Next Article in Journal
Quantum Transport of Particles and Entropy
Next Article in Special Issue
On the Value of Information in Status Update Systems
Previous Article in Journal
Interventional Fairness with Indirect Knowledge of Unobserved Protected Attributes
Previous Article in Special Issue
Age of Information of Parallel Server Systems with Energy Harvesting
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Scheduling to Minimize Age of Incorrect Information with Imperfect Channel State Information

Department of Electrical and Computer Engineering, University of Maryland, College Park, MD 20742, USA
*
Author to whom correspondence should be addressed.
Entropy 2021, 23(12), 1572; https://doi.org/10.3390/e23121572
Submission received: 3 November 2021 / Revised: 21 November 2021 / Accepted: 23 November 2021 / Published: 25 November 2021
(This article belongs to the Special Issue Age of Information: Concept, Metric and Tool for Network Control)

Abstract

:
In this paper, we study a slotted-time system where a base station needs to update multiple users at the same time. Due to the limited resources, only part of the users can be updated in each time slot. We consider the problem of minimizing the Age of Incorrect Information (AoII) when imperfect Channel State Information (CSI) is available. Leveraging the notion of the Markov Decision Process (MDP), we obtain the structural properties of the optimal policy. By introducing a relaxed version of the original problem, we develop the Whittle’s index policy under a simple condition. However, indexability is required to ensure the existence of Whittle’s index. To avoid indexability, we develop Indexed priority policy based on the optimal policy for the relaxed problem. Finally, numerical results are laid out to showcase the application of the derived structural properties and highlight the performance of the developed scheduling policies.

1. Introduction

The Age of Incorrect Information (AoII) is introduced in [1] as a combination of age-based metrics (e.g., Age of Information (AoI)) and error-based metrics (e.g., Minimum Mean Square Error). In communication systems, AoII captures not only the information mismatch between the source and the destination but also the aging process of inconsistent information. Hence, two functions dominate AoII. The first is the time penalty function, which reflects how the inconsistency of information affects the system over time. In real-life applications, inconsistent information will affect different communication systems in different ways. For example, machine temperature monitoring is time-sensitive because the damage caused by overheating will accumulate quickly. However, reservoir water level monitoring is less sensitive to time. Therefore, by adopting different time penalty functions, AoII can capture different aging processes of the mismatch in different systems. The second is the information penalty function, which captures the information mismatch between the source and the destination. It allows us to measure mismatches in different ways, depending on how sensitive different systems are to information inconsistencies. For example, the navigation system requires precise information to give correct instructions, but the real-time delivery tracking system does not need very accurate location information. Since we can choose different penalty functions for different systems, AoII is adaptable to various communication goals, which is why it is regarded as a semantic metric [2].
Since the introduction of AoII, several studies have been performed to reveal its fundamental nature. The authors of [3] consider a system with random packet delivery times and compare AoII with AoI and real-time error via extensive numerical results. The authors of [4] study the problem of minimizing the AoII that takes the general time penalty function. Three real-life applications are considered to showcase the performance advantages of AoII over AoI and real-time error. In [5], the authors investigate the AoII that considers the quantified mismatch between the source and the destination. The optimization problem is studied when the system is resource-constrained. The authors of [6] studied the AoII minimization problem in the context of scheduling. It considers a system where the central scheduler needs to update multiple users at the same time. However, the central scheduler cannot know the states of the sources before receiving the updates. By introducing the belief value, Whittle’s index policy is developed and evaluated. In this paper, we also consider the problem of minimizing AoII in scheduling. Different from [6], we consider the generic time penalty function and study the minimization problem in the presence of imperfect Channel State Information (CSI). Due to the existence of CSI, Whittle’s index policy becomes infeasible in general. Hence, we introduce another scheduling policy that is more versatile and has comparable performance to Whittle’s index policy.
The problem of scheduling to minimize AoI is studied under various system settings in [7,8,9,10,11]. The problem studied in this paper is different and more complicated because AoII considers the aging process of inconsistent information rather than the aging process of updates. Meanwhile, none of them consider the case where CSI is available. The problem of optimizing information freshness in the presence of CSI is studied in [12,13]. However, they focus on the system with a single user and mainly discuss the case where CSI is perfect. The scheduling problems with the goal of minimizing an error-based performance measure are considered in [14,15,16]. Our problem is fundamentally different because AoII also considers the time effect. Moreover, we consider the system where a base station observes multiple sources simultaneously and needs to send updates to multiple destinations.
The main contributions of this work can be summarized as follows. (1) We study the problem of minimizing AoII in a multi-user system where imperfect CSI is available. Meanwhile, the time penalty function is generic. (2) We derive the structural properties of the optimal policy for the considered problem. (3) We establish the indexability of the considered problem under a simple condition and develop Whittle’s index policy. (4) We obtain the optimal policy for a relaxed version of the original problem. By exploring the characteristics of the relaxed problem, we provide an efficient algorithm to obtain the optimal policy. (5) Based on the optimal policy for the relaxed problem, we develop the Indexed priority policy that is free from indexability and has comparable performance to Whittle’s index policy.
The remainder of this paper is organized in the following way. In Section 2, we introduce the system model and formulate the primal problem. Section 3 explores the structural properties of the optimal policy for the primal problem. Under a simple condition, we develop Whittle’s index policy in Section 4. Section 5 presents the optimal policy for a relaxed version of the primal problem. On this basis, we develop the Indexed priority policy in Section 6. Finally, in Section 7, the numerical results are laid out.

2. System Overview

2.1. Communication Model

We consider a slotted-time system with N users and one base station. Each user is composed of a source process, a channel, and a receiver. We assume all the users share the same structure, but the parameters are different. The structure of the communication model is provided in Figure 1.
For user i, the source process is modeled by a two-state Markov chain where transitions happen between the two states with probability p i > 0 and self-transitions happen with probability 1 p i . At any time slot t, the state of the source process X i , t { 0 , 1 } will be reported to the base station as an update, and the base station will decide whether to transmit this update through the corresponding channel. The channel is unreliable, but the estimate of the Channel State Information (CSI) is available at the beginning of each time slot. Let r i , t { 0 , 1 } be the CSI at time t. We assume that r i , t is independent across time and user indices. r i , t = 1 if and only if the transmission attempt at time t will succeed and r i , t = 0 otherwise. Then, we denote by r ^ i , t { 0 , 1 } the estimate of r i , t . We assume that r ^ i , t is an independent Bernoulli random variable with parameter γ i , i.e.,  r ^ i , t = 1 with probability γ i [ 0 , 1 ] and r ^ i , t = 0 with probability 1 γ i . However, the estimate is imperfect. We assume that the error depends only on the user and its estimate. More precisely, we define the probability of error as p e , i r ^ i P r [ r i r ^ i r ^ i ] . We assume p e , i r ^ i < 0.5 because we can flip the estimate if p e , i r ^ i > 0.5 . We are not interested in the case of p e , i r ^ i = 0.5 since r ^ i , t is useless in this case. Although the channel is unreliable, each transmission attempt takes exactly one time slot regardless of the result, and the successfully transmitted update will not be corrupted. Every time an update is received, the receiver will use it as the new estimate X ^ i , t . The receiver will send an A C K / N A C K packet to inform the base station of its reception of the new update. Since an A C K / N A C K packet is generally very small and simple, we assume that it is transmitted reliably and received instantaneously. Then, if  A C K is received, the base station knows that the receiver’s estimate changed to the transmitted update. If  N A C K is received, the base station knows that the receiver’s estimate did not change. Therefore, the base station always knows the estimate at the receiver side.
At the beginning of each time slot, the base station receives updates from each source and the estimates of CSI from each channel. The old updates and estimates are discarded upon the arrival of new ones. Then, the base station decides which updates to transmit, and the decision is independent of the transmission history. Due to the limited resources, at most M < N updates are allowed per transmission attempt. We consider a base station that always transmits M updates.

2.2. Age of Incorrect Information

All the users adopt AoII as a performance metric, but the choices of penalty functions vary. Let X t and X ^ t be the true state and the estimate of the source process, respectively. Then, in a slotted-time system, AoII can be expressed as follows
Δ A o I I ( X t , X ^ t , t ) = k = U t + 1 t g ( X k , X ^ k ) × F ( k U t ) ,
where U t is the last time instance before time t (including t) that the receiver’s estimate is correct. g ( X t , X ^ t ) can be any information penalty function that captures the difference between X t and X ^ t . F ( t ) f ( t ) f ( t 1 ) where f ( t ) can be any time penalty function that is non-decreasing in t. We consider the case where the users adopt the same information penalty function g ( X t , X ^ t ) = | X t X ^ t | but possibly different time penalty functions. To ease the analysis, we require f ( t ) to be unbounded. Combined together, we require f ( t 1 ) f ( t 2 ) if t 1 < t 2 and lim t + f ( t ) = + . Without a loss of generality, we assume f ( 0 ) = 0 , as the source is modeled by a two-state Markov chain, g ( X t , X ^ t ) { 0 , 1 } . Hence, Equation (1) can be simplified to
Δ A o I I ( X t , X ^ t , t ) = k = U t + 1 t F ( k U t ) = f ( s t ) ,
where s t t U t . Therefore, the evolution of s t is sufficient to characterize the evolution of AoII. To this end, we distinguish between the following cases.
  • When the receiver’s estimate is correct at time t + 1 , we have U t + 1 = t + 1 . Then, by definition, s t + 1 = 0 .
  • When the receiver’s estimate is incorrect at time t + 1 , we have U t + 1 = U t . Then, by definition, s t + 1 = t + 1 U t = s t + 1 .
To sum up, we get
s t + 1 = 𝟙 { U t + 1 t + 1 } × ( s t + 1 ) .
A sample path of s t is shown in Figure 2. In the remainder of this paper, we use f i ( · ) to denote the time penalty function user i adopts.
Remark 1.
Under this particular choice of the penalty function, s t can be interpreted as the time elapsed since the last time the receiver’s estimate is correct. Please note that s t is different from the Age of Information (AoI) [17], which is defined as the time elapsed since the generation time of the last received update. We can see that AoI considers the aging process of the update, while AoII considers the aging process of the estimation error. At the same time, s t is also fundamentally different from the holding time, which, according to [18,19], is defined as the time elapsed since the last successful transmission. We notice that the receiver’s estimate can become correct even when no new update is successfully transmitted. Moreover, the information carried by the update may have become incorrect by the time it is received. We also notice that [18,19] consider the problem of minimizing the estimation error. However, by adopting AoII as the performance metric, we study the impact of estimation error on the system.

2.3. System Dynamic

In this section, we tackle the system dynamic. We notice that the status of user i can be captured by the pair x i , t ( s i , t , r ^ i , t ) . In the following, we will use x i , t and ( s i , t , r ^ i , t ) interchangeably. Then, the system dynamic can be fully characterized by the dynamic of x t ( x 1 , t , , x N , t ) . Hence, it suffices to characterize the value of x t + 1 given x t and the base station’s action. To this end, we denote, by a t = ( a 1 , t , , a N , t ) , the base station’s action at time t. a i , t = 1 if the base station transmits the update from user i at time t and a i , t = 0 otherwise. We notice that given action a t , users are independent and the action taken on user i will only affect itself. Consequently
P r ( x t + 1 x t , a t ) = i = 1 N P r ( x i , t + 1 x i , t , a t ) = i = 1 N P r ( x i , t + 1 x i , t , a i , t ) .
Combined with the fact that all the users share the same structure, it is sufficient to study the dynamic of a single user. In the following discussions, we drop the user-dependent subscript i. We recall that r ^ t + 1 is an independent Bernoulli random variable. Then, we have
P r ( x t + 1 x t , a t ) = P ( r ^ t + 1 ) × P r ( s t + 1 x t , a t ) .
By definition, P ( r ^ t + 1 = 1 ) = γ and P ( r ^ t + 1 = 0 ) = 1 γ . Then, we only need to tackle the value of P r ( s t + 1 x t , a t ) . To this end, we distinguish between the following cases
  • When x t = ( 0 , r ^ t ) , the estimate at time t is correct (i.e., X ^ t = X t ). Hence, for the receiver, X t carries no new information about the source process. In other words, X ^ t + 1 = X ^ t regardless of whether an update is transmitted at time t. We recall that U t + 1 = U t if X ^ t + 1 X t + 1 and U t + 1 = t + 1 otherwise. Since the source is binary, we obtain U t + 1 = U t if X t + 1 X t , which happens with probability p and U t + 1 = t + 1 otherwise. According to (2), we obtain
    P r ( 1 ( 0 , r ^ t ) , a t ) = p ,
    P r ( 0 ( 0 , r ^ t ) , a t ) = 1 p .
  • When a t = 0 and x t = ( s t , r ^ t ) , where s t > 0 , the channel will not be used and no new update will be received by the receiver, and so, X ^ t + 1 = X ^ t . We recall that U t + 1 = U t if X ^ t + 1 X t + 1 and U t + 1 = t + 1 otherwise. Since X t X ^ t and the source is binary, we have U t + 1 = U t if X t + 1 = X t , which happens with probability 1 p and U t + 1 = t + 1 otherwise. According to (2), we obtain
    P r ( s t + 1 ( s t , r ^ t ) , a t = 0 ) = 1 p ,
    P r ( 0 ( s t , r ^ t ) , a t = 0 ) = p .
  • When a t = 1 and x t = ( s t , 1 ) where s t > 0 , the transmission attempt will succeed with probability 1 p e 1 and fail with probability p e 1 . We recall that U t + 1 = U t if X ^ t + 1 X t + 1 and U t + 1 = t + 1 otherwise. Then, when the transmission attempt succeeds (i.e., X ^ t + 1 = X t ), U t + 1 = U t if X t + 1 X t and U t + 1 = t + 1 otherwise. When the transmission attempt fails (i.e., X ^ t + 1 = X ^ t X t ), we have U t + 1 = U t if X t + 1 = X t and U t + 1 = t + 1 otherwise. Combining (2) with the dynamic of the source process we obtain
    P r ( s t + 1 ( s t , 1 ) , a t = 1 ) = p e 1 ( 1 p ) + ( 1 p e 1 ) p α ,
    P r ( 0 ( s t , 1 ) , a t = 1 ) = p e 1 p + ( 1 p e 1 ) ( 1 p ) = 1 α .
  • When a t = 1 and x t = ( s t , 0 ) , where s t > 0 , following the same line, we obtain
    P r ( s t + 1 ( s t , 0 ) , a t = 1 ) = p e 0 p + ( 1 p e 0 ) ( 1 p ) β ,
    P r ( 0 ( s t , 0 ) , a t = 1 ) = p e 0 ( 1 p ) + ( 1 p e 0 ) p = 1 β .
Combines together, we obtain the value of P r ( s t + 1 x t , a t ) in all cases. As only M out of N updates are allowed per transmission attempt, we realize a necessity to require transmission attempts always help minimize AoII. It is equivalent to impose P r ( s t + 1 > s t ( s t , r ^ t ) , a t = 0 ) > P r ( s t + 1 > s t ( s t , r ^ t ) , a t = 1 ) for any ( s t , r ^ t ) . Leveraging the results above, it is sufficient to require p < 0.5 . As all the users share the same structure, we assume, for the rest of this paper, that 0 < p i < 0.5 for 1 i N .

2.4. Problem Formulation

The communication goal is to minimize the expected AoII. Therefore, the problem can be formulated as the following
(4a) arg   min ϕ     Φ lim T 1 T E ϕ ( t = 0 T 1 i = 1 N f i ( s i , t ) ) (4b) subject   to i = 1 N a i , t   =   M t ,
where Φ is the set of all causal policies. We refer to the constrained minimization problem reported in problem (4) as the Primal Problem (PP). We notice that the PP is a Restless Multi-Armed Bandit (RMAB) Problem. The optimal policy for this type of problem is far from reachable since it is PSPACE-hard in general [20]. However, we can still derive the structural properties of the optimal policy. These structural properties can be used as a guide for the development of scheduling policies and can indicate the good performance of the developed scheduling policies.

3. Structural Properties of the Optimal Policy

In this section, we investigate the structural properties of the optimal policy for PP. We first define an infinite horizon with an average cost Markov Decision Process (MDP) M N ( w , M ) = ( X N , A N ( M ) , P N , C N ( w ) ) , where
  • X N denotes the state space. The state is x = ( x 1 , , x N ) where x i = ( s i , r ^ i ) .
  • A N ( M ) denotes the action space. The feasible action is a = ( a 1 , , a N ) where a i { 0 , 1 } and i = 1 N a i = M . Note that the feasible actions are independent of the state and the time.
  • P N denotes the state transition probabilities. We define P x , x ( a ) as the probability that action a at state x will lead to state x . It is calculated by
    P x , x ( a ) = i = 1 N P ( r ^ i ) P s i , s i ( a i , r ^ i ) ,
    where P s i , s i ( a i , r ^ i ) is the transition probability from s i to s i when the estimate of CSI is r ^ i and action a i is taken. The values of P s i , s i ( a i , r ^ i ) can be obtained easily from the results in Section 2.3.
  • C N ( w ) denotes the instant cost. When the system is at state x and action a is taken, the instant cost is C ( x , a ) i = 1 N C ( x i , a i ) i = 1 N f i ( s i ) + w a i .
We notice that PP can be cast into M N ( 0 , M ) . Since w = 0 , the instant cost is independent of action a . Therefore, we abbreviate C ( x , a ) as C ( x ) . To simplify the analysis, we consider the case of M = 1 . Equivalently, we investigate the structural properties of the optimal policy for M N ( 0 , 1 ) .
Remark 2.
For the case of M > 1 , we can apply the same methodology. However, as M increases, the action space will grow quickly, resulting in the need to consider more feasible actions in each step of the proof. Hence, to better demonstrate the methodology, we only consider the case of M = 1 in this paper.
It is well known that the optimal policy for M N ( 0 , 1 ) can be characterized by the value function. We denote the value function of state x as V ( x ) . A canonical procedure to calculate V ( x ) is applying the Value Iteration Algorithm (VIA). To this end, we define V ν ( · ) as the estimated value function at iteration ν of VIA and initialize V 0 ( · ) = 0 . Then, VIA updates the estimated value functions in the following way
V ν + 1 ( x ) = C ( x ) θ + min a A N ( 1 ) x X N P x , x ( a ) V ν ( x ) ,
where θ is the optimal value of M N ( 0 , 1 ) . VIA is guaranteed to converge to the value function [21]. More precisely, V ν ( · ) = V ( · ) when ν + . However, the exact value function is impossible to get since we need infinite iterations and the state space is infinite. Instead, we provide two structural properties of the value function.
Lemma 1 (Monotonicity).
For M N ( 0 , 1 ) , V ( x ) is non-decreasing in s i for 1 i N .
Proof. 
Leveraging the iterative nature of VIA, we use mathematical induction to prove the desired results. The complete proof can be found in Appendix A.    ☐
Before introducing the next structural property, we make the following definition.
Definition 1 (Statistically identical).
Two users are said to be statistically identical if the user-dependent parameters and the adopted time penalty functions are the same.
For the users that are statistically identical, we can prove the following
Lemma 2 (Equivalence).
For M N ( 0 , 1 ) , if users j and k are statistically identical, V ( x ) = V ( P ( x ) ) where P ( x ) is state x with x j and x k exchanged.
Proof. 
Leveraging the iterative nature of VIA, we use mathematical induction to prove the desired results. At each iteration, we show that for each feasible action at state x , we can find an equivalent action at state P ( x ) . Two actions are equivalent if they lead to the same value function. The complete proof can be found in Appendix B.    ☐
Equipped with the above lemmas, we proceed with characterizing the structural properties of the optimal policy. We recall that the optimal action at each state can be characterized by the value function. Hence, we denote, by V j ( x ) , the value function resulting from choosing user j to update at state x . Then, V j ( x ) can be calculated by
V j ( x ) = C ( x ) θ + x x j i j P x i , x i ( 0 ) r ^ j P ( r ^ j ) s j P s j , s j ( 1 , r ^ j ) V ( x ) .
If V j ( x ) < V k ( x ) for all k j , it is optimal to transmit the update from user j. When V j ( x ) = V k ( x ) , the two choices are equally desirable. In the following, we will characterize the properties of δ j , k ( x ) V j ( x ) V k ( x ) for any j and k.
Theorem 1 (Structural properties).
For M N ( 0 , 1 ) , δ j , k ( x ) has the following properties
  • δ j , k ( x ) 0 if r ^ k = p e , k 0 = 0 . The equality holds when s j = 0 or r ^ j = p e , j 0 = 0 .
  • δ j , k ( x ) is non-increasing in r ^ j and is non-decreasing in r ^ k when s j , s k > 0 . At the same time, δ j , k ( x ) is independent of r ^ i for any i j , k .
  • δ j , k ( x ) 0 if s k = 0 . The equality holds when s j = 0 or r ^ j = p e , j 0 = 0 .
  • δ j , k ( x ) is non-increasing in s j if Γ j r ^ j Γ k r ^ k and is non-decreasing in s k if Γ j r ^ j Γ k r ^ k when s j , s k > 0 . We define Γ i 1 α i 1 p i and Γ i 0 β i 1 p i for 1 i N .
  • δ j , k ( x ) 0 if s j s k , r ^ j r ^ k , and users j and k are statistically identical.
Proof. 
The proof can be found in Appendix C.    ☐
We notice that Γ i r ^ i can be written as
Γ i r ^ i = P r ( s i + 1 ( s i , r ^ i ) , a i = 1 ) P r ( s i + 1 ( s i , r ^ i ) , a i = 0 ) < 1 ,
where s i can be any positive integer. Consequently, Γ i r ^ i is independent of any s i > 0 and indicates the decrease in the probability of increasing s i caused by action a i = 1 . When Γ i r ^ i is large, action a i = 1 will achieve a small decrease in the probability of increasing s i . In the following, we provide an intuitive interpretation of why the monotonicity in Property 4 of Theorem 1 depends on Γ i r ^ i . We take the case of Γ j r ^ j Γ k r ^ k as an example and assume that there are only users j and k in the system. Then, according to Section 2.3, the dynamic of s j and s k can be divided into the following three cases
  • Neither s j nor s k increases. In this case, both s j and s k become zero.
  • Either s j or s k increases and the other becomes zero. We denote by P j k the probability that only s k increases when a j = 1 . The notation for other cases is defined analogously. The probabilities can be obtained easily using the results in Section 2.3.
  • Both s j and s k increase. We denote by P j the probability that both s j and s k increase when a j = 1 . P k is defined analogously. The probabilities can be obtained easily using the results in Section 2.3.
We notice that δ j , k ( x ) implies the tendency of the base station to choose between the two users. The larger δ j , k ( x ) is, the more the base station tends to choose user k. Thus, we investigate the base station’s propensity to choose user k when s k increases but s j stays the same. We ignore the case where the resulting s k is zero since it is independent of the increase in s k . With this in mind, we first notice that P k k P j k . Meanwhile, we can easily verify that P j P k = Γ j r ^ j Γ k r ^ k . When Γ j r ^ j Γ k r ^ k , we have P j P k . Then, there exists a subtle trade-off. More precisely, choosing user k will result in P k k P j k , but at the cost of P k P j . Hence, in this case, the propensity of the base station is hard to determine. Following the same line, we can show that choosing user j will lead to P j j P k j and P j P k . Thus, there exists no such trade-off when we investigate the base station’s propensity to choose user j as s j increases but s k stays the same.
Leveraging Theorem 1, we can provide some specific structural properties of the optimal policy.
Corollary 1 (Application of Theorem 1).
When M = 1 , the optimal policy for PP must satisfy the following
  • The user i with r ^ i = p e , i 0 = 0 or s i = 0 will not be chosen unless it is to break the tie.
  • When user j is chosen at state x 1 , then for state x 2 , such that r ^ 1 , j r ^ 2 , j and s 1 , i = s 2 , i for 1 i N , the optimal choice must be in the set G = { j } { k : r ^ 1 , k < r ^ 2 , k } .
  • When N = 2 , we consider two states, x 1 and x 2 , which differ only in the value of s j . Specifically, s 1 , j s 2 , j . If user j is chosen at state x 1 and Γ j r ^ 1 , j Γ k r ^ 1 , k , the optimal choice at state x 2 will also be user j.
  • When N = 2 , we consider two states, x 1 and x 2 , which differ only in the value of s k . Specifically, s 1 , k s 2 , k . If user j is chosen at state x 1 and Γ j r ^ 1 , j Γ k r ^ 1 , k , the optimal choice at state x 2 will also be user j.
  • When all users are statistically identical, the optimal choice at any time slot must be either the user with x = ( s m a x , 1 , 1 ) where s m a x , 1 max s i { ( s i , 1 ) } or the user with x = ( s m a x , 0 , 0 ) where s m a x , 0 max s i { ( s i , 0 ) } . Moreover,
    • If s m a x , 1 s m a x , 0 , it is optimal to choose the user with x = ( s m a x , 1 , 1 ) .
    • If s m a x , 1 < s m a x , 0 , the optimal choice will switch from the user with x = ( s m a x , 0 , 0 ) to the user with x = ( s m a x , 1 , 1 ) when s m a x , 1 increases from 0 to s m a x , 0 solely.
Proof. 
The first property follows directly from Property 1 and Property 3 of Theorem 1. For the second property, leveraging Property 2 of Theorem 1, we have δ j , k ( x 2 ) δ j , k ( x 1 ) 0 if r ^ 1 , j r ^ 2 , j , r ^ 1 , k r ^ 2 , k , and  s 1 , i = s 2 , i for 1 i N . Thus, the optimal choice will not be user k in this case. Then, we can conclude that the optimal choice must be in the set G = { j } { k : r ^ 1 , k < r ^ 2 , k } .
For the third property, we have proved in Property 4 of Theorem 1 that δ j , k ( x ) is non-increasing in s j if Γ j r ^ j Γ k r ^ k . Hence, δ j , k ( x 2 ) δ j , k ( x 1 ) 0 . As we consider the case of N = 2 , the optimal choice at state x 2 will also be user j. The fourth property can be shown in a similar way by noticing that δ j , k ( x ) is non-decreasing in s k when Γ j r ^ j Γ k r ^ k .
For the last property, we recall from Property 5 of Theorem 1 that it is always better to choose the user with a larger s if they are statistically identical and have the same r ^ . Thus, we can conclude that the optimal choice must be either the user with x = ( s m a x , 1 , 1 ) or the user with x = ( s m a x , 0 , 0 ) . Without a loss of generality, we assume x j = ( s m a x , 1 , 1 ) and x k = ( s m a x , 0 , 0 ) . Now, we distinguish between the following cases
  • According to Property 5 of Theorem 1, we can conclude that it is optimal to choose user j when s m a x , 1 s m a x , 0 .
  • To determine the optimal choice in the case of s m a x , 1 < s m a x , 0 , we recall that the optimal choice will be user k (i.e., δ j , k ( x ) 0 ) if s j = 0 and will be user j (i.e., δ j , k ( x ) 0 ) if s j = s k . At the same time, Property 4 of Theorem 1 tells us that δ j , k ( x ) is non-increasing in s j when users j and k are statistically identical. Therefore, we can conclude that the optimal choice will switch from user k to user j when s j increases from 0 to s k solely.
   ☐

4. Whittle’s Index Policy

Whittle’s index policy is a well-known low-complexity heuristic that shows a strong performance in many problems that belong to RMAB [22,23,24]. In this section, we develop Whittle’s index policy for PP. We first present the general procedures we adopt to obtain Whittle’s index.
  • We first formulate a relaxed version of PP and apply the Lagrangian approach.
  • Then, we decouple the problem of minimizing the Lagrangian function into N decoupled problems, each of which only considers a single user. By casting the decoupled problem into an MDP, we investigate the structural properties and performance of the optimal policy.
  • Leveraging the results above and under a simple condition, we establish the indexability of the decoupled problem.
  • Finally, we obtain the expression of Whittle’s index by solving the Bellman equation.

4.1. Relaxed Problem

The first step in obtaining Whittle’s index is to formulate the Relaxed Problem (RP). More precisely, instead of requiring the limit on the number of updates allowed per transmission attempt to be met in each time slot, we relax the constraint such that the limit is not violated in an average sense. Then, RP can be formulated as
arg   min ϕ     Φ Δ ¯     lim T 1 T E ϕ ( t = 0 T 1 i = 1 N f i ( s i , t ) )
subject   to ρ ¯ ϕ     lim T 1 T E ϕ ( t = 0 T 1 i = 1 N a i , t )     M .
As RP is specified, we apply the Lagrangian approach. First of all, we write RP into its Lagrangian form.
L ( λ , ϕ ) = lim T 1 T E ϕ t = 0 T 1 i = 1 N ( f i ( s i , t ) + λ a i , t ) λ M ,
where λ 0 is the Lagrange multiplier. Then, we investigate the problem of minimizing the Lagrangian function. Since λ M is independent of policies, we can ignore it. More precisely, we consider the following minimization problem
minimize ϕ     Φ lim T 1 T E ϕ ( t = 0 T 1 i = 1 N ( f i ( s i , t )   +   λ a i , t ) .

4.2. Decoupled Model

In this section, we formulate the decoupled problem and investigate its optimal policy. The decoupled model associated with each user follows the system model with N = 1 . Since all the users share the same structure, we drop the user-dependent subscript i for simplicity. Then, the decoupled problem can be formulated as
minimize ϕ     Φ lim T 1 T E ϕ ( t = 0 T 1 ( f ( s t )   +   λ a t ) ) ,
where Φ is the set of all causal policies when N = 1 . We notice that problem (8) can be cast into the MDP M 1 ( λ , 1 ) . We define M = 1 when there is no restriction on the number of updates allowed per transmission attempt.
We first investigate the structural properties of the optimal policy for M 1 ( λ , 1 ) when λ is a given non-negative constant. We start with characterizing the corresponding value function V ( x ) .
Corollary 2 (Extension of Lemma 1).
For M 1 ( λ , 1 ) , V ( x ) is non-decreasing in s.
Proof. 
The proof follows the same steps as in the proof of Lemma 1. The complete proof can be found in Appendix D.    ☐
Equipped with the above corollary, we can characterize the structural properties of the optimal policy for (8).
Proposition 1 (Optimal policy for decoupled problem).
The optimal policy for the decoupled problem is a threshold policy with the following properties.
  • The optimal policy can be fully captured by n = ( n 0 , n 1 ) . More precisely, when the system is at state ( s , r ^ ) , it is optimal to make a transmission attempt only when s n r ^ .
  • n 0 n 1 > 0 .
Proof. 
We define Δ V ( x ) V 1 ( x ) V 0 ( x ) , where V a ( x ) is the value function resulting from taking action a at state x. Then, the optimal action at state x is a = 1 if Δ V ( x ) < 0 , and  a = 0 is optimal otherwise. We use Corollary 2 to characterize the sign of Δ V ( x ) . The complete proof can be found in Appendix E.    ☐
In the following, we evaluate the performance of the threshold policy detailed in Proposition 1. More precisely, we calculate the expected AoII Δ ¯ n and the expected transmission rate ρ ¯ n resulting from the adoption of threshold policy n . We will see in the following that Δ ¯ n and ρ ¯ n are essential for establishing the indexability and obtaining the expression of Whittle’s index.
Proposition 2 (Performance).
Under threshold policy n = ( n 0 , n 1 ) ,
Δ ¯ n = π 0 p k = 1 n 1 1 f ( k ) ( 1 p ) k 1 + ( 1 p ) n 1 1 k = n 1 n 0 1 f ( k ) c 1 k n 1 + c 1 n 0 n 1 k = n 0 + f ( k ) c 2 k n 0 ,
ρ ¯ n = π 0 p ( 1 p ) n 1 1 γ 1 c 1 + c 1 n 0 n 1 1 1 c 2 γ 1 c 1 ,
where
π 0 = 1 2 + p ( 1 p ) n 1 1 1 1 c 1 1 p + c 1 n 0 n 1 1 1 c 2 1 1 c 1 ,
c 1 = ( 1 γ ) ( 1 p ) + γ α , and  c 2 = ( 1 γ ) β + γ α .
Proof. 
We notice that the dynamic of AoII under the threshold policy can be fully captured by a Discrete-Time Markov Chain (DTMC). Then, combined with the fact that r ^ is an independent Bernoulli random variable, we can obtain the desired results from the stationary distribution of the induced DTMC. The complete proof can be found in Appendix F.    ☐
As f ( · ) can be any non-decreasing function, Δ ¯ can grow indefinitely. Thus, it is necessary to require that there exists at least one threshold policy that causes a finite Δ ¯ . By noting that 1 p c 1 c 2 , we have
Δ ¯ π 0 p k = 1 n 1 1 f ( k ) c 2 k 1 + c 2 n 1 1 k = n 1 n 0 1 f ( k ) c 2 k n 1 + c 2 n 0 n 1 k = n 0 + f ( k ) c 2 k n 0 = π 0 p k = 1 + f ( k ) c 2 k 1 .
The equality is achieved when n 0 = n 1 = 1 . Then, we can conclude that it is sufficient to require k = 1 + f ( k ) c 2 k 1 < + . This will be the underlying assumption throughout the rest of this paper.

4.3. Indexability

In this section, we establish the indexability of the decoupled problem, which ensures the existence of Whittle’s index. We start with the definition of indexability.
Definition 2
(Indexability). The decoupled problem is indexable if the set of states in which a = 0 is the optimal action increases with λ, that is,
λ < λ D ( λ ) D ( λ ) ,
where D ( λ ) is the set of states in which a = 0 is optimal when Lagrange multiplier λ is adopted.
The Lagrange multiplier λ can be viewed as a cost associated with each transmission attempt. Intuitively, as  λ increases, the base station should stay idle (i.e., a = 0 ) for a longer time until s becomes large enough to offset the cost. Although it is intuitively correct that the decoupled problem is indexable, the indexability is hard to establish as the optimal policy is characterized by two thresholds. Thus, Whittle’s index does not necessarily exist. However, the indexability can be established when the following condition is satisfied
p e , i 0 = 0 f o r 1 i N .
Remark 3.
Problem (9) only requires the estimate r ^ i to be perfect when r ^ i = 0 . In the case of r ^ i = 1 , we still allow the estimate to be inaccurate.
When (9) is satisfied, Propositions 1 and 2 reduce to the following
Corollary 3 (Consequences of (9)).
When (9) is satisfied, the optimal policy for the decoupled problem (8) is the threshold policy n = ( + , n ) . The corresponding Δ ¯ n and ρ ¯ n are
Δ ¯ n = π 0 p k = 1 n 1 f ( k ) ( 1 p ) k 1 + ( 1 p ) n 1 k = n + f ( k ) c 1 k n ,
ρ ¯ n = π 0 p ( 1 p ) n 1 γ 1 c 1 ,
where
π 0 = 1 2 + p ( 1 p ) n 1 1 1 c 1 1 p .
Proof. 
We continue with the same notations as in the proof of Propositions 1 and 2. It is sufficient to show that n 0 = + . To this end, we consider the state x = ( s , 0 ) . By following the same steps as in the proof of Proposition 1, we have
Δ V ( s , 0 ) = λ 0 .
Therefore, it is optimal to stay idle (i.e., a = 0 ) at state x = ( s , 0 ) for any s 0 . Equivalently, n 0 = + . Then, the corresponding Δ ¯ n and ρ ¯ n can be calculated as a special case of Proposition 2 where n 0 = + , n 1 = n , and  p e 0 = 0 .    ☐
Leveraging Corollary 3, we can establish the indexability of the decoupled problem.
Proposition 3 (Indexability of decoupled problem).
The decoupled problem is indexable when (9) is satisfied.
Proof. 
According to Proposition 2.2 of [25], we only need to verify that the expected transmission rate ρ ¯ n is strictly decreasing in n. From Corollary 3, we have
ρ ¯ n = γ p 1 c 1 2 ( 1 p ) n 1 + p 1 c 1 1 .
As 1 2 < 1 p < 1 , we can easily verify that ρ ¯ n is strictly decreasing in n. Thus, the decoupled problem is indexable when (9) is satisfied.    ☐

4.4. Whittle’s Index Policy

In this section, we proceed with finding the expression of Whittle’s index and defining Whittle’s index policy. First of all, we give the definition of Whittle’s index.
Definition 3 (Whittle’s index).
When the decoupled problem is indexable, Whittle’s index at state x is defined as the infimum λ, such that both actions are equally desirable. Equivalently, Whittle’s index at state x is defined as the infimum λ such that V 0 ( x ) = V 1 ( x ) .
Let us denote by W x the Whittle’s index at state x. Then, the expression of Whittle’s index is given by the following Proposition.
Proposition 4 (Whittle’s index).
When (9) is satisfied, Whittle’s index is
W x = 0 w h e n x = ( 0 , r ^ ) o r x = ( s , 0 ) , 1 c 1 k = s + 1 + f ( k ) c 1 k s 1 Δ ¯ s ( 1 c 1 ) ( 1 p ) γ ( 1 p α ) c 1 ( 1 p α ) + ρ ¯ s w h e n x = ( s , 1 ) ,
where s > 0 and c 1 = ( 1 γ ) ( 1 p ) + γ α . Δ ¯ s and ρ ¯ s are the expected AoII and the expected transmission rate when threshold policy n = ( + , s ) is adopted, respectively. At the same time, W x is non-negative and is non-decreasing in s.
Proof. 
Whittle’s indexes at state x = ( 0 , r ^ ) and x = ( s , 0 ) are obtained easily from the proof of Proposition 1. For state x = ( s , 1 ) , we first use backward induction to calculate the expressions of some value functions. Then, the expression of Whittle’s index can be obtained from its definition. The complete proof can be found in Appendix G.    ☐
Definition 4
(Whittle’s index policy). At any state x = ( x 1 , x 2 , , x N ) , the base station will transmit the updates from M users with the largest W x i . The ties are broken arbitrarily. W x i is calculated using Proposition 4 with the parameters of user i.
Remark 4.
Whittle’s index policy possesses the structural properties detailed in Corollary 1.
  • The first two properties can be verified by noting that W x i 0 and the equality holds when r ^ i = 0 or s i = 0 . At the same time, W x i is non-decreasing in r ^ i .
  • The third and fourth properties can be verified by noting that W x i is non-decreasing in s i .
  • For the last property, we first notice that W x j = W x k when users j and k are statistically identical and x j = x k . Then, the property can be verified by noting that W x i is non-decreasing in both s i and r ^ i .

5. Optimal Policy for Relaxed Problem

In this section, we provide an efficient algorithm to obtain the optimal policy for RP, based on which we will develop another scheduling policy for PP in the next section that is free from indexability. At the same time, the performance of the optimal policy for RP forms a universal lower bound because the following ordering holds
Δ ¯ A o I I R P Δ ¯ A o I I P P ,
where Δ ¯ A o I I R P and Δ ¯ A o I I P P are the minimal expected AoII of RP and PP, respectively.
Remark 5.
Note that the optimal policy for RP may not necessarily be a valid policy for PP, as the transmitter may transmit more than M updates in one transmission attempt under RP-optimal policy.
To solve RP, we follow the discussion in Section 4.1. More precisely, we take the Lagrangian approach and consider the problem reported in (7). We will see in the following discussion that the optimal policy for RP can be characterized by the optimal policies for problem (7). Therefore, we first cast problem (7) into the MDP M N ( λ , 1 ) . However, the optimal policy for M N ( λ , 1 ) is difficult to obtain because the state space is infinite. Even though we can make the state space finite by imposing an upper limit on the value of s, the state space and the action space grow exponentially with the number of users in the system. To overcome the difficulty, we investigate the optimal policy for M 1 i ( λ , 1 ) where 1 i N . The superscript i means that the only user in the system is user i. We will show later that the optimal policy for M N ( λ , 1 ) can be fully characterized by the optimal policies for M 1 i ( λ , 1 ) where 1 i N .

5.1. Optimal Policy for Single User

In this section, we tackle the problem of finding the optimal policy for M 1 i ( λ , 1 ) . Since the users share the same structure, we ignore the superscript i for simplicity. To find the optimal policy, we first use the Approximating Sequence Method (ASM) introduced in [26] to make the state space finite. More precisely, we impose s m where m is a predetermined upper limit. The state transition probabilities P s , s ( a , r ^ ) are modified in the following way
P s , s ( a , r ^ ) = P s , s ( a , r ^ ) i f s < m , P s , s ( a , r ^ ) + z > m P s , z ( a , r ^ ) i f s = m .
The action space and the instant cost remain unchanged. Then, we can apply Relative Value Iteration (RVI) with convergence criteria ϵ to obtain the optimal policy. We notice that M 1 ( λ , 1 ) coincides with the decoupled model studied in Section 4.2. Hence, we can utilize the threshold structure of the optimal policy to improve RVI. To this end, we class a state as active if the optimal action at this state is a = 1 . Then, the threshold structure detailed in Proposition 1 tells us the following. For any state x, if there exists an active state x 1 with s 1 s and r ^ 1 r ^ , then x must also be active. Hence, we can determine the optimal action at state x immediately instead of comparing all feasible actions. In this way, we can reduce the running time of RVI. The pseudocode for the improved RVI can be found in Algorithm A1 of Appendix M. A similar technique is also presented in [5].
For M 1 ( λ , 1 ) , when problem (9) is satisfied, Whittle’s index exists and can be calculated efficiently using Proposition 4. Therefore, we can obtain the optimal policy using Whittle’s index and further reduce the computational complexity. To this end, we denote by n λ the optimal policy for M 1 ( λ , 1 ) and present the following proposition
Proposition 5 (Optimal deterministic policy).
When (9) is satisfied, the optimal policy for M 1 ( λ , 1 ) is n λ = ( + , n ) where n is given by
n = 1 i f λ = 0 , max { s N 0 : W s λ } + 1 i f λ > 0 .
W s is the Whittle’s index at state ( s , 1 ) .
Proof. 
We first notice that M 1 ( λ , 1 ) coincides with the decoupled model studied in Section 4.2. Then, we show the optimal action for each state with r ^ = 1 using the definition of Whittle’s index and the fact that the decoupled problem is indexable when (9) is satisfied. The complete proof can be found in Appendix H.    ☐
In the following, we provide a randomized policy that is also optimal for M 1 ( λ , 1 ) . We will see later that the randomized policy is the key to obtaining the optimal policy for RP.
Theorem 2 (Optimal randomized policy).
There exist two deterministic policies n λ + and n λ , which are both optimal for M 1 ( λ , 1 ) . We consider the following randomized policy n λ : every time the system reaches state ( 0 , 0 ) , the base station will make the choice between n λ with probability μ and n λ + with probability 1 μ . The chosen policy will be followed until the next choice. Then, the randomized policy n λ is optimal for M 1 ( λ , 1 ) under any μ [ 0 , 1 ] .
Proof. 
We show that our system verifies the assumptions given in [27]. Then, leveraging the characteristics of our system, we can obtain the optimal randomized policy. The complete proof can be found in Appendix I.    ☐
In practice, we approximate λ + λ + ξ and λ λ ξ where ξ is a small perturbation. Then, the deterministic policies n λ + and n λ can be obtained by following the discussion at the beginning of this subsection. Note that, in most cases, n λ + and n λ are the same.

5.2. Optimal Policy for RP

In this section, we characterize the optimal policy for RP. Let us denote by V ( x ) and V i ( x i ) the value functions of M N ( λ , 1 ) and M 1 i ( λ , 1 ) , respectively. Then, we can prove the following
Proposition 6 (Separability).
V ( x ) = i = 1 N V i ( x i ) where x = ( x 1 , , x N ) . In other words, the policy, under which each user adopts its own optimal policy, is optimal for M N ( λ , 1 ) .
Proof. 
We show V ( x ) = i = 1 N V i ( x i ) by comparing the Bellman equations they must satisfy. The complete proof can be found in Appendix J.    ☐
We denote the optimal policy for M N ( λ , 1 ) as ϕ λ = [ n λ , 1 , , n λ , N ] where n λ , i is the optimal policy for M 1 i ( λ , 1 ) . For simplicity, we define Δ ¯ ( λ ) and ρ ¯ ( λ ) as the expected AoII and the expected transmission rate associated with ϕ λ , respectively. Δ ¯ i ( λ ) and ρ ¯ i ( λ ) are defined analogously for user i under policy n λ , i . We also define λ * inf { λ > 0 : ρ ¯ ( λ ) M } . With Proposition 6 and the above definitions in mind, we proceed with constructing the optimal policy for RP.
Theorem 3 (Optimal policy for RP).
The optimal policy for RP can be characterized by two deterministic policies ϕ λ + * = [ n λ + * , 1 , , n λ + * , N ] and ϕ λ * = [ n λ * , 1 , , n λ * , N ] where n λ + * , i and n λ * , i are both the optimal deterministic policies for M 1 i ( λ * , 1 ) . Then, we mix ϕ λ + * and ϕ λ * in the following way: for each user i, every time the user reaches state ( 0 , 0 ) , the base station will make the choice between n λ * , i with probability μ i and n λ + * , i with probability 1 μ i . The chosen policy will be followed by user i until the next choice. Where 1 i N , the μ i is chosen in such a way as to satisfy
i = 1 N ρ ¯ i ( λ * ) = i = 1 N μ i ρ ¯ i ( λ * ) + ( 1 μ i ) ρ ¯ i ( λ + * ) = M .
Then, the mixed policy, denoted by ϕ λ * , is optimal for RP.
Proof. 
According to Lemma 3.10 of [27], a policy is optimal for RP if
  • It is optimal for M N ( λ * , 1 ) ;
  • The resulting expected transmission rate is equal to M.
Then, we construct such a policy using Theorem 2 and Proposition 6. The complete proof can be found in Appendix K.    ☐
Since we approximate λ + * λ * + ξ and λ * λ * ξ in practice, ρ ¯ i ( λ + * ) ρ ¯ i ( λ * ) for all i according to the monotonicity given by Lemma 3.4 of [27]. Combining with the definition of λ * , we must have ρ ¯ ( λ + * ) M < ρ ¯ ( λ * ) . Therefore, we can always find μ i ’s that realize (11). In this paper, we choose
μ i = μ = M ρ ¯ ( λ + * ) ρ ¯ ( λ * ) ρ ¯ ( λ + * ) , f o r 1 i N .
Then, we describe the algorithm used to obtain the optimal policy for RP. As detailed in Theorem 3, it is essential to find λ * . To this end, we recall that, for any user i under given λ , the optimal deterministic policy n λ , i can be obtained using the results in Section 5.1 and the resulting expected transmission rate ρ ¯ i ( λ ) is given by Proposition 2. Since ρ ¯ i ( λ ) is non-increasing in λ for all i according to Lemma 3.4 of [27], ρ ¯ ( λ ) = i = 1 N ρ ¯ i ( λ ) is also non-increasing in λ . Hence, we can regard ρ ¯ ( λ ) as a non-increasing function of λ . Then, according to the definition of λ * , we can use the Bisection search to obtain λ * efficiently. The main steps can be summarized as follows.
  • Initialize λ = 0 and λ + = 1 .
  • Do λ = λ + and λ + = 2 λ + until ρ ¯ ( λ + ) < M .
  • Run Bisection search on the interval [ λ , λ + ] until the tolerance 2 ξ is met.
Then, λ * and λ + * can simply be the boundaries of the final interval. The pseudocode for the Bisection search can be found in Algorithm A2 of Appendix M. After obtaining λ * and λ + * , the optimal policy ϕ λ * is detailed in Theorem 3 and the mixing probabilities μ i ’s are given by (12).
Remark 6.
We recall that the optimal deterministic policy for each user can be characterized by two positive thresholds (i.e., n 0 , n 1 > 0 ). Consequently, under RP-optimal policy, the base station will never choose the user at state ( 0 , r ^ ) . Then, when M increases, the expected transmission rate achieved by RP-optimal policy will saturate before M reaches N. When the expected transmission rate saturates, the RP-optimal policy is ϕ * = [ n 1 , , n N ] where n i = ( 1 , 1 ) for 1 i N . The saturation happens when M is larger than or equal to the expected transmission rate achieved by ϕ * .

6. Indexed Priority Policy

Although the performance of Whittle’s index policy is known to be good, it requires indexability, which is usually difficult to establish. In this section, based on the primal-dual heuristic introduced in [28], we develop a policy that does not require indexability and has comparable performance to Whittle’s index policy. We start with presenting the primal-dual heuristic.

6.1. Primal-Dual Heuristic

The heuristic is based on the optimal primal and dual solution pair to the linear program associated with RP. To introduce the linear program, we define π x i a i ( ϕ ) 0 as the expected time that user i is at state x i and action a i is taken according to policy ϕ . Then, for any ϕ , π x i a i ( ϕ ) must satisfy the following problems
π x i 0 ( ϕ ) + π x i 1 ( ϕ ) = x i a i P x i , x i ( a i ) π x i a i ( ϕ ) , x i , i .
x i a i π x i a i ( ϕ ) = 1 , i .
The objective function of RP can be rewritten as
minimize ϕ     Φ i = 1 N x i , a i C ( x i ) π x i a i ( ϕ ) ,
where C ( x i ) = f i ( s i ) is the instant cost at state x i . The constraint on the expected transmission rate can be rewritten as
i = 1 N x i π x i 1 ( ϕ ) M .
Thus, the linear program associated with RP can be formulated as the following
(13a) minimize π x i a i i = 1 N x i , a i C ( x i ) π x i a i (13b) subject   to π x i 0   +   π x i 1     x i a i P x i , x i ( a i ) π x i a i   =   0 x i , i , (13c) x i a i π x i a i   =   1 i , (13d) i = 1 N x i π x i 1     M , (13e) π x i a i     0 , x i , a i , i .
The corresponding dual problem is
(14a) maximize σ , σ i , σ x i i = 1 N σ i     M σ (14b) subject   to σ x i   +   σ i     x i P x i , x i ( 0 ) σ x i     C ( x i ) , x i , i , (14c) σ x i   +   σ i     x i P x i , x i ( 1 ) σ x i     σ     C ( x i ) , x i , i , (14d) σ     0 .
Let { π ¯ x i a i } and { σ ¯ , σ ¯ i , σ ¯ x i } be the optimal primal and dual solution pair to the problems reported in (13) and (14). We define
ψ ¯ x i 0 = x i P x i , x i ( 0 ) σ ¯ x i + C ( x i ) σ ¯ i σ ¯ x i 0 ,
ψ ¯ x i 1 = x i P x i , x i ( 1 ) σ ¯ x i + σ ¯ + C ( x i ) σ ¯ i σ ¯ x i 0 .
For any state x = ( x 1 , , x N ) , let h ( x ) = i = 1 N 𝟙 { π ¯ x i 1 > 0 } . Then, the heuristic operates in the following way
  • If h ( x ) M , the base station will choose the M users with the largest ψ ¯ x i 0 among the h ( x ) users.
  • If h ( x ) < M , these h ( x ) users are chosen by the base station. The base station will choose M h ( x ) additional users with the smallest ψ ¯ x i 1 .
However, Linear Programming (LP) is a very general technique and does not appear to take advantage of the special structure of the problem. Although there are algorithms for solving rational LP that take time polynomial in the number of variables and constraints, they run extremely slowly in practice [29]. For our problem, we notice that the users have separate activity areas that are linked through a common resource constraint. Therefore, the primal problem can be solved using Dantzig-Wolfe decomposition. Even so, the problem is still computationally demanding when the system scales up. We recall that we solved the exact problem efficiently using MDP-specific algorithms in Section 5. It is more efficient because of the following reasons
  • According to Proposition 6, we can decompose the problem into N subproblems.
  • For each subproblem, the threshold structure of the optimal policy is utilized to reduce the running time of RVI.
  • As we will see later, the developed policy can be obtained directly from the result of RVI in practice.
In the following, we will translate the results in Section 5 into the optimal primal and dual solution pair and propose Indexed priority policy.

6.2. Indexed Priority Policy

We first define the Lagrangian function associated with (13).
L ( π x i a i , σ , σ i , σ x i , ψ x i a i ) = i = 1 N x i , a i C ( x i ) π x i a i + i , x i σ x i x i a i P x i , x i ( a i ) π x i a i π x i 0 π x i 1 + i = 1 N σ i 1 x i a i π x i a i + σ i = 1 N x i π x i 1 M i , x i , a i ψ x i a i π x i a i .
Then, the corresponding Lagrangian dual function is
g ( σ , σ i , σ x i , ψ x i a i ) = inf π x i a i L ( π x i a i , σ , σ i , σ x i , ψ x i a i ) .
Let π x i be the expected time that user i is at state x i caused by the adoption of ϕ λ * , where ϕ λ * is the optimal policy detailed in Theorem 3. Then, we define { π x i a i } as follows
  • State x i is where randomization happens (randomization happens when the actions suggested by the two optimal deterministic policies are different), and it has a value of π x i 0 = a n λ * , i ( x i ) ( 1 μ i ) π x i + a n λ + * , i ( x i ) μ i π x i and π x i 1 = π x i π x i 0 where μ i is given by (12) and a n λ , i ( x i ) is the action suggested by n λ , i at state x i .
  • For other values of x i , we have π x i 0 = ( 1 a n λ * , i ( x i ) ) π x i and π x i 1 = π x i π x i 0 .
We also define σ = λ * , σ i = θ i , and  σ x i = V i ( x i ) where λ * is specified in Section 5.2, θ i is the optimal value of M 1 i ( λ * , 1 ) , and  V i ( x i ) is the value function associated with M 1 i ( λ * , 1 ) . Lastly, we define { ψ x i a i } as follows
ψ x i 0 = x i P x i , x i ( 0 ) σ x i + C ( x i ) σ i σ x i ,
ψ x i 1 = x i P x i , x i ( 1 ) σ x i + σ + C ( x i ) σ i σ x i .
Then, we can prove the following proposition.
Proposition 7 (Optimal solution pair).
{ π x i a i } and { σ , σ i , σ x i , ψ x i a i } are primal and dual solutions to (13), respectively.
Proof. 
Since (13) is linear and strictly feasible, it is sufficient to show that { π x i a i } and { σ , σ i , σ x i , ψ x i a i } verify the KKT conditions, which can be expressed as the following four conditions.
  • Primal feasibility: the constraints in (13) are satisfied.
  • Dual feasibility: σ 0 and ψ x i a i 0 for all x i , a i , and i.
  • Complementary slackness: σ i = 1 N x i π x i 1 M = 0 and ψ x i a i π x i a i = 0 for all x i , a i , and i.
  • Stationarity: the gradient of L ( π x i a i , σ , σ i , σ x i , ψ x i a i ) with respect to { π x i a i } vanishes.
Apparently, the first condition is satisfied by { π x i a i } . For the second condition, σ 0 since σ = λ * 0 by definition. For  ψ x i a i , we can verify that ψ x i a i = V i , a i ( x i ) V i ( x i ) where V i , a i ( x i ) is the value function resulting from taking action a i at state x i . Then, the non-negativity is guaranteed by the Bellman equation. For the third condition, the first term is zero because we choose the μ i ’s given by (12). For the second term, we recall that ψ x i a i = V i , a i ( x i ) V i ( x i ) . According to the definition of π x i a i , we know V i ( x i ) = V i , a i ( x i ) if π x i a i > 0 . Combined together, we can conclude that ψ x i a i = 0 when π x i a i > 0 . Thus, the third condition is satisfied. For the last condition, setting the gradient equal to zero yields a system of linear equations. More precisely, for each x i and 1 i N
{ x i P x i , x i ( 0 ) σ x i + C ( x i ) = σ x i + σ i + ψ x i 0 . x i P x i , x i ( 1 ) σ x i + σ + C ( x i ) = σ x i + σ i + ψ x i 1 .
Then, { σ , σ i , σ x i , ψ x i a i } verifies the system of linear equations by definition. Since all four conditions are satisfied, we can conclude our proof.    ☐
According to Proposition 7, we know that { π x i a i } and { σ , σ i , σ x i } defined above are the optimal solutions to problems (13) and (14), respectively. As the optimal solutions are obtained, we can adopt the heuristic detailed in Section 6.1.
The heuristic can be expressed equivalently as an index policy. To this end, we define the index I x i for state x i as
I x i ψ ¯ x i 0 ψ ¯ x i 1 .
According to the complementary slackness, I x i can be reduced to the following.
  • For state x i such that π ¯ x i 1 > 0 and π ¯ x i 0 = 0 , we have ψ ¯ x i 1 = 0 . Therefore, I x i = ψ ¯ x i 0 0 .
  • For state x i such that π ¯ x i 1 > 0 and π ¯ x i 0 > 0 , we have ψ ¯ x i 1 = ψ ¯ x i 0 = 0 . Therefore, I x i = 0 .
  • For state x i such that π ¯ x i 1 = 0 and π ¯ x i 0 > 0 , we have ψ ¯ x i 0 = 0 . Therefore, I x i = ψ ¯ x i 1 0 .
We can show that I x i possesses the following properties.
Proposition 8 (Properties of I x i ).
For 1 i N , I x i λ * for any x i . The equality holds when r ^ i = p e , i 0 = 0 or s i = 0 . At the same time, I x i is non-decreasing in both s i and r ^ i .
Proof. 
We notice that I x i can be expressed as a function of V i ( x i ) and λ * . Meanwhile, M 1 i ( λ * , 1 ) coincides with the decoupled model studied in Section 4.2. Then, we can verify the properties of I x i using the results in Section 4.2. The complete proof can be found in Appendix L.    ☐
Comparing with the heuristic detailed in Section 6.1, we can define the Indexed priority policy.
Definition 5 (Indexed priority policy).
At any state x = ( x 1 , x 2 , , x N ) , the base station will transmit the updates from M users with the largest I x i . The ties are broken arbitrarily.
Remark 7.
Indexed priority policy belongs to the class of priority policies introduced in [30]. These priority policies are asymptotically optimal when certain conditions are satisfied.
Remark 8.
Indexed priority policy possesses the structural properties detailed in Corollary 1.
  • The first two properties can be verified by noting that I x i λ * and the equality holds when r ^ i = p e , i 0 = 0 or s i = 0 . At the same time, I x i is non-decreasing in r ^ i .
  • The third and fourth properties can be verified by noting that I x i is non-decreasing in s i .
  • For the last property, we first notice that I x j = I x k when users j and k are statistically identical and x j = x k . Then, the property can be verified by noting that I x i is non-decreasing in both s i and r ^ i .
We notice that θ i ’s and C ( x i ) ’s are canceled out by the definition of I x i . Therefore, I x i can be calculated using λ * and the value function of M 1 i ( λ * , 1 ) . In practice, we can use either λ * or λ + * to approximate λ * , and the value function can be approximated by the result of the RVI detailed in Section 5.1. Since the state space is infinite, we only calculate a finite number of V i ( x i ) , the number of which depends on the truncation parameter m of ASM. Meanwhile, the probabilities P x i , x i ( a i ) in I x i are modified according to (10).

7. Numerical Results

In this section, we provide numerical results to showcase the performance of the developed scheduling policies. To eliminate the effect of N, we plot the expected average AoII. In particular, we provide the expected average AoII achieved by the Indexed priority policy and Whittle’s index policy when M = 1 . The policies are calculated using the results detailed in Section 4, Section 5 and Section 6. When obtaining the Indexed priority policy, we set the tolerance in the Bisection search to ξ = 0.005 . Meanwhile, we choose the truncation parameter in ASM m = 800 and the convergence criteria in RVI ϵ = 0.01 . We notice that the calculation of Whittle’s index involves an infinite sum. In practice, we approximate the result by replacing + with a large enough number k m a x . Here, we choose k m a x = 800 . For both scheduling policies, the resulting expected average AoII is obtained via simulations. Each data point is the average of 15 runs with 15,000 time slots considered in each run.
We also compare the developed policies with the optimal policy for RP, which can be calculated by following the discussion in Section 5.2. We adopt the same choices of parameters as we used to obtain the developed policies. The corresponding performance is calculated using Proposition 2. Like before, the infinite sum is approximated by replacing + with k m a x = 800 . We also provide the expected average AoII achieved by the Greedy policy to show the performance advantages of the developed policies. When the Greedy policy is adopted, the base station always chooses the user with the largest AoII. The resulting expected average AoII is obtained via the same simulations as applied to the developed policies.
Figure 3 and Figure 4 illustrate the performance when the source processes have different dynamics and when each user’s communication goal is different, respectively. Figure 3a provides the performance when p i = 0.05 + 0.4 ( i 1 ) N 1 for 1 i N . For other parameters, the users make the same choices. More precisely, f i ( s ) = s , γ i = 0.6 , and  p e , i 0 = p e , i 1 = 0.1 for 1 i N . Figure 4a provides the performance when f i ( s ) = s 0.5 + i 1 N 1 for 1 i N . Same as before, the users make the same choices for other parameters. More precisely, p i = 0.3 , γ i = 0.6 , and  p e , i 0 = p e , i 1 = 0.1 for 1 i N . In Figure 3b and Figure 4b, we force p e , i 0 = 0 for all users to ensure the existence of Whittle’s index. Other choices remain the same as in Figure 3a and Figure 4a. According to Corollary 1, the optimal policy will never choose the user with r ^ = p e 0 = 0 unless it is to break the tie. Therefore, in Figure 3b and Figure 4b, we also consider the Greedy+ policy where the base station always chooses the user with the largest AoII among the users with r ^ = 1 . The resulting expected average AoII is obtained via the same simulations as applied to the Greedy policy.
Figure 5 shows the performance in systems where the parameters for each user are generated uniformly and randomly within their ranges. In Figure 5a, we consider N = 5 , γ [ 0 , 1 ] , p [ 0.05 , 0.45 ] , p e r ^ [ 0 , 0.45 ] , and  f ( s ) = s τ , where τ [ 0.5 , 1.5 ] . There are a total of 300 different choices and the results are sorted by the performance of RP-optimal policy in ascending order. Figure 5b adopts the same system settings except that we impose p e , i 0 = 0 for 1 i N to ensure the feasibility of Whittle’s index policy. Meanwhile, we ignore the Greedy policy since the Greedy+ policy achieves a better performance, as indicated by Figure 3b and Figure 4b.
We can make the following observations from the figures.
  • The Greedy+ policy yields a smaller expected average AoII than that achieved by the Greedy policy. Recall that we obtained the Greedy+ policy by applying the structural properties detailed in Corollary 1. Therefore, simple applications of the structural properties of the optimal policy can improve the performance of scheduling policies.
  • The Indexed priority policy has comparable performance to Whittle’s index policy in all the system settings considered. The two policies have their own advantages. The Indexed priority policy has a broader scope of application, while Whittle’s index policy has a lower computational complexity.
  • The performance of the Indexed priority policy and Whittle’s index policy is better than that of the Greedy/Greedy+ policies and is not far from the performance of the RP-optimal policy. Recall that the performance of the RP-optimal policy forms a universal lower bound on the performance of all admissible policies for PP. Hence, we can conclude that both the Indexed priority policy and Whittle’s index policy achieve good performances.

8. Conclusions

In this paper, we studied the problem of minimizing the Age of Incorrect Information in a slotted-time system where a base station needs to schedule M users among N available users. Meanwhile, the base station has access to imperfect channel state information in each time slot. The problem is a restless multi-armed bandit problem which is SPACE-hard. However, by casting the problem into a Markov decision process, we obtain the structural properties of the optimal policy. Then, we introduce a relaxed version of the original problem and investigate the decoupled model. Under a simple condition, we establish the indexability of the decoupled problem and obtain the expression of Whittle’s index. On this basis, we developed Whittle’s index policy. To get rid of the requirement for indexability, we developed the Indexed priority policy based on the optimal policy for the relaxed problem. The characteristics of the relaxed problem are explored to make the calculation of its optimal policy more efficient. Finally, through numerical results, we show that simple applications of the structural properties can improve the performance of scheduling policies. Moreover, Whittle’s index policy and the Indexed priority policy achieve good and comparable performances.

Author Contributions

Formal analysis, Y.C.; Investigation, Y.C.; Methodology, Y.C.; Supervision, A.E.; Validation, Y.C.; Writing—original draft, Y.C.; Writing—review & editing, Y.C. and A.E. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Lemma 1

We consider two states, x 1 and x 2 , that differ only in the value of s j . Without the loss of generality, we assume s 1 , j < s 2 , j . Then, it is sufficient to show that, for any 1 j N , V ( x 1 ) V ( x 2 ) . Leveraging the iterative nature of VIA, we use mathematical induction to prove the monotonicity. First of all, the base case (i.e., ν = 0 ) is true by initialization. We assume the lemma holds at iteration ν . Then, we want to examine whether it holds at iteration ν + 1 . The update step reported in problem (5) can be rewritten as follows.
V ν + 1 ( x ) = min a A N ( 1 ) V ν + 1 a ( x ) ,
where
V ν + 1 a ( x ) = C ( x ) θ + x { x j } i j P x i , x i ( a i ) r ^ j P ( r ^ j ) U ν j ( x , x ) ,
U ν j ( x , x ) = s j P s j , s j ( a j , r ^ j ) V ν ( x ) .
To prove the desired results, we distinguish between the following cases.
  • We first consider the case of s 1 , j = 0 < s 2 , j and r ^ 1 , j = r ^ 2 , j = 0 . When a j = 1 and for any x { s j } , we have
    U ν j ( x 1 , x ) = p j V ν ( x ; s j = 1 ) + ( 1 p j ) V ν ( x ; s j = 0 ) ,
    U ν j ( x 2 , x ) = β j V ν ( x ; s j = s 2 , j + 1 ) + ( 1 β j ) V ν ( x ; s j = 0 ) ,
    where V ν ( x ; s j = 0 ) is the estimated value function of the state x with s j = 0 at iteration ν (at the risk of abusing the notation, we use V ( x ; s j = s 1 ) and V ( x ; s j = s 2 ) to represent the value functions of two states that differ only in the value of s j ). Then, we get
    U ν j ( x 1 , x ) U ν j ( x 2 , x ) ( p j β j ) ( V ν ( x ; s j = 1 ) V ν ( x ; s j = 0 ) ) 0 .
    The inequalities hold since β j > p j and Lemma 1 are true at iteration ν by assumption. Therefore, we have U ν j ( x 1 , x ) U ν j ( x 2 , x ) when a j = 1 for any x { s j } .
    For the case of a i = 1 where i j , we notice that a j = 0 . Then, for any x { s j } , we obtain
    U ν j ( x 1 , x ) = p j V ν ( x ; s j = 1 ) + ( 1 p j ) V ν ( x ; s j = 0 ) ,
    U ν j ( x 2 , x ) = ( 1 p j ) V ν ( x ; s j = s 2 , j + 1 ) + p j V ν ( x ; s j = 0 ) .
    Therefore, when a i = 1 , we have
    U ν j ( x 1 , x ) U ν j ( x 2 , x ) ( 2 p j 1 ) ( V ν ( x ; s j = 1 ) V ν ( x ; s j = 0 ) ) 0 .
    The inequalities hold since 2 p j 1 < 0 and Lemma 1 is true at iteration ν by assumption. Combining with the case of a j = 1 , U ν j ( x 1 , x ) U ν j ( x 2 , x ) holds for any x { s j } under any feasible action. Since x 1 and x 2 differ only in the value of s j and C ( x ) is non-decreasing in s i for 1 i N , we can see that V ν + 1 a ( x 1 ) V ν + 1 a ( x 2 ) for any feasible a . Then, by (A1), we can conclude that the lemma holds at iteration ν + 1 when s 1 , j = 0 < s 2 , j and r ^ 1 , j = r ^ 2 , j = 0 .
  • When s 1 , j = 0 < s 2 , j and r ^ 1 , j = r ^ 2 , j = 1 , by replacing the β j ’s in the above case with α j ’s, we can achieve the same result.
  • When 0 < s 1 , j < s 2 , j and r ^ 1 , j = r ^ 2 , j , we notice that
    P s 1 , j , s 1 , j + 1 ( a j , r ^ 1 , j ) = P s 2 , j , s 2 , j + 1 ( a j , r ^ 2 , j ) , P s 1 , j , 0 ( a j , r ^ 1 , j ) = P s 2 , j , 0 ( a j , r ^ 2 , j ) .
    Then, leveraging the monotonicity of V ν ( x ) and C ( x ) , we can conclude with the same result.
Combining the three cases, we prove that the lemma also holds at iteration ν + 1 of VIA. Therefore, the lemma holds at any iteration ν by mathematical induction. Since the results hold for any 1 j N and VIA is guaranteed to converge to the value function when ν + , we can conclude our proof.

Appendix B. Proof of Lemma 2

We inherit the notations in the proof of Lemma 1. We still use mathematical induction to obtain the desired results. The base case ν = 0 is true by initialization. We assume the lemma holds at iterative ν and examine whether it still holds at iteration ν + 1 . In the case of M = 1 , we rewrite (5) as
V ν + 1 ( x ) = min 1 j N V ν + 1 j ( x ) ,
where
V ν + 1 j ( x ) = C ( x ) θ + x i j P x i , x i i ( 0 ) P x j , x j j ( 1 ) V ν ( x ) ,
and P x , x i ( a i ) is the probability that action a i will lead to state x when user i is at state x. To get the desired results, we distinguish between the following cases
  • We first show that V ν + 1 j ( x ) = V ν + 1 k ( P ( x ) ) . According to (A3), we have
    V ν + 1 j ( x ) = C ( x ) θ + x i j , k P x i , x i i ( 0 ) P x k , x k k ( 0 ) P x j , x j j ( 1 ) V ν ( x ) .
    V ν + 1 k ( P ( x ) ) = C ( P ( x ) ) θ + P ( x ) i j , k P P ( x ) i , P ( x ) i i ( 0 ) P P ( x ) k , P ( x ) k k ( 1 ) P P ( x ) j , P ( x ) j j ( 0 ) V ν ( P ( x ) ) .
    It is obvious that for any P ( x ) , there always exists P ( x ) = P ( x ) . Then, we obtain
    V ν + 1 k ( P ( x ) ) = C ( P ( x ) ) θ + P ( x ) i j , k P x i , x i i ( 0 ) P x j , P ( x ) k k ( 1 ) P x k , P ( x ) j j ( 0 ) V ν ( P ( x ) ) = C ( P ( x ) ) θ + x i j , k P x i , x i i ( 0 ) P x j , x j k ( 1 ) P x k , x k j ( 0 ) V ν ( x ) = C ( P ( x ) ) θ + x i j , k P x i , x i i ( 0 ) P x j , x j k ( 1 ) P x k , x k j ( 0 ) V ν ( x ) .
    The second equality follows from the definition of P ( · ) , the property of summation, and the assumption at iteration ν . The last equality follows from the variable renaming. Then, by the definition of statistically identical, we have P x j , x j k ( 1 ) = P x j , x j j ( 1 ) , P x k , x k j ( 0 ) = P x k , x k k ( 0 ) , and  C ( x ) = C ( P ( x ) ) . Therefore, we can conclude that V ν + 1 j ( x ) = V ν + 1 k ( P ( x ) ) .
  • Along the same lines, we can easily show that V ν + 1 k ( x ) = V ν + 1 j ( P ( x ) ) and V ν + 1 i ( x ) = V ν + 1 i ( P ( x ) ) for i j , k .
Combining the above cases with (A2), we prove that V ν + 1 ( x ) = V ν + 1 ( P ( x ) ) . Then, by induction, we have V ν ( x ) = V ν ( P ( x ) ) at any iteration ν . Since VIA is guaranteed to converge to the value function when ν + , we can conclude our proof.

Appendix C. Proof of Theorem 1

For arbitrary j and k
δ j , k ( x ) = x { x j , x k } i j , k P x i , x i ( 0 ) r ^ j , r ^ k P ( r ^ j ) P ( r ^ k ) R j , k ( x , x ) ,
where
R j , k ( x , x ) = s j , s k P s k , s k ( 0 , r ^ k ) P s j , s j ( 1 , r ^ j ) P s k , s k ( 1 , r ^ k ) P s j , s j ( 0 , r ^ j ) V ( x ) .
With this in mind, we will prove the properties one by one.
Property 1—δj,k (x) ≤ 0 if r ^ k   =   p e , k 0   =   0 . The equality holds when s j   =   0 or r ^ j   =   p e , j 0   =   0 .
When r ^ k = p e , k 0 = 0 , transmitting the update from user k will necessarily fail. Therefore, P s k , s k ( 0 , 0 ) = P s k , s k ( 1 , 0 ) for any s k and s k . Then, we have
R j , k ( x , x ) = s k P s k , s k ( 0 , 0 ) s j P s j , s j ( 1 , r ^ j ) P s j , s j ( 0 , r ^ j ) V ( x ) .
To identify the sign of R j , k ( x , x ) , we distinguish between the following cases
  • When s j = 0 , we can easily show that R j , k ( x , x ) = 0 for any x { s j , s k } by noticing that the two possible actions with respect to user j (i.e., a j = 1 and a j = 0 ) are equivalent when s j = 0 . Since δ j , k ( x ) is a linear combination of R j , k ( x , x ) ’s with non-negative coefficients, we can conclude that δ j , k ( x ) = 0 in this case.
  • When s j > 0 and r ^ j = 1 , for any x { s j , s k } , we have
    R j , k ( x , x ) = s k P s k , s k ( 0 , 0 ) ( α j + p j 1 ) V ( x ; s j = s j + 1 ) V ( x ; s j = 0 ) 0 .
    The inequality holds because of Lemma 1 and the fact that α j + p j < 1 . We recall that δ j , k ( x ) is a linear combination of R j , k ( x , x ) ’s with non-negative coefficients. Then, we can conclude that δ j , k ( x ) 0 in this case.
  • When s j > 0 and r ^ j = 0 , by replacing the α j in (A6) with β j , we can get the same result. In this case, the equality holds when β j + p j = 1 , or, equivalently, p e , j 0 = 0 .
Combining the cases, we prove the first property.
Property 2—δj,k (x) is non-increasing in r ^ j and is non-decreasing in r ^ k when s j , s k   >   0 . At the same time, δ j , k ( x ) is independent of r ^ j for any ij,k.
We first prove the monotonicity of δ j , k ( x ) with respect to r ^ j . To this end, we define x 1 and x 2 as two states that differ only in the value of r ^ j . Without a loss of generality, we assume r ^ 1 , j = 1 and r ^ 2 , j = 0 . Then, we investigate the sign of δ j , k ( x 1 ) δ j , k ( x 2 ) . We define x i x 1 , i = x 2 , i for i j . Then, according to (A4), δ j , k ( x 1 ) δ j , k ( x 2 ) can be written as
δ j , k ( x 1 ) δ j , k ( x 2 ) = x { x j , x k } i j , k P x i , x i ( 0 ) r ^ j , r ^ k P ( r ^ j ) P ( r ^ k ) R j , k ( x 1 , x ) R j , k ( x 2 , x ) .
Since x 1 , k = x 2 , k , we have P s 1 , k , s k ( a , r ^ 1 , k ) = P s 2 , k , s k ( a , r ^ 2 , k ) for any s k . We recall that the transition probability is independent of r ^ when a = 0 . Combining with the fact that s 1 , j = s 2 , j , we also have P s 1 , j , s j ( 0 , r ^ 1 , j ) = P s 2 , j , s j ( 0 , r ^ 2 , j ) for any s j . Combining together, we obtain
P s 1 , k , s k ( 1 , r ^ 1 , k ) P s 1 , j , s j ( 0 , r ^ 1 , j ) = P s 2 , k , s k ( 1 , r ^ 2 , k ) P s 2 , j , s j ( 0 , r ^ 2 , j ) ,
P s 1 , k , s k ( 0 , r ^ 1 , k ) = P s 2 , k , s k ( 0 , r ^ 2 , k ) .
Leveraging the above two problems, we have
R j , k ( x 1 , x ) R j , k ( x 2 , x ) = s j , s k P s k , s k ( 0 , r ^ k ) P s 1 , j , s j ( 1 , r ^ 1 , j ) P s 2 , j , s j ( 1 , r ^ 2 , j ) V ( x ) .
Consequently, we obtain
δ j , k ( x 1 ) δ j , k ( x 2 ) = x { x j } i j P x i , x i ( 0 ) r ^ j P ( r ^ j ) s j P s 1 , j , s j ( 1 , 1 ) P s 2 , j , s j ( 1 , 0 ) V ( x ) .
In the following, we characterize the sign of
R 1 s j P s 1 , j , s j ( 1 , 1 ) P s 2 , j , s j ( 1 , 0 ) V ( x ) .
As s 1 , j = s 2 , j > 0 , for any x { s j } , we have
R 1 = ( 1 α j ) ( 1 β j ) V ( x ; s j = 0 ) + ( α j β j ) V ( x ; s j = s 1 , j + 1 ) 0 .
The inequality follows from Lemma 1 and the fact that β j > α j . Since δ j , k ( x 1 ) δ j , k ( x 2 ) is a linear combination of R 1 ’s with non-negative coefficients, we can conclude that δ j , k ( x 1 ) δ j , k ( x 2 ) . Since r ^ 1 , j > r ^ 2 , j , we can see that δ j , k ( x ) is non-increasing in r ^ j .
In a very similar way, we can show that δ j , k ( x ) is non-decreasing in r ^ k . We recall that r ^ i will not affect the system dynamic if a i = 0 . Consequently, we can conclude that δ j , k ( x ) is independent of r ^ i for any i j , k .
Combining together, we prove the second property.
Property 3— δ j , k ( x ) 0 if s k   = 0 . The equality holds when s j   = 0 or r ^ j   =   p e , j 0   =   0 .
Since the probabilities are non-negative, it is sufficient to show that R j , k ( x , x ) satisfies Property 3 for any x { s j , s k } . More precisely, it is sufficient to show that R j , k ( x , x ) 0 for any x { s j , s k } when s k = 0 and the equality holds when s j = 0 or r ^ j = p e , j 0 = 0 . We recall that P s k , s k ( 1 , r ^ k ) = P s k , s k ( 0 , r ^ k ) for any s k when s k = 0 . Hence, for any x { s j , s k } , we have
R j , k ( x , x ) = s k P s k , s k ( 0 , r ^ k ) s j P s j , s j ( 1 , r ^ j ) P s j , s j ( 0 , r ^ j ) V ( x ) .
Then, we investigate the following quantity for any x { s j }
R 2 s j P s j , s j ( 1 , r ^ j ) P x j , x j ( 0 , r ^ j ) V ( x ) .
To this end, we distinguish between the following cases
  • When s j = 0 , we have P s j , s j ( 1 , r ^ j ) = P s j , s j ( 0 , r ^ j ) for any s j . Thus, we conclude that R 2 = 0 for any x { s j } . Consequently, R j , k ( x , x ) = 0 for any x { s j , s k } .
  • When s j > 0 and r ^ j = 1 , for any x { s j } , we have
    R 2 = ( α j 1 + p j ) V ( x ; s j = s j + 1 ) + ( 1 α j p j ) V ( x ; s j = 0 ) 0
    The inequality follows from Lemma 1 and the fact that α j + p j < 1 . Thus, R j , k ( x , x ) 0 for any x { s j , s k } .
  • When s j > 0 and r ^ j = 0 , by replacing the α j in (A7) with β j , we can get the same result. In this case, the equality holds when β j + p j = 1 , or, equivalently, p e , j 0 = 0 .
Combined together, we can conclude that Property 3 is true.
Property 4— δ j , k ( x ) is non-increasing in s j if Γ j r ^ j     Γ k r ^ k and is non-decreasing in s k if Γ j r ^ j     Γ k r ^ k when s j , s k   >   0 . We define Γ i 1     α i 1   p i and Γ i 0     β i 1     p i for 1     i     N .
Such as we did in the proof of Property 3, it is sufficient to show that R j , k ( x , x ) satisfies Property 4 for any x { s j , s k } . We recall that R j , k ( x , x ) depends on the values of r ^ j and r ^ k . Therefore, we distinguish between the following cases
  • In the case of r ^ j = r ^ k = 1 and s j , s k > 0 , for any x { s j , s k } , (A5) can be written as
    R j , k ( x , x ) = s j , s k P s k , s k ( 0 , 1 ) P s j , s j ( 1 , 1 ) P s k , s k ( 1 , 1 ) P s j , s j ( 0 , 1 ) V ( x ) = p k α j ( 1 p j ) ( 1 α k ) V ( x ; s j = s j + 1 ; s k = 0 ) + ( 1 p k ) ( 1 α j ) p j α k V ( x ; s j = 0 ; s k = s k + 1 ) + ( 1 p k ) α j ( 1 p j ) α k V ( x ; s j = s j + 1 ; s k = s k + 1 ) + p k ( 1 α j ) p j ( 1 α k ) V ( x ; s j = 0 ; s k = 0 ) .
    As we can verify
    p k α j ( 1 p j ) ( 1 α k ) < 1 2 ( p k + p j 1 ) < 0 ,
    ( 1 p k ) ( 1 α j ) p j α k > 1 2 ( 1 p k p j ) > 0 .
    We define Γ i 1 α i 1 p i and Γ i 0 β i 1 p i for 1 i N . Then, we have
    Γ j 1 Γ k 1 ( 1 p k ) α j ( 1 p j ) α k 0 .
    Combining with Lemma 1, we can conclude that, for any x { s j , s k } , R j , k ( x , x ) is non-increasing in s j if Γ j 1 Γ k 1 and is non-decreasing in s k if Γ j 1 Γ k 1 .
  • In the case of r ^ j = r ^ k = 0 and s j , s k > 0 , by replacing the α ’s in the above case with β ’s, we can conclude with the same result.
  • In the case of r ^ j = 1 , r ^ k = 0 , and  s j , s k > 0 , for any x { s j , s k } , (A5) can be written as
    R j , k ( x , x ) = s j , s k P s k , s k ( 0 , 0 ) P s j , s j ( 1 , 1 ) P s k , s k ( 1 , 0 ) P s j , s j ( 0 , 1 ) V ( x ) = p k α j ( 1 p j ) ( 1 β k ) V ( x ; s j = s j + 1 ; s k = 0 ) + ( 1 p k ) ( 1 α j ) p j β k V ( x ; s j = 0 ; s k = s k + 1 ) + ( 1 p k ) α j ( 1 p j ) β k V ( x ; s j = s j + 1 ; s k = s k + 1 ) + p k ( 1 α j ) p j ( 1 β k ) V ( x ; s j = 0 ; s k = 0 ) .
    As we can verify
    p k α j ( 1 p j ) ( 1 β k ) < p k p j 1 2 < 0 ,
    ( 1 p k ) ( 1 α j ) p j β k > ( 1 p k ) 1 2 p j > 0 .
    At the same time
    Γ j 1 Γ k 0 ( 1 p k ) α j ( 1 p j ) β k 0 .
    Combined with Lemma 1, we can conclude that, for any x { s j , s k } , R j , k ( x , x ) is non-increasing in s j if Γ j 1 Γ k 0 and is non-decreasing in s k if Γ j 1 Γ k 0 .
  • In the case of r ^ j = 0 , r ^ k = 1 , and  s j , s k > 0 , by swapping the α ’s and β ’s in the above case, we can conclude with the same result.
Combined together, we conclude that R j , k ( x , x ) satisfies Property 3 for any x { s j , s k } . Consequently, δ j , k ( x ) is non-increasing in s j if Γ j r ^ j Γ k r ^ k and is non-decreasing in s k if Γ j r ^ j Γ k r ^ k when s j , s k > 0 .
Property 5— δ j , k ( x )     0 if s j     s k ,   r ^ j     r ^ k and users j and k are statistically identical.
According to Property 3, it is sufficient to consider the case where s j , s k > 0 . We notice that the sign of δ j , k ( x ) can be captured by the sign of the quantity Q j , k ( x , x ) r ^ j , r ^ k P ( r ^ j ) P ( r ^ k ) R j , k ( x , x ) . Thus, we divide our discussion into the following cases.
  • We first consider the case of s j s k > 0 and r ^ j = r ^ k = 0 . Leveraging the definition of statistically identical, for any x { x j , x k } , we have
    Q j , k ( x , x ) = r ^ j , r ^ k P ( r ^ j ) P ( r ^ k ) κ 1 ( V ( x ; x j = ( 0 , r ^ j ) ; x k = ( s k + 1 , r ^ k ) ) V ( x ; x j = ( s j + 1 , r ^ j ) ; x k = ( 0 , r ^ k ) ) ) ,
    where κ 1 = 1 p j β j 0 . Then, by substituting the values of P ( r ^ ) and using Lemma 2, we obtain   
    Q j , k ( x , x ) = γ j γ k κ 1 V ( x ; x j = ( s k + 1 , 1 ) ; x k = ( 0 , 1 ) ) γ j γ k κ 1 V ( x ; x j = ( s j + 1 , 1 ) ; x k = ( 0 , 1 ) ) + ( 1 γ j ) ( 1 γ k ) κ 1 V ( x ; x j = ( s k + 1 , 0 ) ; x k = ( 0 , 0 ) ) ( 1 γ j ) ( 1 γ k ) κ 1 V ( x ; x j = ( s j + 1 , 0 ) ; x k = ( 0 , 0 ) ) + γ k ( 1 γ j ) κ 1 V ( x ; x j = ( s k + 1 , 1 ) ; x k = ( 0 , 0 ) ) γ k ( 1 γ j ) κ 1 V ( x ; x j = ( s j + 1 , 0 ) ; x k = ( 0 , 1 ) ) + γ j ( 1 γ k ) κ 1 V ( x ; x j = ( s k + 1 , 0 ) ; x k = ( 0 , 1 ) ) γ j ( 1 γ k ) κ 1 V ( x ; x j = ( s j + 1 , 1 ) ; x k = ( 0 , 0 ) ) .
    Since users j and k are statistically identical, we have γ j = γ k . Then, by Lemma 1, we have Q j , k ( x , x ) 0 for any x { x j , x k } . Since δ j , k ( x ) is a linear combination of Q j , k ( x , x ) ’s with non-negative coefficients, we can conclude that δ j , k ( x ) 0 .
  • For the case of s j s k > 0 and r ^ j = r ^ k = 1 , by replacing β j in κ 1 with α j , we can conclude with the same result.
  • Then, we consider the case of s j s k > 0 , r ^ j = 1 , and  r ^ k = 0 . We first notice that, for any x { s j , s k }
    R j , k ( x , x ) = p k α j ( 1 p j ) ( 1 β k ) V ( x ; s j = s j + 1 ; s k = 0 ) + ( 1 p k ) ( 1 α j ) p j β k V ( x ; s j = 0 ; s k = s k + 1 ) + ( 1 p k ) α j ( 1 p j ) β k V ( x ; s j = s j + 1 ; s k = s k + 1 ) + p k ( 1 α j ) p j ( 1 β k ) V ( x ; s j = 0 ; s k = 0 ) .
    As users j and k are statistically identical, we have p j = p k and α j < β k . Leveraging Lemma 1, we have
    R j , k ( x , x ) ( α j + p j 1 ) ( V ( x ; s j = s j + 1 ; s k = 0 ) V ( x ; s j = 0 ; s k = s k + 1 ) ) .
    Then, for any x { x j , x k }
    Q j , k ( x , x ) r ^ j , r ^ k P ( r ^ j ) P ( r ^ k ) κ 2 ( V ( x ; x j = ( 0 , r ^ j ) ; x k = ( s k + 1 , r ^ k ) ) V ( x ; x j = ( s j + 1 , r ^ j ) ; x k = ( 0 , r ^ k ) ) ) ,
    where κ 2 = 1 p j α j > 0 . Such as we did in the previous cases, we can leverage Lemmas 1 and 2 to conclude that Q j , k ( x , x ) 0 for any x { x j , x k } . Consequently, δ j , k ( x ) 0 in this case. The details are omitted for the sake of space.
Combined together, we conclude the proof of Property 5.

Appendix D. Proof of Corollary 2

We follow the same steps as in the proof of Lemma 1. To prove the corollary, it is sufficient to show that V ( x 1 ) V ( x 2 ) when s 1 < s 2 and r ^ 1 = r ^ 2 . We use mathematical induction to prove the monotonicity. First of all, the base case (i.e., ν = 0 ) is true by initialization. We assume the lemma holds at iteration ν . Then, we want to examine whether it holds at iteration ν + 1 . For the system with a single user, the update step reported in problem (5) can be simplified and rewritten as follows
V ν + 1 ( x ) = min a { 0 , 1 } V ν + 1 a ( x ) ,
where
V ν + 1 a ( x ) = C ( x , a ) θ + r ^ P ( r ^ ) s P s , s ( a , r ^ ) V ν ( x ) ,
and θ is the optimal value for M 1 ( λ , 1 ) . To prove the desired results, we distinguish between the following cases
  • We first consider the case of s 1 = 0 < s 2 and r ^ 1 = r ^ 2 = 0 . When a = 1 , we have
    V ν + 1 1 ( x 1 ) = C ( x 1 , 1 ) θ + r ^ P ( r ^ ) ( p V ν ( 1 , r ^ ) + ( 1 p ) V ν ( 0 , r ^ ) ) ,
    V ν + 1 1 ( x 2 ) = C ( x 2 , 1 ) θ + r ^ P ( r ^ ) ( β V ν ( s 2 + 1 , r ^ ) + ( 1 β ) V ν ( 0 , r ^ ) ) .
    Subtracting the two expressions yields
    V ν + 1 1 ( x 1 )   V ν + 1 1 ( x 2 ) C ( x 1 , 1 ) C ( x 2 , 1 ) + r ^ P ( r ^ ) ( p β ) V ν ( 1 , r ^ ) V ν ( 0 , r ^ )     0 .
    The inequalities hold since β > p , C ( x , a ) is non-decreasing in s, and Corollary 2 is true at iteration ν by assumption.
    For the case of a = 0 , we obtain
    V ν + 1 0 ( x 1 ) = C ( x 1 , 0 ) θ + r ^ P ( r ^ ) ( p V ν ( 1 , r ^ ) + ( 1 p ) V ν ( 0 , r ^ ) ) ,
    V ν + 1 0 ( x 2 ) = C ( x 2 , 0 ) θ + r ^ P ( r ^ ) ( ( 1 p ) V ν ( s 2 + 1 , r ^ ) + p V ν ( 0 , r ^ ) ) .
    Therefore, when a = 0 , we have
    V ν + 1 0 ( x 1 ) V ν + 1 0 ( x 2 ) C ( x 1 , 0 ) C ( x 2 , 0 ) + r ^ P ( r ^ ) ( 2 p 1 ) V ν ( 1 , r ^ ) V ν ( 0 , r ^ )     0 .
    The inequalities hold since 2 p 1 < 0 , C ( x , a ) is non-decreasing in s, and Corollary 2 is true at iteration ν by assumption. Combined together, we can see that V ν + 1 a ( x 1 ) V ν + 1 a ( x 2 ) for any feasible a. Then, by problem (A8), we can conclude that the lemma holds at iteration ν + 1 when s 1 = 0 < s 2 and r ^ 1 = r ^ 2 = 0 .
  • When s 1 = 0 < s 2 and r ^ 1 = r ^ 2 = 1 , by replacing the β ’s in the above case with α ’s, we can achieve the same result.
  • When 0 < s 1 < s 2 and r ^ 1 = r ^ 2 , we notice that P s 1 , s 1 + 1 ( a , r ^ 1 ) = P s 2 , s 2 + 1 ( a , r ^ 2 ) and P s 1 , 0 ( a , r ^ 1 ) = P s 2 , 0 ( a , r ^ 2 ) . Then, leveraging the monotonicity of V ν ( x ) and C ( x , a ) , we can conclude with the same result.
Combining the three cases, we prove that the lemma holds at iteration ν + 1 of VIA. Therefore, the lemma holds at any iteration ν by mathematical induction. Since VIA is guaranteed to converge to the value function when ν + , we can conclude our proof.

Appendix E. Proof of Proposition 1

We define Δ V ( x ) V 1 ( x ) V 0 ( x ) where V a ( x ) is the value function resulting from taking action a at state x. Then, V a ( x ) can be calculated as follows
V a ( x ) = C ( x , a ) θ + x X P x , x ( a ) V ( x ) ,
where θ is the optimal value for M 1 ( λ , 1 ) . Hence, the optimal action at state x can be fully characterized by the sign of Δ V ( x ) . More precisely, the optimal action at state x is a = 1 if Δ V ( x ) < 0 , and  a = 0 is optimal otherwise. To determine the sign of Δ V ( x ) for each state, we distinguish between the following cases
  • We first consider the state x = ( 0 , r ^ ) . Applying the results in Section 2.3 to problem (A9), we obtain
    V 0 ( 0 , r ^ ) = θ + ( 1 γ ) ( 1 p ) V ( 0 , 0 ) + ( 1 γ ) p V ( 1 , 0 ) + γ ( 1 p ) V ( 0 , 1 ) + γ p V ( 1 , 1 ) ,
    V 1 ( 0 , r ^ ) = λ + V 0 ( 0 , r ^ ) .
    Therefore, Δ V ( 0 , r ^ ) = λ 0 . Thus, the optimal action at state ( 0 , r ^ ) is a = 0 .
  • Then, we consider the state x = ( s , 0 ) where s > 0 . Applying the results in Section 2.3 to Equation (A9), we obtain
    V 0 ( s , 0 ) = f ( s ) θ + ( 1 γ ) p V ( 0 , 0 ) + ( 1 γ ) ( 1 p ) V ( s + 1 , 0 ) + γ p V ( 0 , 1 ) + γ ( 1 p ) V ( s + 1 , 1 ) ,
    V 1 ( s , 0 ) = f ( s ) + λ θ + ( 1 γ ) ( 1 β ) V ( 0 , 0 ) + ( 1 γ ) β V ( s + 1 , 0 ) + γ ( 1 β ) V ( 0 , 1 ) + γ β V ( s + 1 , 1 ) .
    Then,
    Δ V ( s , 0 ) = λ + p e 0 ( 1 2 p ) ω ,
    where ω = ( 1 γ ) [ V ( 0 , 0 ) V ( s + 1 , 0 ) ] + γ [ V ( 0 , 1 ) V ( s + 1 , 1 ) ] 0 .
  • Finally, we consider the state x = ( s , 1 ) where s > 0 . Following the same trajectory, we have
    Δ V ( s , 1 ) = λ + ( 1 p e 1 ) ( 1 2 p ) ω .
According to Corollary 2 and the fact that p < 0.5 , we can see that Δ V ( s , 0 ) and Δ V ( s , 1 ) are both a constant λ plus a term that is non-increasing in s. As the time penalty function is unbounded, the value function must also be unbounded. Then, combining the three cases, we can conclude the following. For fixed r ^ , there always exists a threshold n r ^ > 0 such that the optimal action at state ( s , r ^ ) where s n r ^ is a = 1 , otherwise a = 0 is optimal. Since r ^ { 0 , 1 } , the optimal policy can be fully captured by the pair ( n 0 , n 1 ) .
In the following, we determine the relationship between n 0 and n 1 . We have
Δ V ( s , 1 ) Δ V ( s , 0 ) = ( 1 p e 1 p e 0 ) ( 1 2 p ) ω 0 .
At the same time, for the threshold n 0 , we know Δ V ( n 0 , 0 ) < 0 . Then, we have Δ V ( n 0 , 1 ) Δ V ( n 0 , 0 ) < 0 . Combined with the fact that Δ V ( s , r ^ ) is non-increasing in s, we can conclude that the ordering n 0 n 1 is true.

Appendix F. Proof of Proposition 2

We notice that the dynamic of AoII under threshold policy can be fully captured by a Discrete-Time Markov Chain (DTMC). Then, the expected AoII Δ ¯ n and the expected transmission rate ρ ¯ n under threshold policy n = ( n 0 , n 1 ) can be obtained from the stationary distribution of the induced DTMC. Let the states of the induced DTMC be the values of s. We recall that r ^ is an independent Bernoulli random variable with parameter γ . Combined with the results in Section 2.3, we can easily obtain the state transition probabilities of the induced DTMC, which are shown in Figure A1.
Figure A1. DTMC induced by the threshold policy n = ( n 0 , n 1 ) . In the figure, c 1 = ( 1 γ ) ( 1 p ) + γ α and c 2 = ( 1 γ ) β + γ α .
Figure A1. DTMC induced by the threshold policy n = ( n 0 , n 1 ) . In the figure, c 1 = ( 1 γ ) ( 1 p ) + γ α and c 2 = ( 1 γ ) β + γ α .
Entropy 23 01572 g0a1
The balance equations of the induced DTMC are the following
( 1 p ) π 0 + p k = 1 n 1 1 π k + ( 1 c 1 ) k = n 1 n 0 1 π k + ( 1 c 2 ) k = n 0 + π k = π 0 .
p π 0 = π 1 .
( 1 p ) π k 1 = π k f o r 2 k n 1 .
c 1 π k 1 = π k f o r n 1 + 1 k n 0 .
c 2 π k 1 = π k f o r n 0 + 1 k .
k = 0 + π k = 1 .
Then, we can easily solve the above system of linear equations. After some algebraic manipulation, we obtain the following
π 0 = 1 2 + p ( 1 p ) n 1 1 1 1 c 1 1 p + c 1 n 0 n 1 1 1 c 2 1 1 c 1 .
π k = p ( 1 p ) k 1 π 0 f o r 1 k n 1 .
π k = p ( 1 p ) n 1 1 c 1 k n 1 π 0 f o r n 1 + 1 k n 0 .
π k = p ( 1 p ) n 1 1 c 1 n 0 n 1 c 2 k n 0 π 0 f o r n 0 + 1 k .
Equipped with the above results, we proceed with calculating Δ ¯ n and ρ ¯ n . According to problem (6a), the expected AoII is:
Δ ¯ n = k = 0 + f ( k ) π k .
Substituting the expressions of π k ’s, we can get the expression of Δ ¯ n . Proposition 1 tells us the following.
  • For state ( s , r ^ ) where s < n 1 , it is optimal to stay idle (i.e., a = 0 ).
  • For state ( s , r ^ ) where n 1 s < n 0 , it is optimal to make a transmission attempt only when r ^ = 1 . We recall that r ^ is an independent Bernoulli random variable with parameter γ . Therefore, the expected proportion of time that the system is at state ( s , 1 ) is γ π s .
  • For state ( s , r ^ ) where s n 0 , it is optimal to make transmission attempt regardless of  r ^ .
Combined with problem (6b), we have
ρ ¯ n = γ k = n 1 n 0 1 π k + k = n 0 + π k .
Substituting the expressions of π k ’s, we can obtain the closed-form expression of ρ ¯ n .

Appendix G. Proof of Proposition 4

We first tackle the Whittle’s indexes at state ( 0 , r ^ ) and ( s , 0 ) where s > 0 . To this end, we distinguish between the following cases
  • We first consider the state x = ( 0 , r ^ ) . By definition, Whittle’s index is the infimum λ such that V 0 ( x ) = V 1 ( x ) . According to (A10), we can conclude that W x = 0 when x = ( 0 , r ^ ) .
  • Then, we consider the state x = ( s , 0 ) where s > 0 . We recall that p e 0 = 0 . Then, we can conclude, from (A11), that W x = 0 for all x = ( s , 0 ) where s > 0 .
Now, we tackle the Whittle’s index at state x = ( s , 1 ) where s > 0 . For convenience, we denote by W n the Whittle’s index at state x = ( n , 1 ) . According to the monotonicity of Δ V ( x ) shown in the proof of Proposition 1, we can conclude that threshold policy n = ( + , n + 1 ) is optimal when V 0 ( n , 1 ) = V 1 ( n , 1 ) . Then, we can prove the following
Lemma A1.
When (9) is satisfied and V 0 ( n , 1 ) = V 1 ( n , 1 ) , V ( s , 1 ) = V ( s , 0 ) V ( s ) for 0 s n .
Proof. 
Since the value function satisfies the Bellman equation, it is sufficient to show that V ( s , 1 ) and V ( s , 0 ) satisfy the same Bellman equation. We recall that the Bellman equation for V ( x ) is given by
V ( x ) = min a { 0 , 1 } V a ( x ) ,
where
V a ( x ) = C ( x , a ) θ + x P x , x ( a ) V ( x ) ,
and θ is the optimal value of the decoupled problem. We recall, from Corollary 3, that the optimal action at state ( s , 0 ) is staying idle (i.e., a = 0 ) for any s. We also know that threshold policy n = ( + , n + 1 ) is optimal when V 0 ( n , 1 ) = V 1 ( n , 1 ) . Therefore, the optimal actions at states ( s , 0 ) and ( s , 1 ) where s n are the same (i.e., a = 0 ). Equivalently, we have
V ( s , r ^ ) = V 0 ( s , r ^ ) , f o r s n .
According to the system dynamic reported in Section 2.3, we know that the state transition probabilities are independent of r ^ when a = 0 . Meanwhile, r ^ does not affect the instant cost. Let x 1 = ( s , 1 ) and x 2 = ( s , 0 ) . Then, for any x , we have
P x 1 , x ( 0 ) = P x 2 , x ( 0 ) .
C ( x 1 , 0 ) = C ( x 2 , 0 ) .
Hence, according to (A12), we can see that V 0 ( s , 0 ) = V 0 ( s , 1 ) for any s n . Combined with problem (A13), we can conclude that V ( s , 0 ) = V ( s , 1 ) for any 0 s n .    ☐
By definition, Whittle’s index W n is the infimum λ such that V 0 ( n , 1 ) = V 1 ( n , 1 ) . In this case, according to Lemma A1, V ( 0 , 1 ) = V ( 0 , 0 ) = V ( 0 ) . Then, V 0 ( n , 1 ) and V 1 ( n , 1 ) can be written as
V 0 ( n , 1 ) = f ( n ) θ + p V ( 0 ) + ( 1 p ) [ ( 1 γ ) V ( n + 1 , 0 ) + γ V ( n + 1 , 1 ) ] .
V 1 ( n , 1 ) = f ( n ) + W n θ + ( 1 α ) V ( 0 ) + α [ ( 1 γ ) V ( n + 1 , 0 ) + γ V ( n + 1 , 1 ) ] .
Without a loss of generality, we assume V ( 0 ) = 0 . Then, equating the two expressions yields
W n = ( 1 p α ) ( γ V ( n + 1 , 1 ) + ( 1 γ ) V ( n + 1 , 0 ) ) .
Combining problems (A14) and (A15), we conclude that W n is
W n = ( 1 p α ) ( V 0 ( n , 1 ) + θ f ( n ) ) 1 p .
Since the optimal action at state ( n , 1 ) is a = 0 , we have V 0 ( n , 1 ) = V ( n , 1 ) = V ( n ) . Finally, we obtain
W n = ( 1 p α ) ( V ( n ) + θ f ( n ) ) 1 p .
Now, we tackle the expression of V ( n ) . When V 0 ( n , 1 ) = V 1 ( n , 1 ) , the optimal action at state ( s , r ^ ) where 0 s < n is staying idle. Then, leveraging Lemma A1, value function V ( s ) where 0 s < n satisfies the following
V ( s ) = θ + f ( 0 ) + p V ( 1 ) w h e n s = 0 , θ + f ( s ) + ( 1 p ) V ( s + 1 ) w h e n 0 < s < n .
By backward induction, we end up with the following equation for 0 < s < n .
V ( s ) = θ ( 1 ( 1 p ) n s ) p + k = 1 n s f ( n k ) ( 1 p ) n s k + ( 1 p ) n s V ( n ) .
Letting s = 1 yields
V ( 1 ) = θ ( 1 ( 1 p ) n 1 ) p + k = 1 n 1 f ( n k ) ( 1 p ) n 1 k + ( 1 p ) n 1 V ( n ) .
From problem (A17), V ( 1 ) also satisfies the following
V ( 1 ) = θ f ( 0 ) p .
Equating the two expressions of V ( 1 ) , we obtain
V ( n ) = f ( 0 ) p ( 1 p ) n 1 + θ 2 p ( 1 p ) n 1 1 p k = 1 n 1 f ( n k ) ( 1 p ) k .
We recall that, when V 0 ( n , 1 ) = V 1 ( n , 1 ) , threshold policy n = ( + , n + 1 ) is optimal and both actions at state x = ( n , 1 ) are equally desirable. Thus, threshold policy n = ( + , n ) is also optimal. Then, we know
θ = Δ ¯ n + W n ρ ¯ n ,
where Δ ¯ n and ρ ¯ n are the expected AoII and the expected transmission rate under threshold policy n = ( + , n ) , respectively. Finally, combining problems (A16), (A18) and (A19), we obtain
W n = f ( 0 ) p ( 1 p ) n + Δ ¯ n 2 ( 1 p ) n p ( 1 p ) n ( 1 p ) n k = 1 n f ( k ) ( 1 p ) k 1 1 1 p α ρ ¯ n 2 ( 1 p ) n p ( 1 p ) n .
After some algebraic manipulation, we have
W n = 1 c 1 k = n + 1 + f ( k ) c 1 k n 1 Δ ¯ n ( 1 c 1 ) ( 1 p ) γ ( 1 p α ) c 1 ( 1 p α ) + ρ ¯ n ,
where c 1 = ( 1 γ ) ( 1 p ) + γ α .
In the following, we investigate some properties of Whittle’s index. First of all, W n is non-negative since 1 p α and V ( n + 1 , r ^ ) in (A15) are all non-negative. Meanwhile, combining (A15) with the fact that V ( n , r ^ ) is non-decreasing in n, we can verify that W n is non-decreasing in n. Combined with the Whittle’s indexes in two other cases (i.e., x = ( 0 , r ^ ) and x = ( s , 0 ) where s > 0 ), we can easily obtain the properties of W x as detailed in Proposition 4.

Appendix H. Proof of Proposition 5

We notice that M 1 ( λ , 1 ) coincides with the decoupled model studied in Section 4.2. When problem (9) is satisfied, the decoupled problem is indexable, and, according to Corollary 3, we only need to show that n is the optimal threshold for the states with r ^ = 1 . We first tackle the case of λ > 0 . To this end, we divide our discussion into the following cases
  • For state ( s , 1 ) where s < n , W s λ by definition. As the problem is indexable, we have D ( W s ) D ( λ ) . We recall that W s min { λ 0 : V 0 ( s , 1 ) = V 1 ( s , 1 ) } . Equivalently, W s min { λ 0 : ( s , 1 ) D ( λ ) } . Then, we know ( s , 1 ) D ( W s ) . Combined together, we conclude that ( s , 1 ) D ( λ ) . In other words, the optimal action at state ( s , 1 ) where s < n is to stay idle (i.e., a = 0 ).
  • For state ( s , 1 ) where s n , we first recall that W s = min { λ 0 : ( s , 1 ) D ( λ ) } . Consequently, for any λ < W s , we know ( s , 1 ) D ( λ ) . Meanwhile, we have W s W n > λ by the monotonicity of Whittle’s index and the definition of n. Hence, we can conclude that ( s , 1 ) D ( λ ) . In other words, the optimal action at state ( s , 1 ) where s n is to make the transmission attempt.
Then, we conclude that n is the optimal threshold for the states with r ^ = 1 when λ > 0 . In the case of λ = 0 , according to the proof of Proposition 1, we can easily verify that the optimal threshold is 1.

Appendix I. Proof of Theorem 2

We first make the following definitions. When M 1 ( λ , 1 ) is at state x and action a is taken, cost C 1 ( x , a ) f ( s ) and C 2 ( x , a ) λ a are incurred. We denote the expected C 1 -cost and the expected C 2 -cost under policy ϕ as C ¯ 1 ( ϕ ) and C ¯ 2 ( ϕ ) , respectively. Let G be a non-empty set of states. For the given state i, we define R * ( i , G ) as the class of policies ϕ , for which the following hold
  • The probability P ϕ ( x n G f o r s o m e n 1 | x 0 = i ) = 1 where x n is the state of M 1 ( λ , 1 ) at time n.
  • The expected time m i G ( ϕ ) of a first passage from i to G under ϕ is finite.
  • The expected C 1 -cost C ¯ 1 i , G ( ϕ ) and the expected C 2 -cost C ¯ 2 i , G ( ϕ ) of a first passage form i to G under ϕ are finite.
With the definitions in mind, we proceed with verifying the assumptions given in [27].
  • For all d > 0 , the set A ( d ) = { x | there exists an action a such that C 1 ( x , a ) + C 2 ( x , a ) d } is finite: For any state x, the cost satisfies C 1 ( x , a ) + C 2 ( x , a ) = f ( s ) + λ a f ( s ) . The equality holds when a = 0 . Then, the states in A ( d ) must satisfy f ( s ) d . Combined with the fact that f ( s ) is a non-decreasing and unbounded function when s N 0 , we can conclude that A ( d ) is finite.
  • There exists a stationary policy e such that the induced Markov chain has the following properties: the state space S consists of a single (non-empty) positive recurrent class R and a set U of transient states such that e R * ( i , R ) for i U . Moreover, both C ¯ 1 ( e ) and C ¯ 2 ( e ) on R are finite: We consider the policy under which the base station makes a transmission attempt at every time slot. According to the system dynamic detailed in Section 2.3, we can see that all the states communicate with state ( 0 , 0 ) and ( 0 , 0 ) communicates with all other states. Thus, the state space S consists of a single (non-empty) positive recurrent class and the set of transient states can simply be an empty set. C ¯ 1 ( e ) and C ¯ 2 ( e ) are trivially finite as we can verify using Proposition 2.
  • Given any two state x y , there exists a policy ϕ such that ϕ R * ( x , y ) : We notice that, under any policy, the maximum increase of s between two consecutive time slots is 1. Meanwhile, when s decreases, it decreases to zero. Combined with the fact that r ^ is an independent Bernoulli random variable, we can conclude that there always exists a path between any x and y with positive probability. m x y ( ϕ ) , C ¯ 1 x , y ( ϕ ) , and  C ¯ 2 x , y ( ϕ ) are trivially finite.
  • If a stationary policy ϕ has at least one positive recurrent state, then it has a single positive recurrent class R. Moreover, if  x = ( 0 , 0 ) R , then ϕ R * ( x , R ) : Given that r ^ is an independent Bernoulli random variable, we can easily conclude from the system dynamic that all the states communicate with state ( 0 , 0 ) and ( 0 , 0 ) communicates with all other states under any stationary policy. Therefore, any positive recurrent class must contain state ( 0 , 0 ) . Thus, there must have only one positive recurrent class which is R = S .
  • There exists a policy ϕ such that C ¯ 1 ( ϕ ) < and C ¯ 2 ( ϕ ) < K where K ( 0 , 1 ] : We notice that C ¯ 1 ( ϕ ) and C ¯ 2 ( ϕ ) are nothing but the expected AoII and the expected transmission rate achieved by ϕ , respectively. Then, we can easily verify that such policy exists using Proposition 2.
As the assumptions are verified, we proceed with introducing the optimal randomized policy for given λ . We say a policy is λ -optimal if the policy is optimal for M 1 ( λ , 1 ) . We consider two monotone sequences λ + n λ and λ n λ . Then, there exist subsequences of λ + n and λ n such that the corresponding sequences of optimal policies converge. Then, according to Lemma 3.7 of [27], the limit points, denoted by n λ + and n λ , are both λ -optimal. By Proposition 3.2 of [27], the Markov chains induced by n λ + and n λ both contain a single non-empty positive recurrent class and state ( 0 , 0 ) is positive recurrent in both induced Markov chains. Hence, the base station can choose which policy to follow each time the system reaches state ( 0 , 0 ) while keeping the resulting randomized policy λ -optimal as suggested by Lemma 3.9 of [27]. More precisely, we consider the following randomized policy: each time the system reaches state ( 0 , 0 ) , the base station will choose n λ with probability μ and n λ + with probability 1 μ . The chosen policy will be followed until the next choice. We denote such policy as n λ and conclude that n λ is λ -optimal under any μ [ 0 , 1 ] .

Appendix J. Proof of Proposition 6

The value function V ( x ) and V i ( x i ) must satisfy their own Bellman equations. More precisely
V ( x ) + θ = min a A N ( 1 ) C ( x , a ) + x P r ( x x , a ) V ( x ) ,
V i ( x i ) + θ i = min a i { 0 , 1 } C ( x i , a i ) + x i P r ( x i x i , a i ) V i ( x i ) ,
where θ and θ i are the optimal values of M N ( λ , 1 ) and M 1 i ( λ , 1 ) , respectively. We recall from Section 2.3 that the users are independent when action a and current state x are given. Thus
P r ( x x , a ) = i = 1 N P r ( x i x , a ) ,
where x = ( x 1 , , x N ) . Then, we have
x { x i } P r ( x { x i } x , a ) = x { x i } j i P r ( x j x , a ) = 1 .
We also recall from Section 2.3 that the state of user i depends only on its previous state and the action with respect to user i. Thus
P r ( x i x , a ) = P r ( x i x i , a i ) .
Combined together, we obtain
i = 1 N x i P r ( x i x i , a i ) V i ( x i ) = i = 1 N x i x { x i } j i P r ( x j x , a ) P r ( x i x i , a i ) V i ( x i ) = i = 1 N x i x { x i } i = 1 N P r ( x i x , a ) V i ( x i ) = x P r ( x x , a ) i = 1 N V i ( x i ) .
Then, we sum problem (A20) over all users which yields
i = 1 N ( V i ( x i ) + θ i ) = min a i = 1 N C ( x i , a i ) + x i P r ( x i x i , a i ) V i ( x i ) .
We recall that C ( x , a ) = i = 1 N C ( x i , a i ) by definition. Then, leveraging problem (A21), we obtain
i = 1 N V i ( x i ) + i = 1 N θ i = min a A N ( 1 ) C ( x , a ) + x P r ( x x , a ) i = 1 N V i ( x i ) .
Since the solution to the Bellman equation is unique [21], we must have i = 1 N V i ( x i ) = V ( x ) and i = 1 N θ i = θ . Then, we can conclude that it is optimal for M N ( λ , 1 ) if each user adopts its own optimal policy.

Appendix K. Proof of Theorem 3

In this proof, we class a policy as λ * -optimal if it is optimal for M N ( λ * , 1 ) . In Section 4.2, we ensure that, for each user, there exists at least one threshold policy that yields a finite expected AoII. Therefore, we can conclude that, for RP, there exists at least one policy that causes the expected AoII and the expected transmission rate to be both finite. Then, according to Lemma 3.10 of [27], a policy is optimal for RP if
  • It is λ * -optimal;
  • The resulting expected transmission rate is equal to M.
We first construct a policy ϕ λ * that is λ * -optimal. We recall from Proposition 6 that a policy is λ * -optimal if it consists of the optimal policies for each M 1 i ( λ * , 1 ) where 1 i N . According to Theorem 2, for any i, there exist n λ * , i and n λ + * , i that are both optimal for M 1 i ( λ * , 1 ) . Then, we can construct the policy ϕ λ * in the following way.
  • For user i with n λ * , i = n λ + * , i n λ * , i , the threshold policy n λ * , i is used. Then, the deterministic policy n λ * , i is optimal for M 1 i ( λ * , 1 ) and
    ρ ¯ i ( λ * ) = ρ ¯ i ( λ * ) = ρ ¯ i ( λ + * ) .
    In this case, the choice of μ i makes no difference.
  • For user i with n λ * , i n λ + * , i , the randomized policy n λ * , i as detailed in Theorem 2 is used. Then, for any μ i [ 0 , 1 ] , the randomized policy n λ * , i is optimal for M 1 i ( λ * , 1 )  and
    ρ ¯ i ( λ * ) = μ i ρ ¯ i ( λ * ) + ( 1 μ i ) ρ ¯ i ( λ + * ) .
Combing the two cases, we conclude that ϕ λ * = [ n λ * , 1 , , n λ * , N ] is λ * -optimal under any μ i [ 0 , 1 ] . Hence, as long as the chosen μ i ’s realize i = 1 N ρ ¯ i ( λ * ) = M , we can conclude that the randomized policy ϕ λ * is optimal for RP.

Appendix L. Proof of Proposition 8

We notice that M 1 i ( λ * , 1 ) coincides with the decoupled model studied in Section 4.2. Therefore, we can use the results in Section 4.2 to prove the properties. Since the users share the same structure, we ignore the user index i for simplicity. According to the definition of I x , we have
I x = x P x , x ( 0 ) V ( x ) x P x , x ( 1 ) V ( x ) λ * = Δ V ( x ) .
Leveraging the results in the proof of Proposition 1, we have the following
  • For state x = ( 0 , r ^ ) , I x = λ * .
  • For state x = ( s , 0 ) where s > 0 , I x = λ * p e 0 ( 1 2 p ) ω where ω = ( 1 γ ) [ V ( 0 , 0 ) V ( s + 1 , 0 ) ] + γ [ V ( 0 , 1 ) V ( s + 1 , 1 ) ] 0 .
  • For state x = ( s , 1 ) where s > 0 , I x = λ * ( 1 p e 1 ) ( 1 2 p ) ω .
From the above three cases, we can easily conclude that I x λ * and the equality holds when r ^ = p e 0 = 0 or s = 0 . As is proven in Corollary 2, V ( x ) is non-decreasing in s. Hence, we can conclude that I x is also non-decreasing in s. To show that I x is monotone in r ^ , we consider two states x 1 = ( s , 1 ) and x 2 = ( s , 0 ) . Then, we have
I x 2 I x 1 = Δ V ( s , 1 ) Δ V ( s , 0 ) = ( 1 p e 1 p e 0 ) ( 1 2 p ) ω 0 .
Therefore, we can conclude that I x is non-decreasing in r ^ .

Appendix M

Algorithm A1 Improved Relative Value Iteration
Require:
MDP M = ( X , P , A , C )
Convergence Criteria ϵ
1:procedureRelativeValueIteration( M , ϵ )
2:      Initialize V 0 ( x ) = 0 ; ν = 0
3:      Choose x r e f X arbitrarily
4:      while  V ν is not converged (RVI converges when the maximum difference between the results of two consecutive iterations is less than ϵ ) do
5:            for  x = ( s , r ^ ) X  do
6:                   if ∃ active state ( s 1 , r ^ 1 ) s.t. s 1 s and r ^ 1 r ^  then
7:                         a * ( x ) = 1
8:                         Q ν + 1 ( x ) = C ( x , 1 ) + x P x x ( 1 ) V ν ( x )
9:                   else
10:                        for  a A  do
11:                             H x , a = C ( x , a ) + x P x x ( a ) V ν ( x )
12:                         a * ( x ) = arg min a { H x , a }
13:                         Q ν + 1 ( x ) = H x , a *
14:          for  x X  do
15:                    V ν + 1 ( x ) = Q ν + 1 ( x ) Q ν + 1 ( x r e f )
16:               ν = ν + 1
    return  n a * ( x )
Algorithm A2 Bisection Search
Require:
Maximum updates per transmission attempt M
MDP M N ( λ , 1 ) = ( X N , A N ( 1 ) , P N , C N ( λ ) )
Tolerance ξ
Convergence criteria ϵ
1:procedureBisectionSearch( M N ( λ , 1 ) , M, ξ , ϵ )
2:      Initialize λ = 0 ; λ + = 1
3:       ϕ λ + ( M N ( λ + , 1 ) , ϵ ) using Section 5.1 and Proposition 6
4:       ρ ¯ ( λ + ) ϕ λ + using Proposition 2
5:    while  ρ ¯ ( λ + ) M  do
6:            λ = λ + ; λ + = 2 λ +
7:            ϕ λ + ( M N ( λ + , 1 ) , ϵ ) using Section 5.1 and Proposition 6
8:            ρ ¯ ( λ + ) ϕ λ + using Proposition 2
9:    while  λ + λ 2 ξ  do
10:            λ = λ + + λ 2
11:            ϕ λ ( M N ( λ , 1 ) , ϵ ) using Section 5.1 and Proposition 6
12:            ρ ¯ ( λ ) ϕ λ using Proposition 2
13:           if  ρ ¯ ( λ ) > M  then
14:                  λ = λ
15:           else
16:                  λ + = λ
    return  ( λ + * , λ * ) ( λ + , λ )

References

  1. Maatouk, A.; Kriouile, S.; Assaad, M.; Ephremides, A. The age of incorrect information: A new performance metric for status updates. IEEE/ACM Trans. Netw. 2020, 28, 2215–2228. [Google Scholar] [CrossRef]
  2. Uysal, E.; Kaya, O.; Ephremides, A.; Gross, J.; Codreanu, M.; Popovski, P.; Assaad, M.; Liva, G.; Munari, A.; Soleymani, T.; et al. Semantic communications in networked systems. arXiv 2021, arXiv:2103.05391. [Google Scholar]
  3. Kam, C.; Kompella, S.; Ephremides, A. Age of incorrect information for remote estimation of a binary markov source. In Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Toronto, ON, Canada, 6–9 July 2020; pp. 1–6. [Google Scholar]
  4. Maatouk, A.; Assaad, M.; Ephremides, A. The age of incorrect information: An enabler of semantics-empowered communication. arXiv 2020, arXiv:2012.13214. [Google Scholar]
  5. Chen, Y.; Ephremides, A. Minimizing Age of Incorrect Information for Unreliable Channel with Power Constraint. arXiv 2021, arXiv:2101.08908. [Google Scholar]
  6. Kriouile, S.; Assaad, M. Minimizing the Age of Incorrect Information for Real-time Tracking of Markov Remote Sources. arXiv 2021, arXiv:2102.03245. [Google Scholar]
  7. Kadota, I.; Sinha, A.; Uysal-Biyikoglu, E.; Singh, R.; Modiano, E. Scheduling policies for minimizing age of information in broadcast wireless networks. IEEE/ACM Trans. Netw. 2018, 26, 2637–2650. [Google Scholar] [CrossRef] [Green Version]
  8. Hsu, Y.P. Age of information: Whittle index for scheduling stochastic arrivals. In Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018; pp. 2634–2638. [Google Scholar]
  9. Tripathi, V.; Modiano, E. A whittle index approach to minimizing functions of age of information. In Proceedings of the 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 24–27 September 2019; pp. 1160–1167. [Google Scholar]
  10. Maatouk, A.; Kriouile, S.; Assad, M.; Ephremides, A. On the optimality of the Whittle’s index policy for minimizing the age of information. IEEE Trans. Wirel. Commun. 2020, 20, 1263–1277. [Google Scholar] [CrossRef]
  11. Sun, J.; Jiang, Z.; Krishnamachari, B.; Zhou, S.; Niu, Z. Closed-form Whittle’s index-enabled random access for timely status update. IEEE Trans. Commun. 2019, 68, 1538–1551. [Google Scholar] [CrossRef]
  12. Nguyen, G.D.; Kompella, S.; Kam, C.; Wieselthier, J.E. Information freshness over a Markov channel: The effect of channel state information. Ad Hoc Networks 2019, 86, 63–71. [Google Scholar] [CrossRef]
  13. Talak, R.; Karaman, S.; Modiano, E. Optimizing age of information in wireless networks with perfect channel state information. In Proceedings of the 2018 16th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), Shanghai, China, 7–11 May 2018; pp. 1–8. [Google Scholar]
  14. Shi, L.; Cheng, P.; Chen, J. Optimal periodic sensor scheduling with limited resources. IEEE Trans. Autom. Control 2011, 56, 2190–2195. [Google Scholar] [CrossRef]
  15. Leong, A.S.; Dey, S.; Quevedo, D.E. Sensor scheduling in variance based event triggered estimation with packet drops. IEEE Trans. Autom. Control 2016, 62, 1880–1895. [Google Scholar] [CrossRef] [Green Version]
  16. Mo, Y.; Garone, E.; Casavola, A.; Sinopoli, B. Stochastic sensor scheduling for energy constrained estimation in multi-hop wireless sensor networks. IEEE Trans. Autom. Control 2011, 56, 2489–2495. [Google Scholar] [CrossRef] [Green Version]
  17. Kaul, S.; Yates, R.; Gruteser, M. Real-time status: How often should one update? In Proceedings of the 2012 Proceedings IEEE INFOCOM, Orlando, FL, USA, 25–30 March 2012; pp. 2731–2735. [Google Scholar]
  18. Leong, A.S.; Ramaswamy, A.; Quevedo, D.E.; Karl, H.; Shi, L. Deep reinforcement learning for wireless sensor scheduling in cyber–physical systems. Automatica 2020, 113, 108759. [Google Scholar] [CrossRef] [Green Version]
  19. Wang, J.; Ren, X.; Mo, Y.; Shi, L. Whittle index policy for dynamic multichannel allocation in remote state estimation. IEEE Trans. Autom. Control 2019, 65, 591–603. [Google Scholar] [CrossRef]
  20. Gittins, J.; Glazebrook, K.; Weber, R. Multi-Armed Bandit Allocation Indices; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
  21. Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach, 3rd ed.; Prentice Hall Press: Hoboken, NJ, USA, 2009. [Google Scholar]
  22. Whittle, P. Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 1988, 25, 287–298. [Google Scholar] [CrossRef]
  23. Weber, R.R.; Weiss, G. On an index policy for restless bandits. J. Appl. Probab. 1990, 27, 637–648. [Google Scholar] [CrossRef]
  24. Glazebrook, K.D.; Ruiz-Hernandez, D.; Kirkbride, C. Some indexable families of restless bandit problems. Adv. Appl. Probab. 2006, 38, 643–672. [Google Scholar] [CrossRef]
  25. Larrañaga, M. Dynamic Control of Stochastic and Fluid Resource-Sharing Systems. Ph.D. Thesis, Université de Toulouse, Toulouse, France, 2015. [Google Scholar]
  26. Sennott, L.I. On computing average cost optimal policies with application to routing to parallel queues. Math. Methods Oper. Res. 1997, 45, 45–62. [Google Scholar] [CrossRef]
  27. Sennott, L.I. Constrained average cost Markov decision chains. Probab. Eng. Inf. Sci. 1993, 7, 69–83. [Google Scholar] [CrossRef]
  28. Bertsimas, D.; Niño-Mora, J. Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Oper. Res. 2000, 48, 80–90. [Google Scholar] [CrossRef] [Green Version]
  29. Littman, M.L.; Dean, T.L.; Kaelbling, L.P. On the complexity of solving Markov decision problems. arXiv 2013, arXiv:1302.4971. [Google Scholar]
  30. Verloop, I.M. Asymptotically optimal priority policies for indexable and nonindexable restless bandits. Ann. Appl. Probab. 2016, 26, 1947–1995. [Google Scholar] [CrossRef]
Figure 1. The structure of the communication model.
Figure 1. The structure of the communication model.
Entropy 23 01572 g001
Figure 2. A sample path of s t .
Figure 2. A sample path of s t .
Entropy 23 01572 g002
Figure 3. Performance when the source processes vary. We choose p i = 0.05 + 0.4 ( i 1 ) N 1 , f i ( s ) = s , γ i = 0.6 , p e , i 0 = p e 0 , and p e , i 1 = 0.1 for 1 i N .
Figure 3. Performance when the source processes vary. We choose p i = 0.05 + 0.4 ( i 1 ) N 1 , f i ( s ) = s , γ i = 0.6 , p e , i 0 = p e 0 , and p e , i 1 = 0.1 for 1 i N .
Entropy 23 01572 g003
Figure 4. Performance when the communication goals vary. We choose f i ( s ) = s 0.5 + i 1 N 1 , p i = 0.3 , γ i = 0.6 , p e , i 0 = p e 0 , and p e , i 1 = 0.1 for 1 i N .
Figure 4. Performance when the communication goals vary. We choose f i ( s ) = s 0.5 + i 1 N 1 , p i = 0.3 , γ i = 0.6 , p e , i 0 = p e 0 , and p e , i 1 = 0.1 for 1 i N .
Entropy 23 01572 g004
Figure 5. Performance in systems with random parameters when N = 5 . The parameters for each user are chosen randomly within the following intervals: γ [ 0 , 1 ] , p [ 0.05 , 0.45 ] , p e 0 I , p e 1 [ 0 , 0.45 ] , and f ( s ) = s τ where τ [ 0.5 , 1.5 ] .
Figure 5. Performance in systems with random parameters when N = 5 . The parameters for each user are chosen randomly within the following intervals: γ [ 0 , 1 ] , p [ 0.05 , 0.45 ] , p e 0 I , p e 1 [ 0 , 0.45 ] , and f ( s ) = s τ where τ [ 0.5 , 1.5 ] .
Entropy 23 01572 g005
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite