# Scheduling to Minimize Age of Incorrect Information with Imperfect Channel State Information

^{*}

## Abstract

**:**

## 1. Introduction

## 2. System Overview

#### 2.1. Communication Model

#### 2.2. Age of Incorrect Information

- When the receiver’s estimate is correct at time $t+1$, we have ${U}_{t+1}=t+1$. Then, by definition, ${s}_{t+1}=0$.
- When the receiver’s estimate is incorrect at time $t+1$, we have ${U}_{t+1}={U}_{t}$. Then, by definition, ${s}_{t+1}=t+1-{U}_{t}={s}_{t}+1$.

**Remark**

**1.**

#### 2.3. System Dynamic

- When ${x}_{t}=(0,{\widehat{r}}_{t})$, the estimate at time t is correct (i.e., ${\widehat{X}}_{t}={X}_{t}$). Hence, for the receiver, ${X}_{t}$ carries no new information about the source process. In other words, ${\widehat{X}}_{t+1}={\widehat{X}}_{t}$ regardless of whether an update is transmitted at time t. We recall that ${U}_{t+1}={U}_{t}$ if ${\widehat{X}}_{t+1}\ne {X}_{t+1}$ and ${U}_{t+1}=t+1$ otherwise. Since the source is binary, we obtain ${U}_{t+1}={U}_{t}$ if ${X}_{t+1}\ne {X}_{t}$, which happens with probability p and ${U}_{t+1}=t+1$ otherwise. According to (2), we obtain$$Pr(1\mid (0,{\widehat{r}}_{t}),{a}_{t})=p,$$$$Pr(0\mid (0,{\widehat{r}}_{t}),{a}_{t})=1-p.$$
- When ${a}_{t}=0$ and ${x}_{t}=({s}_{t},{\widehat{r}}_{t})$, where ${s}_{t}>0$, the channel will not be used and no new update will be received by the receiver, and so, ${\widehat{X}}_{t+1}={\widehat{X}}_{t}$. We recall that ${U}_{t+1}={U}_{t}$ if ${\widehat{X}}_{t+1}\ne {X}_{t+1}$ and ${U}_{t+1}=t+1$ otherwise. Since ${X}_{t}\ne {\widehat{X}}_{t}$ and the source is binary, we have ${U}_{t+1}={U}_{t}$ if ${X}_{t+1}={X}_{t}$, which happens with probability $1-p$ and ${U}_{t+1}=t+1$ otherwise. According to (2), we obtain$$Pr({s}_{t}+1\mid ({s}_{t},{\widehat{r}}_{t}),{a}_{t}=0)=1-p,$$$$Pr(0\mid ({s}_{t},{\widehat{r}}_{t}),{a}_{t}=0)=p.$$
- When ${a}_{t}=1$ and ${x}_{t}=({s}_{t},1)$ where ${s}_{t}>0$, the transmission attempt will succeed with probability $1-{p}_{e}^{1}$ and fail with probability ${p}_{e}^{1}$. We recall that ${U}_{t+1}={U}_{t}$ if ${\widehat{X}}_{t+1}\ne {X}_{t+1}$ and ${U}_{t+1}=t+1$ otherwise. Then, when the transmission attempt succeeds (i.e., ${\widehat{X}}_{t+1}={X}_{t}$), ${U}_{t+1}={U}_{t}$ if ${X}_{t+1}\ne {X}_{t}$ and ${U}_{t+1}=t+1$ otherwise. When the transmission attempt fails (i.e., ${\widehat{X}}_{t+1}={\widehat{X}}_{t}\ne {X}_{t}$), we have ${U}_{t+1}={U}_{t}$ if ${X}_{t+1}={X}_{t}$ and ${U}_{t+1}=t+1$ otherwise. Combining (2) with the dynamic of the source process we obtain$$Pr({s}_{t}+1\mid ({s}_{t},1),{a}_{t}=1)={p}_{e}^{1}(1-p)+(1-{p}_{e}^{1})p\triangleq \alpha ,$$$$Pr(0\mid ({s}_{t},1),{a}_{t}=1)={p}_{e}^{1}p+(1-{p}_{e}^{1})(1-p)=1-\alpha .$$
- When ${a}_{t}=1$ and ${x}_{t}=({s}_{t},0)$, where ${s}_{t}>0$, following the same line, we obtain$$Pr({s}_{t}+1\mid ({s}_{t},0),{a}_{t}=1)={p}_{e}^{0}p+(1-{p}_{e}^{0})(1-p)\triangleq \beta ,$$$$Pr(0\mid ({s}_{t},0),{a}_{t}=1)={p}_{e}^{0}(1-p)+(1-{p}_{e}^{0})p=1-\beta .$$

#### 2.4. Problem Formulation

## 3. Structural Properties of the Optimal Policy

- ${\mathcal{X}}_{N}$ denotes the state space. The state is $x=({x}_{1},\dots ,{x}_{N})$ where ${x}_{i}=({s}_{i},{\widehat{r}}_{i})$.
- ${\mathcal{A}}_{N}\left(M\right)$ denotes the action space. The feasible action is $\mathit{a}=({a}_{1},\dots ,{a}_{N})$ where ${a}_{i}\in \{0,1\}$ and ${\sum}_{i=1}^{N}{a}_{i}=M$. Note that the feasible actions are independent of the state and the time.
- ${\mathcal{P}}_{N}$ denotes the state transition probabilities. We define ${P}_{\mathit{x},{\mathit{x}}^{\prime}}\left(\mathit{a}\right)$ as the probability that action $\mathit{a}$ at state $\mathit{x}$ will lead to state ${\mathit{x}}^{\prime}$. It is calculated by$${P}_{\mathit{x},{\mathit{x}}^{\prime}}\left(\mathit{a}\right)=\prod _{i=1}^{N}P\left({\widehat{r}}_{i}^{\prime}\right){P}_{{s}_{i},{s}_{i}^{\prime}}({a}_{i},{\widehat{r}}_{i}),$$
- ${\mathcal{C}}_{N}\left(w\right)$ denotes the instant cost. When the system is at state $\mathit{x}$ and action $\mathit{a}$ is taken, the instant cost is $C(\mathit{x},\mathit{a})\triangleq {\sum}_{i=1}^{N}C({x}_{i},{a}_{i})\triangleq {\sum}_{i=1}^{N}\left({f}_{i}\left({s}_{i}\right)+w{a}_{i}\right)$.

**Remark**

**2.**

**Lemma**

**1**(Monotonicity)

**.**

**Proof.**

**Definition**

**1**(Statistically identical)

**.**

**Lemma**

**2**(Equivalence)

**.**

**Proof.**

**Theorem**

**1**(Structural properties)

**.**

- ${\delta}^{j,k}\left(\mathit{x}\right)\le 0$ if ${\widehat{r}}_{k}={p}_{e,k}^{0}=0$. The equality holds when ${s}_{j}=0$ or ${\widehat{r}}_{j}={p}_{e,j}^{0}=0$.
- ${\delta}^{j,k}\left(\mathit{x}\right)$ is non-increasing in ${\widehat{r}}_{j}$ and is non-decreasing in ${\widehat{r}}_{k}$ when ${s}_{j},{s}_{k}>0$. At the same time, ${\delta}^{j,k}\left(\mathit{x}\right)$ is independent of ${\widehat{r}}_{i}$ for any $i\ne j,k$.
- ${\delta}^{j,k}\left(\mathit{x}\right)\le 0$ if ${s}_{k}=0$. The equality holds when ${s}_{j}=0$ or ${\widehat{r}}_{j}={p}_{e,j}^{0}=0$.
- ${\delta}^{j,k}\left(\mathit{x}\right)$ is non-increasing in ${s}_{j}$ if ${\Gamma}_{j}^{{\widehat{r}}_{j}}\le {\Gamma}_{k}^{{\widehat{r}}_{k}}$ and is non-decreasing in ${s}_{k}$ if ${\Gamma}_{j}^{{\widehat{r}}_{j}}\ge {\Gamma}_{k}^{{\widehat{r}}_{k}}$ when ${s}_{j},{s}_{k}>0$. We define ${\Gamma}_{i}^{1}\triangleq \frac{{\alpha}_{i}}{1-{p}_{i}}$ and ${\Gamma}_{i}^{0}\triangleq \frac{{\beta}_{i}}{1-{p}_{i}}$ for $1\le i\le N$.
- ${\delta}^{j,k}\left(\mathit{x}\right)\le 0$ if ${s}_{j}\ge {s}_{k}$, ${\widehat{r}}_{j}\ge {\widehat{r}}_{k}$, and users j and k are statistically identical.

**Proof.**

- Neither ${s}_{j}$ nor ${s}_{k}$ increases. In this case, both ${s}_{j}$ and ${s}_{k}$ become zero.
- Either ${s}_{j}$ or ${s}_{k}$ increases and the other becomes zero. We denote by ${P}_{j}^{k}$ the probability that only ${s}_{k}$ increases when ${a}_{j}=1$. The notation for other cases is defined analogously. The probabilities can be obtained easily using the results in Section 2.3.
- Both ${s}_{j}$ and ${s}_{k}$ increase. We denote by ${P}_{j}$ the probability that both ${s}_{j}$ and ${s}_{k}$ increase when ${a}_{j}=1$. ${P}_{k}$ is defined analogously. The probabilities can be obtained easily using the results in Section 2.3.

**Corollary**

**1**(Application of Theorem 1)

**.**

- The user i with ${\widehat{r}}_{i}={p}_{e,i}^{0}=0$ or ${s}_{i}=0$ will not be chosen unless it is to break the tie.
- When user j is chosen at state ${\mathit{x}}_{1}$, then for state ${\mathit{x}}_{2}$, such that ${\widehat{r}}_{1,j}\le {\widehat{r}}_{2,j}$ and ${s}_{1,i}={s}_{2,i}$ for $1\le i\le N$, the optimal choice must be in the set $G=\left\{j\right\}\cup \{k:{\widehat{r}}_{1,k}<{\widehat{r}}_{2,k}\}$.
- When $N=2$, we consider two states, ${\mathit{x}}_{1}$ and ${\mathit{x}}_{2}$, which differ only in the value of ${s}_{j}$. Specifically, ${s}_{1,j}\le {s}_{2,j}$. If user j is chosen at state ${\mathit{x}}_{1}$ and ${\Gamma}_{j}^{{\widehat{r}}_{1,j}}\le {\Gamma}_{k}^{{\widehat{r}}_{1,k}}$, the optimal choice at state ${\mathit{x}}_{2}$ will also be user j.
- When $N=2$, we consider two states, ${\mathit{x}}_{1}$ and ${\mathit{x}}_{2}$, which differ only in the value of ${s}_{k}$. Specifically, ${s}_{1,k}\ge {s}_{2,k}$. If user j is chosen at state ${\mathit{x}}_{1}$ and ${\Gamma}_{j}^{{\widehat{r}}_{1,j}}\ge {\Gamma}_{k}^{{\widehat{r}}_{1,k}}$, the optimal choice at state ${\mathit{x}}_{2}$ will also be user j.
- When all users are statistically identical, the optimal choice at any time slot must be either the user with $x=({s}_{max,1},1)$ where ${s}_{max,1}\triangleq {max}_{{s}_{i}}\left\{({s}_{i},1)\right\}$ or the user with $x=({s}_{max,0},0)$ where ${s}_{max,0}\triangleq {max}_{{s}_{i}}\left\{({s}_{i},0)\right\}$. Moreover,
- If ${s}_{max,1}\ge {s}_{max,0}$, it is optimal to choose the user with $x=({s}_{max,1},1)$.
- If ${s}_{max,1}<{s}_{max,0}$, the optimal choice will switch from the user with $x=({s}_{max,0},0)$ to the user with $x=({s}_{max,1},1)$ when ${s}_{max,1}$ increases from 0 to ${s}_{max,0}$ solely.

**Proof.**

- According to Property 5 of Theorem 1, we can conclude that it is optimal to choose user j when ${s}_{max,1}\ge {s}_{max,0}$.
- To determine the optimal choice in the case of ${s}_{max,1}<{s}_{max,0}$, we recall that the optimal choice will be user k (i.e., ${\delta}^{j,k}\left(\mathit{x}\right)\ge 0$) if ${s}_{j}=0$ and will be user j (i.e., ${\delta}^{j,k}\left(\mathit{x}\right)\le 0$) if ${s}_{j}={s}_{k}$. At the same time, Property 4 of Theorem 1 tells us that ${\delta}^{j,k}\left(\mathit{x}\right)$ is non-increasing in ${s}_{j}$ when users j and k are statistically identical. Therefore, we can conclude that the optimal choice will switch from user k to user j when ${s}_{j}$ increases from 0 to ${s}_{k}$ solely.

## 4. Whittle’s Index Policy

- We first formulate a relaxed version of PP and apply the Lagrangian approach.
- Then, we decouple the problem of minimizing the Lagrangian function into N decoupled problems, each of which only considers a single user. By casting the decoupled problem into an MDP, we investigate the structural properties and performance of the optimal policy.
- Leveraging the results above and under a simple condition, we establish the indexability of the decoupled problem.
- Finally, we obtain the expression of Whittle’s index by solving the Bellman equation.

#### 4.1. Relaxed Problem

#### 4.2. Decoupled Model

**Corollary**

**2**(Extension of Lemma 1)

**.**

**Proof.**

**Proposition**

**1**(Optimal policy for decoupled problem)

**.**

- The optimal policy can be fully captured by $\mathit{n}=({n}_{0},{n}_{1})$. More precisely, when the system is at state $(s,\widehat{r})$, it is optimal to make a transmission attempt only when $s\ge {n}_{\widehat{r}}$.
- ${n}_{0}\ge {n}_{1}>0$.

**Proof.**

**Proposition**

**2**(Performance)

**.**

**Proof.**

#### 4.3. Indexability

**Definition**

**2**

**Remark**

**3.**

**Corollary**

**3**(Consequences of (9))

**.**

**Proof.**

**Proposition**

**3**(Indexability of decoupled problem)

**.**

**Proof.**

#### 4.4. Whittle’s Index Policy

**Definition**

**3**(Whittle’s index)

**.**

**Proposition**

**4**(Whittle’s index)

**.**

**Proof.**

**Definition**

**4**

**Remark**

**4.**

- The first two properties can be verified by noting that ${W}_{{x}_{i}}\ge 0$ and the equality holds when ${\widehat{r}}_{i}=0$ or ${s}_{i}=0$. At the same time, ${W}_{{x}_{i}}$ is non-decreasing in ${\widehat{r}}_{i}$.
- The third and fourth properties can be verified by noting that ${W}_{{x}_{i}}$ is non-decreasing in ${s}_{i}$.
- For the last property, we first notice that ${W}_{{x}_{j}}={W}_{{x}_{k}}$ when users j and k are statistically identical and ${x}_{j}={x}_{k}$. Then, the property can be verified by noting that ${W}_{{x}_{i}}$ is non-decreasing in both ${s}_{i}$ and ${\widehat{r}}_{i}$.

## 5. Optimal Policy for Relaxed Problem

**Remark**

**5.**

#### 5.1. Optimal Policy for Single User

**Proposition**

**5**(Optimal deterministic policy)

**.**

**Proof.**

**Theorem**

**2**(Optimal randomized policy)

**.**

**Proof.**

#### 5.2. Optimal Policy for RP

**Proposition**

**6**(Separability)

**.**

**Proof.**

**Theorem**

**3**(Optimal policy for RP)

**.**

**Proof.**

- It is optimal for ${\mathcal{M}}_{N}({\lambda}^{*},-1)$;
- The resulting expected transmission rate is equal to M.

- Initialize ${\lambda}_{-}=0$ and ${\lambda}_{+}=1$.
- Do ${\lambda}_{-}={\lambda}_{+}$ and ${\lambda}_{+}=2{\lambda}_{+}$ until $\overline{\rho}\left({\lambda}_{+}\right)<M$.
- Run Bisection search on the interval $[{\lambda}_{-},{\lambda}_{+}]$ until the tolerance $2\xi $ is met.

**Remark**

**6.**

## 6. Indexed Priority Policy

#### 6.1. Primal-Dual Heuristic

- If $h\left(\mathit{x}\right)\ge M$, the base station will choose the M users with the largest ${\overline{\psi}}_{{x}_{i}}^{0}$ among the $h\left(\mathit{x}\right)$ users.
- If $h\left(\mathit{x}\right)<M$, these $h\left(\mathit{x}\right)$ users are chosen by the base station. The base station will choose $M-h\left(\mathit{x}\right)$ additional users with the smallest ${\overline{\psi}}_{{x}_{i}}^{1}$.

- According to Proposition 6, we can decompose the problem into N subproblems.
- For each subproblem, the threshold structure of the optimal policy is utilized to reduce the running time of RVI.
- As we will see later, the developed policy can be obtained directly from the result of RVI in practice.

#### 6.2. Indexed Priority Policy

- State ${x}_{i}$ is where randomization happens (randomization happens when the actions suggested by the two optimal deterministic policies are different), and it has a value of ${\pi}_{{x}_{i}}^{0}={a}_{{\mathit{n}}_{{\lambda}_{-}^{*},i}}\left({x}_{i}\right)(1-{\mu}_{i}){\pi}_{{x}_{i}}+{a}_{{\mathit{n}}_{{\lambda}_{+}^{*},i}}\left({x}_{i}\right){\mu}_{i}{\pi}_{{x}_{i}}$ and ${\pi}_{{x}_{i}}^{1}={\pi}_{{x}_{i}}-{\pi}_{{x}_{i}}^{0}$ where ${\mu}_{i}$ is given by (12) and ${a}_{{\mathit{n}}_{\lambda ,i}}\left({x}_{i}\right)$ is the action suggested by ${\mathit{n}}_{\lambda ,i}$ at state ${x}_{i}$.
- For other values of ${x}_{i}$, we have ${\pi}_{{x}_{i}}^{0}=(1-{a}_{{\mathit{n}}_{{\lambda}^{*},i}}\left({x}_{i}\right)){\pi}_{{x}_{i}}$ and ${\pi}_{{x}_{i}}^{1}={\pi}_{{x}_{i}}-{\pi}_{{x}_{i}}^{0}$.

**Proposition**

**7**(Optimal solution pair)

**.**

**Proof.**

- Primal feasibility: the constraints in (13) are satisfied.
- Dual feasibility: $\sigma \ge 0$ and ${\psi}_{{x}_{i}}^{{a}_{i}}\ge 0$ for all ${x}_{i}$, ${a}_{i}$, and i.
- Complementary slackness: $\sigma \left({\sum}_{i=1}^{N}{\sum}_{{x}_{i}}{\pi}_{{x}_{i}}^{1}-M\right)=0$ and ${\psi}_{{x}_{i}}^{{a}_{i}}{\pi}_{{x}_{i}}^{{a}_{i}}=0$ for all ${x}_{i}$, ${a}_{i}$, and i.
- Stationarity: the gradient of $\mathcal{L}({\pi}_{{x}_{i}}^{{a}_{i}},\sigma ,{\sigma}_{i},{\sigma}_{{x}_{i}},{\psi}_{{x}_{i}}^{{a}_{i}})$ with respect to $\left\{{\pi}_{{x}_{i}}^{{a}_{i}}\right\}$ vanishes.

- For state ${x}_{i}$ such that ${\overline{\pi}}_{{x}_{i}}^{1}>0$ and ${\overline{\pi}}_{{x}_{i}}^{0}=0$, we have ${\overline{\psi}}_{{x}_{i}}^{1}=0$. Therefore, ${I}_{{x}_{i}}={\overline{\psi}}_{{x}_{i}}^{0}\ge 0$.
- For state ${x}_{i}$ such that ${\overline{\pi}}_{{x}_{i}}^{1}>0$ and ${\overline{\pi}}_{{x}_{i}}^{0}>0$, we have ${\overline{\psi}}_{{x}_{i}}^{1}={\overline{\psi}}_{{x}_{i}}^{0}=0$. Therefore, ${I}_{{x}_{i}}=0$.
- For state ${x}_{i}$ such that ${\overline{\pi}}_{{x}_{i}}^{1}=0$ and ${\overline{\pi}}_{{x}_{i}}^{0}>0$, we have ${\overline{\psi}}_{{x}_{i}}^{0}=0$. Therefore, ${I}_{{x}_{i}}=-{\overline{\psi}}_{{x}_{i}}^{1}\le 0$.

**Proposition**

**8**(Properties of ${I}_{{x}_{i}}$)

**.**

**Proof.**

**Definition**

**5**(Indexed priority policy)

**.**

**Remark**

**7.**

**Remark**

**8.**

- The first two properties can be verified by noting that ${I}_{{x}_{i}}\ge -{\lambda}^{*}$ and the equality holds when ${\widehat{r}}_{i}={p}_{e,i}^{0}=0$ or ${s}_{i}=0$. At the same time, ${I}_{{x}_{i}}$ is non-decreasing in ${\widehat{r}}_{i}$.
- The third and fourth properties can be verified by noting that ${I}_{{x}_{i}}$ is non-decreasing in ${s}_{i}$.
- For the last property, we first notice that ${I}_{{x}_{j}}={I}_{{x}_{k}}$ when users j and k are statistically identical and ${x}_{j}={x}_{k}$. Then, the property can be verified by noting that ${I}_{{x}_{i}}$ is non-decreasing in both ${s}_{i}$ and ${\widehat{r}}_{i}$.

## 7. Numerical Results

- The Greedy+ policy yields a smaller expected average AoII than that achieved by the Greedy policy. Recall that we obtained the Greedy+ policy by applying the structural properties detailed in Corollary 1. Therefore, simple applications of the structural properties of the optimal policy can improve the performance of scheduling policies.
- The Indexed priority policy has comparable performance to Whittle’s index policy in all the system settings considered. The two policies have their own advantages. The Indexed priority policy has a broader scope of application, while Whittle’s index policy has a lower computational complexity.
- The performance of the Indexed priority policy and Whittle’s index policy is better than that of the Greedy/Greedy+ policies and is not far from the performance of the RP-optimal policy. Recall that the performance of the RP-optimal policy forms a universal lower bound on the performance of all admissible policies for PP. Hence, we can conclude that both the Indexed priority policy and Whittle’s index policy achieve good performances.

## 8. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Appendix A. Proof of Lemma 1

- We first consider the case of ${s}_{1,j}=0<{s}_{2,j}$ and ${\widehat{r}}_{1,j}={\widehat{r}}_{2,j}=0$. When ${a}_{j}=1$ and for any ${\mathit{x}}^{\prime}-\left\{{s}_{j}^{\prime}\right\}$, we have$${U}_{\nu}^{j}({\mathit{x}}_{1},{\mathit{x}}^{\prime})={p}_{j}{V}_{\nu}({\mathit{x}}^{\prime};{s}_{j}^{\prime}=1)+(1-{p}_{j}){V}_{\nu}({\mathit{x}}^{\prime};{s}_{j}^{\prime}=0),$$$$\begin{array}{cc}\hfill {U}_{\nu}^{j}({\mathit{x}}_{2},{\mathit{x}}^{\prime})& ={\beta}_{j}{V}_{\nu}({x}^{\prime};{s}_{j}^{\prime}={s}_{2,j}+1)+(1-{\beta}_{j}){V}_{\nu}({\mathit{x}}^{\prime};{s}_{j}^{\prime}=0),\hfill \end{array}$$$$\begin{array}{c}\hfill {U}_{\nu}^{j}({\mathit{x}}_{1},{\mathit{x}}^{\prime})-{U}_{\nu}^{j}({\mathit{x}}_{2},{\mathit{x}}^{\prime})\le ({p}_{j}-{\beta}_{j})({V}_{\nu}({\mathit{x}}^{\prime};{s}_{j}^{\prime}=1)-{V}_{\nu}({\mathit{x}}^{\prime};{s}_{j}^{\prime}=0))\le 0.\end{array}$$The inequalities hold since ${\beta}_{j}>{p}_{j}$ and Lemma 1 are true at iteration $\nu $ by assumption. Therefore, we have ${U}_{\nu}^{j}({\mathit{x}}_{1},{\mathit{x}}^{\prime})\le {U}_{\nu}^{j}({\mathit{x}}_{2},{\mathit{x}}^{\prime})$ when ${a}_{j}=1$ for any ${\mathit{x}}^{\prime}-\left\{{s}_{j}^{\prime}\right\}$.For the case of ${a}_{i}=1$ where $i\ne j$, we notice that ${a}_{j}=0$. Then, for any ${\mathit{x}}^{\prime}-\left\{{s}_{j}^{\prime}\right\}$, we obtain$${U}_{\nu}^{j}({\mathit{x}}_{1},{\mathit{x}}^{\prime})={p}_{j}{V}_{\nu}({\mathit{x}}^{\prime};{s}_{j}^{\prime}=1)+(1-{p}_{j}){V}_{\nu}({\mathit{x}}^{\prime};{s}_{j}^{\prime}=0),$$$$\begin{array}{cc}\hfill {U}_{\nu}^{j}({\mathit{x}}_{2},{\mathit{x}}^{\prime})& =(1-{p}_{j}){V}_{\nu}({\mathit{x}}^{\prime};{s}_{j}^{\prime}={s}_{2,j}+1)+{p}_{j}{V}_{\nu}({\mathit{x}}^{\prime};{s}_{j}^{\prime}=0).\hfill \end{array}$$Therefore, when ${a}_{i}=1$, we have$$\begin{array}{c}\hfill {U}_{\nu}^{j}({\mathit{x}}_{1},{\mathit{x}}^{\prime})-{U}_{\nu}^{j}({\mathit{x}}_{2},{\mathit{x}}^{\prime})\le (2{p}_{j}-1)({V}_{\nu}({\mathit{x}}^{\prime};{s}_{j}^{\prime}=1)-{V}_{\nu}({\mathit{x}}^{\prime};{s}_{j}^{\prime}=0))\le 0.\end{array}$$The inequalities hold since $2{p}_{j}-1<0$ and Lemma 1 is true at iteration $\nu $ by assumption. Combining with the case of ${a}_{j}=1$, ${U}_{\nu}^{j}({\mathit{x}}_{1},{\mathit{x}}^{\prime})\le {U}_{\nu}^{j}({\mathit{x}}_{2},{\mathit{x}}^{\prime})$ holds for any ${\mathit{x}}^{\prime}-\left\{{s}_{j}^{\prime}\right\}$ under any feasible action. Since ${\mathit{x}}_{1}$ and ${\mathit{x}}_{2}$ differ only in the value of ${s}_{j}$ and $C\left(\mathit{x}\right)$ is non-decreasing in ${s}_{i}$ for $1\le i\le N$, we can see that ${V}_{\nu +1}^{\mathit{a}}\left({\mathit{x}}_{1}\right)\le {V}_{\nu +1}^{\mathit{a}}\left({\mathit{x}}_{2}\right)$ for any feasible $\mathit{a}$. Then, by (A1), we can conclude that the lemma holds at iteration $\nu +1$ when ${s}_{1,j}=0<{s}_{2,j}$ and ${\widehat{r}}_{1,j}={\widehat{r}}_{2,j}=0$.
- When ${s}_{1,j}=0<{s}_{2,j}$ and ${\widehat{r}}_{1,j}={\widehat{r}}_{2,j}=1$, by replacing the ${\beta}_{j}$’s in the above case with ${\alpha}_{j}$’s, we can achieve the same result.
- When $0<{s}_{1,j}<{s}_{2,j}$ and ${\widehat{r}}_{1,j}={\widehat{r}}_{2,j}$, we notice that$$\begin{array}{cc}& {P}_{{s}_{1,j},{s}_{1,j}+1}({a}_{j},{\widehat{r}}_{1,j})={P}_{{s}_{2,j},{s}_{2,j}+1}({a}_{j},{\widehat{r}}_{2,j}),\hfill \\ & {P}_{{s}_{1,j},0}({a}_{j},{\widehat{r}}_{1,j})={P}_{{s}_{2,j},0}({a}_{j},{\widehat{r}}_{2,j}).\hfill \end{array}$$

## Appendix B. Proof of Lemma 2

- We first show that ${V}_{\nu +1}^{j}\left(\mathit{x}\right)={V}_{\nu +1}^{k}\left(\mathcal{P}\left(\mathit{x}\right)\right)$. According to (A3), we have$${V}_{\nu +1}^{j}\left(\mathit{x}\right)=C\left(\mathit{x}\right)-\theta +\sum _{{x}^{\prime}}\left\{\left(\prod _{i\ne j,k}{P}_{{x}_{i},{x}_{i}^{\prime}}^{i}\left(0\right)\right){P}_{{x}_{k},{x}_{k}^{\prime}}^{k}\left(0\right){P}_{{x}_{j},{x}_{j}^{\prime}}^{j}\left(1\right){V}_{\nu}\left({\mathit{x}}^{\prime}\right)\right\}.$$$$\begin{array}{cc}\hfill {V}_{\nu +1}^{k}\left(\mathcal{P}\left(\mathit{x}\right)\right)& =C\left(\mathcal{P}\right(\mathit{x}\left)\right)-\theta +\hfill \\ & \sum _{\mathcal{P}{\left(\mathit{x}\right)}^{\prime}}\left(\prod _{i\ne j,k}{P}_{\mathcal{P}{\left(\mathit{x}\right)}_{i},\mathcal{P}{\left(\mathit{x}\right)}_{i}^{\prime}}^{i}\left(0\right)\right){P}_{\mathcal{P}{\left(\mathit{x}\right)}_{k},\mathcal{P}{\left(\mathit{x}\right)}_{k}^{\prime}}^{k}\left(1\right){P}_{\mathcal{P}{\left(\mathit{x}\right)}_{j},\mathcal{P}{\left(\mathit{x}\right)}_{j}^{\prime}}^{j}\left(0\right){V}_{\nu}\left(\mathcal{P}{\left(\mathit{x}\right)}^{\prime}\right).\hfill \end{array}$$It is obvious that for any $\mathcal{P}{\left(\mathit{x}\right)}^{\prime}$, there always exists $\mathcal{P}\left({\mathit{x}}^{\u2033}\right)=\mathcal{P}{\left(\mathit{x}\right)}^{\prime}$. Then, we obtain$$\begin{array}{cc}\hfill {V}_{\nu +1}^{k}\left(\mathcal{P}\left(\mathit{x}\right)\right)& =C\left(\mathcal{P}\right(\mathit{x}\left)\right)-\theta +\hfill \\ & \phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\sum _{\mathcal{P}\left({\mathit{x}}^{\u2033}\right)}\left(\prod _{i\ne j,k}{P}_{{x}_{i},{x}_{i}^{\u2033}}^{i}\left(0\right)\right){P}_{{x}_{j},\mathcal{P}{\left({x}^{\u2033}\right)}_{k}}^{k}\left(1\right){P}_{{x}_{k},\mathcal{P}{\left({x}^{\u2033}\right)}_{j}}^{j}\left(0\right){V}_{\nu}\left(\mathcal{P}\left({\mathit{x}}^{\u2033}\right)\right)\hfill \\ & =C\left(\mathcal{P}\left(\mathit{x}\right)\right)-\theta +\sum _{{\mathit{x}}^{\u2033}}\left(\prod _{i\ne j,k}{P}_{{x}_{i},{x}_{i}^{\u2033}}^{i}\left(0\right)\right){P}_{{x}_{j},{x}_{j}^{\u2033}}^{k}\left(1\right){P}_{{x}_{k},{x}_{k}^{\u2033}}^{j}\left(0\right){V}_{\nu}\left({\mathit{x}}^{\u2033}\right)\hfill \\ & =C\left(\mathcal{P}\left(\mathit{x}\right)\right)-\theta +\sum _{{\mathit{x}}^{\prime}}\left(\prod _{i\ne j,k}{P}_{{x}_{i},{x}_{i}^{\prime}}^{i}\left(0\right)\right){P}_{{x}_{j},{x}_{j}^{\prime}}^{k}\left(1\right){P}_{{x}_{k},{x}_{k}^{\prime}}^{j}\left(0\right){V}_{\nu}\left({\mathit{x}}^{\prime}\right).\hfill \end{array}$$The second equality follows from the definition of $\mathcal{P}(\xb7)$, the property of summation, and the assumption at iteration $\nu $. The last equality follows from the variable renaming. Then, by the definition of statistically identical, we have ${P}_{{x}_{j},{x}_{j}^{\prime}}^{k}\left(1\right)={P}_{{x}_{j},{x}_{j}^{\prime}}^{j}\left(1\right)$, ${P}_{{x}_{k},{x}_{k}^{\prime}}^{j}\left(0\right)={P}_{{x}_{k},{x}_{k}^{\prime}}^{k}\left(0\right)$, and $C\left(\mathit{x}\right)=C\left(\mathcal{P}\right(\mathit{x}\left)\right)$. Therefore, we can conclude that ${V}_{\nu +1}^{j}\left(\mathit{x}\right)={V}_{\nu +1}^{k}\left(\mathcal{P}\left(\mathit{x}\right)\right)$.
- Along the same lines, we can easily show that ${V}_{\nu +1}^{k}\left(\mathit{x}\right)={V}_{\nu +1}^{j}\left(\mathcal{P}\left(\mathit{x}\right)\right)$ and ${V}_{\nu +1}^{i}\left(\mathit{x}\right)={V}_{\nu +1}^{i}\left(\mathcal{P}\left(\mathit{x}\right)\right)$ for $i\ne j,k$.

## Appendix C. Proof of Theorem 1

^{j,k}(

**x**) ≤ 0 if ${\widehat{r}}_{k}={p}_{e,k}^{0}=0$. The equality holds when ${s}_{j}=0$ or ${\widehat{r}}_{j}={p}_{e,j}^{0}=0$.

- When ${s}_{j}=0$, we can easily show that ${R}^{j,k}(\mathit{x},{\mathit{x}}^{\prime})=0$ for any ${x}^{\prime}-\{{s}_{j}^{\prime},{s}_{k}^{\prime}\}$ by noticing that the two possible actions with respect to user j (i.e., ${a}_{j}=1$ and ${a}_{j}=0$) are equivalent when ${s}_{j}=0$. Since ${\delta}^{j,k}\left(\mathit{x}\right)$ is a linear combination of ${R}^{j,k}(\mathit{x},{\mathit{x}}^{\prime})$’s with non-negative coefficients, we can conclude that ${\delta}^{j,k}\left(\mathit{x}\right)=0$ in this case.
- When ${s}_{j}>0$ and ${\widehat{r}}_{j}=1$, for any ${\mathit{x}}^{\prime}-\{{s}_{j}^{\prime},{s}_{k}^{\prime}\}$, we have$$\begin{array}{cc}\hfill {R}^{j,k}(\mathit{x},{\mathit{x}}^{\prime})& =\sum _{{s}_{k}^{\prime}}{P}_{{s}_{k},{s}_{k}^{\prime}}(0,0)({\alpha}_{j}+{p}_{j}-1)\left(V({\mathit{x}}^{\prime};{s}_{j}^{\prime}={s}_{j}+1)-V({\mathit{x}}^{\prime};{s}_{j}^{\prime}=0)\right)\hfill \\ \hfill \phantom{\rule{1.em}{0ex}}& \le 0.\hfill \end{array}$$The inequality holds because of Lemma 1 and the fact that ${\alpha}_{j}+{p}_{j}<1$. We recall that ${\delta}^{j,k}\left(\mathit{x}\right)$ is a linear combination of ${R}^{j,k}(\mathit{x},{\mathit{x}}^{\prime})$’s with non-negative coefficients. Then, we can conclude that ${\delta}^{j,k}\left(\mathit{x}\right)\le 0$ in this case.
- When ${s}_{j}>0$ and ${\widehat{r}}_{j}=0$, by replacing the ${\alpha}_{j}$ in (A6) with ${\beta}_{j}$, we can get the same result. In this case, the equality holds when ${\beta}_{j}+{p}_{j}=1$, or, equivalently, ${p}_{e,j}^{0}=0$.

^{j,k}(

**x**) is non-increasing in ${\widehat{r}}_{j}$ and is non-decreasing in ${\widehat{r}}_{k}$ when ${s}_{j},{s}_{k}0$. At the same time, ${\delta}^{j,k}\left(\mathit{x}\right)$ is independent of ${\widehat{r}}_{j}$ for any i ≠ j,k.

- When ${s}_{j}=0$, we have ${P}_{{s}_{j},{s}_{j}^{\prime}}(1,{\widehat{r}}_{j})={P}_{{s}_{j},{s}_{j}^{\prime}}(0,{\widehat{r}}_{j})$ for any ${s}_{j}^{\prime}$. Thus, we conclude that ${R}_{2}=0$ for any ${\mathit{x}}^{\prime}-\left\{{s}_{j}^{\prime}\right\}$. Consequently, ${R}^{j,k}(\mathit{x},{\mathit{x}}^{\prime})=0$ for any ${\mathit{x}}^{\prime}-\{{s}_{j}^{\prime},{s}_{k}^{\prime}\}$.
- When ${s}_{j}>0$ and ${\widehat{r}}_{j}=1$, for any ${\mathit{x}}^{\prime}-\left\{{s}_{j}^{\prime}\right\}$, we have$${R}_{2}=({\alpha}_{j}-1+{p}_{j})V({\mathit{x}}^{\prime};{s}_{j}^{\prime}={s}_{j}+1)+(1-{\alpha}_{j}-{p}_{j})V({\mathit{x}}^{\prime};{s}_{j}^{\prime}=0)\le 0$$The inequality follows from Lemma 1 and the fact that ${\alpha}_{j}+{p}_{j}<1$. Thus, ${R}^{j,k}(\mathit{x},{\mathit{x}}^{\prime})\le 0$ for any ${\mathit{x}}^{\prime}-\{{s}_{j}^{\prime},{s}_{k}^{\prime}\}$.
- When ${s}_{j}>0$ and ${\widehat{r}}_{j}=0$, by replacing the ${\alpha}_{j}$ in (A7) with ${\beta}_{j}$, we can get the same result. In this case, the equality holds when ${\beta}_{j}+{p}_{j}=1$, or, equivalently, ${p}_{e,j}^{0}=0$.

- In the case of ${\widehat{r}}_{j}={\widehat{r}}_{k}=1$ and ${s}_{j},{s}_{k}>0$, for any ${x}^{\prime}-\{{s}_{j}^{\prime},{s}_{k}^{\prime}\}$, (A5) can be written as$$\begin{array}{cc}\hfill {R}^{j,k}(\mathit{x},{\mathit{x}}^{\prime})& =\sum _{{s}_{j}^{\prime},{s}_{k}^{\prime}}\left[\left({P}_{{s}_{k},{s}_{k}^{\prime}}(0,1){P}_{{s}_{j},{s}_{j}^{\prime}}(1,1)-{P}_{{s}_{k},{s}_{k}^{\prime}}(1,1){P}_{{s}_{j},{s}_{j}^{\prime}}(0,1)\right)V\left({\mathit{x}}^{\prime}\right)\right]\hfill \\ & =\left({p}_{k}{\alpha}_{j}-(1-{p}_{j})(1-{\alpha}_{k})\right)V({\mathit{x}}^{\prime};{s}_{j}^{\prime}={s}_{j}+1;{s}_{k}^{\prime}=0)\hfill \\ & \phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}+\left((1-{p}_{k})(1-{\alpha}_{j})-{p}_{j}{\alpha}_{k}\right)V({\mathit{x}}^{\prime};{s}_{j}^{\prime}=0;{s}_{k}^{\prime}={s}_{k}+1)\hfill \\ & \phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}+\left((1-{p}_{k}){\alpha}_{j}-(1-{p}_{j}){\alpha}_{k}\right)V({\mathit{x}}^{\prime};{s}_{j}^{\prime}={s}_{j}+1;{s}_{k}^{\prime}={s}_{k}+1)\hfill \\ & \phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}+\left({p}_{k}(1-{\alpha}_{j})-{p}_{j}(1-{\alpha}_{k})\right)V({\mathit{x}}^{\prime};{s}_{j}^{\prime}=0;{s}_{k}^{\prime}=0).\hfill \end{array}$$As we can verify$${p}_{k}{\alpha}_{j}-(1-{p}_{j})(1-{\alpha}_{k})<\frac{1}{2}({p}_{k}+{p}_{j}-1)<0,$$$$(1-{p}_{k})(1-{\alpha}_{j})-{p}_{j}{\alpha}_{k}>\frac{1}{2}(1-{p}_{k}-{p}_{j})>0.$$We define ${\Gamma}_{i}^{1}\triangleq \frac{{\alpha}_{i}}{1-{p}_{i}}$ and ${\Gamma}_{i}^{0}\triangleq \frac{{\beta}_{i}}{1-{p}_{i}}$ for $1\le i\le N$. Then, we have$${\Gamma}_{j}^{1}\u22da{\Gamma}_{k}^{1}\u27f9(1-{p}_{k}){\alpha}_{j}-(1-{p}_{j}){\alpha}_{k}\u22da0.$$Combining with Lemma 1, we can conclude that, for any ${\mathit{x}}^{\prime}-\{{s}_{j}^{\prime},{s}_{k}^{\prime}\}$, ${R}^{j,k}(\mathit{x},{\mathit{x}}^{\prime})$ is non-increasing in ${s}_{j}$ if ${\Gamma}_{j}^{1}\le {\Gamma}_{k}^{1}$ and is non-decreasing in ${s}_{k}$ if ${\Gamma}_{j}^{1}\ge {\Gamma}_{k}^{1}$.
- In the case of ${\widehat{r}}_{j}={\widehat{r}}_{k}=0$ and ${s}_{j},{s}_{k}>0$, by replacing the $\alpha $’s in the above case with $\beta $’s, we can conclude with the same result.
- In the case of ${\widehat{r}}_{j}=1$, ${\widehat{r}}_{k}=0$, and ${s}_{j},{s}_{k}>0$, for any ${\mathit{x}}^{\prime}-\{{s}_{j}^{\prime},{s}_{k}^{\prime}\}$, (A5) can be written as$$\begin{array}{cc}\hfill {R}^{j,k}(\mathit{x},{\mathit{x}}^{\prime})& =\sum _{{s}_{j}^{\prime},{s}_{k}^{\prime}}\left[\left({P}_{{s}_{k},{s}_{k}^{\prime}}(0,0){P}_{{s}_{j},{s}_{j}^{\prime}}(1,1)-{P}_{{s}_{k},{s}_{k}^{\prime}}(1,0){P}_{{s}_{j},{s}_{j}^{\prime}}(0,1)\right)V\left({\mathit{x}}^{\prime}\right)\right]\hfill \\ & =\left({p}_{k}{\alpha}_{j}-(1-{p}_{j})(1-{\beta}_{k})\right)V({\mathit{x}}^{\prime};{s}_{j}^{\prime}={s}_{j}+1;{s}_{k}^{\prime}=0)\hfill \\ & \phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}+\left((1-{p}_{k})(1-{\alpha}_{j})-{p}_{j}{\beta}_{k}\right)V({\mathit{x}}^{\prime};{s}_{j}^{\prime}=0;{s}_{k}^{\prime}={s}_{k}+1)\hfill \\ & \phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}+\left((1-{p}_{k}){\alpha}_{j}-(1-{p}_{j}){\beta}_{k}\right)V({\mathit{x}}^{\prime};{s}_{j}^{\prime}={s}_{j}+1;{s}_{k}^{\prime}={s}_{k}+1)\hfill \\ & \phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}+\left({p}_{k}(1-{\alpha}_{j})-{p}_{j}(1-{\beta}_{k})\right)V({\mathit{x}}^{\prime};{s}_{j}^{\prime}=0;{s}_{k}^{\prime}=0).\hfill \end{array}$$As we can verify$${p}_{k}{\alpha}_{j}-(1-{p}_{j})(1-{\beta}_{k})<{p}_{k}\left({p}_{j}-\frac{1}{2}\right)<0,$$$$(1-{p}_{k})(1-{\alpha}_{j})-{p}_{j}{\beta}_{k}>(1-{p}_{k})\left(\frac{1}{2}-{p}_{j}\right)>0.$$At the same time$${\Gamma}_{j}^{1}\u22da{\Gamma}_{k}^{0}\u27f9(1-{p}_{k}){\alpha}_{j}-(1-{p}_{j}){\beta}_{k}\u22da0.$$Combined with Lemma 1, we can conclude that, for any ${\mathit{x}}^{\prime}-\{{s}_{j}^{\prime},{s}_{k}^{\prime}\}$, ${R}^{j,k}(\mathit{x},{\mathit{x}}^{\prime})$ is non-increasing in ${s}_{j}$ if ${\Gamma}_{j}^{1}\le {\Gamma}_{k}^{0}$ and is non-decreasing in ${s}_{k}$ if ${\Gamma}_{j}^{1}\ge {\Gamma}_{k}^{0}$.
- In the case of ${\widehat{r}}_{j}=0$, ${\widehat{r}}_{k}=1$, and ${s}_{j},{s}_{k}>0$, by swapping the $\alpha $’s and $\beta $’s in the above case, we can conclude with the same result.

- We first consider the case of ${s}_{j}\ge {s}_{k}>0$ and ${\widehat{r}}_{j}={\widehat{r}}_{k}=0$. Leveraging the definition of statistically identical, for any ${\mathit{x}}^{\prime}-\{{x}_{j}^{\prime},{x}_{k}^{\prime}\}$, we have$$\begin{array}{cc}\hfill {Q}^{j,k}(\mathit{x},{\mathit{x}}^{\prime})=\sum _{{\widehat{r}}_{j}^{\prime},{\widehat{r}}_{k}^{\prime}}P\left({\widehat{r}}_{j}^{\prime}\right)P\left({\widehat{r}}_{k}^{\prime}\right){\kappa}_{1}(& V({x}^{\prime};{x}_{j}^{\prime}=(0,{\widehat{r}}_{j}^{\prime});{x}_{k}^{\prime}=({s}_{k}+1,{\widehat{r}}_{k}^{\prime}))-\hfill \\ & V({\mathit{x}}^{\prime};{x}_{j}^{\prime}=({s}_{j}+1,{\widehat{r}}_{j}^{\prime});{x}_{k}^{\prime}=(0,{\widehat{r}}_{k}^{\prime}))),\hfill \end{array}$$$$\begin{array}{cc}\hfill {Q}^{j,k}(\mathit{x},{\mathit{x}}^{\prime})=& {\gamma}_{j}{\gamma}_{k}{\kappa}_{1}V({\mathit{x}}^{\prime};{x}_{j}^{\prime}=({s}_{k}+1,1);{x}_{k}^{\prime}=(0,1))-\hfill \\ & {\gamma}_{j}{\gamma}_{k}{\kappa}_{1}V({\mathit{x}}^{\prime};{x}_{j}^{\prime}=({s}_{j}+1,1);{x}_{k}^{\prime}=(0,1))+\hfill \\ & (1-{\gamma}_{j})(1-{\gamma}_{k}){\kappa}_{1}V({\mathit{x}}^{\prime};{x}_{j}^{\prime}=({s}_{k}+1,0);{x}_{k}^{\prime}=(0,0))-\hfill \\ & (1-{\gamma}_{j})(1-{\gamma}_{k}){\kappa}_{1}V({\mathit{x}}^{\prime};{x}_{j}^{\prime}=({s}_{j}+1,0);{x}_{k}^{\prime}=(0,0))+\hfill \\ & {\gamma}_{k}(1-{\gamma}_{j}){\kappa}_{1}V({\mathit{x}}^{\prime};{x}_{j}^{\prime}=({s}_{k}+1,1);{x}_{k}^{\prime}=(0,0))-\hfill \\ & {\gamma}_{k}(1-{\gamma}_{j}){\kappa}_{1}V({\mathit{x}}^{\prime};{x}_{j}^{\prime}=({s}_{j}+1,0);{x}_{k}^{\prime}=(0,1))+\hfill \\ & {\gamma}_{j}(1-{\gamma}_{k}){\kappa}_{1}V({\mathit{x}}^{\prime};{x}_{j}^{\prime}=({s}_{k}+1,0);{x}_{k}^{\prime}=(0,1))-\hfill \\ & {\gamma}_{j}(1-{\gamma}_{k}){\kappa}_{1}V({\mathit{x}}^{\prime};{x}_{j}^{\prime}=({s}_{j}+1,1);{x}_{k}^{\prime}=(0,0)).\hfill \end{array}$$Since users j and k are statistically identical, we have ${\gamma}_{j}={\gamma}_{k}$. Then, by Lemma 1, we have ${Q}^{j,k}(\mathit{x},{\mathit{x}}^{\prime})\le 0$ for any ${\mathit{x}}^{\prime}-\{{x}_{j}^{\prime},{x}_{k}^{\prime}\}$. Since ${\delta}^{j,k}\left(\mathit{x}\right)$ is a linear combination of ${Q}^{j,k}(\mathit{x},{\mathit{x}}^{\prime})$’s with non-negative coefficients, we can conclude that ${\delta}^{j,k}\left(\mathit{x}\right)\le 0$.
- For the case of ${s}_{j}\ge {s}_{k}>0$ and ${\widehat{r}}_{j}={\widehat{r}}_{k}=1$, by replacing ${\beta}_{j}$ in ${\kappa}_{1}$ with ${\alpha}_{j}$, we can conclude with the same result.
- Then, we consider the case of ${s}_{j}\ge {s}_{k}>0$, ${\widehat{r}}_{j}=1$, and ${\widehat{r}}_{k}=0$. We first notice that, for any ${\mathit{x}}^{\prime}-\{{s}_{j}^{\prime},{s}_{k}^{\prime}\}$$$\begin{array}{cc}\hfill {R}^{j,k}(\mathit{x},{\mathit{x}}^{\prime})=& \left({p}_{k}{\alpha}_{j}-(1-{p}_{j})(1-{\beta}_{k})\right)V({\mathit{x}}^{\prime};{s}_{j}^{\prime}={s}_{j}+1;{s}_{k}^{\prime}=0)+\hfill \\ & \left((1-{p}_{k})(1-{\alpha}_{j})-{p}_{j}{\beta}_{k}\right)V({\mathit{x}}^{\prime};{s}_{j}^{\prime}=0;{s}_{k}^{\prime}={s}_{k}+1)+\hfill \\ & \left((1-{p}_{k}){\alpha}_{j}-(1-{p}_{j}){\beta}_{k}\right)V({\mathit{x}}^{\prime};{s}_{j}^{\prime}={s}_{j}+1;{s}_{k}^{\prime}={s}_{k}+1)+\hfill \\ & \left({p}_{k}(1-{\alpha}_{j})-{p}_{j}(1-{\beta}_{k})\right)V({\mathit{x}}^{\prime};{s}_{j}^{\prime}=0;{s}_{k}^{\prime}=0).\hfill \end{array}$$As users j and k are statistically identical, we have ${p}_{j}={p}_{k}$ and ${\alpha}_{j}<{\beta}_{k}$. Leveraging Lemma 1, we have$$\begin{array}{cc}\hfill {R}^{j,k}(\mathit{x},{\mathit{x}}^{\prime})\le & ({\alpha}_{j}+{p}_{j}-1)(V({\mathit{x}}^{\prime};{s}_{j}^{\prime}={s}_{j}+1;{s}_{k}^{\prime}=0)-\hfill \\ & V({\mathit{x}}^{\prime};{s}_{j}^{\prime}=0;{s}_{k}^{\prime}={s}_{k}+1)).\hfill \end{array}$$Then, for any ${\mathit{x}}^{\prime}-\{{x}_{j}^{\prime},{x}_{k}^{\prime}\}$$$\begin{array}{cc}\hfill {Q}^{j,k}(\mathit{x},{\mathit{x}}^{\prime})\le \sum _{{\widehat{r}}_{j}^{\prime},{\widehat{r}}_{k}^{\prime}}P\left({\widehat{r}}_{j}^{\prime}\right)P\left({\widehat{r}}_{k}^{\prime}\right){\kappa}_{2}& (V({\mathit{x}}^{\prime};{x}_{j}^{\prime}=(0,{\widehat{r}}_{j}^{\prime});{x}_{k}^{\prime}=({s}_{k}+1,{\widehat{r}}_{k}^{\prime}))-\hfill \\ & V({\mathit{x}}^{\prime};{x}_{j}^{\prime}=({s}_{j}+1,{\widehat{r}}_{j}^{\prime});{x}_{k}^{\prime}=(0,{\widehat{r}}_{k}^{\prime}))),\hfill \end{array}$$

## Appendix D. Proof of Corollary 2

- We first consider the case of ${s}_{1}=0<{s}_{2}$ and ${\widehat{r}}_{1}={\widehat{r}}_{2}=0$. When $a=1$, we have$${V}_{\nu +1}^{1}\left({x}_{1}\right)=C({x}_{1},1)-\theta +\sum _{{\widehat{r}}^{\prime}}P\left({\widehat{r}}^{\prime}\right)(p{V}_{\nu}(1,{\widehat{r}}^{\prime})+(1-p){V}_{\nu}(0,{\widehat{r}}^{\prime})),$$$$\begin{array}{c}\hfill {V}_{\nu +1}^{1}\left({x}_{2}\right)=C({x}_{2},1)-\theta +\sum _{{\widehat{r}}^{\prime}}P\left({\widehat{r}}^{\prime}\right)(\beta {V}_{\nu}({s}_{2}+1,{\widehat{r}}^{\prime})+(1-\beta ){V}_{\nu}(0,{\widehat{r}}^{\prime})).\end{array}$$Subtracting the two expressions yields$$\begin{array}{cc}\hfill {V}_{\nu +1}^{1}\left({x}_{1}\right)& -{V}_{\nu +1}^{1}\left({x}_{2}\right)\hfill \\ & \le C({x}_{1},1)-C({x}_{2},1)+\sum _{{\widehat{r}}^{\prime}}P\left({\widehat{r}}^{\prime}\right)\left[(p-\beta )\left({V}_{\nu}(1,{\widehat{r}}^{\prime})-{V}_{\nu}(0,{\widehat{r}}^{\prime})\right)\right]\le 0.\hfill \end{array}$$The inequalities hold since $\beta >p$, $C(x,a)$ is non-decreasing in s, and Corollary 2 is true at iteration $\nu $ by assumption.For the case of $a=0$, we obtain$${V}_{\nu +1}^{0}\left({x}_{1}\right)=C({x}_{1},0)-\theta +\sum _{{\widehat{r}}^{\prime}}P\left({\widehat{r}}^{\prime}\right)(p{V}_{\nu}(1,{\widehat{r}}^{\prime})+(1-p){V}_{\nu}(0,{\widehat{r}}^{\prime})),$$$$\begin{array}{c}\hfill {V}_{\nu +1}^{0}\left({x}_{2}\right)=C({x}_{2},0)-\theta +\sum _{{\widehat{r}}^{\prime}}P\left({\widehat{r}}^{\prime}\right)((1-p){V}_{\nu}({s}_{2}+1,{\widehat{r}}^{\prime})+p{V}_{\nu}(0,{\widehat{r}}^{\prime})).\end{array}$$Therefore, when $a=0$, we have$$\begin{array}{cc}\hfill {V}_{\nu +1}^{0}\left({x}_{1}\right)& -{V}_{\nu +1}^{0}\left({x}_{2}\right)\hfill \\ & \le C({x}_{1},0)-C({x}_{2},0)+\sum _{{\widehat{r}}^{\prime}}P\left({\widehat{r}}^{\prime}\right)\left[(2p-1)\left({V}_{\nu}(1,{\widehat{r}}^{\prime})-{V}_{\nu}(0,{\widehat{r}}^{\prime})\right)\right]\le 0.\hfill \end{array}$$The inequalities hold since $2p-1<0$, $C(x,a)$ is non-decreasing in s, and Corollary 2 is true at iteration $\nu $ by assumption. Combined together, we can see that ${V}_{\nu +1}^{a}\left({x}_{1}\right)\le {V}_{\nu +1}^{a}\left({x}_{2}\right)$ for any feasible a. Then, by problem (A8), we can conclude that the lemma holds at iteration $\nu +1$ when ${s}_{1}=0<{s}_{2}$ and ${\widehat{r}}_{1}={\widehat{r}}_{2}=0$.
- When ${s}_{1}=0<{s}_{2}$ and ${\widehat{r}}_{1}={\widehat{r}}_{2}=1$, by replacing the $\beta $’s in the above case with $\alpha $’s, we can achieve the same result.
- When $0<{s}_{1}<{s}_{2}$ and ${\widehat{r}}_{1}={\widehat{r}}_{2}$, we notice that ${P}_{{s}_{1},{s}_{1}+1}(a,{\widehat{r}}_{1})={P}_{{s}_{2},{s}_{2}+1}(a,{\widehat{r}}_{2})$ and ${P}_{{s}_{1},0}(a,{\widehat{r}}_{1})={P}_{{s}_{2},0}(a,{\widehat{r}}_{2})$. Then, leveraging the monotonicity of ${V}_{\nu}\left(x\right)$ and $C(x,a)$, we can conclude with the same result.

## Appendix E. Proof of Proposition 1

- We first consider the state $x=(0,\widehat{r})$. Applying the results in Section 2.3 to problem (A9), we obtain$$\begin{array}{cc}\hfill {V}^{0}(0,\widehat{r})=& -\theta +(1-\gamma )(1-p)V(0,0)+(1-\gamma )pV(1,0)+\hfill \\ & \gamma (1-p)V(0,1)+\gamma pV(1,1),\hfill \end{array}$$$${V}^{1}(0,\widehat{r})=\lambda +{V}^{0}(0,\widehat{r}).$$Therefore, $\Delta V(0,\widehat{r})=\lambda \ge 0$. Thus, the optimal action at state $(0,\widehat{r})$ is $a=0$.
- Then, we consider the state $x=(s,0)$ where $s>0$. Applying the results in Section 2.3 to Equation (A9), we obtain$$\begin{array}{cc}\hfill {V}^{0}(s,0)& =f\left(s\right)-\theta +(1-\gamma )pV(0,0)+(1-\gamma )(1-p)V(s+1,0)+\hfill \\ & \phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\gamma pV(0,1)+\gamma (1-p)V(s+1,1),\hfill \end{array}$$$$\begin{array}{cc}\hfill {V}^{1}(s,0)& =f\left(s\right)+\lambda -\theta +(1-\gamma )(1-\beta )V(0,0)+(1-\gamma )\beta V(s+1,0)+\hfill \\ & \phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\gamma (1-\beta )V(0,1)+\gamma \beta V(s+1,1).\hfill \end{array}$$Then,$$\Delta V(s,0)=\lambda +{p}_{e}^{0}(1-2p)\omega ,$$
- Finally, we consider the state $x=(s,1)$ where $s>0$. Following the same trajectory, we have$$\begin{array}{c}\hfill \Delta V(s,1)=\lambda +(1-{p}_{e}^{1})(1-2p)\omega .\end{array}$$

## Appendix F. Proof of Proposition 2

**Figure A1.**DTMC induced by the threshold policy $\mathit{n}=({n}_{0},{n}_{1})$. In the figure, ${c}_{1}=(1-\gamma )(1-p)+\gamma \alpha $ and ${c}_{2}=(1-\gamma )$$\beta +\gamma \alpha $.

- For state $(s,\widehat{r})$ where $s<{n}_{1}$, it is optimal to stay idle (i.e., $a=0$).
- For state $(s,\widehat{r})$ where ${n}_{1}\le s<{n}_{0}$, it is optimal to make a transmission attempt only when $\widehat{r}=1$. We recall that $\widehat{r}$ is an independent Bernoulli random variable with parameter $\gamma $. Therefore, the expected proportion of time that the system is at state $(s,1)$ is $\gamma {\pi}_{s}$.
- For state $(s,\widehat{r})$ where $s\ge {n}_{0}$, it is optimal to make transmission attempt regardless of $\widehat{r}$.

## Appendix G. Proof of Proposition 4

- We first consider the state $x=(0,\widehat{r})$. By definition, Whittle’s index is the infimum $\lambda $ such that ${V}^{0}\left(x\right)={V}^{1}\left(x\right)$. According to (A10), we can conclude that ${W}_{x}=0$ when $x=(0,\widehat{r})$.
- Then, we consider the state $x=(s,0)$ where $s>0$. We recall that ${p}_{e}^{0}=0$. Then, we can conclude, from (A11), that ${W}_{x}=0$ for all $x=(s,0)$ where $s>0$.

**Lemma**

**A1.**

**Proof.**

## Appendix H. Proof of Proposition 5

- For state $(s,1)$ where $s<n$, ${W}_{s}\le \lambda $ by definition. As the problem is indexable, we have $D\left({W}_{s}\right)\subseteq D\left(\lambda \right)$. We recall that ${W}_{s}\triangleq min\{{\lambda}^{\prime}\ge 0:{V}^{0}(s,1)={V}^{1}(s,1)\}$. Equivalently, ${W}_{s}\triangleq min\{{\lambda}^{\prime}\ge 0:(s,1)\in D\left({\lambda}^{\prime}\right)\}$. Then, we know $(s,1)\in D\left({W}_{s}\right)$. Combined together, we conclude that $(s,1)\in D\left(\lambda \right)$. In other words, the optimal action at state $(s,1)$ where $s<n$ is to stay idle (i.e., $a=0$).
- For state $(s,1)$ where $s\ge n$, we first recall that ${W}_{s}=min\{{\lambda}^{\prime}\ge 0:(s,1)\in D\left({\lambda}^{\prime}\right)\}$. Consequently, for any ${\lambda}^{\prime}<{W}_{s}$, we know $(s,1)\notin D\left({\lambda}^{\prime}\right)$. Meanwhile, we have ${W}_{s}\ge {W}_{n}>\lambda $ by the monotonicity of Whittle’s index and the definition of n. Hence, we can conclude that $(s,1)\notin D\left(\lambda \right)$. In other words, the optimal action at state $(s,1)$ where $s\ge n$ is to make the transmission attempt.

## Appendix I. Proof of Theorem 2

- The probability ${P}_{\varphi}({x}_{n}\in G\phantom{\rule{4pt}{0ex}}for\phantom{\rule{4pt}{0ex}}some\phantom{\rule{4pt}{0ex}}n\ge 1\phantom{\rule{4pt}{0ex}}|\phantom{\rule{4pt}{0ex}}{x}_{0}=i)=1$ where ${x}_{n}$ is the state of ${\mathcal{M}}_{1}(\lambda ,-1)$ at time n.
- The expected time ${m}_{iG}\left(\varphi \right)$ of a first passage from i to G under $\varphi $ is finite.
- The expected ${C}_{1}$-cost ${\overline{C}}_{1}^{i,G}\left(\varphi \right)$ and the expected ${C}_{2}$-cost ${\overline{C}}_{2}^{i,G}\left(\varphi \right)$ of a first passage form i to G under $\varphi $ are finite.

- For all $d>0$, the set $A\left(d\right)=\left\{x\phantom{\rule{4pt}{0ex}}\right|$ there exists an action a such that ${C}_{1}(x,a)+{C}_{2}(x,a)\le d\}$ is finite: For any state x, the cost satisfies ${C}_{1}(x,a)+{C}_{2}(x,a)=f\left(s\right)+\lambda a\ge f\left(s\right)$. The equality holds when $a=0$. Then, the states in $A\left(d\right)$ must satisfy $f\left(s\right)\le d$. Combined with the fact that $f\left(s\right)$ is a non-decreasing and unbounded function when $s\in {\mathbb{N}}_{0}$, we can conclude that $A\left(d\right)$ is finite.
- There exists a stationary policy e such that the induced Markov chain has the following properties: the state space $\mathcal{S}$ consists of a single (non-empty) positive recurrent class R and a set U of transient states such that $e\in {\mathcal{R}}^{*}(i,R)$ for $i\in U$. Moreover, both ${\overline{C}}_{1}\left(e\right)$ and ${\overline{C}}_{2}\left(e\right)$ on R are finite: We consider the policy under which the base station makes a transmission attempt at every time slot. According to the system dynamic detailed in Section 2.3, we can see that all the states communicate with state $(0,0)$ and $(0,0)$ communicates with all other states. Thus, the state space $\mathcal{S}$ consists of a single (non-empty) positive recurrent class and the set of transient states can simply be an empty set. ${\overline{C}}_{1}\left(e\right)$ and ${\overline{C}}_{2}\left(e\right)$ are trivially finite as we can verify using Proposition 2.
- Given any two state $x\ne y$, there exists a policy ϕ such that $\varphi \in {\mathcal{R}}^{*}(x,y)$: We notice that, under any policy, the maximum increase of s between two consecutive time slots is 1. Meanwhile, when s decreases, it decreases to zero. Combined with the fact that $\widehat{r}$ is an independent Bernoulli random variable, we can conclude that there always exists a path between any x and y with positive probability. ${m}_{xy}\left(\varphi \right)$, ${\overline{C}}_{1}^{x,y}\left(\varphi \right)$, and ${\overline{C}}_{2}^{x,y}\left(\varphi \right)$ are trivially finite.
- If a stationary policy ϕ has at least one positive recurrent state, then it has a single positive recurrent class R. Moreover, if $x=(0,0)\notin R$, then $\varphi \in {\mathcal{R}}^{*}(x,R)$: Given that $\widehat{r}$ is an independent Bernoulli random variable, we can easily conclude from the system dynamic that all the states communicate with state $(0,0)$ and $(0,0)$ communicates with all other states under any stationary policy. Therefore, any positive recurrent class must contain state $(0,0)$. Thus, there must have only one positive recurrent class which is $R=\mathcal{S}$.
- There exists a policy ϕ such that ${\overline{C}}_{1}\left(\varphi \right)<\infty $ and ${\overline{C}}_{2}\left(\varphi \right)<K$ where $K\in (0,1]$: We notice that ${\overline{C}}_{1}\left(\varphi \right)$ and ${\overline{C}}_{2}\left(\varphi \right)$ are nothing but the expected AoII and the expected transmission rate achieved by $\varphi $, respectively. Then, we can easily verify that such policy exists using Proposition 2.

## Appendix J. Proof of Proposition 6

## Appendix K. Proof of Theorem 3

- It is ${\lambda}^{*}$-optimal;
- The resulting expected transmission rate is equal to M.

- For user i with ${\mathit{n}}_{{\lambda}_{-}^{*},i}={\mathit{n}}_{{\lambda}_{+}^{*},i}\triangleq {\mathit{n}}_{{\lambda}^{*},i}$, the threshold policy ${\mathit{n}}_{{\lambda}^{*},i}$ is used. Then, the deterministic policy ${\mathit{n}}_{{\lambda}^{*},i}$ is optimal for ${\mathcal{M}}_{1}^{i}({\lambda}^{*},-1)$ and$${\overline{\rho}}^{i}\left({\lambda}^{*}\right)={\overline{\rho}}^{i}\left({\lambda}_{-}^{*}\right)={\overline{\rho}}^{i}\left({\lambda}_{+}^{*}\right).$$In this case, the choice of ${\mu}_{i}$ makes no difference.
- For user i with ${\mathit{n}}_{{\lambda}_{-}^{*},i}\ne {\mathit{n}}_{{\lambda}_{+}^{*},i}$, the randomized policy ${\mathit{n}}_{{\lambda}^{*},i}$ as detailed in Theorem 2 is used. Then, for any ${\mu}_{i}\in [0,1]$, the randomized policy ${\mathit{n}}_{{\lambda}^{*},i}$ is optimal for ${\mathcal{M}}_{1}^{i}({\lambda}^{*},-1)$ and$${\overline{\rho}}^{i}\left({\lambda}^{*}\right)={\mu}_{i}{\overline{\rho}}^{i}\left({\lambda}_{-}^{*}\right)+(1-{\mu}_{i}){\overline{\rho}}^{i}\left({\lambda}_{+}^{*}\right).$$

## Appendix L. Proof of Proposition 8

- For state $x=(0,\widehat{r})$, ${I}_{x}=-{\lambda}^{*}$.
- For state $x=(s,0)$ where $s>0$, ${I}_{x}=-{\lambda}^{*}-{p}_{e}^{0}(1-2p)\omega $ where $\omega =(1-\gamma )\left[V\right(0,0)-V(s+1,0\left)\right]+\gamma \left[V\right(0,1)-V(s+1,1\left)\right]\le 0$.
- For state $x=(s,1)$ where $s>0$, ${I}_{x}=-{\lambda}^{*}-(1-{p}_{e}^{1})(1-2p)\omega $.

## Appendix M

Algorithm A1 Improved Relative Value Iteration | |

Require: | |

MDP $\mathcal{M}=(\mathcal{X},\mathcal{P},\mathcal{A},\mathcal{C})$ | |

Convergence Criteria $\u03f5$ | |

1: | procedureRelativeValueIteration($\mathcal{M}$,$\u03f5$) |

2: | Initialize ${V}_{0}\left(x\right)=0$; $\nu =0$ |

3: | Choose ${x}^{ref}\in \mathcal{X}$ arbitrarily |

4: | while ${V}_{\nu}$ is not converged (RVI converges when the maximum difference between the results of two consecutive iterations is less than $\u03f5$) do |

5: | for $x=(s,\widehat{r})\in \mathcal{X}$ do |

6: | if ∃ active state $({s}_{1},{\widehat{r}}_{1})$ s.t. ${s}_{1}\le s$ and ${\widehat{r}}_{1}\le \widehat{r}$ then |

7: | ${a}^{*}\left(x\right)=1$ |

8: | ${Q}_{\nu +1}\left(x\right)=C(x,1)+{\sum}_{{x}^{\prime}}{P}_{x{x}^{\prime}}\left(1\right){V}_{\nu}\left({x}^{\prime}\right)$ |

9: | else |

10: | for $a\in \mathcal{A}$ do |

11: | ${H}_{x,a}=C(x,a)+{\sum}_{{x}^{\prime}}{P}_{x{x}^{\prime}}\left(a\right){V}_{\nu}\left({x}^{\prime}\right)$ |

12: | ${a}^{*}\left(x\right)=arg{min}_{a}\left\{{H}_{x,a}\right\}$ |

13: | ${Q}_{\nu +1}\left(x\right)={H}_{x,{a}^{*}}$ |

14: | for $x\in \mathcal{X}$ do |

15: | ${V}_{\nu +1}\left(x\right)={Q}_{\nu +1}\left(x\right)-{Q}_{\nu +1}\left({x}^{ref}\right)$ |

16: | $\nu =\nu +1$ |

return $\mathit{n}\leftarrow {a}^{*}\left(x\right)$ |

Algorithm A2 Bisection Search | |

Require: | |

Maximum updates per transmission attempt M | |

MDP ${\mathcal{M}}_{N}(\lambda ,-1)=({\mathcal{X}}_{N},{\mathcal{A}}_{N}(-1),{\mathcal{P}}_{N},{\mathcal{C}}_{N}\left(\lambda \right))$ | |

Tolerance $\xi $ | |

Convergence criteria $\u03f5$ | |

1: | procedureBisectionSearch(${\mathcal{M}}_{N}(\lambda ,-1)$, M, $\xi $, $\u03f5$) |

2: | Initialize ${\lambda}_{-}=0$; ${\lambda}_{+}=1$ |

3: | ${\varphi}_{{\lambda}_{+}}\leftarrow ({\mathcal{M}}_{N}({\lambda}_{+},-1),\u03f5)$ using Section 5.1 and Proposition 6 |

4: | $\overline{\rho}\left({\lambda}_{+}\right)\leftarrow {\varphi}_{{\lambda}_{+}}$ using Proposition 2 |

5: | while $\overline{\rho}\left({\lambda}_{+}\right)\ge M$ do |

6: | ${\lambda}_{-}={\lambda}_{+}$; ${\lambda}_{+}=2{\lambda}_{+}$ |

7: | ${\varphi}_{{\lambda}_{+}}\leftarrow ({\mathcal{M}}_{N}({\lambda}_{+},-1),\u03f5)$ using Section 5.1 and Proposition 6 |

8: | $\overline{\rho}\left({\lambda}_{+}\right)\leftarrow {\varphi}_{{\lambda}_{+}}$ using Proposition 2 |

9: | while ${\lambda}_{+}-{\lambda}_{-}\ge 2\xi $ do |

10: | $\lambda =\frac{{\lambda}_{+}+{\lambda}_{-}}{2}$ |

11: | ${\varphi}_{\lambda}\leftarrow ({\mathcal{M}}_{N}(\lambda ,-1),\u03f5)$ using Section 5.1 and Proposition 6 |

12: | $\overline{\rho}\left(\lambda \right)\leftarrow {\varphi}_{\lambda}$ using Proposition 2 |

13: | if $\overline{\rho}\left(\lambda \right)>M$ then |

14: | ${\lambda}_{-}=\lambda $ |

15: | else |

16: | ${\lambda}_{+}=\lambda $ |

return $({\lambda}_{+}^{*},{\lambda}_{-}^{*})\leftarrow ({\lambda}_{+},{\lambda}_{-})$ |

## References

- Maatouk, A.; Kriouile, S.; Assaad, M.; Ephremides, A. The age of incorrect information: A new performance metric for status updates. IEEE/ACM Trans. Netw.
**2020**, 28, 2215–2228. [Google Scholar] [CrossRef] - Uysal, E.; Kaya, O.; Ephremides, A.; Gross, J.; Codreanu, M.; Popovski, P.; Assaad, M.; Liva, G.; Munari, A.; Soleymani, T.; et al. Semantic communications in networked systems. arXiv
**2021**, arXiv:2103.05391. [Google Scholar] - Kam, C.; Kompella, S.; Ephremides, A. Age of incorrect information for remote estimation of a binary markov source. In Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Toronto, ON, Canada, 6–9 July 2020; pp. 1–6. [Google Scholar]
- Maatouk, A.; Assaad, M.; Ephremides, A. The age of incorrect information: An enabler of semantics-empowered communication. arXiv
**2020**, arXiv:2012.13214. [Google Scholar] - Chen, Y.; Ephremides, A. Minimizing Age of Incorrect Information for Unreliable Channel with Power Constraint. arXiv
**2021**, arXiv:2101.08908. [Google Scholar] - Kriouile, S.; Assaad, M. Minimizing the Age of Incorrect Information for Real-time Tracking of Markov Remote Sources. arXiv
**2021**, arXiv:2102.03245. [Google Scholar] - Kadota, I.; Sinha, A.; Uysal-Biyikoglu, E.; Singh, R.; Modiano, E. Scheduling policies for minimizing age of information in broadcast wireless networks. IEEE/ACM Trans. Netw.
**2018**, 26, 2637–2650. [Google Scholar] [CrossRef] [Green Version] - Hsu, Y.P. Age of information: Whittle index for scheduling stochastic arrivals. In Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018; pp. 2634–2638. [Google Scholar]
- Tripathi, V.; Modiano, E. A whittle index approach to minimizing functions of age of information. In Proceedings of the 2019 57th Annual Allerton Conference on Communication, Control, and Computing (Allerton), Monticello, IL, USA, 24–27 September 2019; pp. 1160–1167. [Google Scholar]
- Maatouk, A.; Kriouile, S.; Assad, M.; Ephremides, A. On the optimality of the Whittle’s index policy for minimizing the age of information. IEEE Trans. Wirel. Commun.
**2020**, 20, 1263–1277. [Google Scholar] [CrossRef] - Sun, J.; Jiang, Z.; Krishnamachari, B.; Zhou, S.; Niu, Z. Closed-form Whittle’s index-enabled random access for timely status update. IEEE Trans. Commun.
**2019**, 68, 1538–1551. [Google Scholar] [CrossRef] - Nguyen, G.D.; Kompella, S.; Kam, C.; Wieselthier, J.E. Information freshness over a Markov channel: The effect of channel state information. Ad Hoc Networks
**2019**, 86, 63–71. [Google Scholar] [CrossRef] - Talak, R.; Karaman, S.; Modiano, E. Optimizing age of information in wireless networks with perfect channel state information. In Proceedings of the 2018 16th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), Shanghai, China, 7–11 May 2018; pp. 1–8. [Google Scholar]
- Shi, L.; Cheng, P.; Chen, J. Optimal periodic sensor scheduling with limited resources. IEEE Trans. Autom. Control
**2011**, 56, 2190–2195. [Google Scholar] [CrossRef] - Leong, A.S.; Dey, S.; Quevedo, D.E. Sensor scheduling in variance based event triggered estimation with packet drops. IEEE Trans. Autom. Control
**2016**, 62, 1880–1895. [Google Scholar] [CrossRef] [Green Version] - Mo, Y.; Garone, E.; Casavola, A.; Sinopoli, B. Stochastic sensor scheduling for energy constrained estimation in multi-hop wireless sensor networks. IEEE Trans. Autom. Control
**2011**, 56, 2489–2495. [Google Scholar] [CrossRef] [Green Version] - Kaul, S.; Yates, R.; Gruteser, M. Real-time status: How often should one update? In Proceedings of the 2012 Proceedings IEEE INFOCOM, Orlando, FL, USA, 25–30 March 2012; pp. 2731–2735. [Google Scholar]
- Leong, A.S.; Ramaswamy, A.; Quevedo, D.E.; Karl, H.; Shi, L. Deep reinforcement learning for wireless sensor scheduling in cyber–physical systems. Automatica
**2020**, 113, 108759. [Google Scholar] [CrossRef] [Green Version] - Wang, J.; Ren, X.; Mo, Y.; Shi, L. Whittle index policy for dynamic multichannel allocation in remote state estimation. IEEE Trans. Autom. Control
**2019**, 65, 591–603. [Google Scholar] [CrossRef] - Gittins, J.; Glazebrook, K.; Weber, R. Multi-Armed Bandit Allocation Indices; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
- Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach, 3rd ed.; Prentice Hall Press: Hoboken, NJ, USA, 2009. [Google Scholar]
- Whittle, P. Restless bandits: Activity allocation in a changing world. J. Appl. Probab.
**1988**, 25, 287–298. [Google Scholar] [CrossRef] - Weber, R.R.; Weiss, G. On an index policy for restless bandits. J. Appl. Probab.
**1990**, 27, 637–648. [Google Scholar] [CrossRef] - Glazebrook, K.D.; Ruiz-Hernandez, D.; Kirkbride, C. Some indexable families of restless bandit problems. Adv. Appl. Probab.
**2006**, 38, 643–672. [Google Scholar] [CrossRef] - Larrañaga, M. Dynamic Control of Stochastic and Fluid Resource-Sharing Systems. Ph.D. Thesis, Université de Toulouse, Toulouse, France, 2015. [Google Scholar]
- Sennott, L.I. On computing average cost optimal policies with application to routing to parallel queues. Math. Methods Oper. Res.
**1997**, 45, 45–62. [Google Scholar] [CrossRef] - Sennott, L.I. Constrained average cost Markov decision chains. Probab. Eng. Inf. Sci.
**1993**, 7, 69–83. [Google Scholar] [CrossRef] - Bertsimas, D.; Niño-Mora, J. Restless bandits, linear programming relaxations, and a primal-dual index heuristic. Oper. Res.
**2000**, 48, 80–90. [Google Scholar] [CrossRef] [Green Version] - Littman, M.L.; Dean, T.L.; Kaelbling, L.P. On the complexity of solving Markov decision problems. arXiv
**2013**, arXiv:1302.4971. [Google Scholar] - Verloop, I.M. Asymptotically optimal priority policies for indexable and nonindexable restless bandits. Ann. Appl. Probab.
**2016**, 26, 1947–1995. [Google Scholar] [CrossRef]

**Figure 3.**Performance when the source processes vary. We choose ${p}_{i}=0.05+\frac{0.4(i-1)}{N-1}$, ${f}_{i}\left(s\right)=s$, ${\gamma}_{i}=0.6$, ${p}_{e,i}^{0}={p}_{e}^{0}$, and ${p}_{e,i}^{1}=0.1$ for $1\le i\le N$.

**Figure 4.**Performance when the communication goals vary. We choose ${f}_{i}\left(s\right)={s}^{0.5+\frac{i-1}{N-1}}$, ${p}_{i}=0.3$, ${\gamma}_{i}=0.6$, ${p}_{e,i}^{0}={p}_{e}^{0}$, and ${p}_{e,i}^{1}=0.1$ for $1\le i\le N$.

**Figure 5.**Performance in systems with random parameters when $N=5$. The parameters for each user are chosen randomly within the following intervals: $\gamma \in [0,1]$, $p\in [0.05,0.45]$, ${p}_{e}^{0}\in I$, ${p}_{e}^{1}\in [0,0.45]$, and $f\left(s\right)={s}^{\tau}$ where $\tau \in [0.5,1.5]$.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).