# A UoI-Optimal Policy for Timely Status Updates with Resource Constraint

^{*}

## Abstract

**:**

## 1. Introduction

- In contrast to [27,30], we assumed that the context-aware weight is a first-order irreducible positive recurrent Markov process or independent and identically distributed (i.i.d.) over time. We formulated the updating problem as a CMDP problem and proved the single threshold structure of the UoI-optimal policy. We then derived the policy through LP with the threshold structure and discussed the conditions that the monitored process needs to satisfy for the threshold structure.
- When the distributions of the context-aware weight and the increment in estimation error were unknown, we used model-based RL method to learn the state transitions of the whole system and derive a near-optimal RL-based updating policy.
- Simulations were conducted to verify the theoretical analysis of the threshold structure and show the near-optimal performance of the RL-based updating policy. The results indicate that: (i) the update thresholds decrease when the maximum average update frequency becomes large; (ii) the update threshold for emergency can actually be larger than that for ordinary states when the probability of transferring from emergency to ordinary states tends to 1.

## 2. System Model and Problem Formulation

## 3. Scheduling with CMDP-Based Approach

#### 3.1. Constrained Markov Decision Process Formulation

- State space: The state of the vehicle in slot t, denoted by $s\left(t\right)=\left(Q\right(t),\omega (t\left)\right)$, includes the current estimation error and the context-aware weight. Then, we discretize $Q\left(t\right)$ with the step size ${\Delta}_{Q}>0$, i.e., the estimation error $Q\left(t\right)\in \mathbb{Q}=\{0,\pm {\Delta}_{Q},\pm 2{\Delta}_{Q},\cdots ,\pm n{\Delta}_{Q},\cdots \}$. For example, when $Q\left(t\right)\in [n{\Delta}_{Q}-\frac{1}{2}{\Delta}_{Q},n{\Delta}_{Q}+\frac{1}{2}{\Delta}_{Q})$, its value will be taken as $n{\Delta}_{Q}$. The smaller the step size ${\Delta}_{Q}$, the smaller the performance degradation caused by discretization. In addition, the value set of the context-aware weight is denoted by $\mathbb{W}$. Then, the state space $\mathbb{S}=\{\mathbb{Q}\times \mathbb{W}\}$ is thus countable but infinite.
- Action space: At each slot, the vehicle can take two actions, namely $U\left(t\right)\in \mathbb{U}=\{0,1\}$, where $U\left(t\right)=1$ denotes the vehicle deciding to transmit updates in slot t and $U\left(t\right)=0$ denotes the vehicle deciding to wait.
- Probability transfer function: After taking action U at state $s=(Q,\omega )$, the next state is denoted by ${s}^{\prime}=({Q}^{\prime},{\omega}^{\prime})$. When the vehicle decides not to transmit or the transmission fails, the probability of the estimation error transferring from Q to ${Q}^{\prime}$ is written as $\mathrm{Pr}\{{Q}^{\prime}-Q=a\}={p}_{a}$. Due to the discretization of the estimation error, the increment $a\in \mathbb{A}=\{0,\pm {\Delta}_{Q},\pm 2{\Delta}_{Q},\cdots ,\pm {A}_{m}\}$, where ${A}_{m}=\lfloor \frac{{A}_{max}}{{\Delta}_{Q}}\rfloor {\Delta}_{Q}>0$. In addition, ${p}_{a}={F}_{A}(a+\frac{1}{2}{\Delta}_{Q})-{F}_{A}(a-\frac{1}{2}{\Delta}_{Q})$, where ${F}_{A}\left(a\right)$ is the CDF of increment $A\left(t\right)$. In addition, the probability of the context-aware weight transferring from $\omega $ to ${\omega}^{\prime}$ is written as $\mathrm{Pr}\{\omega \to {\omega}^{\prime}\}={p}_{\omega {\omega}^{\prime}}$. Based on the assumption that the context-aware weight $\omega \left(t\right)$ is independent with the estimation error $Q\left(t\right)$, then the probability of the state transferring from $s=(Q,\omega )$ to ${s}^{\prime}=({Q}^{\prime},{\omega}^{\prime})$ given action U is:$$\begin{array}{cc}\hfill \phantom{\rule{1.em}{0ex}}& \mathrm{Pr}\{s\to {s}^{\prime}|U\}=\mathrm{Pr}\{(Q,\omega )\to ({Q}^{\prime},{\omega}^{\prime})|U\}\hfill \\ \hfill \phantom{\rule{1.em}{0ex}}& =\left\{\begin{array}{cc}{p}_{\omega {\omega}^{\prime}}{p}_{{Q}^{\prime}-Q}\hfill & ,U=0,\hfill \\ {p}_{\omega {\omega}^{\prime}}((1-{p}_{s}){p}_{{Q}^{\prime}-Q}+{p}_{s}{p}_{{Q}^{\prime}-0})\hfill & ,U=1.\hfill \end{array}\right.\hfill \end{array}$$
- One-step cost: The cost caused by taking action U in state $(Q,\omega )$ is:$$\begin{array}{c}\hfill C(Q,\omega ,U)=\omega {Q}^{2},\end{array}$$$$\begin{array}{c}\hfill D(Q,\omega ,U)=U.\end{array}$$

#### 3.2. Threshold Structure of the Optimal Policy

**Definition**

**1.**

**Theorem**

**1.**

**Lemma**

**1.**

**Proof of Lemma**

**1.**

**Lemma**

**2.**

**Proof of Lemma**

**2.**

- When ${A}_{m}\le \left|{Q}_{1}\right|$, then $|{Q}_{1}+a|<|{Q}_{2}+a|$, for $\forall a\in [-{A}_{m},{A}_{m}]$, we can derive that:$$\begin{array}{cc}\hfill {f}_{\alpha}^{(k+1)}({Q}_{1},\omega )=& \sum _{a=-{A}_{m}}^{{A}_{m}}{p}_{a}{V}_{\alpha}^{(k+1)}\left({Q}_{1}+a,\omega \right)\hfill \\ \hfill <& \sum _{a=-{A}_{m}}^{{A}_{m}}{p}_{a}{V}_{\alpha}^{(k+1)}\left({Q}_{2}+a,\omega \right)={f}_{\alpha}^{(k+1)}({Q}_{2},\omega ).\hfill \end{array}$$
- When ${A}_{m}>\left|{Q}_{2}\right|$, there exists an increment ${a}^{\prime}\in {\mathcal{A}}^{\prime}=\left\{a\right|a\in [-{A}_{m},-\frac{1}{2}({Q}_{1}+{Q}_{2})\}$, such that $|{Q}_{1}+{a}^{\prime}|>|{Q}_{2}+{a}^{\prime}|$, and ${V}_{\alpha}^{(k+1)}\left({Q}_{1}+{a}^{\prime},{\omega}^{\prime}\right)>{V}_{\alpha}^{(k+1)}\left({Q}_{2}+{a}^{\prime},{\omega}^{\prime}\right)$. Notice that $-{Q}_{1}-{a}^{\prime}\in (\frac{1}{2}({Q}_{2}-{Q}_{1}),{A}_{m}-{Q}_{1}]$ and ${Q}_{2}+a\in [-{A}_{m}+{Q}_{2},{A}_{m}+{Q}_{2}]$, then ${p}_{-{Q}_{1}-{a}^{\prime}-{Q}_{2}}{V}_{\alpha}^{(k+1)}\left(-{Q}_{1}-{a}^{\prime},\omega \right)$ is a term in the summation ${f}_{\alpha}^{(k+1)}({Q}_{2},\omega )$, namely ${\sum}_{a=-{A}_{m}}^{{A}_{m}}{p}_{a}{V}_{\alpha}\left(Q+a,\omega \right)$. Similarly, ${p}_{-{Q}_{2}-{a}^{\prime}-{Q}_{1}}{V}_{\alpha}^{(k+1)}\left(-{Q}_{2}-{a}^{\prime},\omega \right)$ is a term in the summation ${f}_{\alpha}^{(k+1)}({Q}_{1},\omega )$. We further define ${\mathcal{A}}^{\u2033}=\left\{a\right|a=-{Q}_{1}-{Q}_{2}-{a}^{\prime}\}$, since $-{Q}_{1}-{Q}_{2}-{a}^{\prime}\in \left(-\frac{1}{2}({Q}_{1}+{Q}_{2}),{A}_{m}-{Q}_{1}-{Q}_{2}\right]$, then ${A}^{\prime}\cap {A}^{\u2033}=\u2300$.Furthermore, the probability of the estimation error transferring from ${Q}_{1}$ to $-{Q}_{2}-{a}^{\prime}$, i.e., ${p}_{-{Q}_{2}-{a}^{\prime}-{Q}_{1}}$ equals ${p}_{-{Q}_{1}-{a}^{\prime}-{Q}_{2}}$, the probability of the estimation error transferring from ${Q}_{2}$ to $-{Q}_{1}-{a}^{\prime}$. Since $-{a}^{\prime}\in (\frac{1}{2}({Q}_{1}+{Q}_{2}),{A}_{m}]$, then $|{a}^{\prime}|>|-{Q}_{1}-{Q}_{2}-{a}^{\prime}|$. According to our assumption of the increment, we can prove that for any ${a}^{\prime}\in {\mathcal{A}}^{\prime}$, ${p}_{{a}^{\prime}}<{p}_{-{Q}_{1}-{Q}_{2}-{a}^{\prime}}$. Then, we can derive:$$\begin{array}{cc}\hfill \phantom{\rule{1.em}{0ex}}& {f}_{\alpha}^{(k+1)}({Q}_{1},\omega )-{f}_{\alpha}^{(k+1)}({Q}_{2},\omega )\hfill \\ \hfill =& \sum _{a\in {\mathcal{A}}^{\prime}}{p}_{a}{V}_{\alpha}^{(k+1)}\left({Q}_{1}+a,\omega \right)+\sum _{a\in {\mathcal{A}}^{\u2033}}{p}_{a}{V}_{\alpha}^{(k+1)}\left({Q}_{1}+a,\omega \right)\hfill \\ \hfill -& \sum _{a\in {\mathcal{A}}^{\prime}}{p}_{a}{V}_{\alpha}^{(k+1)}\left({Q}_{2}+a,\omega \right)-\sum _{a\in {\mathcal{A}}^{\u2033}}{p}_{a}{V}_{\alpha}^{(k+1)}\left({Q}_{2}+a,\omega \right)+M({Q}_{1},{Q}_{2})\hfill \\ \hfill =& \sum _{a\in {\mathcal{A}}^{\prime}}{p}_{a}\{{V}_{\alpha}^{(k+1)}\left({Q}_{1}+a,\omega \right)-{V}_{\alpha}^{(k+1)}\left({Q}_{2}+a,\omega \right)\}\hfill \\ \hfill +& \sum _{a\in {\mathcal{A}}^{\prime}}{p}_{-{Q}_{1}-{Q}_{2}-a}\{{V}_{\alpha}^{(k+1)}\left({Q}_{2}+a,\omega \right)-{V}_{\alpha}^{(k+1)}\left({Q}_{1}+a,\omega \right)\}+M({Q}_{1},{Q}_{2})\hfill \\ \hfill =& \sum _{a\in {\mathcal{A}}^{\prime}}\left({p}_{a}-{p}_{-{Q}_{1}-{Q}_{2}-a}\right)\{{V}_{\alpha}^{(k+1)}\left({Q}_{1}+a,\omega \right)-{V}_{\alpha}^{(k+1)}\left({Q}_{2}+a,\omega \right)\}+M({Q}_{1},{Q}_{2})<0,\hfill \end{array}$$
- When $|{Q}_{2}|>{A}_{m}>\left|{Q}_{1}\right|$, since ${a}^{\prime}\in [-{A}_{m},-\frac{1}{2}({Q}_{1}+{Q}_{2}))$, we only need to consider the case when ${A}_{m}>\frac{1}{2}({Q}_{1}+{Q}_{2})$, in this case $-{Q}_{1}-{a}^{\prime}>\frac{1}{2}({Q}_{2}-{Q}_{1})>{Q}_{2}-{A}_{m}$. Therefore, ${p}_{-{Q}_{1}-{a}^{\prime}-{Q}_{2}}{V}_{\alpha}^{(k+1)}\left(-{Q}_{1}-{a}^{\prime},\omega \right)$ is a term in the summation ${f}_{\alpha}^{(k+1)}({Q}_{2},\omega )$. Similarly, we can also prove that ${f}_{\alpha}^{(k+1)}({Q}_{1},\omega )<{f}_{\alpha}^{(k+1)}({Q}_{2},\omega )$ when $|{Q}_{2}|>{A}_{m}>\left|{Q}_{1}\right|$.

**Remark**

**1.**

- ${f}_{A}\left(a\right)={f}_{A}(-a)$, $\mu =0$;
- ${f}_{A}\left({a}_{2}\right)\le {f}_{A}\left({a}_{1}\right),\forall {a}_{2}\ge {a}_{1}\ge 0$.

**Theorem**

**2.**

**Proof of Theorem**

**2.**

**Theorem**

**3.**

**Proof of Theorem**

**3.**

#### 3.3. Numerical Solution of Optimal Strategy

**Theorem**

**4.**

**Proof of Theorem**

**4.**

## 4. Scheduling in Unknown Contexts

Algorithm 1 RL-based Updating Policy |

Input: $l\in [0,1],L>0,K>0$- 1:
**for**episodes $k=1,2,\cdots ,K$**do**- 2:
- Set ${L}_{k}=L\sqrt{k},{\u03f5}_{k}=l/\sqrt{k}$, uniformly draw $\alpha \in [0,1]$.
- 3:
**if**$\alpha <{\u03f5}_{k}$**then**- 4:
- Set ${\pi}_{k}\left(s\right)={\pi}_{rand}\left(s\right),$
- 5:
**else**- 6:
**for**each state $s,{s}^{\prime}\in \mathbb{S}$ and $U\in \mathbb{U}$**do**- 7:
**if**$N(s,U)>0$**then**- 8:
- Let ${\tilde{p}}_{k}\left({s}^{\prime}\right|s,U)=N(s,U,{s}^{\prime})/N(s,u),$
- 9:
**else**- 10:
- ${\tilde{p}}_{k}\left({s}^{\prime}\right|s,U)=1/\left|\mathbb{S}\right|.$
- 11:
**end if**- 12:
**end for**- 13:
- obtain policy ${\pi}_{k}\left(s\right)$ by solving the estimated CMDP
- 14:
**end if**- 15:
- Randomly choose an initial state $s\left(1\right)$.
- 16:
**for**slots $t=1,2,\cdots ,\lceil {L}_{k}\rceil -1$**do**- 17:
- Choose action $U\left(t\right)$ as ${\pi}_{k}\left(s\left(t\right)\right)$.
- 18:
- Observe the next state $s(t+1)$.
- 19:
- $N\left(s\right(t),U(t),s(t+1\left)\right)=N\left(s\right(t),U(t),s(t+1\left)\right)+1$.
- 20:
- $N\left(s\right(t),U(t\left)\right)=N\left(s\right(t),U(t\left)\right)+1$.
- 21:
- $s\left(t\right)\leftarrow s(t+1)$.
- 22:
**end for**- 23:
**end for**- 24:
- obtain policy ${\pi}^{\u2605}\left(s\right)$ by solving the estimated CMDP based on ${\tilde{p}}_{k}\left({s}^{\prime}\right|s,U)$, $s,{s}^{\prime}\in \mathbb{S},U\in \mathbb{U}$.
Output: output the RL-based updating policy ${\pi}^{\u2605}\left(s\right)$ |

## 5. Simulation Results and Discussion

#### 5.1. Simulation Setup

- The context-aware weight $\omega \left(t\right)$ has the first-order Markov property. The state transition diagram of $\omega \left(t\right)$ is shown in Figure 2 and $\omega \left(t\right)$ is irreducible and positive recurrent. ${p}_{1}$ is the probability of the context-aware weight transferring from the normal state to the urgent state, while ${p}_{2}$ is the probability of the weight transferring from the urgent state to the normal state;
- The context-aware weight $\omega \left(t\right)$ is i.i.d. over time. The probability of the weight being in the urgent state and the normal state are denoted by ${p}_{h}$ and ${p}_{l}$, respectively.

#### 5.2. Numerical Results

## 6. Conclusions

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## Abbreviations

AoI | Age of Information |

AoII | Age of Incorrect Information |

AoS | Age of Synchronization |

CDF | Cumulative Distribution Function |

CMDP | Constrained Markov Decision Process |

CoUD | Cost of Update Delay |

i.i.d. | Independent and Identically Distributed |

IoT | Internet of Things |

LP | Linear Programming |

Probability Density Function | |

RL | Reinforcement Learning |

UoI | Urgency of Information |

URLLC | Ultra-Reliable Low-Latency Communication |

VR | Virtual Reality |

V2X | Vehicle to Everything |

## Appendix A. Proof of Theorem 1

**Assumption**

**A1.**

**Assumption**

**A2.**

**Assumption**

**A3.**

**Assumption**

**A4.**

**Assumption**

**A5.**

**Proof of Lemma**

**A1.**

- Assumption A1: In this problem, $C(s,U)$ is the UoI at state s, namely $C(Q,\omega ,U)=\omega {Q}^{2}$. $D(s,U)$ is 1 if the vehicle chooses to transmit its status and $D(s,U)$ is 0 otherwise, namely $D(Q,\omega ,U)=U$. Therefore, Assumption A1 holds, for any $b>0$, the number of states $(Q,\omega )$ with $\omega {Q}^{2}\le b$ is finite.
- Assumption A2: Due to the current high-level wireless communication technology, we reasonably assumed that the successful transmission probability ${p}_{s}$ is relatively close to 1. Based on the assumptions mentioned above, the Markov chain of context-aware weight obviously satisfies Assumption A2. Define the probability of the context-aware weight transferring from $\omega $ to ${\omega}^{\prime}$ in k steps for the first time as ${P}_{\omega ,{\omega}^{\prime},k}$. Then, we consider the policy $\pi (Q,\omega )=1$ for all $(Q,\omega )\in \mathbb{S}$, namely this policy chooses to transmit in all the states.Since the evolution of the context-aware weight is independent with the evolution of the estimation error and the updating policy. Therefore, we first focused on the estimation error, which can be formulated as a one-dimensional irreducible Markov chain with state space $\mathbb{Q}=\{0,\pm {\Delta}_{Q},\pm 2{\Delta}_{Q},\cdots ,\pm n{\Delta}_{Q},\cdots \}$. We denote the set of states which can transfer to state Q in a single step by ${\mathcal{Z}}_{Q}$. The probability of the estimation error transferring from state Q to state ${Q}^{\prime}$ at the k-th step without an arrival to state $Q=0$ is defined as ${P}_{Q,{Q}^{\prime},k}^{\prime}$. Obviously, ${\sum}_{{Q}^{\prime}\in \mathbb{Q}}{P}_{Q,{Q}^{\prime},k}^{\prime}<{(1-{p}_{s})}^{k}$. Then, the probability of the first passage from state $Q(Q\ne 0)$ to 0 taking $k+1$ steps is ${\sum}_{{Q}^{\prime}\notin {\mathcal{Z}}_{0}}{P}_{Q,{Q}^{\prime},k}^{\prime}{p}_{s}+{\sum}_{{Q}^{\prime}\in {\mathcal{Z}}_{0}}{P}_{Q,{Q}^{\prime},k}^{\prime}({p}_{s}+(1-{p}_{s}){p}_{0-{Q}^{\prime}})<{(1-{p}_{s})}^{k}$, where ${p}_{0-{Q}^{\prime}}$ is the probability that the increment in estimation error is $-{Q}^{\prime}$. Therefore, the expected time of the first passage from $Q(Q\ne 0)$ to 0 is finite.For state $Q=0$, the estimation error will stay in this state in the next step with a probability of ${p}_{s}+{p}_{0-0}$ and will first return to state $Q=0$ in the second transition with a probability smaller than $(1-{p}_{s}-{p}_{0-0})$. Then, starting from state $Q=0$, the estimation error will first return to state $Q=0$ in the $k+1$-th ($k>2$) step will be smaller than $(1-{p}_{s}-{p}_{0-0}){(1-{p}_{s})}^{k-1}$. Therefore, we can prove that state $Q=0$ is a positive recurrent state, and ${\mathbb{R}}_{Q}^{\pi}=\{Q=0\}$ is a positive recurrent class of the induced Markov chain of the estimation error. Furthermore, for any states in ${\mathbb{T}}_{Q}^{\pi}=\mathbb{Q}\setminus {\mathbb{R}}_{Q}^{\pi}$, the expected time of the first passage from the state in ${\mathbb{T}}_{Q}^{\pi}$ to state $Q=0$ under $\pi $ is finite and the probability of the states in ${\mathbb{T}}_{Q}^{\pi}$ not getting to state $Q=0$ in k steps is smaller than ${(1-{p}_{s})}^{k}$.Define the probability of state Q transferring to state ${Q}^{\prime}$ in k steps for the first time as ${P}_{Q,{Q}^{\prime},k}$. Then, the probability of state $(Q,\omega )$ transferring to state $({Q}^{\prime},{\omega}^{\prime})$ in k steps for the first time is ${P}_{Q,{Q}^{\prime},k}{P}_{\omega ,{\omega}^{\prime},k}$. Since ${\sum}_{k=1}^{\infty}{P}_{Q,{Q}^{\prime},k}k<\infty $ and ${\sum}_{k=1}^{\infty}{P}_{\omega ,{\omega}^{\prime},k}k<\infty $, then ${\sum}_{k=1}^{\infty}{P}_{Q,{Q}^{\prime},k}{P}_{\omega ,{\omega}^{\prime},k}k<\infty $. Therefore, the set of states ${\mathbb{R}}^{\pi}=\left\{(Q,\omega )\right|Q\in {\mathbb{R}}_{Q}^{\pi},\omega \in \mathbb{W}\}$ is a positive recurrent class. Similarly, we can prove that ${\mathbb{T}}^{\pi}=\mathbb{S}\setminus {\mathbb{R}}^{\pi}$ satisfies Assumption A2. Finally, ${\overline{D}}^{\pi}=1<\infty $, ${\overline{C}}^{\pi}=E\left[\omega \right]\frac{1}{{p}_{s}}{\sigma}^{2}<\infty $.
- Assumption A3: Define ${P}_{Q,min}={min}_{{Q}^{\prime}}{p}_{Q-{Q}^{\prime}}$, ${P}_{Q,max}={max}_{{Q}^{\prime}}{p}_{Q-{Q}^{\prime}}$. Consider the policy ${\pi}^{\prime}(Q,\omega )=0$ for all states $(Q,\omega )\in \mathbb{S}$, notably that this policy chooses not to transmit in any states. Similarly, we first focus on the Markov chain of estimation error. Starting from state Q, the probability of transferring to state ${Q}^{\prime}$ in $k+1$-th $(k\ge 2)$ steps for the first time is smaller than $(1-{p}_{{Q}^{\prime}-Q}){P}_{{Q}^{\prime},max}{(1-{P}_{{Q}^{\prime},min})}^{k-1}$. Then, the expected time of the first passage from state Q to state ${Q}^{\prime}$ under policy ${\pi}^{\prime}$ is finite. Similarly, since the Markov chain of context-aware weight is irreducible positive recurrent and independent with the updating policy, we can therefore prove that the expected time of the first passage from state $(Q,\omega )$ to state $({Q}^{\prime},{\omega}^{\prime})$ under policy ${\pi}^{\prime}$ is finite.
- Assumption A4: For the Markov chain of the estimation error, any state will return to state $Q=0$ if a successful transmission occurs. For the policy without transmission, namely ${\pi}^{\prime}(Q,\omega )=0$, state $Q=0$ still exists in only one positive recurrent class. For each positive recurrent class containing state $Q=0$, we can prove that there is only one positive recurrent class. Since the Markov chain of the context-aware weight is irreducible positive recurrent, we can similarly prove Assumption A4.
- Assumption A5: The policy ${\pi}_{\rho}$ that updates the status with a probability of $\rho -\delta $ satisfies Assumption A5. Here, $\delta $ is a small positive number. Under this policy, ${\overline{D}}^{\pi}=\rho -\delta <\rho $ and ${\overline{C}}^{\pi}=E\left[\omega \right]\frac{1}{{p}_{s}(\rho -\delta )}{\sigma}^{2}<\infty $.

## References

- Talak, R.; Karaman, S.; Modiano, E. Speed limits in autonomous vehicular networks due to communication constraints. In Proceedings of the 2016 IEEE 55th Conference on Decision and Control (CDC), Las Vegas, NV, USA, 12–14 December 2016; pp. 4998–5003. [Google Scholar]
- Hou, I.H.; Naghsh, N.Z.; Paul, S.; Hu, Y.C.; Eryilmaz, A. Predictive Scheduling for Virtual Reality. In Proceedings of the IEEE INFOCOM 2020-IEEE Conference on Computer Communications, Toronto, ON, Canada, 6–9 July 2020; pp. 1349–1358. [Google Scholar]
- Kaul, S.; Yates, R.; Gruteser, M. Real-time status: How often should one update? In Proceedings of the 2012 Proceedings IEEE INFOCOM, Orlando, FL, USA, 25–30 March 2012; pp. 2731–2735. [Google Scholar]
- Sun, Y.; Polyanskiy, Y.; Uysal-Biyikoglu, E. Remote estimation of the Wiener process over a channel with random delay. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 321–325. [Google Scholar]
- Sun, Y.; Polyanskiy, Y.; Uysal, E. Sampling of the wiener process for remote estimation over a channel with random delay. IEEE Trans. Inf. Theory
**2019**, 66, 1118–1135. [Google Scholar] [CrossRef] - Jiang, Z.; Zhou, S. Status from a random field: How densely should one update? In Proceedings of the 2019 IEEE International Symposium on Information Theory (ISIT), Paris, France, 7–12 July 2019; pp. 1037–1041. [Google Scholar]
- Bedewy, A.M.; Sun, Y.; Singh, R.; Shroff, N.B. Optimizing information freshness using low-power status updates via sleep-wake scheduling. In Proceedings of the Twenty-First International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing, New York, NY, USA, 11–14 October 2020; pp. 51–60. [Google Scholar]
- Ceran, E.T.; Gündüz, D.; György, A. Average age of information with hybrid ARQ under a resource constraint. IEEE Trans. Wirel. Commun.
**2019**, 18, 1900–1913. [Google Scholar] [CrossRef] [Green Version] - Sun, J.; Jiang, Z.; Krishnamachari, B.; Zhou, S.; Niu, Z. Closed-form Whittle’s index-enabled random access for timely status update. IEEE Trans. Commun.
**2019**, 68, 1538–1551. [Google Scholar] [CrossRef] - Yates, R.D.; Kaul, S.K. Status updates over unreliable multiaccess channels. In Proceedings of the 2017 IEEE International Symposium on Information Theory (ISIT), Aachen, Germany, 25–30 June 2017; pp. 331–335. [Google Scholar]
- Sun, J.; Wang, L.; Jiang, Z.; Zhou, S.; Niu, Z. Age-Optimal Scheduling for Heterogeneous Traffic with Timely Throughput Constraints. IEEE J. Sel. Areas Commun.
**2021**, 39, 1485–1498. [Google Scholar] [CrossRef] - Tang, H.; Wang, J.; Song, L.; Song, J. Minimizing age of information with power constraints: Multi-user opportunistic scheduling in multi-state time-varying channels. IEEE J. Sel. Areas Commun.
**2020**, 38, 854–868. [Google Scholar] [CrossRef] [Green Version] - Abdel-Aziz, M.K.; Samarakoon, S.; Liu, C.F.; Bennis, M.; Saad, W. Optimized age of information tail for ultra-reliable low-latency communications in vehicular networks. IEEE Trans. Commun.
**2019**, 68, 1911–1924. [Google Scholar] [CrossRef] [Green Version] - Devassy, R.; Durisi, G.; Ferrante, G.C.; Simeone, O.; Uysal-Biyikoglu, E. Delay and peak-age violation probability in short-packet transmissions. In Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018; pp. 2471–2475. [Google Scholar]
- Inoue, Y.; Masuyama, H.; Takine, T.; Tanaka, T. A general formula for the stationary distribution of the age of information and its application to single-server queues. IEEE Trans. Inf. Theory
**2019**, 65, 8305–8324. [Google Scholar] [CrossRef] [Green Version] - Sun, Y.; Uysal-Biyikoglu, E.; Yates, R.D.; Koksal, C.E.; Shroff, N.B. Update or wait: How to keep your data fresh. IEEE Trans. Inf. Theory
**2017**, 63, 7492–7508. [Google Scholar] [CrossRef] - Zheng, X.; Zhou, S.; Jiang, Z.; Niu, Z. Closed-form analysis of non-linear age of information in status updates with an energy harvesting transmitter. IEEE Trans. Wirel. Commun.
**2019**, 18, 4129–4142. [Google Scholar] [CrossRef] [Green Version] - Kosta, A.; Pappas, N.; Ephremides, A.; Angelakis, V. Non-linear age of information in a discrete time queue: Stationary distribution and average performance analysis. In Proceedings of the ICC 2020—2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–6. [Google Scholar]
- Kosta, A.; Pappas, N.; Ephremides, A.; Angelakis, V. The cost of delay in status updates and their value: Non-linear ageing. IEEE Trans. Commun.
**2020**, 68, 4905–4918. [Google Scholar] [CrossRef] [Green Version] - Zhong, J.; Yates, R.D.; Soljanin, E. Two freshness metrics for local cache refresh. In Proceedings of the 2018 IEEE International Symposium on Information Theory (ISIT), Vail, CO, USA, 17–22 June 2018; pp. 1924–1928. [Google Scholar]
- Maatouk, A.; Kriouile, S.; Assaad, M.; Ephremides, A. The age of incorrect information: A new performance metric for status updates. IEEE/ACM Trans. Netw.
**2020**, 28, 2215–2228. [Google Scholar] [CrossRef] - Kadota, I.; Sinha, A.; Uysal-Biyikoglu, E.; Singh, R.; Modiano, E. Scheduling policies for minimizing age of information in broadcast wireless networks. IEEE/ACM Trans. Netw.
**2018**, 26, 2637–2650. [Google Scholar] [CrossRef] [Green Version] - Song, J.; Gunduz, D.; Choi, W. Optimal scheduling policy for minimizing age of information with a relay. arXiv
**2020**, arXiv:2009.02716. [Google Scholar] - Sun, Y.; Cyr, B. Information aging through queues: A mutual information perspective. In Proceedings of the 2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Kalamata, Greece, 25–28 June 2018; pp. 1–5. [Google Scholar]
- Kam, C.; Kompella, S.; Nguyen, G.D.; Wieselthier, J.E.; Ephremides, A. Towards an effective age of information: Remote estimation of a markov source. In Proceedings of the IEEE INFOCOM 2018—IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Honolulu, HI, USA, 15–19 April 2018; pp. 367–372. [Google Scholar]
- Zheng, X.; Zhou, S.; Niu, Z. Context-aware information lapse for timely status updates in remote control systems. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA, 9–13 December 2019; pp. 1–6. [Google Scholar]
- Zheng, X.; Zhou, S.; Niu, Z. Beyond age: Urgency of information for timeliness guarantee in status update systems. In Proceedings of the 2020 2nd IEEE 6G Wireless Summit (6G SUMMIT), Levi, Finland, 17–20 March 2020; pp. 1–5. [Google Scholar]
- Zheng, X.; Zhou, S.; Niu, Z. Urgency of Information for Context-Aware Timely Status Updates in Remote Control Systems. IEEE Trans. Wirel. Commun.
**2020**, 19, 7237–7250. [Google Scholar] [CrossRef] - Ioannidis, S.; Chaintreau, A.; Massoulié, L. Optimal and scalable distribution of content updates over a mobile social network. In Proceedings of the IEEE INFOCOM 2009, Rio de Janeiro, Brazil, 19–25 April 2009; pp. 1422–1430. [Google Scholar]
- Wang, L.; Sun, J.; Zhou, S.; Niu, Z. Timely Status Update Based on Urgency of Information with Statistical Context. In Proceedings of the 2020 32nd IEEE International Teletraffic Congress (ITC 32), Osaka, Japan, 22–24 September 2020; pp. 90–96. [Google Scholar]
- Nayyar, A.; Başar, T.; Teneketzis, D.; Veeravalli, V.V. Optimal strategies for communication and remote estimation with an energy harvesting sensor. IEEE Trans. Autom. Control
**2013**, 58, 2246–2260. [Google Scholar] [CrossRef] [Green Version] - Cika, A.; Badiu, M.A.; Coon, J.P. Quantifying link stability in Ad Hoc wireless networks subject to Ornstein-Uhlenbeck mobility. In Proceedings of the ICC 2019—2019 IEEE International Conference on Communications (ICC), Shanghai, China, 20–24 May 2019; pp. 1–6. [Google Scholar]
- Sennott, L.I. Constrained average cost Markov decision chains. Probab. Eng. Inf. Sci.
**1993**, 7, 69–83. [Google Scholar] [CrossRef] - Bertsekas, D.P. Dynamic Programming and Optimal Control; Athena Scientific: Belmont, MA, USA, 2000. [Google Scholar]
- Sennott, L.I. Average cost optimal stationary policies in infinite state Markov decision processes with unbounded costs. Oper. Res.
**1989**, 37, 626–633. [Google Scholar] [CrossRef] - Liu, B.; Xie, Q.; Modiano, E. Rl-qn: A reinforcement learning framework for optimal control of queueing systems. arXiv
**2020**, arXiv:2011.07401. [Google Scholar] - Chen, X.; Liao, X.; Bidokhti, S.S. Real-time Sampling and Estimation on Random Access Channels: Age of Information and Beyond. arXiv
**2020**, arXiv:2007.03652. [Google Scholar]

**Figure 3.**Threshold structure of the UoI-optimal updating policy when: (

**a**) the context-aware weight is a first-order Markov process, $\rho =0.05,{p}_{1}=0.001,{p}_{2}=0.01,{p}_{s}=0.9,{\sigma}^{2}=1,{\omega}_{e}=100$; (

**b**) the context-aware weight is i.i.d. over time, $\rho =0.05,{p}_{l}=0.999,{p}_{h}=0.001,{p}_{s}=0.9,{\sigma}^{2}=1,{\omega}_{e}=100$; (

**c**) the context-aware weight is a first order Markov process, $\rho =0.05,{p}_{1}=0.001,{p}_{2}=0.01,{p}_{s}=0.9,{\omega}_{e}=100$, increment in the estimation error during one slot, i.e., $A\left(t\right)\sim Unif(-3,3)$, for $\forall t$; and (

**d**) the context-aware weight is a three-state first-order Markov process, which takes value from ${\omega}_{1}=1,{\omega}_{2}=50,{\omega}_{3}=100$ and evolves according to the state transition matrix ${P}_{3}$, $\rho =0.05,{p}_{s}=0.9,{\sigma}^{2}=1$.

**Figure 4.**Average UoI of the UoI-optimal updating policy, the RL-based updating policy, the update-index-based adaptive scheme [27], and the AoI-optimal updating policy when ${p}_{1}=0.001,{p}_{2}=0.01,{p}_{s}=0.9,{\sigma}^{2}=1,{\omega}_{e}=100$.

**Figure 5.**Average squared estimation error of the UoI-optimal updating policy and the AoI-optimal updating policy when ${p}_{1}=0.001,{p}_{2}=0.01,{\sigma}^{2}=1,{\omega}_{e}=100$.

**Figure 6.**Update thresholds of the UoI-optimal updating policy with different values of ${\omega}_{e}$ when ${p}_{1}=0.001,{p}_{2}=0.01,{p}_{s}=0.9,{\sigma}^{2}=1$.

**Figure 7.**Update thresholds of the UoI-optimal updating policy with different values of ${p}_{2}$ when ${p}_{1}=0.01,{p}_{s}=0.9,{\sigma}^{2}=1,{\omega}_{e}=100$.

**Figure 8.**Average UoI of the RL-based updating policy with different values of L when ${p}_{1}=0.001,{p}_{2}=0.01,{\sigma}^{2}=1,{\omega}_{e}=100$.

**Figure 9.**Average UoI of the RL-based updating policy with different values of K when ${p}_{1}=0.001,{p}_{2}=0.01,{\sigma}^{2}=1,{\omega}_{e}=100$.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Wang, L.; Sun, J.; Sun, Y.; Zhou, S.; Niu, Z.
A UoI-Optimal Policy for Timely Status Updates with Resource Constraint. *Entropy* **2021**, *23*, 1084.
https://doi.org/10.3390/e23081084

**AMA Style**

Wang L, Sun J, Sun Y, Zhou S, Niu Z.
A UoI-Optimal Policy for Timely Status Updates with Resource Constraint. *Entropy*. 2021; 23(8):1084.
https://doi.org/10.3390/e23081084

**Chicago/Turabian Style**

Wang, Lehan, Jingzhou Sun, Yuxuan Sun, Sheng Zhou, and Zhisheng Niu.
2021. "A UoI-Optimal Policy for Timely Status Updates with Resource Constraint" *Entropy* 23, no. 8: 1084.
https://doi.org/10.3390/e23081084