Tracking an Auto-Regressive Process with Limited Communication per Unit Time †

Samples from a high-dimensional first-order auto-regressive process generated by an independently and identically distributed random innovation sequence are observed by a sender which can communicate only finitely many bits per unit time to a receiver. The receiver seeks to form an estimate of the process value at every time instant in real-time. We consider a time-slotted communication model in a slow-sampling regime where multiple communication slots occur between two sampling instants. We propose a successive update scheme which uses communication between sampling instants to refine estimates of the latest sample and study the following question: Is it better to collect communication of multiple slots to send better refined estimates, making the receiver wait more for every refinement, or to be fast but loose and send new information in every communication opportunity? We show that the fast but loose successive update scheme with ideal spherical codes is universally optimal asymptotically for a large dimension. However, most practical quantization codes for fixed dimensions do not meet the ideal performance required for this optimality, and they typically will have a bias in the form of a fixed additive error. Interestingly, our analysis shows that the fast but loose scheme is not an optimal choice in the presence of such errors, and a judiciously chosen frequency of updates outperforms it.


I. INTRODUCTION
We consider the setting of real-time decision systems based on remotely sensed observations. In this setting, the decision maker needs to track the remote observations with high precision and in a timely manner. These are competing requirements, since high precision tracking will require larger number of bits to be communicated, resulting in larger transmission delay and increased staleness of information. Towards this larger goal, we study the following problem.
Consider a discrete time first-order auto-regressive (AR [1]) process X t ∈ R n , t ≥ 0. A sensor draws a sample from this process, periodically once every s time-slots. In each of these time-slots, the sensor can send nR bits to a center. The center seeks to form an estimateX t of X t at time t, with small mean square error (MSE). Specifically, we are interested in minimizing the time-averaged error T t=1 E X t −X t 2 2 /T to enable timely and accurate tracking of X t . We propose and study a successive update scheme where the encoder computes the error in the estimate of the latest sample at the decoder and sends its quantized value to the decoder. The decoder adds this value to its previous estimate to update the estimate of the latest sample, and uses it to estimate the current value using a linear predictor. We instantiate this scheme with a general gain-shape quantizer for error-quantization.
Note that we can send this update several times between two sampling instances. In particular, our interest in comparing a fast but loose scheme where an update is sent every slot or a slower update every p communication slots. The latter allows the encoder to use more bits for the update, but the decoder will need to wait longer. We analyze this scheme for a universal setting and show that the fast but loose successive update scheme, used with an appropriately selected quantizer, is optimal asymptotic.
To show this optimality, we use a random construction for the quantizer, based on the spherical code given in [1], [2]. Roughly speaking, this ideal quantizer Q yields E y − Q(y) We present our analysis for such general quantizers. Interestingly, for such a quantizer (which is all we have at a finite n), the optimal choice of p can differ from 1. Our analysis provides a theoretically sound guideline for choosing the frequency of updates 1/p for practical quantizers.
Our work relates to a large body of literature ranging from real-time compression to control and estimation over networks. The structure of real-time encoders for source coding has been studied in [3]- [9]. The general structure of real-time encoders for Markov sources is studied for communication over error-free channels in [3] and over noisy channels in [4], [5]. A similar structural result for the optimal encoders and decoders which are restricted to be causal is given in [6]. Furthermore, structural results in the context of optimal zero-delay coding of correlated sources are available in [7]- [9]. The setup in all these works are different from the problem we consider and the results do not extend to our problem.

II. PROBLEM FORMULATION
We begin by providing a formal description of our problem; different components of the model are presented in separate sections. Throughout the remainder of this paper, the set of real numbers is denoted by R, the n-dimensional Euclidean space is denoted by R n and the associated Euclidean norm by · 2 , the set of positive integers is denoted by N, the set of non-negative integers is denoted by Z + , the set of continuous positive integers until m is denoted by [m] {1, . . . , m}, and an identity matrix of size n × n is denoted by I n .

A. Observed random process and its sampling
For α ∈ (0, 1), we consider a discrete time auto-regressive process of order 1 (AR [1] process) in R n , where (ξ t ∈ R n , t 1) is an independent and identically distributed (i.i.d.) random sequence with zero mean and covariance matrix σ 2 (1 − α 2 )I n . For simplicity, we assume that X 0 ∈ R n is a zero mean random variable with covariance matrix σ 2 I n . This implies that the variance of X t ∈ R n is σ 2 I n for all t 0. In addition, we assume that X t 2 has a bounded fourth moment at all times t 0. Specifically, let κ > 0 satisfy It is clear that X = (X t ∈ R n , t 0) is a Markov process. We denote the set of processes X satisfying the assumptions above by X n and the class of all such processes for different choices of dimension n as X. This discrete time process is sub-sampled periodically at sampling frequency 1/s, for some s ∈ N, to obtain samples (X ks ∈ R n , k 0).

B. Encoder description
The sampled process (X ks , k 0) is passed to an encoder which converts it to a bit stream. The encoder operates in real-time and sends nRs bits between any two sampling instants. Specifically, the encoder is given by a sequence of mappings (φ t ) t≥0 , where the mapping at any discrete time t = ks is denoted by nRs .
The encoder output at time t = ks is denoted by the codeword C t φ t (X 0 , X s , . . . , X ks ). We represent this codeword by an s-length sequence of binary strings C t = (C t,0 , . . . , C t,s−1 ), where each term C t,i takes values in {0, 1} nR . For t = ks and 0 ≤ i ≤ s − 1, we can view the binary string C t,i as the communication sent at time t + i. We elaborate on the communication channel next.

C. Communication channel
The output bit-stream of the encoder is sent to the receiver via an error-free communication channel. Specifically, we assume slotted transmission with synchronization where in each slot the transmitter sends nR bits of communication error-free. That is, we are allowed to send R bits per dimension, per time slot. Note that there is a delay of 1 time-unit (corresponding to one slot) in transmission of each nR bits. Therefore, the vector C ks,i of nR bits transmitted at time ks+i is received at time instant ks+i+1 for 0 i s − 1. Throughout we use the notation I k {ks, . . . , (k + 1)s − 1} andĨ k = I k + 1 = {ks + 1, . . . , (k + 1)s}, respectively, for the set of transmit and receive times for the strings C ks,i , 0 ≤ i ≤ s − 1.

D. Decoder description
We describe the operation of the receiver at time t ∈ I k , for some k ∈ N, such that i = t − ks ∈ {0, . . . , s − 1}. Upon receiving the codewords C s , C 2s , ..., C (k−1)s and the partial codeword (C ks,0 , ..., C ks,i−1 ) at time t = ks + i, the decoder estimates the current-state X t of the process using the estimator mapping We denote the overall communication received by the decoder until time instant 1 t by C t−1 . Further, we denote byX t|t the real-time causal estimate ψ t (C t−1 ) of X t formed at the decoder at time t. Thus, the overall real-time causal estimation scheme is described by the mappings (φ t , ψ t ) t 0 . It is important to note that the communication available to the decoder at time t ∈ I k can only depend on samples X ℓ up to time ℓ ks. As a convention, we assume thatX 0|0 = 0.

E. Performance metrics
We call the encoder-decoder mapping sequence (φ, ψ) = (φ t , ψ t ) t 0 a tracking code of rate R and sampling period s. The tracking error of our tracking code at time t for process X is measured by the mean squared error (MSE) per dimension given by Our goal is to design (φ, ψ) with low average tracking error D T (φ, ψ, X) given by For technical reasons, we restrict to a finite time horizon setting. For the most part, the time horizon T will remain fixed and will be omitted from the notation. Instead of working with the mean-square error, a more convenient parameterization for us will be that of accuracy, given by Definition 1 (Maxmin tracking accuracy). The worst-case tracking accuracy for X n attained by a tracking code (φ, ψ) is given by The maxmin tracking accuracy for X n at rate R and sampling period s is given by where the supremum is over all tracking codes (φ, ψ).
The maxmin tracking accuracy δ T n (R, s, X n ) is the fundamental quantity of interest for us. Recall that n denotes the dimension of the observations in X t for X ∈ X n and T the time horizon. However, we will only characterize δ T n (R, s, X n ) asymptotically in n and T . Specifically, we define the asymptotic maxmin tracking accuracy as We will provide a characterization of δ * (R, s, X) and present a sequence of tracking codes that attains it. In fact, the tracking code we use is an instantiation of our successive update scheme, which we describe in the next section. It is important to note that our results may not hold if we switch the order of limits above: We need very large codeword lengths depending on a fixed finite time horizon T .

III. THE SUCCESSIVE UPDATE SCHEME
In this section, we present our main contribution in this paper, namely the Successive Update tracking code. Before we describe the scheme completely, we present its different components. In every communication slot, the transmitter gets an opportunity to send nR bits. The transmitter may use it to send any information about a previously seen sample. There are various options for the encoder. For instance, it may use the current slot to send some information about a sample it had seen earlier. Or it may use all the slots between two sampling instants to send a quantized version of the latest sample. Interestingly, it will be seen (quite straightforwardly) that there are not so many options for the decoder; it gets roughly fixed once the encoder is chosen.

A. Decoder structure
Once the quantized information is sent by the transmitter, at the receiver end, the decoder estimates the state X t , using the codewords received until time t. Since we are interested in forming estimates with small MSE, the decoder simply forms the minimum mean square error (MMSE) estimate using all the observations till that point. Specifically, for t ≥ u, denoting bỹ X u|t the MMSE estimate X u formed by the communication C t−1 received before time t, we know (cf. [44]) The following result presents a simple structure forX u|t for our AR[1] model. Lemma 1 (MMSE Structure). The MMSE estimatesX t|t andX t−i|t , respectively, of samples X t and X t−i at any time t ∈ I k and i = t − ks using communication C t−1 are related as . Proof: Recalling the notation I k , we can represent t ∈ I k as t = ks + i for 0 ≤ i ≤ s − 1. From the evolution of the AR[1] process, for 1 ≤ i ≤ s − 1, the sample X ks+i can be expressed in terms of the previous sample X ks as where the innovation sequence (ξ ks+j : j 1) is independent of process samples (X 0 , . . . , X ks ). By our specification, the historical observations C t−1 at the receiver depend only on the process evolution until time ks, namely C t−1 is independent of (ξ ks+j : j 1) conditioned on (X 0 , . . . , X ks ). In particular, E[ξ ks+j |C t−1 ] = 0 for every j ≥ 1. Thus, taking conditional expectation on both sides of (2), we get Therefore, the optimal strategy for the decoder is to use the communication sent to form an estimate for the latest sample and then scale it to form the estimate of the state at the current time instant.

B. Encoder structure: Refining the error successively
The structure of the decoder exposed in Lemma 1 gives an important insight for encoder design: The communication sent between two sampling instants is used only to form estimates of the latest sample. In particular, the communication C ks+1,i transmitted at time t = ks + i must be chosen to refine the previous estimate from E[X ks |C 0 , . . . , C ks , C ks+1,0 , . . . , C ks+1,i−1 ] to E[X ks |C 0 , . . . , C ks , C ks+1,0 , . . . , C ks+1,i−1 , C ks+1,i ]. This principle can be applied (as a heuristic) for any other form of the estimate as follows. LetX ks|t denote the estimate for X ks formed at the receiver at time t (which need not be the MMSE estimateX ks|t ). Our encoder computes the error in the receiver estimate of the last process sample at each time instant t. Denoting the error at time t ∈ I k by Y t X ks −X ks|t , the encoder quantizes this error Y t and sends it as communication C ks+1,i .
Simply speaking, our encoder computes and quantizes the error in the current estimate of the last sample at the decoder, and sends it to the decoder to enable the refinement of the estimate in the next time slot. While we have not been able to establish optimality of this encoder structure, our results will show its optimality asymptotically, in the limit as the dimension n goes to infinity.
Even within this structural simplification, a very interesting question remains. Since the process is sampled once in s time slots, we have, potentially, nRs bits to encode the latest sample. At any time t ∈Ĩ k , the receiver has access to (C 0 , . . . , C (k−1)s ) and the partial codewords (C ks,0 , . . . , C ks,i−1 ) for i = t − ks. A simple approach for the encoder is to use the complete codeword to express the latest sample and the decoder can ignore the partial codewords. This approach will result in slow but very accurate updates of the sample estimates. An alternative fast but loose approach will send nR quantizer codewords to refine estimates in every communication slot. Should we prefer fast but loose estimates or slow but accurate ones? Our results will shed light on this conundrum.

C. The choice of quantizers
In our description of the encoder structure above, we did not specify a key design choice, namely the choice of the quantizer. We will restrict to using the same quantizer to quantize the error in each round of communication. The precision of this quantizer will depend on whether we choose a fast but loose paradigm or a slow but accurate one. However, the overall structure will remain the same. Roughly speaking, we allow any gain-shape [45] quantizer which separately sends the quantized value of the gain y 2 and the shape y/ y 2 for input y. Formally, we use the following abstraction.
The expectation in the previous definition is taken with respect to the randomness in the quantizer, which is assumed to be shared between the encoder and the decoder for simplicity. The parameter M , termed the dynamic range of the quantizer, specifies the domain of the quantizer. When the input y does not satisfy y 2 ≤ √ nM , the quantizer simply declares a failure, which we denote by ⊥. Our tracking code may use any such (θ, ε)-quantizer family. It is typical in any construction of a gain-shape quantizer to have a finite M and ε > 0. Our analysis for finite n will apply to any such (θ, ε)-quantizer family and, in particular, will bring-out the role of the "bias" ε. However, when establishing our optimality result, we instantiate it using a random spherical code to get the desired performance.

D. Description of the successive update scheme
All the conceptual components of our scheme are ready. We use the structure of Lemma 1 and focus only on updating the estimates of the latest observed sample X ks at the decoder. Our encoder successively updates the estimate of the latest sample at the decoder by quantizing and sending estimates for errors Y t .
As discussed earlier, we must decide if we prefer a fast but loose approach or a slow but accurate approach for sending error estimates. To carefully examine this tradeoff, we opt for a more general scheme where the nRs bits available between two samples are divided into m = s/p sub-fragments of length nRp bits each. We use an nRp bit quantizer to refine error estimates for the latest sample X ks (obtained at time t = ks) every p slots, and send the resulting quantizer codewords as partial tracking codewords (C ks,jp , ..., C ks,(j+1)p−1 ), 0 j ≤ m − 1. Specifically, the kth codeword transmission interval I k is divided into m sub-fragments I k,j , 1 ≤ j ≤ m given by and (C ks,jp , ..., C ks,(j+1)p−1 ) is transmitted in communication slots in I k,j .
At time instant t = ks + jp + 1 the decoder receives the jth sub-fragment (C ks,t−ks , t ∈ I k,j ) of nRp bits, and uses it to refine the estimate of the latest source sample X ks . Note that the fast but loose and the slow but accurate regimes described above correspond to p = 1 and p = s, respectively. In the middle of the interval I k,j , the decoder ignores the partially received quantization code and retains the estimateX ks of X ks formed at time ks + (j − 1)p + 1. It forms an estimate of the current state X ks+i by simply scalingX ks by a factor of α i , as suggested by Lemma 1.
Finally, we impose one more additional simplification on the decoder structure. Instead of using MMSE estimates for the latest sample, we simply update the estimate by adding to it the quantized value of the error. Thus, the decoder has a simple linear structure.
We can use any nRp bit quantizer 2 Q p for the n-dimensional error vector, whereby this scheme can be easily implemented in practice if Q p can be implemented. For instance, we can use any standard gain-shape quantizer. The performance of most quantizers can be analyzed explicitly to render them a (θ, ε)-quantizer family for an appropriate M and function θ. Later, when analyzing the scheme, we will consider a Q p coming from a (θ, ε)-quantizer family and present a theoretically sound guideline for choosing p.
Recall that we denote the estimate of X u formed at the decoder at time t u byX u|t . We start by initializingX 0|0 = 0 and then proceed using the encoder and the decoder algorithms outlined above. Note that our quantizer Q p may declare failure symbol ⊥, in which case the decoder must still yield a nominal estimate. We will simply declare the estimate as 3 0 once a failure happens.
We give a formal description of our encoder and decoder algorithms below. The encoder.
2 At time t = ks + jp, use the decoder algorithm (to be described below) to form the estimateX ks|t and compute the error where we use the latest sample X ks available at time t = ks + jp. 3 Quantize Y k,j to nRp bit as Q p (Y k,j ). 4 If quantize failure occurs and Q p (Y k,j ) = ⊥, send ⊥ to the receiver and terminate the encoder. 5 Else, send a binary representation of Q p (Y k,j ) as the communication (C ks,0 , ..., C ks,p−1 ) to the receiver over the next p communication slots 4 . 6 If j < m − 1, increase j by 1; else set j = 0 and increase k by 1. Go to Step 2. The decoder.
2 At time t = ks + jp, if encoding failure has not occurred until time t, computê and outputX t|t = α t−ksX ks|t . 3 Else, if encoding failure has occurred and the ⊥ symbol is received declareX s|t = 0 for all subsequent time instants s t. 4 At time t = ks + jp + i, for i ∈ [p − 1], output 5X t|t = α t−ksX ks|ks+jp . 5 If j < m − 1, increase j by 1; else set j = 0 and increase k by 1. Go to Step 2.

IV. MAIN RESULTS
We present results in two categories. First, we provide an explicit formula for the asymptotic maxmin tracking accuracy δ * (R, s, X). Next, we present a theoretically-founded guideline for selecting a good p for the successive update scheme with a (θ, ε)-quantizer family. Interestingly, the optimal choice may differ from the asymptotically optimal choice of p = 1.
2 With an abuse of notation, we will use Qp instead of Q Rp to denote an nRp bit quantizer. 3 In analysis, we account for all these events as error. Only the probability of failure will determine the contribution of this part to the MSE since the process is mean-square bounded. 4 For simplicity, we do not account for the extra message symbol needed for sending ⊥. 5 We ignore the partial quantizer codewords received as (C ks,jp+1 , C ks,jp+2 , . . . , C ks,jp+i−1 ) till time t.
Note that g(s) is a decreasing function of s with g(1) = 1. The result below shows that, for an appropriate choice of the quantizer, our successive update scheme with p = 1 (the fast but loose version) achieves an accuracy of δ 0 (R)g(s) asymptotically, universally for all processes in X.
Furthermore, this bound can be obtained by a successive update scheme with p = 1 and appropriately chosen quantizer Q p .
We provide a proof in Section VI. Note that while we assume that the per dimension fourth moment of the processes in X is bounded, the asymptotic result above does not depend on that bound. Interestingly, the performance characterized above is the best possible.

Furthermore, the upper bound is obtained by considering a Gauss-Markov process.
We provide a proof in Section VII. Thus, δ * (R, s, X) = δ 0 (R)g(s) with the fast but loose successive update scheme being universally (asymptotically) optimal and the Gauss-Markov process being the most difficult process to track. Clearly, the best possible choice of sampling period is s = 1 and the highest possible accuracy at rate R is δ 0 (R), whereby we cannot hope for an accuracy exceeding δ 0 (R).
Alternatively, the results above can be interpreted as saying that we cannot subsample at a frequency less than 1/⌊g −1 (δ/δ 0 (R))⌋ for attaining a tracking accuracy δ δ 0 (R).

B. Guidelines for choosing a good p
The proof of Theorem 2 entails the analysis of the successive update scheme for p = 1. In fact, we can analyze this scheme for any p ∈ N and for any (θ, ε)-quantizer family; we term this tracking code the p-successive update (p-SU) scheme. This analysis can provide a simple guideline for the optimal choice of p depending on the performance of the quantizer.
However, there are some technical caveats. A quantizer family will operate only as long as the input y satisfies y 2 ≤ M . If a y outside this range is observed is observed, the quantizer will declare ⊥ and the tracking code encoder, in turn, will declare a failure. We denote by τ the stopping time at which encoder failure occurs for the first time, i.e., Further, denote by A t the event that failure does not occur until time t, i.e., We characterize the performance of a p-SU in terms of the probability of encoder failure in a finite time horizon T .
Theorem 4 (Performance of p-SU). For fixed θ, ε, β ∈ [0, 1], consider the p-SU scheme with an nRp bit (θ, ε)-quantizer Q p , and denote the corresponding tracking code by (φ p , ψ p ). Suppose that for a time horizon T ∈ N, the tracking code (φ p , ψ p ) satisfies P (τ ≤ T ) ≤ β. Then, sup where B T (θ, ε, β) satisfies We remark that β can be made small by choosing M to be large for a quantizer family. Furthermore, the inequality in the upper bound for the MSE in the previous result (barring the dependence on β) comes from the inequality in the definition of a (θ, ε)-quantizer, rendering it a good proxy for the performance of the quantizer. The interesting regime is that of very small β where the encoder failure doesn't occur during the time horizon of operation. If we ignore the dependence on β, the accuracy of the p-SU does not depend either on s or on the bound for the fourth moment κ. Motivated by these insights, we define the accuracy-speed curve of a quantizer family as follows.
Definition 3 (The accuracy-speed curve). For α ∈ [0, 1], σ 2 , and R > 0, the accuracy-speed curve for a (θ, ε)-quantizer family Q is given by By Theorem 4, it is easy to see that the accuracy (precisely the upper bound on the accuracy) of a p-SU scheme is better when Γ Q (p) is larger. Thus, a good choice of p for a given quantizer family Q is the one that maximizes Γ Q (p) for 1 ≤ p ≤ s.
We conclude by providing accuracy-speed curves for some illustrative examples. To build some heuristics, note that a uniform quantization of [−M, M ] has θ(R) = 0 and ε = M 2 −R . For a gain-shape quantizer, we express a vector y = y 2 y s where the shape vector y s has y s 2 = 1. An ideal shape quantizer (which only can be shown to exist asymptotically) using R bits per dimension will satisfy E ŷ s − y s 2 2 ≤ 2 −2R , similar to the scalar uniform quantizer. In one of the examples below, we consider gain-shape quantizers with such an ideal shape quantizer.
Example 1. We begin by considering an ideal quantizer family with θ(R) = 2 −2R and ε = 0. In our asymptotic analysis, we will show roughly that such a quantizer with very small ε exists. For this ideal case, for R > 0, the accuracy-speed curve is given by It can be seen that Γ Q (p) is decreasing in p whereby the optimal choice of p that maximized Γ Q (p) over p ∈ [s] is p = 1. Heuristically, this justifies why asymptotically the fast but loose successive update scheme is optimal.
Example 2 (Uniform scalar quantization). In this example, we consider a coordinate-wise uniform quantizer. Since we seek quantizers for inputs y ∈ R n such that y 2 ≤ M √ n, we can only use uniform quantizer of [−M √ n, M √ n] for each coordinate. For this quantizer, we have θ = 0 and ε 2 = nM 2 2 −2R , whereby the accuracy-speed curve is given by Γ Q (p) = α 2p (1 − nM 2 2 −2R /σ 2 ). Thus, once again, the optimal choice of p that maximizes accuracy is p = 1.
Example 3 (Gain-shape quantizer). Consider the quantization of a vector y = ay s where a = y 2 . The vector y is quantized by a gain-shape quantizer which quantizes the norm and shape of the vector separately to give Q(y) =âŷ s . We use a uniform quantizer within a fixed range [0, M √ n] in order to quantize the norm a toâ, where an ideal shape quantizer is employed in quantizing the shape vector y s . Namely, we assume E y s −ŷ s 2 2 2 −2R and ŷ s 1. Suppose, that we allot ℓ bits out of the total budget of nR bits for norm quantization and the rest for shape quantization. Then, we see that whereby θ(R) = 2 −2(R−ℓ/n)+1 and ǫ 2 = M 2 2 −2ℓ−1 . Thus, the accuracy-speed curve is given by Note that the optimal choice of p in this case depends on the choice of M .
We illustrated application of our analysis for idealized quantizers, but it can be used to analyze even very practical quantizers, such as the recently proposed almost optimal quantizer in [46]. V. ANALYSIS OF THE SUCCESSIVE UPDATE SCHEME From the discussion in section III, we observe that the successive update scheme is designed to refine the estimate of X ks in each intervalĨ k . This fact helps us in establishing a recursive relation for D t (φ p , ψ p , X), t ∈Ĩ k in terms of D ks (φ p , ψ p , X) which is provided next. 1 and k 0, let (φ p , ψ p ) denote the tracking code of a p-SU scheme employing an nRp bit (θ, ǫ)-quantizer. Assume that P (A c t ) β 2 . Then, we have

Lemma 5. For a time instant t
Proof: From the evolution of the AR [1] process defined in (1), we see that X t = α t−ks X ks + t u=ks+1 α t−u ξ u . Further for the p-SU scheme, we know thatX t|t = α t−ksX ks|ks+jp at each instant t = ks + jp + i. Therefore, we have Since the estimateX ks|ks+jp is a function of samples (X 0 , . . . , X ks ), and the sequence (ξ u , u ks) is independent of the past, we obtain the per dimension MSE as Further, we divide the error into two terms based on occurrence of the failure event as follows: Recall that at each instant t = ks + jp, we refine the estimateX ks|ks+(j−1)p of X ks toX ks|ks+jp = (X ks|ks+(j−1)p + Q p (Y k,j−1 ))½ At . Upon substituting this expression forX ks|ks+jp , we obtain where the identity uses the definition of error Y k,j−1 given in (3) and the inequality holds since Q p is a (θ, ε)-quantizer.
Repeating the previous step recursively, we get which is the same as Moving to the error term E[ X ks −X ks|ks+jp 2 2 ½ A c t ] when encoder failure occurs, recall that the decoder sets the estimate to 0 in the event of an encoder failure. Thus, using the Cauchy-Schwartz inequality, we get κβ.
Substituting the two bounds above in (5), we get the result.
The following recursive bound can be obtained using almost the same proof as that of Lemma 5; we omit the details.
Lemma 6. Let (φ p , ψ p ) denote the tracking code of a p-SU scheme employing an nRp bit (θ, ǫ)-quantizer. Assume that P (A c t ) β 2 . Then, we have We also need the following technical observation.

Lemma 7.
For a sequence (X k ∈ R : k ∈ Z + ) that satisfies sequence of upper bounds with constants a, b ∈ R such that b is finite and a ∈ (−1, 1), we have Proof: From the sequence of upper bounds, we can inductively show that Averaging X k over the horizon {0, . . . , K − 1}, we get From the finiteness of X 0 , b and the fact that |a| < 1, the result follows by taking the limit K growing arbitrarily large on both sides. We are now in a position to prove Theorem 4.

Proof of Theorem 4:
We begin by noting that, without any loss of generality, we can restrict to T = Ks. This holds since the contributions of the error term within the fixed interval I K are bounded. For T = Ks, the time duration {0, . . . , T } can be partitioned into intervals (I k , k + 1 ∈ [K]). Therefore, we can write the average MSE per dimension for the p-SU scheme for time-horizon T = Ks as From the upper bound for per dimension MSE given in Lemma 5, we get Summing the expression above over j ∈ {0, ..., m − 1} and k ∈ {0, ..., K − 1}, and dividing by T , we get It follows by Lemma 6 that where the A denotes the set of {a k } k≥0 satisfying We denote the right-side of (6) by B T (θ, ǫ, β). Noting that by Lemma 7 any sequence {a k } k≥0 ∈ A satisfies we get that which completes the proof.

VI. ASYMPTOTIC ACHIEVABILITY USING RANDOM QUANTIZER
With Theorem 4 at our disposal, the proof of achievability can be completed by fixing p = 1 and showing the existence of appropriate quantizer. However, we need to handle the failure event, and we address this first. The next result shows that the failure probability depends on the quantizer only through M . Proof: The event A T (of encoder failure not happening until time T for the successive update scheme) occurs when the errors Y k,j satisfies Y k,j 2 2 ≤ nM 2 , for every k ≥ 0 and 0 ≤ j ≤ s − 1 such that t = ks + j ≤ T . For brevity, we denote by Y t the error random variable Y k,j and Y t−1 = (Y 1 , ..., Y t−1 ). We note that Denoting by β 2 T the probability P (A c T ), the previous two inequalities imply We saw earlier in the proof of Lemma 5 that E[ Y T −1 2 2 ]/n depends only on the probability β 2 T −1 that failure doesn't occur until time T − 1. Proceeding as in that proof, we get , where c 1 and c 2 do not depend on n. Therefore, there exists M 0 independent of n such that for all M exceeding M 0 we have β 2 T ≤ β 2 T −1 + η, which completes the proof by summing over T .
The bound above is rather loose, but it suffices for our purpose. In particular, it says that we can choose M sufficiently large to make probability of failure until time T less than any β 2 , whereby Theorem 4 can be applied by designing a quantizer for this M . Indeed, we can use the quantizer of unit sphere from [1], [2], along with a uniform quantizer for gain (which lies in [−M, M ]) to get the following performance. In fact, we will show that a deterministic quantizer with the desired performance exists. Note that we already considered such a quantizer in Example 3. But the analysis there was slightly loose, and assumed the existence of an ideal shape quantizer.
Proof: We first borrow a classic construction from [1], [2], which gives us our desired shape quantizer. Denote by S n the (n − 1)-dimensional unit sphere {y ∈ R n : y 2 = 1}. For every γ > 0 and n sufficiently large, it was shown in [1], [2] that there exist 2 nR vectors C in S n such that for every y ∈ S n we can find y ′ ∈ C satisfying Denoting cos θ = √ 1 − 2 −2(R−γ) , consider the shape quantizer Q R (y) from [2] given by Note that we shrink the length of y ′ by a factor of cos θ, which will be seen to yield the gain over the analysis in Example 3. We append to this shape quantizer the uniform gain quantizer For every y ∈ R n such that y 2 2 ≤ nM 2 , we consider the quantizer For this quantizer, for every y ∈ R n with y 2 2 = nB 2 such that B ≤ M , we have y − Q(y) 2 2 = y 2 2 + Q(y) 2 2 − 2 y, Q(y) = nB 2 + nB 2 cos 2 θ − 2nBB cos θ ỹ, Q R (ỹ) ≤ nB 2 + nB 2 cos 2 θ − 2nBB cos 2 θ = nB 2 sin 2 θ + n(B −B) 2 cos 2 θ ≤ nB 2 sin 2 θ + nε 2 cos 2 θ where the first inequality uses the covering property of C. Therefore, Q constitutes an nR + ℓ bit 2 −2(R−γ),ε -quantizer with dynamic range M , for all n sufficiently large. Note that this quantizer is a deterministic one.
Proof of Theorem 2: For any fixed β and ε, we can make the probability of failure until time T less than β by choosing M sufficiently large. Further, for any fixed R, γ > 0, by Lemma 9, we can choose n sufficiently large to get an nR bit (2 −2(R−γ) , ε)-quantizer for vectors y with y 2 2 ≤ nM 2 . Therefore, by Theorem 4 applied for p = 1, we get that The proof is completed upon taking the limits as ε, γ, and β go to 0.
VII. CONVERSE BOUND : PROOF OF THEOREM 3 The proof is similar to the converse proof in [32], but now we need to handle the delay per transmission. We rely on the properties of entropy power of a random variable. Recall that for a continuous random variable X taking values in R n , the entropy power of X is given by where h(X) is the differential entropy of X. Consider a tracking code (φ, ψ) of rate R and sampling period s and a process X ∈ X n . We begin by noting that the state at time t is related to the state at time t + i as where the noise i−1 j=0 α j ξ t+i−j is independent of X t (and the past states). In particular, for t = ks + i, 1 i < s, we get where we defineX t := α −iX t|t and the first identity uses the orthogonality of noise added in each round from the previous states and noise. Since the Gaussian distribution has the maximum differential entropy among all continuous random variables with a given variance, and the entropy power for a Gaussian random variable equals its variance, we get that Therefore, the previous bound for tracking error yields where the identity uses the assumption that ξ t are identically distributed for all t. Taking average of these terms for t = 0, .., T , we get D T (φ, ψ, X) = 1 nKs Note thatX ks+i s act as estimates of X ks which depend on the communication received by the decoder until time ks + i. We denote the communication received at time t by C t−1 , wherebyX ks+i depends only on C 1 , ..., C ks+i−1 . In particular, the communication C ks , ..., C ks+i−1 was sent as a function of X ks , the sample seen at time t = ks. From here on, we proceed by invoking the "entropy power bounds" for the MSE terms. For random variables X and Y such that P X|Y has a conditional density, the conditional entropy power is given by N (X|Y ) = 1/(2πe)2 2h(X|Y )/n . 6 Bounding MSE terms by entropy power is a standard step that allows us to track reduction in error due to a fixed amount of communication.
We begin by using the following standard bound (see [47,Chapter 10]): 7 For a continuous random variable X and a discrete random variable Y taking {0, 1} nR values, letX be any function of Y . Then, it holds that We apply this result to X ks given C ks−1 in the role of X and the communication C ks , .., C ks+i−1 in the role of Y . The previous bound and Jensen's inequality yield Next, we recall the entropy power inequality (cf. [47]): For independent X 1 and X 2 , N (X 1 + X 2 ) N (X 1 ) + N (X 2 ). Noting that X ks = α s X (k−1)s + s−1 j=0 α j ξ ks−j , where {ξ i } is an iid zero-mean random variable independent of X (k−1)s , and that C ks−1 is a function of X 1 , ..., X (k−1)s , we get where the previous identities utilizes the scaling property of differential entropy. Upon combining the bounds given above and simplifying, we get Finally, note that the terms N (X (k−1)s |C ks−1 ) are exactly the same as that considered in [32, eqn. 11e] since they correspond to recovering X (k−1)s using communication that can depend on it. Therefore, a similar expression holds here, for the sampled process {X ks : k ∈ N} . Using the recursive bound for the tracking error in (7) and (8), we adapt the results of [32, eqn. 11] for our case to obtain E[N (X (k−1)s |C ks−1 )] d * k−1 , where the quantity d * k is given by the recursion with d * 0 = 0. The bound obtain above holds for any given process X ∈ X n . To obtain the best possible bound we substitute ξ 1 to be a Gaussian random variable, since that would maximize N (ξ 1 ). Specifically, we set {ξ k } to be a Gaussian random variable with zero mean and variance σ 2 to get N (ξ) = σ 2 (1 − α 2 ). Thus, taking supremum over all distributions on both sides of (9), we have sup X∈Xn D T (φ, ψ, X) σ 2 (1 − α 2s )α 2s 2 −2Rs s(1 − α 2 2 −2R ) As the previous bound holds for all tracking codes (φ, ψ), it follows that δ * (R, s, X) g(s)δ 0 (R).

VIII. DISCUSSION
We restricted our treatment to an AR [1] process with uncorrelated components. This restriction is for clarity of presentation, and some of the results can be extended to AR [1] processes with correlated components. In this case, the decoder will be replaced by a Kalman-like filter in the manner of [27]. A natural extension of this work is the study of an optimum transmission strategy for an AR[n] process in the given setting. In an AR[n] process, the strategy of refining the latest sample is clearly not sufficient as the value of the process at any time instant is dependent on the past n samples. If the sampling is periodic, even the encoder does not have access to all these n samples unless we take a sample at every instant. A viable alternative is to take n consecutive samples at every sampling instant. However, even with this structure on the sampling policy, it is not clear how must the information be transmitted. A systematic analysis of this problem is an interesting area of future research.
Another setting which is not discussed in the current work is where the transmissions are of nonuniform rates. Throughout our work, we have assumed periodic sampling and transmissions at a fixed rate. For the scheme presented in this paper, it is easy to see from our analysis that only the total number of bits transmitted in each sampling interval matters, when the dimension is sufficient large. That is, for our scheme, even framing each packet (sent in each communication slot) using unequal number of bits will give the same performance as that for equal packet size, if the overall bit-budget per sampling period is fixed. A similar phenomenon was observed in [31], which allowed the extension of some of their analysis to erasure channels with feedback. We remark that a similar extension is possible for some of our results, too. This behavior stems from the use of successive batches of bits to successively refine the estimate of a single sample within any sampling interval, whereby at the end of the sampling interval the error corresponds to roughly that for a quantizer using the total number of bits sent during the interval. In general, a study of nonuniform rates for describing each sample, while keeping bits per time-slot fixed, will require us to move beyond uniform sampling. This, too, is an interesting research direction to pursue.
Finally, we remark that the encoder structure we have imposed, wherein the error in the estimate of the latest sample is refined at each instant, is optimal only asymptotically and is justified only heuristically for fixed dimensions. Even for one dimensional observation it is not clear if this structure is optimal. We believe that this is a question of fundamental interest which remains open.