Potential Well in Poincaré Recurrence

From a physical/dynamical system perspective, the potential well represents the proportional mass of points that escape the neighbourhood of a given point. In the last 20 years, several works have shown the importance of this quantity to obtain precise approximations for several recurrence time distributions in mixing stochastic processes and dynamical systems. Besides providing a review of the different scaling factors used in the literature in recurrence times, the present work contributes two new results: (1) For ϕ-mixing and ψ-mixing processes, we give a new exponential approximation for hitting and return times using the potential well as the scaling parameter. The error terms are explicit and sharp. (2) We analyse the uniform positivity of the potential well. Our results apply to processes on countable alphabets and do not assume a complete grammar.


Introduction
The close relation between the Extreme Value Theory (EVT) and the statistical properties of Poincaré recurrence has been recently quite well explored. The starting point is that the exceedances of a stochastic process to a sequence of barrier values a n > 0, n ∈ N, can be considered as hittings to a sequence of nested sets. More precisely, if one defines the semi-infinite intervals: A n = (a n , ∞), and considers a sequence of random variables X 1 , X 2 , . . ., one has the equivalence: max{X 1 , . . . , X t } > a n if and only if T A n ≤ t, where for any measurable set A, T A denotes the smallest k such that X k ∈ A. As the sequence of levels a n diverges, the sets A n are nested. This equivalence allows building a bridge between two historically independent theories: Extreme Value Theory (EVT) and Poincaré Recurrence Theory (PRT). While EVT focuses on the existence (and identification) of the limit of the distribution of the partial maxima and k-maxima, among others [1][2][3][4][5], the aim of recent works on PRT is to understand the statistical properties of the different notions of return times. The present paper stands on the approach of PRT. Our interest is the statistics of visits of a random process X t , t ∈ N to a given target measurable set. Asymptotic statistics are obtained by studying sequences of target sets A n , n ≥ 1, usually of a measure shrinking to zero. In this context and for certain classes of processes, hitting and return times with respect to a given sequence of target sets converge to the exponential distribution, modelling the unpredictability of rare events. However, this rough affirmation is full of nuances that need to be established in very precise terms. It turns out that these details bring much information on the system. For instance, for two observables having the same probability, the Ergodic theorem says that, macroscopically, their numbers of occurrences are about the same. However, these occurrences can appear scattered in a very different way along time. Under some strong mixing assumptions, it is a well-known fact of the literature that for nested sequences of observables with the same probability, the asymptotic observation of one of them can be distributed as a Poisson process while the other one will follow a compound Poisson process. Thus, the dichotomy Poisson/compound Poisson in the same system is determined also by the intrinsic properties of the target sets considered.
In the setting of the present paper, the target sets are finite strings of symbols (patterns). In this case, even if the process is a sequence of independent random variables, the successive occurrences of the string are not independent because the structure of the pattern itself enters the game, allowing or avoiding consecutive observations due to possible overlaps with itself. This leads to a dichotomy between aperiodic/periodic patterns that yields, in the limit of long patterns, the dichotomy Poisson/compound Poisson mentioned before. In passing, let us also mention that this dichotomy also exists in EVT where it is referred to as the phenomenon of clustering/non-clustering of maxima, and it has generated a great deal of research over the last two decades [3].
Let us now be more specific about what we are doing here. First, we operate in the context of discrete time stochastic processes with countable alphabet enjoying φ-mixing. Fix any point x, that is any right infinite sequence of symbols taken from the alphabet, and consider the nested sequence of neighbourhoods corresponding to the first n symbols of x, namely A n = (x 0 , . . . , x n−1 ), n ≥ 1. The main theorem of the paper, Theorem 1, gives explicit and computable error terms for the approximation of the hitting time distribution µ(T A n > t) and return time distribution µ A n (T A n > t), by exponential distributions whose parameter is explicit and depends on A n .
The first main advantage of Theorem 1 is that it uses the potential well as the scaling parameter. In other words, the potential well is the probability, conditioned on starting from A n , that the pattern A n does not reappear at the first possible moment it could reappear. The use of this simple and well-defined quantity as the scaling parameter contrasts with previous works using parameters whose expressions are hardly explicit and even more hardly computable. Another advantage of Theorem 1 is that, unlike a whole body of literature obtaining almost sure results, our results hold for all x. This allows distinguishing different limiting distributions, as for example in the periodic/aperiodic dichotomy described above that almost sure results cannot detect. Finally, the error terms of our approximations are not in total variation distance, but in the stronger point-wise form with respect to the time scale.
The closest result of the literature, but restricted to return times, is the main theorem of [6]. Our Theorem 1, besides considering also the case of hitting times, also extends the class of processes to which the approximations apply. First, we do not assume the finiteness of the alphabet, allowing countably infinite alphabets. Second, our processes are not assumed to have a complete grammar, as was the case of [6]. In ergodic theory terminology, this means that we do not restrict to the full shift. Yet another contribution here is that we specify a tighter and simpler error term under the stronger assumption of ψ-mixing in both hitting and return time approximations.
Last but not least, Theorem 1 corrects the exponential approximation obtained by [6] for return times. Indeed, the error term of their Theorem 4.1 contains a mistake for small t's.
The other main novelty of the present work is Theorem 2, stating that (1) the potential well almost surely converges to one as the size of the patter diverges under φ-mixing and (2) the potential well is uniformly bounded away from zero when we have ψ-mixing or φ-mixing with summable function φ(n). Naturally, as a conditional probability, we know that the potential well belongs to the interval [0, 1] for any n ≥ 1 and any pattern A n . However, it was proven that the potential well could be arbitrarily close to zero for β-mixing processes, a slightly weaker mixing assumption than φ-mixing. Indeed, it was shown in [7] that for the binary renewal process, with specific choices of transition probabilities and target sets A n , n ≥ 1, the potential well of A n vanishes as n diverges. Note that the border is thin between this β-mixing example and our Theorem 2 holding for ψ-mixing and φ-mixing with summable φ (see the review of [8] on the distinct mixing assumptions). We conjecture that the assumption of the summability of the φ rates can be dropped.
A fundamental quantity is the shortest possible return of the pattern A n , denoted τ(A n ), which is in particular used to define the potential well. It is known [9,10] that under the assumptions of specification and positive entropy, τ(A n )/n converges almost surely to one. Since we do not assume a complete grammar, τ(A n ) is not bounded above by n, and it could be strictly larger than n in general. Here, we prove an important technical result (Lemma 2) giving explicit uniform upper bounds, which hold for sufficiently large n and under either φ-or ψ-mixing.
To conclude about the importance of the present work as a whole, let us mention that our results are fundamental for the study of further recurrence quantities, such as the return time function [11,12] and the waiting time function [13,14], establishing a link with information theory. These random variables are known to satisfy a counterpart of the famous Shannon-McMillan-Breiman theorem (asymptotic equipartition property). In order to study the fluctuations of these limit theorems, for instance a large deviation principle, we need to control the return/hitting time exponential approximations for any point and any t > 0. This was particularly clear in [15,16], studying the fluctuations of the waiting time and return time, respectively. It is also interesting to notice that it was [17] who first pointed out the importance of seeking exponential approximations for any point x, and it was precisely to study the small and large fluctuations of the return time function.
The paper is organized as follows. We describe in Section 2 the setting of the paper in the context of PRT, defining carefully the types of exponential approximations we are interested in and explaining, including through an extensive bibliography, the role of the potential well as the scaling parameter. Section 3 contains the main results, and Section 4 is dedicated to their proofs.

The Framework of Mixing Processes
Consider a countable set A that we call the alphabet. By N, we denote the set of nonnegative integers and, by X := A N , the set of right infinite sequences x = (x 0 , x 1 , . . .) of symbols taken from A. Given a point x ∈ X and for any finite set I ⊂ N, the cylinder sets with the base in I is defined as the set A I (x) := {y ∈ X : y i = x i , i ∈ I}. In the particular case where I = {0 . . . , n − 1}, we write A n (x) and sometimes abuse notation, writing x n−1 0 . We endow X with the σ-algebra F generated by the class of cylinder sets {A I : I ⊂ N, |I| < ∞}. Furthermore, F I denotes the σ-algebra generated by A I (x), x ∈ X . For the special case in which I = {i, . . . , j}, 0 ≤ i ≤ j ≤ ∞, we use the notation F j i . We use the shorthand notation a j i := (a i , a i+1 , . . . , a j ), 0 ≤ i ≤ j < ∞ for finite strings of consecutive symbols of A. When necessary, A n (x) will be naturally identified with the sequence x n−1 0 . The shift operator σ : X → X shifts the point x = (x 0 , x 1 , x 2 , . . . ) to the left by one coordinate, (σx) i = x i+1 , i ≥ 0.
We consider a shift invariant (or stationary) probability measure µ on (X , F ). For any is the conditional measure µ restricted to A.
Our results are stated under two mixing conditions that we now define. For all n ≥ 1, define: Note that ψ(n) and φ(n) are nonincreasing sequences, since F i 0 ⊂ F i+1 0 for every i ≥ 0. Definition 1. We say that the measure µ on (X , F ) is φ-mixing (resp. ψ-mixing) if φ(n) (resp. ψ(n)) goes to zero as n diverges. We say that µ is "summable φ-mixing" if it is φ mixing with ∑ n φ(n) < ∞.
We refer to [8] for an exhaustive review of mixing properties and examples.

Recurrence Times and Exponential Approximations
The hitting time of a point y to a set A ∈ F is defined by: For sets A of small measure (rare events) and under mixing conditions such as the ones introduced in the preceding subsection, it is expected that µ(T A > t) is approximately exponentially distributed. This is what we call hitting time exponential approximation. Similarly, when we refer to return time, we mean that we study the approximation of µ A (T A > t), that is the measure of the same event, conditioned on the points starting in A.
In this paper, we are interested in the case where we fix any point x and consider A n (x) as the target set. When n diverges, the measure of A n (x) vanishes, leading to rare events. The scaling parameter of the exponential approximation depends on the point x.
The two main types of approximations that appeared in the literature when approximating the hitting/return time distributions around any point x of the phase space are a total variation distance type and a pointwise type.
• Type 1: Total variation distance. For any x ∈ X , -Hitting times: • Type 2: Pointwise. For any x ∈ X and any t > 0, -Hitting times: -Return times: Note that in the return time approximation, the parameters θ andθ need not be equal. However, such approximation leads to: In view of Kac lemma, which, we recall, states that E A (T A ) = 1 µ(A) , the last display suggests that θ andθ must be close.

Potential Well: Definition and Genealogy in PRT
As already explained, the potential well will be used as the scaling parameter in the exponential approximations of Types 1 and 2 defined above. In order to define it, we need first to define the shortest possible return of a set A ∈ F (to itself): or, equivalently: In the case where A = A n (x), we define τ n (x) = τ(A n (x)). The τ n : X → N constitutes a sequence of simple functions. Notice that the above alternative definitions of τ(A) involve the measure µ A , while the traditional definition is completely topological. This is to account for the case where the measure does not have a complete grammar (We say that µ has a complete grammar if µ(A n (x)) > 0 for any x ∈ X and n ≥ 1).
The first possible return time τ n (x) is an object of independent interest, which was studied under several perspectives in the literature. Let us mention that its asymptotic concentration was proven by [9,10], large deviations in [18][19][20], and fluctuations in [21,22].
Obviously, by definition, µ A (T A ≥ τ(A)) = 1. If for a point x ∈ A, we have T A (x) > τ(A), we say that x escapes from A. The potential well of order n at x is precisely the proportional measure of points of A that escape from A: Since we are interested in the case where A = A n (x), we use the alternative notation ρ(x n−1 0 ) instead of ρ(A n (x)). Besides being explicitly computable in many situations, the potential well is physically meaningful and, as the scaling parameter, provides precise exponential approximations for recurrence times under suitable mixing assumptions.
We give below a small genealogy of scaling parameters that appeared in the literature that consider results holding for all points to get approximations for hitting/return times.

•
As far as we know, the first paper to prove exponential approximations for hitting time statistics for all points is due to Aldous and Brown [23]. They obtained Type 1 approximations in the case of reversible Markov chains. The parameter used there was just the inverse of the expectation, which is mandatory to use when the approximating law is the exponential distribution. However, this does not bring information about the value of the expectation. • Galves and Schmitt [24] obtained Type 1 approximations for hitting times in ψ-mixing processes. The major breakthrough there was that the authors provided an explicit formula for the parameter (denoted by λ(A)). This quantity could be viewed as the grandfather of ρ. Nonetheless, its explicit significance was not evident. • References [25,26] gave exponential approximations (Type 1 and Type 2, respectively) of the distribution of hitting time around any point using a scaling parameter. In [26], however, only its existence and necessity were proven, the calculation of λ being intractable in general. The main problem is that λ(A) depends on the recurrence property of the cylinder A up to large time scales (usually of the order of µ(A) −1 ). • In order to circumvent this issue, Reference [25] also provided, in the context of approximations of Type 1, another scaling parameter, easier to compute, but with a slightly larger error term as a price to pay. It is defined as follows: This quantity depends on, at most, the 2n first coordinates of the process. ζ s (x n−1 0 ) can be seen as the father of the potential well. Both works [25,26] led with processes enjoying ψ-mixing or summable φ-mixing. • The use of the potential well ρ as the scaling parameter was firstly proposed by Abadi in [27], still in the context of an approximation of Type 1 for hitting and return times. More specifically, it is proven that, for exponentially α-mixing processes, λ and ζ (grandpa and father of ρ) can be well approximated by ρ. • The first paper to really directly use ρ as the scaling parameter was [6], in which a Type 2 approximation for return times was obtained, withθ = θ = ρ. The process is assumed to be φ-mixing. • Focusing on proving exponential approximations for hitting and return times under the largest possible class of systems, and still for all points, Abadi and Saussol [28] returned to the approach of Galves and Schmitt. Their results hold under the α-mixing condition, which is the weakest hypothesis used up to date, but the scaling parameter is not explicit. • Focussing on the specific class of binary renewal processes, Reference [7] proved a Type 1 approximation for hitting and return times using the potential well ρ. One interesting aspect concerning this work lies in the fact that the renewal process is β-mixing (weaker than the φ-mixing assumed by [6]). Moreover, the authors managed to use the renewal property to compute the limit of ρ(A n (x)) for any point x. In other words, the approximating asymptotic law for hitting and return times was explicitly computed as a function of the parameters of the process. This result shows the usefulness of the potential well, an "easy to compute" scaling parameter.

Main Results
Theorem 1 below presents Type 2 approximations for hitting and return time under φand ψ-mixing conditions with the potential well as the scaling parameter and an explicit error term.
Before we can state this result, we first need to define the second order periodicity of string A n (x), which plays a crucial role for the size of the error term.

Second Order Periodicity
The short returns that we will define here are precisely those that are difficult to treat as (almost) independent. They not only depend on the correlation decay of the system, but also on the particular properties of the string itself. Technically, for an n-cylinder, short means returning in up to the order n steps.
Consider the cylinder A, and suppose τ(A) = k. Write n = qk + r, where q ∈ N and 0 ≤ r < k, and note that the cylinder overlaps itself in all multiples of k smaller than n. The set P (A) := {mk : 1 ≤ m ≤ q} s the indexes of possible returns at multiples of τ(A), but returns can also occur at other time indexes after that. Let: A point y ∈ A could only return to A before n at time indexes in P (A) ∪ R(A), but there is a crucial difference between them. A point that escapes from A cannot return in P (A), but it could return in R(A). Namely, We set n A as the first possible return to A, among those points x ∈ A that escape A at τ(A): We refer to [6] for an example that illustrates these ideas under a complete grammar and a finite alphabet. In the general case, notice that n A could be strictly lager than n, if one is not considering the full shift. For a complete example, consider the house of cards Markov chain. It has transition matrix Q on N with entries Q(i, 0) = q i = 1 − Q(i, i + 1) where q i , i ≥ 0 is a sequence of real numbers taking values in the interval (0, 1). Therefore, it is defined on an infinite alphabet and does not have a complete grammar, since several transitions are forbidden due to the sparse nature of Q. Consider the strings A = 00010001000 and B = (n + 1) · · · 2n (for some n ≥ 1) generated by this Markov chain in the stationary measure. Then, we see that τ(A) = 4, R(A) = {9, 10}, and n A = 9, while, on the other hand, τ(B) = 2n + 1 > n, R(B) = ∅, and n B = 3n + 2.

Type 2 Approximations Scaled by the Potential Well
For any finite string A, let us denote by By definition, φ(g) is finite for all g ≥ 1. This is not the case for ψ(g). Thus, for ψmixing measures, we define: Now, for the error term, define: where n * := min{n, n A }. Note that cylinders A of size n verify that n A ≥ n/2, then ψ is well defined for all n > 2g 0 .
We use to denote either ψ or φ when the argument/statement is general.
Consider a stationary measure µ on (X , F ) enjoying either φ-mixing with sup A∈A n µ(A)τ(A) n −→ 0, or simply ψ-mixing. There exist five positive constants C i , i = 1, . . . , 5, and n 0 ∈ N such that for all n ≥ n 0 and all A ∈ A n , the following inequalities hold.
• For all t ≥ 0: For all t ≥ τ(A): Theorem 1 (and its proof) were definitely inspired by [6] and their Theorem 4.1. However, let us first observe that our result provides the first statement of the literature for Type 2 hitting time approximations with the potential well as the scaling parameter. Moreover, contrary to [6], we do not assume a complete grammar nor a finite alphabet.
Let us make some further important observations concerning this theorem.

Remark 1. Under φ-mixing, the assumption sup µ(A)τ(A)
n −→ 0 can be dropped under certain circumstances. For instance, if the measure has a complete grammar, we have τ(A) ≤ n, and the assumption is granted using Lemma 1. Another way is to assume that µ is summable φ-mixing, as mentioned after Lemma 2 in Section 4.

Remark 2.
According to Lemma 1, if µ is φ-mixing (and a fortiori, ψ-mixing), there exist constants C and c such that µ(A) ≤ Ce −cn for all n ≥ 1 and A ∈ A n . On the other hand, since n A ≥ n/2, uniformly. This is the case, for instance, if one has a complete grammar. On the other hand, notice that τ(A) < n A . Hence, if τ(A) > 2n, we take w = n and get φ( Remark 3. Naturally, the statements under ψ-mixing are less general, but have smaller error terms. The error term is the same for t > [2µ(A)] −1 for both hitting and return times' approximations. The difference is for small t's, due to the correlation arising from the conditional measure.
process is trivially ψ-mixing with function ψ identically zero. Thus, Theorem 1 states that the error for small t's is ψ (A n ) = np n . On the other hand, by direct substitution using the above facts, we have for each n ≥ 2: which implies that the exact error in the approximation for the return time at n − 1 is of order p n n, just as stated by Theorem 1.

Remark 6.
The reader may notice a difference between Theorem 1 and Theorem 4.1 of [6] concerning the error term for small t's for return time approximation. Indeed, their statement is incorrect as shown by the preceding example. We recall that the error term for small t's plays a fundamental role when studying the return time spectrum, as was done by [15]. Theorem 1, besides correcting [6], is also fundamental to correct [15], which was based on the exponential approximations given by [6].

Remark 7.
As an example where Theorem 1 can be applied while Theorem 4.1 of [6] cannot, let us consider the house of cards Markov chain defined in the previous subsection. We refer to [7] where it was explained that, if q i ≥ , i ≥ 1 for some > 0, there exists a stationary process (X m ) m∈N with matrix Q, and it is φ-mixing with exponentially decaying φ(n). Therefore, according to Remark 1, it fits into the conditions of Theorem 1. However, since it is defined on an infinite alphabet and does not have a complete grammar, such processes are not covered by [6].

Uniform Positivity of the Potential Well
Theorem 1 says that the potential well can be used as the scaling parameter to obtain approximations for recurrence times around any point. We now ask about the possible values of this scaling parameter in its range [0, 1].
Abadi and Saussol [29], in the more general case known up to now, proved that for α-mixing processes with at least polynomially decaying α rates, the distribution of the hitting and return time converges, almost surely, to an exponential with parameter of one. We refer to [8] for the precise definition of α-mixing, but the only important point for us it to know that summable φ-mixing implies α-mixing with at least polynomially decaying α rates. This fact, combined with Theorem 1, proves, indirectly, that for summable φ-mixing processes, the potential well converges almost surely to one, since both theorems must agree on the limiting distribution under these conditions. Theorem 2 Item (a) below states that the same holds for φ-mixing without any assumption on the rate.
On the other hand, for the renewal processes, with a certain tail distribution for the inter-arrival times, Abadi, Cardeño, and Gallo [7] proved that for the point x = (00000...), the sequence of potential wells ρ(x n−1 0 ) converges to zero. In this case, the scaling parameter has a predominant role, indicating the drastic change of scale of the occurrence of events. For instance, in this case, the mean hitting time is much larger than the mean return time: Such renewal processes are β-mixing (see [8] for the definition). Theorem 2 Item (b) below states that this cannot happen for ψ-mixing processes or summable φ-mixing processes. If the alphabet, A is finite; the set {ρ(A) : A ∈ A n , n < n 1 } is finite and has a strictly positive infimum, which implies that the infimum above can be taken over all n ≥ 1.

Proofs of the Results
The statement of Theorem 1 is for φ and ψ and for hitting and return times. The case of return times under φ-mixing was already done by [6]. Our proof follows their method. In particular for the next subsection that lists a sequence of auxiliary results, some of them are not proven.

Preliminary Results
The following lemma plays a fundamental role in Theorems 1 and 2. It was originally proven in [25] assuming the summability of the function φ, an assumption that can be dropped.

Lemma 1.
Let µ be a φ-mixing measure. Then, there exists positive constants C and c such that for all n ≥ 1 and all A ∈ A n , one has: Proof. We denote by λ = sup{µ(a) : a ∈ A} < 1. Consider a positive integer k 0 , and for all n ≥ k 0 , write n = k 0 q + r, with 1 ≤ q ∈ N and 0 ≤ r < k 0 . Suppose A = a n−1 0 , and apply the φ-mixing property to obtain: Iterating this argument, one concludes: This covers the case n ≥ k 0 . By eventually enlarging the constant C, one covers the case n < k 0 . This ends the proof.
For a complete grammar, one has τ(A n (x)) ≤ n. Since we do not assume this, we need the following lemma, which provides upper bounds for τ(A) when µ is ψ-mixing or summable φ-mixing.

Lemma 2.
Consider µ a ψ-mixing or summable φ-mixing measure. Then, there exists n 2 ∈ N such that for all n ≥ n 2 and A ∈ A n , • τ(A) ≤ 2n, for ψ; Proof. We start with the case ψ. For n large enough, we have ψ(n) < 1, which implies: Now consider the φ-mixing case. The summability of φ ensures that for g large enough, we have φ(g) ≤ 1/(g ln g). Thus: Take g = − 2

µ(A) ln µ(A)
. The rightmost parenthesis above becomes: which is positive for n large enough.
The multiplicative constant of two in both cases is technical and was chosen for the simplicity of the proof. Actually, it can be replaced by any constant strictly larger than one.
An irreducible aperiodic finite state Markov chain with some entry equal to zero shows that this constant cannot be taken equal to one in the ψ-mixing case. Whether this bound is optimal for the φ-mixing case is an open question. Note that Lemmas 1 and 2 imply that

τ(A)µ(A)
n −→ 0 uniformly. The remaining results of this subsection hold for n ≥ n , where n = 1 for the case of φ-mixing and: for the ψ-mixing case (see (1) for the definition of g 0 ). Let us define: M := ψ(g 0 + 1) + 1.

Proposition 1.
Let µ be a ψ-mixing measure. Then, for all n ≥ n , A ∈ A n and k ≥ n A , the following inequality holds: Proof. By definition of n A , we first note that Consider the case in which n A ≤ n + g 0 . In this case, for j ≥ n A , one trivially has: Thus: Note that A and the union on the right hand-side in the above inequality are separated by a gap of length g 0 + 1. By ψ-mixing, one concludes that the left-hand side is bounded by: For n A > n + g 0 , recall first the convention in Subsection 3.2, which states that µ A (n A −g 0 ) = µ(A). In a similar way to the first case, For the next proposition, recall that stands either for ψ (2) or for φ (3), according to the mixing property of the measure under consideration. Further, let us use the notation T A [i] := T A • σ i .

Proposition 2.
Let µ be a φ or ψ-mixing measure. Then, for all n ≥ n , A ∈ A n and t ≥ τ(A): where C = 4 for φ and C = 4(M + 1) for ψ .
Proof. The proof for φ can be found in Proposition 4.1 Item (b) of [6], and it remains valid even for a non-complete grammar and infinite alphabet. We observe that the error term defined therein is: which justifies C = 4 for this case.
Here, we prove the case ψ in the same way. We start by assuming that t ≥ τ(A) + 2n. By the triangle inequality: For the first modulus, by inclusion of sets: If n A > τ(A) + 2n, the last term is equal to zero. Otherwise, we apply Proposition 1 to obtain: where the last inequality follows from Lemma 2. By ψ-mixing, the modulus (6) is bounded by: Note that the modulus is not needed for (7), and by inclusion: where the equality and second inequality follow from the stationarity of µ. Therefore, for t ≥ τ(A) + 2n, the sum of (5), (6) and (7) is bounded by: We now consider the case where τ(A) ≤ t < τ(A) + 2n. We have: The lastinequality follows from (8). The other inequalities are straightforward. This ends the proof.
The next lemma establishes upper bounds for the tail distribution at the scale given by Kac's lemma, namely 1/µ(A). For technical reasons, we actually choose the scale:

Lemma 3.
Let µ be a stationary measure. Then, for all n ≥ 1, A ∈ A n , positive integer k, and B ∈ F ∞ k f A , the following inequalities hold:

Proof. We start by observing that {T
. Thus, applying the ψ-mixing property, we get: Furthermore: and then apply (9) with k − 1 instead of k to get: The equality follows by stationarity. Iterating this argument, one concludes that: Applying the resulting inequality in (9), we get Statement (a). In a similar way, φ-mixing gives: Thus: which ends the proof of (b). The proof for (c) follows the same lines as Item (a), by observing that for A, B ∈ F i 0 and C ∈ F ∞ i+n , the ψ-mixing property implies µ A (B; C) ≤ µ A (B)µ(C)(ψ(n) + 1). The next proposition is the key to the proof of Theorem 1, and the idea is the following. We work under the time scale f A . When t = k f A , k ∈ N, then we simply cut t into k pieces of equal size f A . Then, the case of general t = k f A + r, r < f A is approximated by its integer part k f A . Technically, this is done in (b) and (a), respectively. Proposition 3. Let µ be a φ or ψ-mixing measure. Then, for all n ≥ n , A ∈ A n and positive integer k, the following inequalities hold: (a) For 0 ≤ r ≤ f A : for the cases involving ψ and C = 4 for φ.
For the proof of (a)-3, we write a similar triangle inequality as above: Then, we follow the same as we did for (a)-1, but applying Item (c) of Lemma 3 and using the ψ-mixing property: where A, B ∈ F i 0 and C ∈ F ∞ i+n . For the case r < 2n, we use: and proceed as we did in (13), applying again Lemma 3-(c). This ends Item (a).
We now come to the proof of Items (b)-1 and (b)-2. For k = 1, we have an equality. For k ≥ 2, we get: We put r = f A in Item (a)-1 to obtain (b)-1: Furthermore, we get the inequality (b)-2, under φ-mixing, proceeding similarly as above: Finally, we prove (b)-3 applying (a)-3 as follows: The next two lemmas are classical results and are stated without proof. The first one establishes the reversibility of certain sets for stationary measures, and the second one is a discrete version of the mean value theorem, which follows with a straightforward computation.
Lemma 5. Given a 1 , ..., a n , b 1 , ..., b n real numbers such that 0 ≤ a i , b i ≤ 1, the following inequality holds: Recall the definition of n in (4). The proof of Theorem 1 holds for all n ≥ n 0 , where n 0 is explicitly given by: which is finite since sup A∈A n µ(A)τ(A) n −→ 0. Then, in particular, we have τ(A) < f A for all n ≥ n 0 and A ∈ A n .

Proofs of the Statements for Small t's
Here, we assume that Proof of hitting time, φ and ψ together. Recall that (A) denotes φ (A) or ψ (A), depending on whether the measure is φ or ψ-mixing.
Applying the inequality |1 − e −x | ≤ x for x ≥ 0, we obtain the statement for 1 ≤ t ≤ τ(A) as follows: We consider now the case τ(A) < t ≤ f A . For positive i ∈ N, define: Then: where we used Lemma 4 in the last equality. Thus, for τ(A) < t ≤ f A , we apply (25) to obtain: where the two inequalities follow from Lemma 5 and (24). On the other hand, by the triangle inequality: Since |1 − x − e −x | ≤ x 2 2 for all 0 ≤ x ≤ 1, by doing x = ρ(A)µ(A), we get: Furthermore, still for τ(A) + 1 ≤ i ≤ f A + 1, Proposition 2 gives us: where, for the last inequality, we used: Thus, applying (27), we obtain for τ(A) + 1 ≤ i ≤ f A + 1: Therefore, (26) and (28) give us: which concludes the statement of Theorem 1 for hitting time at small t's (with either φ or ψ).
Proof for return time, φ and ψ together. We first note that the statement is trivial for t = τ(A), then we consider t > τ(A). By definition, we have µ A (T A > t) = p t+1 µ(T A > t).
Then, we use again the triangle inequality to write: As we saw before, the first modulus above is bounded by 2C (A). On the other hand, we apply (25) to obtain for τ(A) < t ≤ f A : This is bounded, applying Lemma 5, by: where the last inequality follows from (26) and (28). Finally, notice that tµ(A) ≤ f A µ(A) = 1/2 and τ(A)µ(A) ≤ 2 (A) (use Lemma 2 for ψ). Therefore, we obtain from (30): This concludes the statement of Theorem 1 for the return time at small t's (with either φ or ψ).

Proof of the Statements for Large t's
The proof for the return time for t > f A was given in [6] under φ-mixing, a finite alphabet, and a complete grammar. The proof still holds if one just assumes a countable alphabet and an incomplete grammar (recall Remark 2 for the uniform convergence to zero of the error term φ ). Thus, we focus on the hitting time under each mixing assumption and the return time only under ψ-mixing.
Proof of Theorem 1 for hitting times, for t > f A . Write t = k f A + r with integer k ≥ 1 and 0 ≤ r < f A . Thus, we have: In order to get an upper bound for the sum of (32) and (33), we analyse the ψ and φ cases separately and start with the ψ-mixing. Applying Items (a)-1 and (b)-1 of Proposition 3, that sum is bounded by: where the last two inequalities are justified by µ( On the other hand, applying (29) with t = f A − 2n, we get: where we use τ(A)µ(A) ≤ 2 ψ (A). Furthermore, by the Mean Value Theorem (MVT): since for n ≥ n 0 , we have 2nµ(A) ≤ 2 sup µ(A)τ(A) ≤ 1. Thus, it follows that: Therefore: Since e x − 1 ≥ x ∀x ∈ R, by doing K = (5C + 27/4)e 1/2 , we get: Now, using that k = 2µ(A)(t − r), we have: where the last inequality follows from e µ(A)r ≤ e µ(A) f A . Therefore, it follows from (36) that the sum of (32) and (33) is bounded by: We now turn to the case of φ-mixing. We apply Items (a)-2 and (b)-2 of Proposition 3 to get an upper bound for the sum of (32) and (33): Similarly to the ψ-mixing case, one obtains: which implies in the φ-mixing case that the sum of (32) and (33) is bounded by: Now, we will treat the cases ψ and φ together to obtain upper bounds for (34) and (35). In order to get an upper bound for (34), we apply (29) with t = f A : Thus, applying Lemma 5, we have: The max is bounded using (39) by: Naturally, the absolute value is also bounded by using (39), and we get that the above sum is bounded above by: Recalling that k = 2µ(A)(t − r) and proceeding as we did for (37) and (38), we get the following upper bound for (34): To conclude the proof for the hitting time, we apply (29) with t = r to bound (35) as follows: where the term µ(A)t follows from 1 = 2µ(A) f A ≤ 2µ(A)t.
Proof of Theorem 1 for the return time, for t > f A and under ψ-mixing. We use again the triangle inequality to write: Applying Items (a)-3 and (b)-3 of Proposition 3, the sum of (41) and (42) is bounded by: Replacing k by k − 1 in (38), the last term is bounded above by: On the other hand, Lemma 5 gives us: The last sum is bounded by (5C + 5/4)(k − 1) ψ (A) using (39). On the other hand, applying (31) with t = f A and the MVT, we obtain: and by (39), we get: Finally, by doing t = r in (29) and applying the MVT once again, we get:
Therefore, by (49) and (50), the summability of φ concludes the proof for the φ-mixing case. If µ is ψ-mixing, we separate the sum in (48) into three parts. First, recall the definition of g 0 in Section 3.2. For 1 ≤ j ≤ g 0 , we bound the sum as follows: For g 0 + 1 ≤ j ≤ g 0 + n − 1, we have by ψ-mixing: where we denoted = j − g 0 . Thus: Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.