Exchangeably Weighted Bootstraps of General Markov U -Process

: We explore an exchangeably weighted bootstrap of the general function-indexed empirical U -processes in the Markov setting, which is a natural higher-order generalization of the weighted bootstrap empirical processes. As a result of our ﬁndings, a considerable variety of bootstrap resampling strategies arise. This paper aims to provide theoretical justiﬁcations for the exchangeably weighted bootstrap consistency in the Markov setup. General structural conditions on the classes of functions (possibly unbounded) and the underlying distributions are required to establish our results. This paper provides the ﬁrst general theoretical study of the bootstrap of the empirical U -processes in the Markov setting. Potential applications include the symmetry test, Kendall’s tau and the test of independence.


Introduction
U-statistics are a class of estimators, initially explored in association with unbiased estimators by [1] and officially introduced by [2], and are defined as follows: let {X i } ∞ i=1 be a sequence of random variables defined on a measurable space (E, E ), and let h : E m → R be a measurable function, the U-statistics of order m and kernel h based on the sequence {X i } are where I m n = (i 1 , . . . , i m ) : i j ∈ N, 1 ≤ i j ≤ n, i j = i k if j = k . The empirical variance, Gini's mean difference or Kendall's rank correlation coefficient are common examples of U-estimators, while a classical test based on a U-statistic is Wilcoxon's signed rank test for the hypothesis of the location at zero (see, e.g., [3], Example 12.4).The authors in [1,2,4] provided, amongst others, the first asymptotic results for the case in which the underlying random variables have independent and identical distributions. Extensive literature works have treated the theory of U-statistics, for instance, see [5][6][7][8], etc. Complex statistical issues are also amenable to being solved using U-processes. Examples include tests for goodness-of-fit, nonparametric regression and density estimation. U-processes are a set of U-statistics that are indexed by a family of kernels. U-processes might be viewed as infinite-dimensional variants of U-statistics with a single kernel function or as nonlinear stochastic extensions of empirical processes. Both thoughts have the following advantages: first, considering a large group of statistics rather than a single statistic is more statistically interesting. Second, we may use ideas from the theory of empirical processes to construct limit or approximation theorems for U-processes. Nevertheless, achieving results in U-processes is not easy. Extending U-statistics to U-processes necessitates a significant effort and distinct methodologies; generalizing empirical processes to U-processes is quite challenging, especially when U-processes are presented in the stationary setting. We highlight that the U-processes are used often in statistics, such as when higher order terms are a part of von Mises expansions. Particularly, the study of estimators (including function estimators) with various smoothness degrees involves U-statistics. For instance, Ref. [9] applied almost-sure uniform bounds for P-canonical U-processes to analyze the product limit estimator for truncated data. Two new tests for normality based on U-processes were also presented in [10]. Inspired by [11][12][13], they developed other tests for normality that employed weighted L 1 -distances between the standard normal density and local U-statistics based on standardized observations as test statistics. Estimating the mean of multivariate functions in the case of possibly heavy-tailed distributions was explored by [14]; they presented the median-of-means too, and both explorations were based on U-statistics. Moreover, other researchers emphasized the importance of U-processes; [15][16][17] used them for testing qualitative features of functions in nonparametric statistics, [18] represented the cross-validation for density estimation using U-statistics, in addition to [6,7,19], where the authors established limiting distributions of M-estimators. Since then, this discipline has made significant advancements, and the results have been broadly interpreted. Asymptotic behaviors were demonstrated under weak dependence assumptions, for example, in the works of [20][21][22] or more recently in [23] as well as more generally in [24,25]. However, in practice, explicit computation is not always possible due to the complexity of the Uprocesses' limiting distributions or their functionals. We suggest a general bootstrap of the U-processes in the Markov setting to solve this issue, which is a challenging problem. The concept of the bootstrap, given by [26], in the case of independent and identically distributed (iid) random variables, is to resample from an original sample of observations of an unknown marginal distribution function F(x), X 1 , . . . , X n , a new i.i.d sample X * 1 , . . . , X * n with the marginal distribution function F n (x), which represents the empirical distribution function constructed from the original sample. Moreover, it is commonly known that the bootstrap approach gives a better approximation to the statistic's distribution, mainly when the sample size is small, [27]. Bootstraps for U-statistics of independent observations were studied by [28][29][30][31]. However, the bootstrap technique is not the same for dependent variables because the dependence structure cannot be conserved in the new sample. For this reason, other blockwise bootstrap methods were introduced, aiming to keep the structure of dependence. Among those methods, we can cite the circular block bootstrap introduced by [32] and the nonoverlapping block bootstrap introduced by [33]. In [34], the authors proposed a bootstrap method related to the weakly dependent stationary observation, the stationary bootstrap. This latter can be seen as an expansion of the circular block bootstrap, where a random variable, such as a geometric random variable, can be used for the block length. It is important to note that Efron's initial bootstrap formulation (see [26]) had a few flaws. To be more precise, certain observations might be sampled several times while others might not be at all. A more generalized version of the bootstrap, the weighted bootstrap, was developed to get around this issue and was also demonstrated to be computationally more appealing in some applications. This resampling strategy was initially described in [35] and thoroughly investigated by [28], who coined the name "weighted bootstrap". For example, Bayesian bootstrap when the weighted vector (ξ n1 , . . . , ξ nn ) = (M n1 , . . . , M nn ), is equal to the vector of n spacings of n − 1 ordered uniform (0, 1) random variables in distributions, that is, (M n1 , . . . , M nn ) follows a Dirichlet distribution of parameters (n; 1, . . . , 1). For more details, see [36]. This diversity of resampling approaches necessitates the use of a uniform approach, commonly known as general weighted resampling, which was first described by [37] and has since been developed by [38,39]. In [40], the authors investigated the almost-sure rate of convergence of strong approximation for the weighted bootstrap process by a sequence of Brownian bridge processes; refer to [41] for the multivariate setting and [42] for recent references. The concept of the generalized bootstrap, introduced by [37], was extended to the class of nondegenerate U-statistics of degree two and the corresponding Studentized U-statistics by [43]; refer to [44,45]. In [46], the author generalized this theory for a higher order. In his work, he developed a multiplier inequality of a U-process for i.i.d. random variables. We mention that the multiplier processes' theory is directly and strongly related to the symmetrization inequalities investigated by [6,7]. This paper aims to investigate the exchangeable bootstrap for U-processes in the same way that [46] did but without the restriction of the independence setting. The previous reference focused on U-processes in an independent framework, whereas this paper considers U-processes in the dependent setting of Markov chains. We believe we are the first to present a successful consideration in this general context. We combine the techniques of the renewal bootstrap with the randomly weighted bootstrap in a nontrivial way. We mention a connection between moving-blocks bootstrap and its modification, matched-block bootstrap, at this point. Instead of artificially splitting a sample into fixedsize blocks and then resampling them, the latter seeks to match the blocks to create a smoother transition; for more information, see [47]. The main difficulties in proving Theorem 3 are due to the random size of the resampled blocks. This randomness generates a problem with the random stopping times, which cannot be removed by replacing a random stopping time with its expectation. In the present setting, the bootstrap random variables are generated by resampling from a random number of blocks. One can think that using the conditioning arguments can overcome the problem, but the answer is negative. Our proof uses some arguments from [46,47] by verifying bootstrap stochastic equicontinuity by comparing it to the original process in a similar way as in [48]. However, as we shall see later, integrating concepts from these papers is not enough to solve the problem. To deal with U-processes in the Markov framework, sophisticated mathematical derivations are necessary. We present the first complete theoretical justification of the bootstrap consistency. This justification requires the efficient use of large sample theoretical approaches established for U-empirical processes.
The rest of this paper is organized as follows. Section 2 is devoted to the introduction of the Markov framework, the U−process, the bootstrap weights and the definitions needed in our work. In Section 3, we recall the necessary ingredient for U-statistics and U-processes in the Markov setting. Furthermore, we provide some asymptotic results including the weak convergence of U-processes in Theorem 1. In Section 4, we derive the main results concerning the bootstrap of the U-processes. In Section 5, we collect some examples of weighted U-statistics. Some concluding remarks and possible future developments are relegated to Section 6. To prevent the interruption of the flow of the presentation, all proofs are gathered in Section 7. Appendix A contains a few pertinent technical findings and proofs.

Notation and Definitions
In what follows, we aim to properly define our settings. For this reason, we have collected the definitions and notation needed.

Markov Chain
Let X = (X n ) n∈N be an homogeneous ψ-irreducible Markov chain, that means that the chain has stationary transition probabilities, defined on a measurable space (E, E ), where E is a separable σ-algebra. Let π(x, dy) be the transition probability and ν = ν(i) i>0 the initial probability. Therefore, we denote by P ν or just P the probability measure for P = (π, ν). Likewise, E ν denotes the integration with respect to P ν . In our framework, let P x be a probability measure such that X 0 = x, X 0 ∈ E and E x (·) is the P x -expectation. We further assume that the Markov chain is Harris positive recurrent with an atom A.

Definition 1 (Harris recurrent).
A Markov chain X = (X n ) n∈N is said to be Harris recurrent if there exists a σ-finite measure such that, for ψ a positive measure on a countable generated measurable space (E, E ), ψ(E) > 0 and if for all B ∈ E with ψ(B) > 0 Recall that a chain is positive Harris recurrent and aperiodic if and only if it is ergodic ([49] Proposition 6.3), i.e., there exists a probability measure π, called the stationary distribution, such that, in total variation distance, lim n→+∞ P n (x, ·) − π tv = 0. Definition 2 (Small sets). A set S ∈ E is said to be Ψ-small if there exists δ > 0, a positive probability measure Ψ supported by S and an integer m ∈ N * , such that (1) where T 0 is the hitting time of A by the m-step chain, roughly speaking,

Definition 4.
A ψ-irreducible aperiodic chain X is called regenerative or atomic if there exists a measurable set A called an atom, in such a way that ψ(A) > 0 and for all(x, y) ∈ A 2 we have P(x, ·) = P(y, ·). Roughly speaking, an atom is a set on which the transition probabilities are the same. If a finite number of states or subsets are visited from the chain, then any state or any subset of the states is actually an atom.

Definition 5 (Aperiodicity).
Assuming ψ-irreducibility, there exists d ∈ N * and disjoints sets D 1 , . . . , D d (set D d +1 = D 1 ) positively weighted by ψ such that The period of the chain is the greatest common divisor d of such integers, it is said to be aperiodic if d = 1.
Definition 6 (Irreducibility). The chain is ψ-irreducible if there exists a σ-finite measure ψ such that, for all set B ∈ E , when ψ(B) > 0, for any x ∈ E, there exists n > 0 such that P n (x, B) > 0.
One of the most important properties of Harris recurrent Markov chains is the existence of an invariant distribution which we is called µ (a limiting probability distribution, also called occupation measure). Furthermore, Harris recurrent Markov chains can always be embedded in a certain Markov chain on an extended sample space with a recurrent atom. The existence of a recurrent atom A gives an immediate consequence for the construction of a regenerative extension of this chain. The time that the chain hits a given atom (recurrent state) is seen as the regenerative time. In [50,51], the authors give the construction of such a regenerative extension. The development of a regenerative extension makes the use of regenerative techniques possible in order to study this type of Markov chain. As we mentioned above, we assume in this work that the Harris recurrent chain is atomic, i.e., the set which is infinitely almost sure is well-defined and accessible, this set A is called an atom. By definition, an atom A is a set in E , where µ(A) > 0, and for all x, y ∈ A, π(x, ·) = π(y, ·). Let P A (respectively, E A ) be the probability measure on the underlying space such that x ∈ A (respectively, the P A -expectation).
The conditions imposed on the Markov chain ensure that the defined atom A (or the constructed one in the case of a nonatomic chain) is one recurrent class, and let us define the following terms.

Hitting Times
A well-known property of the hitting time is that for all j ∈ N, T j < ∞, P ν − a.s ( [52], chap. I14).

Renewal Times
Using the hitting times, we can define the renewal times as Similar to the regenerative process, the sequence of renewal times {τ(j)} ∞ j=1 is i.i.d. and it is independent of the choice of the initial probability. All over this work, we set τ = τ(1) and α = E A (τ). Definition 7 (Strong Markov property). Let (X n ) n≥0 be a Markov chain, with T the stopping time of (X n ) n≥0 . Then, conditionally on T < ∞ and X T = i, (X T+n ) n≥0 is a sequence of a Markov chain and is independent of X 0 , . . . , X T .

Regenerative Blocks
Let l n := max{j : ∑ j i=0 τ(j) ≤ n} be the number of visits to the atom A. Using the strong property of a Markov chain, it is possible to divide the given sample (X 1 , . . . , X n ) into a sequence of blocks {B j } l n j=0 such that: E n , for all j = 1, · · · , l n , where l n is the total number of blocks. The length of each block is denoted by

The U-Process Framework
Let (X n ) n∈N be a sequence of random variables with values in a measurable space (E, E ). Let h : E m → R be a measurable function symmetric in its arguments. The U-statistic of order (or degree) m and kernel h(·) is defined as: Accordingly, a U-process is the collection {U n (h) : h ∈ F }, where F is the class of kernels h(·) of m variables. The decoupling inequality of U-statistics and U-processes plays a central role in the latest developments in the asymptotic theory. As a result, the decoupling inequality can give a relation between the quantities where Φ(·) is a non-negative function and {X k i }, k = 1, . . . , m are independent copies of the original sequence {X i }. One of the useful reasons for decoupling is randomization, which is frequently used in the study of the asymptotic theory of U-statistics, and was studied by [6,7]. The main idea of randomization is to compare the tail probabilities or moments of the original U-statistic or process, ∑ I m n h(X i 1 , . . . , X i m ), with the tail probabilities or moments of the statistic where ε i are independent Rademacher variables, independent from X i , 1 ≤ r ≤ m and the variables depend on the degree of degeneracy (centering) of the kernel h(·).

Definition 9 ([6]).
A symmetric P m -integrable kernel h : E m → R is P-degenerate of order r − 1 if and only if h(x 1 , . . . , x m )dP m−r+1 (x r , . . . , x m ) = hdP m holds for any x 1 , . . . , is not a constant function. If h is furthermore P m -centered, that is, P m f = 0, we write h ∈ L c,r 2 (P m ). For notational simplicity, we usually write L c,m 2 (P m ) = L c,m 2 (P).
Moreover, h(·) is said to be canonical or completely degenerated if the integral with respect to one variable is equal to zero, i.e., The fact that the kernel is completely degenerate with the condition P m h 2 < ∞ is used for the orthogonality of the different elements of the Hoeffding decomposition of the U-statistics.
Definition 10 (Covering number). The covering number N p (ε, Q, F ) is defined as the minimal number of balls with radius ε that are needed to cover a class of functions F in the norm L p (Q), where Q is the measure on E with finite support.
We can associate some distances e n,p to the covering numbers, where e n,p = (U n (| f − g| p )) 1/p .
In this work, we use the two distances defined afterward For decoupled statistics, we also associate covering numbers, well-known as N( , F , e n,p ) and a distance, which can be defined for p = 2 as follows:

Definition 11.
A class F of measurable functions E → R is said to be of VC type (or Vapnik-Chervonenkis type) for an envelope F and admissible characteristic (C, v) (positive constants) such that C ≥ (3 √ e) v and v ≥ 1, if for all probability measure Q on (E, E ) with 0 < F L 2 (Q) < ∞ and every 0 < < 1, We assume that the class is countable to avoid measurability issues (the noncountable case may be handled similarly by using an outer probability and additional measurability assumptions, see [53]).
Definition 12 (Stochastic equicontinuity, ([54])). Let {Z n } be a sequence of stochastic processes. Call {Z n } stochastically equicontinuous at t 0 if for each δ > 0, there exists a neighborhood D of t 0 such that In the context of the U-process {U n }, the stochastic equicontinuity at a function g ∈ F implies generally that |U n (h) − U n (g)| should be uniformly small for all h(·) close enough to g(·), with high probability and for all n large enough.

Gaussian Chaos Process
Definition 13. Let H denote a real separable Hilbert space with scalar product ·, · H . We say that a stochastic process G = {G P (h), h ∈ H} defined in a complete probability space (E, E , P) is an isonormal Gaussian process (or a Gaussian process on H ) if G P is a centered Gaussian family of random variables such that E(G P (h)G P (g)) = h, g H for all h, g ∈ H.
Define the mapping h → G P (h). Under the assumption mentioned above, this map is linear and it provides a linear isometry of H onto a closed subspace L 2 (E, E , P) which contains a zero mean Gaussian random variables as its elements. Let K P be the isonormal Gaussian chaos process associated with G P determined by: and R m is a polynomial defined as a sum of monomials of degree m; [6] give us a simple expression of this polynomial, extracted from Newton's identity given by Hence, by the continuous mapping theorem, we can see that CLT and LLN give: Under the linearity of the kernel, we only need to show that: to hold the weak convergence. The limit K P is useful in the case of degenerate U-statistics and it provides a convergence of all moments, which in turn plays a crucial role because it is due to the hypercontractivity, which makes the uniform integrability better. For a good explanation of K P , readers are invited to see ( [6] Chapter 4, Section 4.2).

Technical Assumptions
For our results, we need the following assumptions.

Preliminary Results
A significant issue was detected in recovering the estimation of our parameter of interest using the U-process. The given shape of this parameter is as follows: where h : E m → R is a kernel function. The estimation of this parameter should be possible using the U-statistics of the form: As the parameter of interest is defined and based on Kac's theorem for the occupation measure, µ(h) in the regeneration setup can be written as follows: In the Markovian context and since the variables are not independent, the approximation related to the i.i.d. blocks and the regenerative case is introduced below: We define the regenerative kernel ω h : T m → R as follows: It is not necessary that the kernel ω h (·) be symmetric, as soon as h(·). In fact, we can use the symmetrization of S m ω h in the following way where the first sum is over all permutations σ = {i 1 , . . . , i m } of {1, . . . , m}. Next, we consider the U-statistic formed by the regenerative data.
The regenerative U-statistic associated with the sequence of regenerative blocks {B j } L j=1 , generated by the Markov chain is given by Hence, R l n (h) is a standard U-statistic with mean zero.

Proposition 1. Let us define
Then, under conditions (C.1), (C.2), (C.3) and (C.4), we have the following stochastic convergences: Before stating the weak convergence in the next theorem, we define the corresponding U-processes related to the U-statistic U n and the regenerative U-statistic R L , respectively: Theorem 1. Let (X n ) n be a positive recurrent Harris Markov chain, with an accessible atom A, X n satisfies the conditions (C.1) and (C.2) (moments assumptions), (C.3), (C.4), (C.5) and, for a fixed γ > 0, E(τ) 2+γ < ∞. Let F be a uniform bounded class of functions with an envelope H square-integrable such that: Then, the process Z n converges weakly in probability under P ν to a Gaussian process G P indexed by F whose sample paths are bounded and uniformly continuous with respect to the metric L 2 (µ).

The Bootstrapped U-Processes
Trying to facilitate the bootstrap technique, we write the detailed steps of the regenerative block construction and the weighted bootstrap method in Algorithm 1: Algorithm 1 Regenerative block and weighted bootstrap construction.

1.
Identify the number of visits l n = ∑ n i=0 1 X i ∈A to the atom A.

2.
Divide the sample Drop the first and the last blocks if τ l n < n to avoid bias.

4.
Let ξ = (ξ i,l n , i = 1, . . . , n) be a triangular array of random variables. Define the weighted bootstrap empirical measure from the data: In what follows, we denote by P * and E * , respectively, the conditional probability and the conditional expectation given the sample {X 1 , . . . , X n }. The same notation is used for the sample {B 1 , . . . , B L n }. Define the bootstrapped U-statistic as and the regenerative bootstrapping and the U-processes are: and T * l n := l n m Given ∆ n , a real-valued function, defined on the product probability space, we say that ∆ n is of an order o o We must comment here that the bootstrap works in probability if and g T * l n • is the measurable envelope of g T * l n . In addition, for any measurable random elements, Y n and Y, the convergence in law of Y n to Y is in the sense of Hoffman-Jorgensen, which is defined as for g bounded and continuous. This weak convergence is metrizable by Theorem A1 in Appendix A.

Proposition 2. Suppose that the bootstrap weights
Then, we have The proof of Proposition 2 is postponed until Section 7. Now, in the following lemma, there are some instrumental results needed later.

Lemma 1.
Let (X n ) n be a Markov chain defined in 2.1. Define p := P(X 0 ∈ A) = α −1 . Then, for any initial probability ν, we have: (i) For some η > 0 and C > 0: (ii) n * n → 1 in P ν × P ξ -probability . (iii) Let X i be a sequence of random variables. If then for any integer t n valued sequence of random variables, The proof of Lemma 1 is postponed until Section 7.

Weighted Bootstrap Weak Convergence
In this section, we extend some existing results concerning the multiplier U-process to prove the bootstrap uniform weak convergence. Most of these results can be found in [46], generalizing the empirical process work of [38] in the i.i.d. setting. The weak convergence is proved for degenerate U-processes, as we mentioned before, and under the weighted regenerative bootstrap schemes described in Algorithm 1. Before stating the weak convergence theorem, we recall the following important results. The next theorem, proved in [46], is a sharp multiplier inequality, which is essential in the study of the multiplier U-process. These results are based on the decoupling symmetrized U-process, a basic framework of U-statistics. In [47], the author solved these problems for the empirical process settings in the Markov setting (multinomial bootstrap), which we generalize to the U-process by considering more general weights, i.e., the exchangeable weighted bootstrap. Theorem 2 ([46]). Let (ξ 1 , . . . , ξ n ) be a random vector independent of (Y 1 , . . . , Y n ). Then, there exists some measurable function ψ n : R m ≥0 → R ≥0 such that the expected supremum of the decoupled (Here "decoupled" refers to the fact that {Y Furthermore, if there exists a concave and nondecreasing functionψ n : R → R such that ψ n ( 1 , . . . , m ) =ψ n ∏ m k=1 k , then Here, K > 0 is a constant depending on m only and can be taken as Lemma 2 ([46]). Let {F ( 1 ,..., m ),n : 1 ≤ 1 , . . . , m ≤ n, n ∈ N} be function classes such that F ( 1 ,..., m ),n ⊃ F (n,...,n),n for all 1 ≤ 1 , . . . , m ≤ n. Suppose that the ξ i 's have the same marginal distributions with ξ 1 2m,1 < ∞. Suppose that there exists some bounded measurable function a : R m(n) The main result of this paper is represented in the following theorem. It is worth noting here that it is not easy to prove the stochastic equicontinuity in the present setting as explained in the introduction.
Definition 16 (Permissible classes of function). Let (E, E , P) be a measurable space (E a Borel σ-field on E). Let F be a class of functions indexed by a parameter x that belongs to a set E. F is called permissible if it can be indexed by a E such that:

•
There exists a function g(x, f ) = f (x) defined from S × F to R in such a way that this function is L ⊗ B(F ) measurable function, where B(F ) is the Borel σ-algebra generated by the metric on F . • E is a Suslin measurable space whose mean E is an analytic subset of a compact metric space E from which it inherits its metric and Borel σ-field.
Theorem 3. Suppose Assumptions (A1) to (A4), and Conditions (C.1)-(C.5) hold. Let F ⊂ L c,m 2 (P) be permissible and admit a P m -square integrable envelope F such that where the supremum is taken over all discrete probability measures. Then, where c is the constant in (A3), and the convergence in probability → P ν is with respect to the outer probability of P ∞ defined on (E ∞ , E ∞ ).
The proof of Theorem 3 is postponed until Section 7.

The Delete h-Jackknife
In [62], the authors permute deterministic weights w n , where in order to build new bootstrap weights, and they defined the new weights ξ nj := w nR n (j) where R n (·) is a random permutation uniformly distributed over {1, . . . , n}. These weights are called the delete h-Jackknife. In order to achieve Assumption (A3), we must assume that h/n → α ∈ (0, 1), as c 2 = h/(n − h) and c > 0.

The Multivariate Hypergeometric Resampling Scheme
As its name indicates, the bootstrap weights of this scheme follow the multivariate hypergeometric distribution with density: where K is a positive integer. Assumption (A3) is satisfied with c 2 = (K − 1)/K.

Remark 2.
As was pointed out in [38], the preceding mentioned bootstraps are "smoother" in some way than the multinomial bootstrap because they place some (random) weight on all elements in the sample, whereas the multinomial bootstrap applies the positive weight at a proportion of about 1 − (1 − n −1 ) n → 1 − e −1 = 0.6322 of each element of the sample, on average. Notice that when ω i ∼ Gamma(4, 1), the ξ ni /n are equivalent to four spacings from a sample of 4n − 1 Uniform (0, 1) random variables. In [63,64], it was noticed that in addition to being four times more expensive to implement, the choice of four spacings depends on the functional of interest and is not universal.

Remark 3.
It is noteworthy that choosing the bootstrap weights ξ ni properly implies a smaller limit variance, that is, c 2 is smaller than 1.  (3)). A thorough treatment of the weight selection is undoubtedly outside the scope of the current work; for review, we refer the readers to [66].

Remark 4.
In the present paper, we considered a renewal type of bootstrap for atomic Markov chains under minimal moment conditions on renewal times. The atomic Markov chains assumption can be dropped by mimicking the ideas of [50,51] by introducing an artificial atom and deriving the bootstrap procedure that applies to nonatomic Markov chains. Precisely, in the case of a general irreducible chain X with a transition kernel Π(x, dy) satisfying a minorization condition: for an accessible measurable set S, a probability measure ψ and δ ∈]0, 1[ (note that such a minorization condition always holds for Π or an iterate when the chain is irreducible), an atomic extension (X, Y) of the chain may be explicitly constructed by the Nummelin splitting technique (see [49]) from the parameters (S, δ, ψ) and the transition probability Π, see for instance [47,67]. From a practical viewpoint, the size of the first block may be large compared to the size n of the whole trajectory, for instance, in the case where the expected return time to the (pseudo-)atom when starting with the initial probability distribution is large. The effective sample size for constructing the data blocks and the corresponding statistic is then dramatically reduced. However, in [68], some simulations were given together with examples including content-dependent storage systems and general AR models supporting the method discussed in this work.

Applications
Example 1 (Symmetry test). This example gives an application for the bootstrap U-statistics, inspired by the goodness-of-fit tests in [69], where they considered the symmetry test for the distribution of X t . Let (X t ) t∈N be a stationary mixing process with f X (·) the Lebesgue density. We test the hypothesis: , almost every were, The estimator of f X (u) is:f where K(·) is a kernel function and h n > 0 is a smoothing parameter or the bandwidth. An appropriate estimator of the integrated squared difference represent the symmetry test: According to [69], I can be estimated bŷ I n := 4 , for Y j ∈ X j , −X j . Clearly,Î n is a degenerate U-statistic with kernel varying with the sample size n. Thus, the stationary bootstrap test,Î * can be shown to have the same limit asÎ n .
Example 2 (Kendall's tau). The covariance matrix quantifies the linear dependency in a random vector. The rank correlation is another measure of the nonlinear dependency in a random vector. Two generic vectors y = (y 1 , y 2 ) and z = (z 1 , Then, Kendall's tau rank correlation coefficient matrix T = {τ mk } p m,k=1 is a matrix-valued Ustatistic with a bounded kernel. It is clear that τ mk quantifies the monotonic dependency between (X 1m , X 1k ) and (X 2m , X 2k ) and it is an unbiased estimator of that is, the probability that (X 1m , X 1k ) and (X 2m , X 2k ) are concordant.
The corresponding U-statistics may be used to test the independence.

Conclusions
The present paper was concerned with the randomly weighted bootstrap of the Uprocess in a Markov framework. A large number of bootstrap resampling schemes emerged as special cases of our setting, in particular, the multinomial bootstrap, which is the bestknown bootstrap scheme introduced by [26]. One of the main tools was the approximation of the Markov U-process by the corresponding regenerative one. We looked to mimic this result in Proposition 2, in order to approximate the weighted-bootstrap U-process U * n to the regenerative weighted-bootstrap U-process R * l n . Other technical arguments were given in Lemma 1 extended from the work of [47]. These intricate tools were used to reach the full independence of regenerative block variables by proving that a deterministic one could substitute the random size of blocks, which was the main problem for the extension of the bootstrap results to the Markov framework. After a lengthy proof to arrive at independence, we used the results of [46]. All the above steps led us to prove the weak convergence of the regenerative-block weighted-bootstrap U-process, which implied the weak convergence of the weighted-bootstrap U-process. It will be of interest to consider the extension of the paper to the semi-Markov setting. A more delicate problem is to consider the setting of incomplete data such as censored cases or missing data. To the best of our knowledge, this problem has not been considered, even for the original sample (without bootstrap) in the Markov framework. It would be interesting to extend our work to the case of the local stationary process, which requires nontrivial mathematics; this would go well beyond the scope of the present paper.

Mathematical Development
This section is devoted to the proof of our results. The previously defined notations continue to be used in what follows.

Proof of Proposition 2. We have
In a similar way, we have Making use of Proposition 1 and the law of large numbers, we infer that Hence, the proof is completed.
Proof of Lemma 1. The proof of part i) and part iii) follows from ([47] Lemma 3.1 and Lemma 3.2). In order to prove ii), we need to show that, for every > 0, which follows if, conditioned on the sample, We have: We denote by E * the expectation conditionally on X 1 , . . . , X n . By the fact that τ i are i.i.d. and using Chebyshev's inequality, we have: The last inequality follows using i), which implies that l n n → p and iii) where for E(ξ 2 1,l n ) < ∞. For I I we have: The last equality converges to zero by the fact that n/l n → α = E(τ) and by iii) This proves Lemma 1.

Proof of Theorem 3.
For the weak convergence, we need to show the finite-dimensional convergence and the asymptotic equicontinuity. According to Proposition 2 and [6], the finite-dimensional convergence is considered if, for every fixed finite collection of functions { f 1 , . . . , f k } ⊂ F , where K P is the Gaussian chaos process. According to Cramér-Wold and the countability of F , we only need to show that for any f ∈ L c,m 2 (P), for some bounded ψ q ∈ L c,1 2 (P). Fix > 0. Then, there exists Q ∈ N such that with The left-hand side of (25) can be further bounded by ≡ (I) + (I I) + (I I I).

. (Under Conditions (C.1) and (C.3).)
Hence we have lim sup n→∞ (I) m,ξ , a.s. (27) Now for the second term, we have: where R m is the polynomial of degree m (see [6], pp. 175): As we mentioned before, this polynomial follows from Newton's inequality and allows us to show a polynomial function as a sum of monomials. All we need now is to check each argument of this polynomial function. For = 1: We first recall the following lemma from [53].

Lemma 3 ([53]
). Let (a 1 , . . . , a n ) be a vector and (ξ 1 , . . . , ξ n ) be a vector of exchangeable random variables. Suppose that Applying Lemma 3 with a i ≡ ψ q (B i ) − P n ψ q and ξ i replaced by ξ R i − 1, we can see that A (1) where G P is a Gaussian process defined on L c,1 2 (P) with covariance EG P ( f )G P (g) = P( f g), for f , g ∈ L c,1 2 (P). Furthermore, The first inequality in the above display follows, since This shows that A (2) ϕ(n),q → P ξ c 2 Eψ 2 q a.s.

This shows that
Then, we have where K P is the Gaussian chaos process defined on (⊕ is the orthogonal sum in . Hence, it follows that, by linearity of K P , The last term in (26) follows from the definition of K P All these final results give the finite-dimensional convergence.
Now, we take a step-by-step approach to establish stochastic equicontinuity. We assume that the class of functions must be bounded, so we suppose that h ≤ H, for H an envelope. Throughout the following, we denote by Step 1 Let andT * In this step, we must prove that the stochastic equicontinuity of the U-process implies that of the regenerative U-process. This is a consequence of 1, and for the weighted bootstrap Proposition 2 and part ii) of Lemma 1.
Step 2 DefineT * Hypothesis: The stochastic equicontinuity ofT * l n implies the stochastic equicontinuity of T * l n .
Proof. In order to prove the previous implication, we only need to show that: Suppose that l n ≤ E(l n ), the opposite case can be treated in a similar way. We have However, |E(l n ) − l n | = O P ( √ n) by Lemma 1, part i). Then, the exists a constant K > 0, such that for every > 0, P * (|E(l n ) − l n | > n/4) < , and the first expression in the previous expression is bounded by: The last expression follows from the Montgomery-Smith inequality. Since the last expression matches the stochastic equicontinuity condition for T * l n . This proves this step.
Before passing to the next step, we introduce a new bootstrap sample. Definê B i := X T i−1 +1 , . . . , X T i for i = 1, . . . , E(l n ). Now, apply the weighted bootstrap proce- . This new procedure is the same as the old one for B i , but we aim here to replace the random quantity l n with a deterministic one, which is E(l n ).
Step 3 Define: Hypothesis: The stochastic equicontinuity of T * l n implies the stochastic equicontinuity of T * l n .
Proof. First case: l n ≤ E(l n ): In this case, all of the terms in the following computation should be multiplied with 1 (l n ≤E(l n )) . We leave it out to keep the already complex notation simple. Define , and has the same distribution as T * l n and (i 1 , . . . , i m ) ∈ I m E(l n ) . Hence, if we show that: then the stochastic equicontinuity of T * l n is established. However, we aim to approximate the one of T * l n . In order to achieve our goal, it is sufficient to estimate: Hence, where For n large enough, we need to show that there exists K > 0 such that As 1B i ∈A c n are i.i.d and bounded, therefore, we can find M > 0 such that However, by Lemma 1 i), then P * S * n > K √ n → 0.
Then, we only need to estimate the first part in (33). Define the following bootstrap procedure: let − → B i := X T i−1 +1 , . . . , X T i , 0, 0, . . . and let − → F be a class of function, related to the class of functions F , such that, for every − → ω h ∈ − → F : It is classical that { − → B i } are i.i.d., applying the same bootstrap method of Algorithm 1. This new sample allows us to enlarge and bound (33) by and the bracketing entropy number by N 1 (γ, F , P), which denotes the minimal number N ≥ 1 for which there exist functions f 1 , · · · , f N and f u 1 , · · · , f u N such that: Hence, we have the following inequalities Treating each term, keeping in mind Condition (A.1), i.e., ∑ n i=1 ξ i = n, we have Using the same argument as in part iii) of Lemma 1, we can prove that Then, it remains to find that, for every fixed γ > 0, N * (γ) is bounded in probability, as the last expression in (38) does not depend on k. It is interesting to note that N 1 (γ, − → H , P) is finite, due to the boundness of − → H by 2F with E − → F ( − → B ) < ∞ and the fact that − → B i are i.i.d. and discrete random variables. Under the norm L 1 (P), define γ/2 brackets h 1 , · · · h N(γ/2) and h u 1 , · · · , h u N(γ/2) . Observe that max j≤N(γ/2) converges to zero in probability, and N(γ/2) does not depend on n. That implies that N * (γ) ≤ N(γ/2) < ∞ in probability. Replacing h by h u , I A is identical to I B , i.e., I A also converges to zero in probability. This proves the convergence of I n to zero in probability.
For I I n : In the same manner, let Define a new bootstrap sample {B * * i } in i = l n + 1, . . . , E(l n ). Clearly, the new sample is well-defined since we assumed at the beginning that l n ≤ E(l n ), and it is defined independently from B * i andB * i . In this case: Hence, as in (33), we have: where Using the same bootstrap procedure defined previously for I n , let for i = l n + 1, . . . , E(l n ), and let − → F be a class of function such that, for every − → ω h ∈ − → F : It is classical that { − → B i } are i.i.d., applying the same bootstrap method of Algorithm 1. This new sample allows us to enlarge and bound (33) by Following the same arguments from Equations (37) through (38), we can find that (42) is Here, we must pay attention to the randomness of N * * which depends on n. According to Lemma 1 i), we can see that |E(l n ) − l n | → ∞ in probability, under the assumption that l n < E(l n ). Now, using the same treatment of I n , and for M n := n 1/3 ( to provide the convergence of M n to ∞), as in [47], this allows the convergence of (43) to zero in probability. Estimating now N * * by considering the same γ/2 brackets h 1 , · · · h N(γ/2) and h u 1 , · · · , h u N(γ/2) , we have N * * (γ) < N(γ/2), which does not depend on n. Then, I I n is proved. Following the same footsteps, we can prove the case where l n > E(l n ). This proves Step 3.
The end of the previous step yields that we only need to show the stochastic equicontinuity of T * l n , where the number of blocks is replaced by the deterministic quantity E(l n ). In order to achieve the equicontinuity of this statistic, Lemma 2 shows that it is sufficient to prove that: for all 1 ≤ 1 , . . . , m ≤ n. We begin to define the distance: Take f 2 ≡ e 2 ( f , 0) and Using Corollary A1, we have 1 Assuming that F ≥ 1, the upper bound in the integral can be replaced by r (δ). The following proposition is necessary for the following.
holds for any 1 ∧ · · · ∧ k → ∞. Here for = ( 1 , . . . , k ) and where H k is an envelope for π k H . Then, The above equation can be replaced by the decoupled version.
By this proposition, F → P F L 2 (P) as 1 ∧ . . . ∧ m → ∞, therefore, it suffices to get r (δ) → p 0 as 1 ∧ . . . ∧ m → ∞ and δ → 0. It is obvious that all that is left to do now is to demonstrate that Verifying condition (45) max 1≤j ≤k The shift from the second to the third line is true because As the condition is verified, as well as 1 ∧ · · · ∧ m → ∞, (46) follows directly using the previous proposition. Hence, there exists some sequence {a }, in a way that a → 0 for any sequence {δ } with δ → 0 both under 1 ∧ · · · ∧ m → ∞, such that: An application of Lemma 2 proves that This completes the proof for the asymptotic equicontinuity.
Author Contributions: I.S. and S.B.: conceptualization, methodology, investigation, writing-original draft, writing-review and editing. All authors contributed equally to the writing of this paper. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.
Data Availability Statement: Not applicable.

Acknowledgments:
The authors would like to thank the Special Issue Editor of the Special Issue on "Current Developments in Theoretical and Applied Statistics", Christophe Chesneau for the invitation. The authors are indebted to the Editor-in-Chief and the three referees for their very generous comments and suggestions on the first version of our article, which helped us to improve the content, presentation and layout of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
This appendix contains supplementary information that is an essential part of providing a more comprehensive understanding of the paper. We also refer to [46] for more details.
Otherwise, for l n > 2, we can write W n (h) as follows:  To prove the convergence of W n (h) to zero in probability, we must fulfill the convergence of (I) and (II) to zero in probability.
where 1 ≤ k ≤ m and 1 ≤ u ≤ k. We apply the SLLN for Harris Markov chains to find the convergence of Using the conditions, all terms in A and B are finite and we can prove the convergence of (I I) to zero. Now, for (I), applying the SLLN and by Lemma 3.2 in [47] part i), we can see that We have We obtain, in turn, that Hence, (I) also converges to zero a.s under P ν as n → ∞.

Proof of Theorem 1.
In what follows, let L = l n − 1 denote the number of blocks observed. We find that where h (c) (·) represents the conditional expectation of ω h (·) given the c of the coordinates, for all B c ∈ T. The U-statistics D L (h) is obtained by truncating the Hoeffding decomposition after the first term S L (h). Then, we just need to show that: 1. L 1/2 S L (h) converges weakly to a Gaussian process G P on l ∞ (F ), (P L P m−1 (h)) − P m (h) .
Using (A1), we can replace the random variable L = l n − 1 with the deterministic quantity L and we write . In order to establish the weak convergence for the empirical process ZL(h), it is sufficient and necessary to prove the finite dimensional convergence and the stochastic equicontinuity. For the finite multidimensional convergence, we have to prove that ZL(h i 1 ), . . . , ZL(h i k ) converges weakly to G(h i 1 ), . . . , G(h i k ) , for every fixed finite collection of functions h i 1 , . . . , h i k ⊂ F .
In order to fix this, it is enough to show that for every fixed a 1 , . . . , a k ∈ R, k ∑ j=1 a j ZL(h i j ) → N(0, σ 2 ), in distribution, where σ 2 = k ∑ j=1 a 2 j Var(ZL(h i j )) + ∑ s =r a j a i Cov(ZL(h i s ), ZL(h i r )).
By linearity, and in the same footsteps of the arguments of ( [57], Chapter 17), we can prove that where, under Condition (C5), We readily infer that we have where d(·, ·) is a pseudo distance for which the class F is totally bounded, and f , g belong to F . According to [72], we have where a = min(L, n/E(τ) ) and b = max(L, n/E(τ) ). For the left-hand part in the last inequality, we have Dividing the last inequality by L 1/2 and using the convergence result in ([72] Lemma 2.11) with Condition (C1), we obtain the desired result. The right-hand part in the inequality is treated using ([72] Lemma 4.2) providing that E A (τ) 2+α < ∞, for α > 0 and the hypothesis of a finite uniform entropy integral. To complete the weak convergence of the regenerative U-statistic, we must treat the remaining terms of its Hoeffding decomposition. For ζ ∈ F , let us introduce Once can see that ζ is centered, that is ζ(B 1 , . . . , B m )dP(B 1 ) . . . dP(B i ) . . . dP(B m ) = 0.