Next Article in Journal
General Quantile Time Series Regressions for Applications in Population Demographics
Next Article in Special Issue
Generating VaR Scenarios under Solvency II with Product Beta Distributions
Previous Article in Journal
CoRisk: Credit Risk Contagion with Correlation Network Models

## Volume 6, Issue 3

Article Menu Metrics 0

## Export Article

Risks 2018, 6(3), 96; https://doi.org/10.3390/risks6030096

Article
Bootstrapping Average Value at Risk of Single and Collective Risks
1
Department of Quantitative Economics, Maastricht University, 6200 MD Maastricht, The Netherlands
2
Department of Mathematics, Saarland University, 66123 Saarbrücken, Germany
*
Author to whom correspondence should be addressed.
Received: 1 August 2018 / Accepted: 7 September 2018 / Published: 12 September 2018

## Abstract

:
Almost sure bootstrap consistency of the blockwise bootstrap for the Average Value at Risk of single risks is established for strictly stationary $β$-mixing observations. Moreover, almost sure bootstrap consistency of a multiplier bootstrap for the Average Value at Risk of collective risks is established for independent observations. The main results rely on a new functional delta-method for the almost sure bootstrap of uniformly quasi-Hadamard differentiable statistical functionals, to be presented here. The latter seems to be interesting in its own right.
Keywords:
Average Value at Risk; compound distribution; nonparametric estimation; multiplier bootstrap; blockwise bootstrap; functional delta-method; uniform quasi-Hadamard differentiability; chain rule

## 1. Introduction

One of the most popular risk measures in practice is the so-called Average Value at Risk which is also referred to as Expected Shortfall (see Acerbi and Szekely (2014); Acerbi and Tasche (2002a, 2002b); Emmer et al. (2015) and references therein). For a fixed level $α ∈ ( 0 , 1 )$, the corresponding Average Value at Risk is the map $AV @ R α : L 1 → R$ defined by $AV @ R α ( X ) : = R α ( F X )$, where $F X$ refers to the distribution function of X, $L 1$ is the usual $L 1$-space associated with some atomless probability space, and
$R α ( F ) : = ∫ 0 1 F ← ( s ) d g α ( s ) = − ∫ − ∞ 0 g α ( F ( x ) ) d x + ∫ 0 ∞ 1 − g α ( F ( x ) ) d x$
for any $F ∈ F 1$ with $F 1$ the set of the distribution functions $F X$ of all $X ∈ L 1$. Here, $g α ( t ) : = 1 1 − α max { t − α ; 0 }$ and $F ← ( s ) : = inf { x ∈ R : F ( x ) ≥ s }$ denotes the left-continuous inverse of F. The statistical functional $R α : F 1 → R$ is sometimes referred to as risk functional associated with $AV @ R α$. Note that $AV @ R α ( X ) = E [ X | X ≥ F X ← ( α ) ]$ when $F X$ is continuous at $F X ← ( α )$.
In this article, we mainly focus on bootstrap methods for the Average Value at Risk. Before doing so, we briefly review nonparametric estimation techniques and asymptotic results for the Average Value at Risk. Given identically distributed observations $X 1 , … , X n$ ($, X n + 1 , …$) on some probability space $( Ω , F , P )$ with unknown marginal distribution $F ∈ F 1$, a natural estimator for $R α ( F )$ is the empirical plug-in estimator
$R α ( F ^ n ) = ∫ 0 1 F ^ n ← ( s ) d g α ( s ) = ∑ i = 1 n g α i n − g α i − 1 n X i : n ,$
where $F ^ n : = 1 n ∑ i = 1 n 𝟙 [ X i , ∞ )$ is the empirical distribution function of $X 1 , … , X n$ and $X 1 : n , … , X n : n$ refer to the order statistics of $X 1 , … , X n$. The second representation in Equation (2) shows that $R α ( F ^ n )$ is a specific L-statistic which was already mentioned in Acerbi (2002); Acerbi and Tasche (2002a); Jones and Zitikis (2003).
In particular, if the underlying sequence $( X i ) i ∈ N$ is strictly stationary and ergodic, classical results of van Zwet (1980) and Gilat and Helmers (1997) show that $R α ( F ^ n )$ converges $P$-almost surely to $R α ( F )$ as $n → ∞$, i.e., that strong consistency holds. If $X 1 , X 2 , …$ are i.i.d. and F has a finite second moment and takes the value $α$ only once, then a result of Stigler ((Stigler 1974, Theorems 1–2)) yields the asymptotic distribution of the estimation error:
$n R α ( F ^ n ) − R α ( F ) ⇝ Z ∼ N 0 , σ F 2 ,$
where $σ F 2 : = ∫ ∫ g α ′ ( F ( x 0 ) ) Γ ( x 0 , x 1 ) g α ′ ( F ( x 1 ) ) d x 0 d x 1$ with $Γ ( x 0 , x 1 ) : = F ( x 0 ∧ x 1 ) ( 1 − F ( x 0 ∨ x 1 ) ) + ∑ i = 0 1 ∑ k = 2 ∞ C ov ( 𝟙 { X 1 ≤ x i } , 𝟙 { X k ≤ x 1 − i } )$, $g α ′ : = 1 1 − α 𝟙 ( α , 1 ]$, and ⇝ refers to convergence in distribution (see also Shorack 1972; Shorack and Wellner 1986). In fact, for independent $X 1 , X 2 , …$ the second summand in the definition of $Γ ( x 0 , x 1 )$ vanishes. Results of Beutner and Zähle (2010) show that Equation (3) still holds if $( X i ) i ∈ N$ is strictly stationary and $α$-mixing with mixing coefficients $α ( i ) = O ( i − θ )$ and $lim x → ∞ ( 1 − F ( x ) ) x 2 θ / ( θ − 1 ) < ∞$ for some $θ > 1 + 2$. Tsukahara (2013) obtained the same result. A similar result can also be derived from an earlier work by Mehra and Rao (1975), but under a faster decay of the mixing coefficients and under an additional assumption on the dependence structure. We emphasize that the method of proof proposed by Beutner and Zähle is rather flexible, because it easily extends to other weak and strong dependence concepts and other risk measures (see Beutner et al. 2012; Beutner and Zähle 2010, 2016; Krätschmer et al. 2013; Krätschmer and Zähle 2017).
Even in the i.i.d. case the asymptotic variance $σ F 2$ depends on F in a fairly complex way. For the approximation of the distribution of $n ( R α ( F ^ n ) − R α ( F ) )$, bootstrap methods should thus be superior to the method of estimating $σ F 2$. However, to the best of our knowledge, theoretical investigations of the bootstrap for the Average Value at Risk seem to be rare. According to Gribkova (2016), a result of Gribkova (2002) yields bootstrap consistency for Efron’s bootstrap when $X 1 , X 2 , …$ are i.i.d, while Theorem 3 of Helmers et al. (1990) seems not to cover the Average Value at Risk, because there the function J (which plays the role of $g α ′$) is assumed to be Lipschitz continuous. In these articles, bootstrap consistency is typically proved by first proving consistency of the bootstrap variance and then using this result by showing that upper bounds for the difference between the sampling distribution and the bootstrap distribution converge to zero. Employing different techniques, Beutner and Zähle (2016) established bootstrap consistency in probability for the multiplier bootstrap when $X 1 , X 2 , …$ are i.i.d. as well as bootstrap consistency in probability for the circular bootstrap when $X 1 , X 2 , …$ are strictly stationary and $β$-mixing with mixing coefficients $β ( i ) = O ( i − b )$ and $∫ | x | p d F ( x ) < ∞$ for some $p > 2$ and $b > p / ( p − 2 )$. Recently, Sun and Cheng (2018) established bootstrap consistency in probability for the moving blocks bootstrap when $X 1 , X 2 , …$ are strictly stationary and $α$-mixing with mixing coefficients $α ( i ) ≤ c δ i$ and $∫ | x | p d F ( x ) < ∞$ for some $p > 4$, $c > 0$ and $δ ∈ ( 0 , 1 )$. Strictly speaking, Sun and Cheng did not consider the Average Value at Risk (Expected Shortfall) but the Tail Conditional Expectation in the sense of Acerbi and Tasche (2002a, 2002b).
The contribution of the article at hand is twofold. First, we extend the results of Beutner and Zähle (2016) on the Average Value at Risk from bootstrap consistency in probability to bootstrap consistency almost surely. Second, we establish bootstrap consistency for the Average Value at Risk of collective risks, i.e., for $R α ( F ∗ m )$ and more general expressions.
The rest of the article is organized as follows. In Section 2, we present and illustrate our main results which are proved in Section 3. Section 3 is followed by the conclusions. The proofs of Section 3 rely on a new functional delta-method for the almost sure bootstrap which seems to be interesting in its own right and which is presented in Appendix B. Roughly speaking, the (functional) delta method studies properties of particular estimators for quantities of the form $H ( θ )$. Here, H is a known functional, such as the Average Value at Risk functional, and $θ$ is a possibly infinite dimensional parameter, such as an unknown distribution function. The particular estimators covered by the (functional) delta method are of the form $H ( T ^ n )$ where $T ^ n$ is an estimator for $θ$. In general and in the particular application considered here, the appeal of the (functional) delta method lies in the fact that, once “differentiability” of H (here, the Average Value at Risk functional) is established, the asymptotic error distribution of $H ( T ^ n )$ can immediately be derived from the asymptotic error distribution of $T ^ n$ (here $F ^ n$). This also applies to the (functional) delta method for the bootstrap where bootstrap consistency of the bootstrapped version of $H ( T ^ n )$ will follow from the respective property of the bootstrapped version of $T ^ n$ (here $F ^ n$). Thus, if in financial or actuarial applications the data show dependencies for which the asymptotic error distribution and/or bootstrap consistency of plug-in estimators for the Average Value at Risk have not been established yet, it would be enough to check if for these dependencies the asymptotic error distribution and/or bootstrap consistency of $F ^ n$ is known; thanks to the (functional) delta method the Average Value at Risk functional would inherit these properties. In Appendix A.1, we give results on convergence in distribution for the open-ball $σ$-algebra which are needed for the main results, and in Appendix A.2 we prove a delta-method for uniformly quasi-Hadamard differentiable maps that is the basis for the method of Appendix B. Readers interested in these methods used to prove the main results might wish to first work through Appendix A and Appendix B before reading Section 2 and Section 3.

## 2. Main Results

#### 2.1. The Case of i.i.d. Observations

Keep the notation of Section 1. Assume that $( X i ) i ∈ N$ is a sequence of i.i.d. real-valued random variables on some probability space $( Ω , F , P )$ with distribution function F. Let $F ^ n : = 1 n ∑ i = 1 n 𝟙 [ X i , ∞ )$ and $( W n i )$ be a triangular array of nonnegative real-valued random variables on another probability space $( Ω ′ , F ′ , P ′ )$ such that one of the following two settings is met.
S1.
The random vector $( W n 1 , … , W n n )$ is multinomially distributed according to the parameters n and $p 1 = ⋯ = p n : = 1 / n$ for every $n ∈ N$.
S2.
$W n i = Y i / Y ¯ n$ for every $i = 1 , … , n$ and $n ∈ N$, where $Y ¯ n : = 1 n ∑ j = 1 n Y j$ and $( Y j )$ is any sequence of nonnegative i.i.d. random variables on $( Ω ′ , F ′ , P ′ )$ with $∫ 0 ∞ P ′ [ Y 1 > t ] 1 / 2 d t < ∞$ and $V ar ′ [ Y 1 ] 1 / 2 = E ′ [ Y 1 ] > 0$.
Let $( Ω ¯ , F ¯ , P ¯ ) : = ( Ω × Ω ′ , F ⊗ F ′ , P ⊗ P ′ )$ and $F ^ n ∗ ( ω , ω ′ ) : = 1 n ∑ i = 1 n W n i ( ω ′ ) 𝟙 [ X i ( ω ) , ∞ )$. Setting S1. is nothing but Efron’s boostrap (Efron 1979). If in Setting S2. the distribution of $Y 1$ is the exponential distribution with parameter 1, then the resulting scheme is in line with the Bayesian bootstrap of Rubin (1981). Let $σ F 2 : = ∫ ∫ g α ′ ( F ( x 0 ) ) Γ ( x 0 , x 1 ) g α ′ ( F ( x 1 ) ) d x 0 d x 1$ with $Γ ( x 0 , x 1 ) : = F ( x 0 ∧ x 1 ) ( 1 − F ( x 0 ∨ x 1 ) )$.
Theorem 1.
In the setting above assume that $∫ ϕ 2 d F < ∞$ for some continuous function $ϕ : R → [ 1 , ∞ )$ with $∫ 1 / ϕ ( x ) d x < ∞$ (in particular $F ∈ F 1$), and that F takes the value α only once. Then
$n R α ( F ^ n ) − R α ( F ) ⇝ Z ∼ N 0 , σ F 2$
and
$n R α ( F ^ n ∗ ( ω , · ) ) − R α ( F ^ n ( ω ) ) ⇝ Z ∼ N 0 , σ F 2 , P − a . e . ω .$
Theorem 1 is a special case of Corollary 1 below. For the bootstrap Scheme S1. the result of Theorem 1 can be also deduced from Theorem 7 in Gribkova (2002). According to Gribkova (2016), Condition (1) of this theorem is satisfied if there are $0 = a 0 < a 1 < ⋯ < a k = 1$ for some $k ∈ N$ such that J is Hölder continuous on each interval $( a i − 1 , a i )$, $1 ≤ i ≤ k$, and the measure $d F − 1$ has no mass at the points $a 1 , … , a k − 1$. For the bootstrap Scheme S2. the result seems to be new.
We now consider the collective risk model. Let $( X i ) i ∈ N$ and $F ^ n$ be as above, and let $p = ( p k ) k ∈ N 0$ be the counting density of a distribution on $N 0$. Let $F$ denote the set of all distribution functions on $R$, and consider the functional $C p : F → F$ defined by $C p ( F ) : = ∑ k = 0 ∞ p k F ∗ k$, where $F ∗ k$ refers to the k-fold convolution of F, i.e., $F ∗ 0 : = 𝟙 [ 0 , ∞ )$ and $F ∗ k ( x ) : = ∫ F ( x − x k − 1 ) d F ∗ ( k − 1 ) ( x k − 1 ) = ∫ ⋯ ∫ F ( x − x k − 1 − ⋯ − x 1 ) d F ( x 1 ) ⋯ d F ( x k − 1 )$ for $k ∈ N$. If $p m = 1$ for some $m ∈ N 0$, then $C p ( F ) = F ∗ m$. Let $σ p , F 2 : = ∫ ∫ ∫ ∫ g α ′ ( F ( x 0 ) ) Γ ( x 0 − y 0 , x 1 − y 1 ) g α ′ ( F ( x 1 ) ) d H p , F ( y 0 ) d H p , F ( y 1 ) d x 0 d x 1$ with $Γ ( z 0 , z 1 ) : = F ( z 0 ∧ z 1 ) ( 1 − F ( z 0 ∨ z 1 ) )$ and $H p , F : = ∑ k = 1 ∞ k p k F ∗ ( k − 1 )$.
Theorem 2.
In the setting above assume that $∫ | x | 2 λ d F ( x ) < ∞$ for some $λ > 1$ (in particular $F ∈ F 1$) and $∑ k = 1 ∞ p k k 1 + λ < ∞$, and that $C p ( F )$ takes the value α only once. Then,
$n R α ( C p ( F ^ n ) ) − R α ( C p ( F ) ) ⇝ Z ∼ N 0 , σ p , F 2$
and
$n R α ( C p ( F ^ n ∗ ( ω , · ) ) ) − R α ( C p ( F ^ n ( ω ) ) ) ⇝ Z ∼ N 0 , σ p , F 2 , P − a . e . ω .$
Theorem 2 is a special case of Corollary 4 below. Lauer and Zähle (2015, 2017) derive the asymptotic distribution as well as almost sure bootstrap consistency for the Average Value at Risk (and more general risk measures) of $F ∗ m n$ when $m n / n$ is asymptotically constant, but we do not know any result in the existing literature which is comparable to that of Theorem 2.

#### 2.2. The Case of $β$-Mixing Observations

Keep the notation of Section 1. Assume that $( X i ) i ∈ N$ is a strictly stationary sequence of $β$-mixing random variables on $( Ω , F , P )$ with distribution function F. As before let $F ^ n : = 1 n ∑ i = 1 n 𝟙 [ X i , ∞ )$. Let $( ℓ n )$ be a sequence of integers such that $ℓ n ↗ ∞$ as $n → ∞$, and $ℓ n < n$ for all $n ∈ N$. Set $k n : = ⌈ n / ℓ n ⌉$ for all $n ∈ N$. Let $( I n j ) n ∈ N , 1 ≤ j ≤ k n$ be a triangular array of random variables on $( Ω ′ , F ′ , P ′ )$ such that $I n 1 , … , I n k n$ are i.i.d. according to the uniform distribution on ${ 1 , … , n − ℓ n + 1 }$ for every $n ∈ N$. Let $( Ω ¯ , F ¯ , P ¯ ) : = ( Ω × Ω ′ , F ⊗ F ′ , P ⊗ P ′ )$ and $F ^ n ∗ ( ω , ω ′ ) : = 1 n ∑ i = 1 n W n i ( ω ′ ) 𝟙 [ X i ( ω ) , ∞ )$ with
$W n i ( ω ′ ) : = ∑ j = 1 k n − 1 𝟙 { I n j ≤ i ≤ I n j + ℓ n − 1 } ( ω ′ ) + 𝟙 { I n k n ≤ i ≤ I n k n + ( n − ( k n − 1 ) ℓ n ) − 1 } ( ω ′ ) .$
Note that the sequence $( X i )$ and the triangular array $( W n i )$ regarded as families of random variables on the product space $( Ω ¯ , F ¯ , P ¯ ) : = ( Ω × Ω ′ , F ⊗ F ′ , P ⊗ P ′ )$ are independent. At an informal level, this means that, given a sample $X 1 , … , X n$, we pick $k n − 1$ blocks of length $ℓ n$ and one block of length $n − ( k n − 1 ) ℓ n$ in the sample $X 1 , … , X n$, where the start indices $I n 1 , I n 2 , … , I n k n$ are chosen independently and uniformly in the set of indices ${ 1 , … , n − ℓ n + 1 }$:
 block 1: $X I n 1 , X I n 1 + 1 , … , X I n 1 + ℓ n − 1$ block 2: $X I n 2 , X I n 2 + 1 , … , X I n 2 + ℓ n − 1$ ⋮ block $k n − 1$: $X I n ( k n − 1 ) , X I n ( k n − 1 ) + 1 , … , X I n ( k n − 1 ) + ℓ n − 1$ block $k n$: $X I n k n , X I n k n + 1 , … , X I n k n + ( n − ( k n − 1 ) ℓ n ) − 1$.
The bootstrapped empirical distribution function $F ^ n ∗$ is then defined to be the distribution function of the discrete probability measure with atoms $X 1 , … , X n$ carrying masses $W n 1 , … , W n n$, respectively, where $W n i$ specifies the number of blocks which contain $X i$. This is known as the blockwise bootstrap (see, e.g., Bühlmann (1994, 1995) and references therein). Assume that the following assertions hold:
A1.
$∫ ϕ p d F < ∞$ for some $p > 4$ (in particular $F ∈ F 1$).
A2.
The sequence of random variables $( X i )$ is strictly stationary and $β$-mixing with mixing coefficients $( β i )$ satisfying $β i ≤ c δ i$ for some constants $c > 0$ and $δ ∈ ( 0 , 1 )$.
A3.
The block length $ℓ n$ satisfies $ℓ n = O ( n γ )$ for some $γ ∈ ( 0 , 1 / 2 )$.
Let $C ^ n : = E ′ [ F ^ n ∗ ] = 1 n ∑ i = 1 n w n i 𝟙 [ X i , ∞ )$ with $w n i : = E ′ [ W n i ]$, and note that
$w n i = k n i n − ℓ n + 1 , i = 1 , … , n − ( k n − 1 ) ℓ n ( k n − 1 ) i n − ℓ n + 1 + n − ( k n − 1 ) ℓ n n − ℓ n + 1 , i = n − ( k n − 1 ) ℓ n + 1 , … , ℓ n ( k n − 1 ) ℓ n n − ℓ n + 1 + n − ( k n − 1 ) ℓ n n − ℓ n + 1 = n n − ℓ n + 1 , i = ℓ n + 1 , … , n − ℓ n ( k n − 1 ) n − i + 1 n − ℓ n + 1 + 2 n − k n ℓ n − i + 1 n − ℓ n + 1 , i = n − ℓ n + 1 , … , n − ( k n ℓ n − n ) ( k n − 1 ) n − i + 1 n − ℓ n + 1 , i = n − ( k n ℓ n − n ) + 1 , … , n$
which can be verified easily. Let $σ F 2 : = ∫ ∫ g α ′ ( F ( x 0 ) ) Γ ( x 0 , x 1 ) g α ′ ( F ( x 1 ) ) d x 0 d x 1$ with $Γ ( x 0 , x 1 ) : = F ( x 0 ∧ x 1 ) ( 1 − F ( x 0 ∨ x 1 ) ) + ∑ i = 0 1 ∑ k = 2 ∞ C ov ( 𝟙 { X 1 ≤ x i } , 𝟙 { X k ≤ x 1 − i } )$.
Theorem 3.
In the setting above (in particular under A1.–A3.) assume that F takes the value α only once. Then, we have
$n R α ( F ^ n ) − R α ( F ) ⇝ Z ∼ N 0 , σ F 2$
and
$n R α ( F ^ n ∗ ( ω , · ) ) − R α ( C ^ n ( ω ) ) ⇝ Z ∼ N 0 , σ F 2 , P − a . e . ω .$
Theorem 3 is a special case of Corollary 1 below. To the best of our knowledge, there does not yet exist any result on almost sure bootstrap consistency for the Average Value at Risk when the underlying data are dependent.

#### 2.3.1. Bootstrapping the Down Side Risk of an Asset Price

Let $( A i ) i ∈ N 0$ be the price process of an asset. Let us assume that it is induced by an initial state $A 0 ∈ R +$ and a sequence of $R +$-valued i.i.d. random variables $( R i ) i ∈ N$ via $A i : = R i A i − 1$, $i ∈ N$. Here, $R i$ is the return of the asset in between time $i − 1$ and time i. For instance, if $A 0 , A 1 , A 2 , …$ are the observations of a time-continuous Black–Scholes–Merton model with drift $μ$ and volatility $σ$ at the points of the time grid ${ 0 , h , 2 h , … }$, then the distribution of $R 1$ is the log-normal distribution with parameters $( μ − σ 2 / 2 ) h$ and $σ 2 h$. However, the adequacy of a specific parametric model is usually hard to verify. For this reason, we do not restrict ourselves to any particular parametric structure for the dynamics of $( R i ) i ∈ N$.
Let us assume that we can observe the asset prices $A 0 , … , A n$ up to time n, and that we are interested in the Average Value at Risk at level $α$ of the negative price change $A n − A n + 1$ (which specifies the down side risk of the asset) in between time n and $n + 1$. That is, since for any $a 0 , … , a n ∈ R +$ the unconditional distribution of $( 1 − R n + 1 ) a n$ coincides with the factorized conditional distribution of $A n − A n + 1 = ( 1 − R n + 1 ) A n$ given $( A 0 , … , A n ) = ( a 0 , … , a n )$, we are in fact interested in $R α ( F ) = AV @ R α ( X )$ for the distribution function F of $X : = ( 1 − R n + 1 ) a$ for any fixed $a ∈ R +$. As the random variables $X 1 : = ( 1 − R 1 ) a , … , X n : = ( 1 − R n ) a$ are i.i.d. copies of X, we can use $R α ( F ^ n )$ as an estimator for $R α ( F )$ and derive from Equation (4) an asymptotic confidence interval at a given level $τ ∈ ( 0 , 1 )$ for $R α ( F )$ where one has to estimate $σ F 2$ by $∫ ∫ g α ′ ( F ^ n ( x 0 ) ) F ^ n ( x 0 ∧ x 1 ) ( 1 − F ^ n ( x 0 ∨ x 1 ) ) g α ′ ( F ^ n ( x 1 ) ) d x 0 d x 1$. As the estimator for $σ F 2$ depends on $F ^ n$ in a somewhat complex way, the bootstrap confidence interval
$R α F ^ n ( ω ) − 1 n q ^ 1 − τ / 2 ∗ ( ω ) , R α F ^ n ( ω ) − 1 n q ^ τ / 2 ∗ ( ω )$
at level $τ$ derived from Equations (4) and (5) is supposed to have a slightly better performance. Here, $q ^ t ∗ ( ω )$ denotes a t-quantile of (a Monte Carlo approximation of) the distribution of the left-hand side in Equation (5) for fixed $ω$. For Equations (4) and (5) it suffices to assume that $E [ | R 1 | 2 + ε ] < ∞$ for some arbitrarily small $ε > 0$.

#### 2.3.2. Bootstrapping the Total Risk Premium in Insurance Models

In actuarial mathematics, the collective risk model is frequently used for modeling the total claim distribution of an insurance collective. If the counting density $p = ( p k ) k ∈ N 0$ corresponds to the distribution of the random number N of claims caused by the whole collective within one insurance period, and if $X 1 , … , X N$ ($, X N + 1 , …$) denote the i.i.d. sizes of the corresponding claims with marginal distribution F, then $C p ( F )$ is the distribution of the total claim $∑ i = 1 N X i$ (the latter sum is set to 0 if $N = 0$). Now, $R α ( C p ( F ) )$ is a suitable insurance premium for the whole collective when the Average Value at Risk at level $α$ is considered to be a suitable premium principle.
Assume that p is known, for instance $p m = 1$ for some fixed $m ∈ N$, and let $X 1 , … , X n$ be observed historical (i.i.d.) claims with n large. On the one hand, the construction of an exact confidence interval for $R α ( C p ( F ) )$ at level $τ ∈ ( 0 , 1 )$ based on $X 1 , … , X n$ is hardly possible. Likewise, the performance of an asymptotic confidence interval at level $τ$ derived from Equation (6) with (nonparametrically) estimated $σ p , F 2$ is typically only moderate. Take into account that $σ p , F 2$ depends on the unknown F in a fairly complex way. On the other hand, the bootstrap confidence interval
$R α C p ( F ^ n ( ω ) ) − 1 n q ^ 1 − τ / 2 ∗ ( ω ) , R α C p ( F ^ n ( ω ) ) − 1 n q ^ τ / 2 ∗ ( ω )$
at level $τ$ derived from Equation (7) should have a better performance. Here, $q ^ t ∗ ( ω )$ denotes a t-quantile of (a Monte Carlo approximation of) the distribution of the left-hand side in Equation (7) for fixed $ω$.
Note that Theorem 2 ensures that Equations (6) and (7) hold true when the marginal distribution F of the $X i$ is any log-normal distribution, any Gamma distribution, any Pareto distribution with tail index greater than 2, or any convex combination of one of these distributions with the Dirac measure $δ 0$, and the counting density p corresponds to any Dirac measure with atom in $N$, any binomial distribution, any Poisson distribution, or any geometric distribution. The former distributions are classical examples for the single claim distribution and the latter distributions are classical examples for the claim number distribution.

## 3. Proofs of Main Results

Here, we prove the results of Section 2. In fact, Theorems 1–3 are special cases of Corollaries 1 and 4. The latter corollaries are proved with the help of the technique introduced in Appendix B.2, which in turn avails the concept of uniform quasi-Hadamard differentiability (see Definition A1 in Appendix B.1).
Keep the notation introduced in Section 1. Let $D$ be the space of all cádlág functions v on $R$ with finite sup-norm $∥ v ∥ ∞ : = sup t ∈ R | v ( t ) |$, and $D$ be the $σ$-algebra on $D$ generated by the one-dimensional coordinate projections $π t$, $t ∈ R$, given by $π t ( v ) : = v ( t )$. Let $ϕ : R → [ 1 , ∞ )$ be a weight function, i.e., a continuous function being non-increasing on $( − ∞ , 0 ]$ and non-decreasing on $[ 0 , ∞ )$. Let $D ϕ$ be the subspace of $D$ consisting of all $x ∈ D$ satisfying $∥ v ∥ ϕ : = ∥ v ϕ ∥ ∞ < ∞$ and $lim | t | → ∞ | v ( t ) | = 0$. The latter condition automatically holds when $lim | t | → ∞ ϕ ( t ) = ∞$. We equip $D ϕ$ with the trace $σ$-algebra of $D$, and note that this $σ$-algebra coincides with the $σ$-algebra $B ϕ ∘$ on $D ϕ$ generated by the $∥ · ∥ ϕ$-open balls (see Lemma 4.1 in Beutner and Zähle (2016)).

#### 3.1. Average Value at Risk functional

Using the terminology of Part (i) of Definition A1, we obtain the following result.
Proposition 1.
Let $F ∈ F 1$ and assume that F takes the value α only once. Let $S$ be the set of all sequences $( G n ) ⊆ F 1$ with $G n → F$ pointwise. Moreover, assume that $∫ 1 / ϕ ( x ) d x < ∞$. Then, the map $R α : F 1 ( ⊆ D ) → R$ is uniformly quasi-Hadamard differentiable with respect to $S$ tangentially to $D ϕ 〈 D ϕ 〉$, and the uniform quasi-Hadamard derivative $R ˙ α ; F : D ϕ → R$ is given by
$R ˙ α ; F ( v ) : = − ∫ g α ′ ( F ( x ) ) v ( x ) d x ,$
where as before $g α ′ : = 1 1 − α 𝟙 ( α , 1 ]$.
Proposition 1 shows in particular that for any $F ∈ F 1$ which takes the value $α$ only once, the map $R α : F 1 ( ⊆ D ) → R$ is uniformly quasi-Hadamard differentiable at F tangentially to $D ϕ 〈 D ϕ 〉$ (in the sense of Part (ii) of Definition A1) with uniform quasi-Hadamard derivative given by Equation (13).
Proof. (of Proposition 1)
First, note that the map $R ˙ α ; F$ defined in Equation (13) is continuous with respect to $∥ · ∥ ϕ$, because
$| R ˙ α ; F ( v 1 ) − R ˙ α ; F ( v 2 ) | ≤ ∫ 1 1 − α | v 1 ( x ) − v 2 ( x ) | d x ≤ 1 1 − α ∫ 1 / ϕ ( x ) d x ∥ v 1 − v 2 ∥ ϕ$
holds for every $v 1 , v 2 ∈ D ϕ$.
Now, let $( ( F n ) , v , ( v n ) , ( ε n ) )$ be a quadruple with $( F n ) ⊆ F 1$ satisfying $F n → F$ pointwise, $v ∈ D ϕ$, $( v n ) ⊆ D ϕ$ satisfying $∥ v n − v ∥ ϕ → 0$ and $( F n + ε n v n ) ⊆ F 1$, and $( ε n ) ⊆ ( 0 , ∞ )$ satisfying $ε n → 0$. It remains to show that
$lim n → ∞ | R α ( F n + ε n v n ) − R α ( F n ) ε n − R ˙ α ; F ( v ) | = 0 ,$
that is,
$lim n → ∞ | ∫ g α F n ( x ) − g α ( F n + ε n v n ) ( x ) ε n − − g α ′ ( F ( x ) ) v ( x ) d x | = 0 .$
Let us denote the integrand of the integral in Equation (14) by $I n ( x )$. In virtue of $F n → F$ pointwise, $∥ v n − v ∥ ϕ → 0$, $ε n → 0$, and
$| ( F n + ε n v n ) ( x ) − F ( x ) | ≤ | F n ( x ) − F ( x ) | + ε n | v n ( x ) − v ( x ) | + ε n | v ( x ) | ,$
we have $lim n → ∞ F n ( x ) = F ( x )$ and $lim n → ∞ ( F n ( x ) + ε n v n ( x ) ) = F ( x )$ for every $x ∈ R$. Thus, for every $x ∈ R$ with $F ( x ) < a l p h a$, we obtain $g α ′ ( F ( x ) ) v ( x ) = 0$ and
i.e., $lim n → ∞ I n ( x ) = 0$. Moreover, for every $x ∈ R$ with $F ( x ) > α$, we obtain $g α ′ ( F ( x ) ) v ( x ) = 1 1 − α v ( x )$ and
i.e., $lim n → ∞ I n ( x ) = 0$. Since we assumed that F takes the value $α$ only once, we can conclude that $lim n → ∞ I n ( x ) = 0$ for Lebesgue-a.e. $x ∈ R$. Moreover, by the Lipschitz continuity of $g α$ with Lipschitz constant $1 1 − α$ we have
$| I n ( x ) | = | I n ( x ) | ϕ ( x ) ϕ ( x ) − 1 = | g α F n ( x ) − g α ( F n + ε n v n ) ( x ) ε n + g α ′ ( F ( x ) ) v ( x ) | ϕ ( x ) ϕ ( x ) − 1 ≤ 1 1 − α ∥ v n ∥ ϕ + ∥ v ∥ ϕ ϕ ( x ) − 1 ≤ 1 1 − α sup n ∈ N ∥ v n ∥ ϕ + ∥ v ∥ ϕ ϕ ( x ) − 1 .$
Since $sup n ∈ N ∥ v n ∥ ϕ < ∞$ (recall $∥ v n − v ∥ ϕ → 0$), the assumption $∫ 1 / ϕ ( x ) d x < ∞$ ensures that the latter expression provides a Borel measurable majorant of $I n$. Now, the Dominated Convergence theorem implies Equation (14). ☐
As an immediate consequence of Corollary A4, Examples A1 and A2, and Proposition 1, we obtain the following corollary.
Corollary 1.
Let F, $F ^ n$, $F ^ n ∗$, $C ^ n$, and $B F$ be as in Example A1 (S1. or S2.) or as in Example A2 respectively, and assume that the assumptions discussed in Example A1 or in Example A2 respectively are fulfilled for some weight function ϕ with $∫ 1 / ϕ ( x ) d x < ∞$ (in particular $F ∈ F 1$). Moreover, assume that F takes the value α only once. Then,
and

#### 3.2. Compound Distribution Functional

Let $C p : F → F$ be the compound distribution functional introduced in Section 2.1. For any $λ ≥ 0$, let the function $ϕ λ : R → [ 1 , ∞ )$ be defined by $ϕ λ ( x ) : = ( 1 + | x | ) λ$ and denote by $F ϕ λ$ the set of all distribution functions F that satisfy $∫ ϕ λ ( x ) d F ( x ) < ∞$. Using the terminology of Part (ii) of Definition A1, we obtain the following Proposition 2. In the proposition, the functional $C p$ is restricted to the domain $F ϕ λ$ in order to obtain $D ϕ λ ′$ as the corresponding trace. The latter will be important for Corollary 3.
Proposition 2.
Let $λ > λ ′ ≥ 0$ and $F ∈ F ϕ λ$. Assume that $∑ k = 1 ∞ p k k ( 1 + λ ) ∨ 2 < ∞$. Then, the map $C p : F ϕ λ ( ⊆ D ) → F ( ⊆ D )$ is uniformly quasi-Hadamard differentiable at F tangentially to $D ϕ λ 〈 D ϕ λ 〉$ with trace $D ϕ λ ′$. Moreover, the uniform quasi-Hadamard derivative $C ˙ p ; F : D ϕ λ → D ϕ λ ′$ is given by
$C ˙ p ; F ( v ) ( · ) : = v ∗ H p , F ( · ) : = ∫ v ( · − x ) d H p , F ( x ) ,$
where as before $H p , F : = ∑ k = 1 ∞ k p k F ∗ ( k − 1 )$. In particular, if $p m = 1$ for some $m ∈ N$, then
$C ˙ p ; F ( v ) ( · ) = m ∫ v ( · − x ) d F ∗ ( m − 1 ) ( x ) .$
Proposition 2 extends Proposition 4.1 of Pitts (1994). Before we prove the proposition, we note that the proposition together with Corollary A4 and Examples A1 and A2 yields the following corollary.
Corollary 2.
Let F, $F ^ n$, $F ^ n ∗$, $C ^ n$, and $B F$ be as in Example A1 (S1. or S2.) or as in Example A2 respectively, and assume that the assumptions discussed in Example A1 or in Example A2 respectively are fulfilled for $ϕ = ϕ λ$ for some $λ > 0$. Then, for $λ ′ ∈ [ 0 , λ )$
and
To ease the exposition of the proof of Proposition 2, we first state a lemma that follows from results given in Pitts (1994). In the sequel we use $f ∗ H$ to denote the function defined by $f ∗ H ( · ) : = ∫ v ( · − x ) d H ( x )$ for any measurable function f and any distribution function H of a finite (not necessarily probability) Borel measure on $R$ for which $f ∗ H ( · )$ is well defined on $R$.
Lemma 1.
Let $λ > λ ′ ≥ 0$, and $( F n ) ⊆ F ϕ λ$ and $( G n ) ⊆ F ϕ λ$ be any sequences such that $∥ F n − F ∥ ϕ λ → 0$ and $∥ G n − G ∥ ϕ λ → 0$ for some $F , G ∈ F ϕ λ$. Then, the following two assertions hold.
(i)
There exists a constant $C 1 > 0$ such that for every $k , n ∈ N$
$∥ 𝟙 [ 0 , ∞ ) − F n ∗ k ∥ ϕ λ ′ ≤ ( 2 λ ′ − 1 ∨ 1 ) ( 1 + k λ ′ ∨ 1 C 1 ) .$
(ii)
For every $v ∈ D ϕ λ ′$ there exists a constant $C 2 > 0$ such that for every $k , ℓ , n ∈ N$
$∥ v ∗ ( F n ∗ k ∗ G n ∗ ℓ ) ∥ ϕ λ ′ ≤ 2 λ ′ 1 + 2 λ ′ ( 2 λ ′ − 1 ∨ 1 ) ( 2 + ( k + ℓ ) λ ′ ∨ 1 C 2 ) ∥ v ∥ ϕ λ ′ .$
Proof.
(i): From Equation (2.4) in Pitts (1994) we have
$∥ 𝟙 [ 0 , ∞ ) − F n ∗ k ∥ ϕ λ ′ ≤ ( 2 λ ′ − 1 ∨ 1 ) 1 + k λ ′ ∨ 1 ∫ | x | λ ′ d F n ( x ) ,$
so that it remains to show that $∫ | x | λ ′ d F n ( x )$ is bounded above uniformly in $n ∈ N$. The functions $𝟙 [ 0 , ∞ ) − F n$ and $𝟙 [ 0 , ∞ ) − F$ both lie in $D ϕ λ$, because $F n , F ∈ F ϕ λ$. Along with $∥ F n − F ∥ ϕ λ → 0$, this implies $∫ | x | λ ′ d F n ( x ) → ∫ | x | λ ′ d F ( x )$ (see Lemma 2.1 in Pitts (1994)). Therefore, $∫ | x | λ ′ d F n ( x ) ≤ C 1$ for some suitable finite constant $C 1 > 0$ and all $n ∈ N$.
(ii): With the help of Lemma 2.3 of Pitts (1994) (along with $∥ F n ∗ k ∗ G n ∗ ℓ ∥ ∞ = 1$), Lemma 2.4 of Pitts (1994), and Equation (2.4) in Pitts (1994), we obtain
$∥ v ∗ ( F n ∗ k ∗ G n ∗ ℓ ) ∥ ϕ λ ′ ≤ 2 λ ′ ∥ v ∥ ϕ λ ′ 1 + ∥ 𝟙 [ 0 , ∞ ) − F n ∗ k ∗ G n ∗ ℓ ∥ ϕ λ ′ ≤ 2 λ ′ ∥ v ∥ ϕ λ ′ 1 + 2 λ ′ ∥ 𝟙 [ 0 , ∞ ) − F n ∗ k ∥ ϕ λ ′ + ∥ 𝟙 ( 0 , ∞ ) − G n ∗ ℓ ∥ ϕ λ ′ ≤ 2 λ ′ ∥ v ∥ ϕ λ ′ 1 + 2 λ ′ ( 2 λ ′ − 1 ∨ 1 ) 1 + k λ ′ ∨ 1 ∫ | x | λ ′ d F n ( x ) + 1 + ℓ λ ′ ∨ 1 ∫ | x | λ ′ d G n ( x ) .$
It hence remains to show that $∫ | x | λ ′ d F n ( x )$ and $∫ | x | λ ′ d G n ( x )$ are bounded above uniformly in $n ∈ N$. However, this was already done in the proof of Part (i). ☐
Proof. Proof of Proposition 2.
First, note that for $G 1 , G 2 ∈ F ϕ λ$, we have
$∥ C p ( G 1 ) − C p ( G 2 ) ∥ ϕ λ ′ ≤ ∥ C p ( G 1 ) − 𝟙 [ 0 , ∞ ) ∥ ϕ λ ′ + ∥ 𝟙 [ 0 , ∞ ) − C p ( G 2 ) ∥ ϕ λ ′ ≤ ∫ ( 1 + | x | ) λ ′ d C p ( G 1 ) ( x ) + ∫ ( 1 + | x | ) λ ′ d C p ( G 2 ) ( x )$
by Equation (2.1) in Pitts (1994). Moreover, according to Lemma 2.2 in Pitts (1994), we have that the integrals $∫ | x | λ ′ d C p ( F ) ( x )$ and $∫ | x | λ ′ d C p ( G ) ( x )$ are finite under the assumptions of the proposition. Hence, $D ϕ λ ′$ can indeed be seen as the trace.
Second, we show $( ∥ · ∥ ϕ λ , ∥ · ∥ ϕ λ ′ )$-continuity of the map $C ˙ p ; F : D ϕ λ → D ϕ λ ′$. To this end let $v ∈ D ϕ λ$ and $( v n ) ⊆ D ϕ λ$ such that $∥ v n − v ∥ ϕ λ → 0$. For every $k ∈ N$, we have
$∥ p k k ( v n − v ) ∗ F ∗ ( k − 1 ) ∥ ϕ λ ′ ≤ 2 λ ′ ∥ v n − v ∥ ϕ λ ′ p k k ∥ 𝟙 [ 0 , ∞ ) ∥ F ∗ ( k − 1 ) ∥ ∞ − F ∗ ( k − 1 ) ∥ ϕ λ ′ + ∥ F ∗ ( k − 1 ) ∥ ∞ = 2 λ ′ ∥ v n − v ∥ ϕ λ ′ p k k ∥ 𝟙 [ 0 , ∞ ) − F ∗ ( k − 1 ) ∥ ϕ λ ′ + 1 ≤ 2 λ ′ ∥ v n − v ∥ ϕ λ ′ p k k ( 2 λ ′ − 1 ∨ 1 ) 1 + ( k − 1 ) λ ′ ∨ 1 ∫ | x | λ ′ d F ( x ) + 1 ,$
where the first and the second inequality follow from Lemma 2.3 and Equation (2.4) in Pitts (1994) respectively. Hence,
$∥ C ˙ p ; F ( v n ) − C ˙ p ; F ( v ) ∥ ϕ λ ′ = ∥ v n ∗ H p , F − v ∗ H p , F ∥ ϕ λ ′ ≤ 2 λ ′ ∥ v n − v ∥ ϕ λ ′ ∑ k = 1 ∞ p k k ( 2 λ ′ − 1 ∨ 1 ) 1 + ( k − 1 ) λ ′ ∨ 1 ∫ | x | λ ′ d F ( x ) + 1 .$
Now, the series converges due to the assumptions, and $∥ v n − v ∥ ϕ λ → 0$ implies $∥ v n − v ∥ ϕ λ ′ → 0$. Thus, $∥ C ˙ p ; F ( v n ) − C ˙ p ; F ( v ) ∥ ϕ λ ′ → 0$, which proves continuity.
Third, let $( ( F n ) , v , ( v n ) , ( ε n ) )$ be a quadruple with $( F n ) ⊆ F ϕ λ$ satisfying $∥ F n − F ∥ ϕ λ → 0$, $v ∈ D ϕ λ$, $( v n ) ⊆ D ϕ λ$ satisfying $∥ v n − v ∥ ϕ λ → 0$ and $( F n + ε n v n ) ⊆ F ϕ λ$, and $( ε n ) ⊆ ( 0 , ∞ )$ satisfying $ε n → 0$. It remains to show that
$lim n → ∞ ∥ C p ( F n + ε n v n ) − C p ( F n ) ε n − C ˙ p ; F ( v ) ∥ ϕ λ ′ = 0 .$
To do so, define for $k ∈ N 0$ a map $H k : F × F : → F$ by
$H k ( G 1 , G 2 ) : = ∑ j = 0 k − 1 G 1 ∗ ( k − 1 − j ) ∗ G 2 ∗ j$
with the usual convention that the sum over the empty set equals zero. We find that for every $M ∈ N$
$∥ C p ( F n + ε n v n ) − C p ( F n ) ε n − C ˙ p ; F ( v ) ∥ ϕ λ ′ = ∥ 1 ε n ∑ k = 0 ∞ p k ( F n + ε n v n ) ∗ k − ∑ k = 0 ∞ p k F n ∗ k − C ˙ p ; F ( v ) ∥ ϕ λ ′ = ∥ 1 ε n ∑ k = 1 ∞ p k ( F n + ε n v n ) ∗ k − p k F n ∗ k − C ˙ p ; F ( v ) ∥ ϕ λ ′ = ∥ ∑ k = 1 ∞ p k v n ∗ H k ( F n + ε n v n , F n ) − C ˙ p ; F ( v ) ∥ ϕ λ ′ ≤ ∥ ∑ k = M + 1 ∞ p k v n ∗ H k ( F n + ε n v n , F n ) ∥ ϕ λ ′ + ∥ ∑ k = 1 M p k ( v n − v ) ∗ H k ( F n + ε n v n , F n ) ∥ ϕ λ ′ + ∥ v ∗ ∑ k = M + 1 ∞ k p k F ∗ ( k − 1 ) ∥ ϕ λ ′ + ∥ ∑ k = 1 M p k v ∗ H k ( F n + ε n v n , F n ) − k p k v ∗ F ∗ ( k − 1 ) ∥ ϕ λ ′ = : S 1 ( n , M ) + S 2 ( n , M ) + S 3 ( M ) + S 4 ( n , M ) ,$
where for the third “=” we use the fact that for $G 1 , G 2 ∈ F$
$( G 1 − G 2 ) ∗ H k ( G 1 , G 2 ) = G 1 ∗ k − G 2 ∗ k .$
By Part (ii) of Lemma reflemma preceding qHD of compound (this lemma can be applied since $∥ F n + ε n v n − F ∥ ϕ λ → 0$) there exists a constant $C 2 > 0$ such that for all $n ∈ N$
$S 1 ( n , M ) = ∥ ∑ k = M + 1 ∞ p k v n ∗ H k ( F n + ε n v n , F n ) ∥ ϕ λ ′ ≤ 2 λ ′ ∥ v n ∥ ϕ λ ′ ∑ k = M + 1 ∞ p k k 1 + 2 λ ′ ( 2 λ ′ − 1 ∨ 1 ) 2 + ( k − 1 ) λ ′ ∨ 1 C 2 .$
Since $λ ′ < λ$ and $∥ v n − v ∥ ϕ λ → 0$, we have $∥ v n ∥ ϕ λ ′ ≤ K 1$ for some finite constant $K 1 > 0$ and all $n ∈ N$. Hence, the right-hand side of Equation (17) can be made arbitrarily small by choosing M large enough. That is, $S 1 ( n , M )$ can be made arbitrarily small uniformly in $n ∈ N$ by choosing M large enough.
Furthermore, it is demonstrated in the proof of Proposition 4.1 of Pitts (1994) that $S 3 ( M )$ can be made arbitrarily small by choosing M large enough.
Next, applying again Part (ii) of Lemma 1, we obtain
$S 2 ( n , M ) = ∥ ∑ k = 1 M p k ( v n − v ) ∗ H k ( F n + ε n v n , F n ) ∥ ϕ λ ′ ≤ 2 λ ′ ∑ k = 1 M p k k ∥ v n − v ∥ ϕ λ ′ 1 + 2 λ ′ ( 2 λ ′ − 1 ∨ 1 ) 2 + ( k − 1 ) λ ′ ∨ 1 C 2 .$
Using $∥ v n − v ∥ ϕ λ ′ ≤ ∥ v n − v ∥ ϕ λ → 0$, this term tends to zero as $n → ∞$ for a given M.
It remains to consider the summand
$S 4 ( n , M ) = ∥ ∑ k = 1 M p k v ∗ H k ( F n + ε n v n , F n ) − k p k v ∗ F ∗ ( k − 1 ) ∥ ϕ λ ′ = ∥ ∑ k = 1 M p k ∑ ℓ = 0 k − 1 v ∗ ( F n + ε n v n ) ∗ ( k − 1 − ℓ ) ∗ F n ∗ ℓ − v ∗ F ∗ ( k − 1 ) ∥ ϕ λ ′ .$
We show that for M fixed this term can be made arbitrarily small by letting $n → ∞$. This would follow if for every given $k ∈ { 1 , … , M }$ and $ℓ ∈ { 0 , … , k − 1 }$ the expression
$∥ v ∗ ( F n + ε n v n ) ∗ ( k − 1 − ℓ ) ∗ F n ∗ ℓ − v ∗ F ∗ ( k − 1 ) ∥ ϕ λ ′$
could be made arbitrarily small by letting $n → ∞$. For every such k and we can find a linear combination of indicator functions of the form $𝟙 [ a , b )$, $− ∞ < a < b < ∞$, which we denote by $v ˜$, such that
$∥ v ∗ ( F n + ε n v n ) ∗ ( k − 1 − ℓ ) ∗ F n ∗ ℓ − v ∗ F ∗ ( k − 1 ) ∥ ϕ λ ′ ≤ ∥ v ∗ ( F n + ε n v n ) ∗ ( k − 1 − ℓ ) ∗ F n ∗ ℓ − v ˜ ∗ ( F n + ε n v n ) ∗ ( k − 1 − ℓ ) ∗ F n ∗ ℓ ∥ ϕ λ ′ + ∥ v ˜ ∗ ( F n + ε n v n ) ∗ ( k − 1 − ℓ ) ∗ F n ∗ ℓ − v ˜ ∗ F ∗ ( k − 1 ) ∥ ϕ λ ′ + ∥ v ˜ ∗ F ∗ ( k − 1 ) − v ∗ F ∗ ( k − 1 ) ∥ ϕ λ ′ ≤ 2 λ ′ ∥ v ˜ − v ∥ ϕ λ ′ ∥ 𝟙 [ 0 , ∞ ) − ( F n + ε n v n ) ∗ ( k − 1 − ℓ ) ∗ F n ∗ ℓ ∥ ϕ λ ′ + 1 + c ( λ ′ , v ˜ ) ∥ ( F n + ε n v n ) ∗ ( k − 1 − ℓ ) ∗ F n ∗ ℓ − F ∗ ( k − 1 ) ∥ ϕ λ ′ + 2 λ ′ ∥ v ˜ − v ∥ ϕ λ ′ ∥ 𝟙 [ 0 , ∞ ) − F ∗ ( k − 1 ) ∥ ϕ λ ′ + 1$
for some suitable finite constant $c ( λ ′ , v ˜ ) > 0$ depending only on $λ ′$ and $v ˜$. The first inequality in Equation (18) is obvious (and holds for any $v ˜ ∈ D ϕ λ ′$). The second inequality in Equation (18) is obtained by applying Lemma 2.3 of Pitts (1994) to the first summand (noting that $∥ ( F n + ε n v n ) ∗ ( k − 1 − ℓ ) ∗ F n ∗ ℓ ∥ ∞ = 1$; recall $F n + ε n v n ∈ F$), by applying Lemma 4.3 of Pitts (1994) to the second summand (which requires that $v ˜$ is as described above), and by applying Lemma 2.3 of Pitts (1994) to the third summand.
We now consider the three summands on the right-hand side of Equation (18) separately. We start with the third term. Since $v ∈ D ϕ λ$, Lemma 4.2 of Pitts (1994) ensures that we may assume that $v ˜$ is chosen such that $∥ v ˜ − v ∥ ϕ λ ′$ is arbitrarily small. Hence, for fixed M the third summand in Equation (18) can be made arbitrarily small.
We next consider the the second summand in Equation (18). Obviously,
$∥ ( F n + ε n v n ) ∗ ( k − 1 − ℓ ) ∗ F n ∗ ℓ − F ∗ ( k − 1 ) ∥ ϕ λ ′ = ∥ ( F n + ε n v n ) ∗ ( k − 1 − ℓ ) ∗ F n ∗ ℓ − F n ∗ ( k − 1 ) + F n ∗ ( k − 1 ) − F ∗ ( k − 1 ) ∥ ϕ λ ′ ≤ ∥ ( F n + ε n v n ) ∗ ( k − 1 − ℓ ) − F n ∗ ( k − 1 − ℓ ) ∗ F n ∗ ℓ ∥ ϕ λ ′ + ∥ F n ∗ ( k − 1 ) − F ∗ ( k − 1 ) ∥ ϕ λ ′ .$
We start by considering the first summand in Equation (19). In view of Equation (16), it can be written as
$∥ ( F n + ε n v n ) ∗ ( k − 1 − ℓ ) − F n ∗ ( k − 1 − ℓ ) ∗ F n ∗ ℓ ∥ ϕ λ ′ = ∥ ( F n + ε n v n − F n ) ∗ H k − 1 − ℓ ( F n + ε n v n , F n ) ∗ F n ∗ ℓ ∥ ϕ λ ′ = ∥ ε n v n ∗ H k − 1 − ℓ ( F n + ε n v n , F n ) ∗ F n ∗ ℓ ∥ ϕ λ ′ .$
Applying Lemma 2.3 of Pitts (1994) with $f = ε n v n ∗ H k − ℓ − 1 ( F n + ε n v n , F n )$ and $H = F n ∗ ℓ$ we obtain
$∥ ε n v n ∗ H k − 1 − ℓ ( F n + ε n v n , F n ) ∗ F n ∗ ℓ ∥ ϕ λ ′ ≤ 2 λ ′ ∥ ε n v n ∗ H k − ℓ − 1 ( F n + ε n v n , F n ) ∥ ϕ λ ′ ∥ 𝟙 [ 0 , ∞ ) ∥ F n ∗ ℓ ∥ ∞ − F n ∗ ℓ ∥ ϕ λ ′ + ∥ F n ∗ ℓ ∥ ∞ = 2 λ ′ ∥ ε n v n ∗ H k − ℓ − 1 ( F n + ε n v n , F n ) ∥ ϕ λ ′ ∥ 𝟙 [ 0 , ∞ ) − F n ∗ ℓ ∥ ϕ λ ′ + 1 ≤ 2 λ ′ ∥ ε n v n ∗ H k − ℓ − 1 ( F n + ε n v n , F n ) ∥ ϕ λ ′ ( 2 λ ′ − 1 ∨ 1 ) 1 + ℓ λ ′ ∨ 1 C 1 + 1 ,$
where we applied Part (i) of Lemma 1 to $∥ 𝟙 [ 0 , ∞ ) − F n ∗ ℓ ∥ ϕ λ ′$ to obtain the last inequality. Hence, for the left-hand side of Equation (20) to go to zero as $n → ∞$ it suffices to show that $∥ ( ε n v n ∗ H k − ℓ − 1 ( F n + ε n v n , F n ) ) ∥ ϕ λ ′ → 0$ as $n → ∞$. The latter follows from
$∥ ε n v n ∗ H k − ℓ − 1 ( F n + ε n v n , F n ) ∥ ϕ λ ′ ≤ 2 λ ′ ( k − ℓ − 1 ) ε n ∥ v n ∥ ϕ λ ′ 1 + 2 λ ′ ( 2 λ ′ − 1 ∨ 1 ) 2 + ( ( k − ℓ − 2 ) ) λ ′ ∨ 1 C 2 ,$
where we applied Part (ii) of Lemma 1 with $v = ε n v n$ to all summands in $H k − ℓ − 1 ( F n + ε n v n , F n )$. For every k and $ℓ ∈ { 0 , … , k − 1 }$ this expression goes indeed to zero as $n → ∞$, because, as mentioned before, $∥ v n ∥ ϕ λ ′$ is uniformly bounded in $n ∈ N$, and we have $ε n → 0$. Next, we consider the second summand in Equation (19). Applying Equation (16) to $F n ∗ ( k − 1 )$ and $F ∗ ( k − 1 )$ and subsequently Part (ii) of Lemma 1 to the summands in $H k − 1 ( F n , F )$, we have
$∥ F n ∗ ( k − 1 ) − F ∗ ( k − 1 ) ∥ ϕ λ ′ ≤ 2 λ ′ ( k − 1 ) ∥ F n − F ∥ ϕ λ ′ 1 + 2 λ ′ ( 2 λ ′ − 1 ∨ 1 ) ( 2 + ( ( k − 2 ) ) λ ′ ∨ 1 C 2 ) .$
Clearly for every k this term goes to zero 0 as $n → ∞$, because $∥ F n − F ∥ ϕ λ ′ ≤ ∥ F n − F ∥ ϕ λ → 0$ as $n → ∞$ by assumption. This together with the fact that Equation (20) goes to zero 0 as $n → ∞$ shows that Equation (19) goes to zero in $∥ · ∥ ϕ λ ′$ as $n → ∞$. Therefore, the second summand in Equation (18) goes to zero as $n → ∞$.
It remains to consider the first term in Equation (18). We find
$2 λ ′ ∥ v ˜ − v ∥ ϕ λ ∥ 𝟙 [ 0 , ∞ ) − ( F n + ε n v n ) ∗ ( k − 1 − ℓ ) ∗ F n ∗ ℓ ∥ ϕ λ ′ + 1 ≤ 2 λ ′ ∥ v ˜ − v ∥ ϕ λ ′ ∥ 𝟙 [ 0 , ∞ ) − ( F n + ε n v n ) ∗ ( k − 1 − ℓ ) ∗ F n ∗ ℓ ∥ ϕ λ ′ + 1 ≤ 2 λ ′ ∥ v ˜ − v ∥ ϕ λ ′ ∥ 𝟙 [ 0 , ∞ ) − F ∗ ( k − 1 ) + F ∗ ( k − 1 ) − ( F n + ε n v n ) ∗ ( k − 1 − ℓ ) ∗ F n ∗ ℓ ∥ ϕ λ ′ + 1 ≤ 2 λ ′ ∥ v ˜ − v ∥ ϕ λ ′ ∥ 𝟙 [ 0 , ∞ ) − F ∗ ( k − 1 ) ∥ ϕ λ ′ + ∥ F ∗ ( k − 1 ) − ( F n + ε n v n ) ∗ ( k − 1 − ℓ ) ∗ F n ∗ ℓ ∥ ϕ λ ′ + 1 ≤ 2 λ ′ ∥ v ˜ − v ∥ ϕ λ ′ ( 2 λ ′ − 1 ∨ 1 ) 1 + k λ ∨ 1 ∫ | x | λ ′ d F ( x ) + 2 λ ′ ∥ v ˜ − v ∥ ϕ λ ′ ∥ F ∗ ( k − 1 ) − ( F n + ε n v n ) ∗ ( k − 1 − ℓ ) ∗ F n ∗ ℓ ∥ ϕ λ ′ + 1 ,$
where for the last inequality we used Formula (2.4) of Pitts (1994). In the following, Equation (19) we showed that $∥ F ∗ ( k − 1 ) − ( F n + ε n v n ) ∗ ( k − 1 − ℓ ) ∗ F n ∗ ℓ ∥ ϕ λ ′$ goes to zero as $n → ∞$ for every k and $ℓ ∈ { 0 , … , k − 1 }$. Hence, for every such k and , it is uniformly bounded in $n ∈ N$. Therefore, we can make Equation (22) arbitrarily small by making $∥ v ˜ − v ∥ ϕ λ ′$ small which, as mentioned above, is possible according to Lemma 4.2 of Pitts (1994). This finishes the proof. ☐

#### 3.3. Composition of Average Value at Risk Functional and Compound Distribution Functional

Here, we consider the composition of the Average Value at Risk functional $R α$ defined in Equation (1) and the compound distribution functional $C p$ introduced in Section 2.1. As a consequence of Propositions 1 and 2, we obtain the following Corollary 3. Note that, for any $λ > 1$, Lemma 2.2 in Pitts (1994) yields $C p ( F ϕ λ ) ⊆ F 1$ so that the composition $R α ∘ C p$ is well defined on $F ϕ λ$.
Corollary 3.
Let $λ > 1$ and assume $∑ k = 1 ∞ p k k 1 + λ < ∞$. Let $F ∈ F ϕ λ$, and assume that $C p ( F )$ takes the value α only once. Then, the map $T α , p : = R α ∘ C p : F ϕ λ ( ⊆ D ) → R$ is uniformly quasi-Hadamard differentiable at F tangentially to $D ϕ λ 〈 D ϕ λ 〉$, and the uniform quasi-Hadamard derivative $T ˙ α , p ; F : D ϕ λ → R$ is given by $T ˙ α , p ; F = R ˙ α ; C p ( F ) ∘ C ˙ p ; F$, i.e.,
with $g α ′$ and $v ∗ H p , F$ as in Proposition 1 and 2, respectively.
Proof.
We intend to apply Lemma A1 to $H = C p : F ϕ λ → F 1$ and $H ˜ = R α : F 1 → R$. To verify that the assumptions of the lemma are fulfilled, we first recall from the comment directly before Corollary 3 that $C p ( F ϕ λ ) ⊆ F 1$. It remains to show that the Assumptions (a)–(c) of Lemma A1 are fulfilled. According to Proposition 2 we have that for every $λ ′ ∈ ( 1 , λ )$ the functional $C p$ is uniformly quasi-Hadamard differentiable at F tangentially to $D ϕ λ 〈 D ϕ λ 〉$ with trace $D ϕ λ ′$, which is the first part of Assumption (b). The second part of Assumption (b) means $C ˙ p , F ( D ϕ λ ) ⊆ D ϕ λ ′$ and follows from
$∥ C ˙ p ; F ( v ) ∥ ϕ λ ′ = ∥ v ∗ ∑ k = 1 ∞ p k k F ∗ ( k − 1 ) ∥ ϕ λ ′ ≤ 2 λ ′ ∥ v ∥ ϕ λ ′ ∑ k = 1 ∞ p k k 1 + ( 2 λ ′ − 1 ∨ 1 ) 1 + k λ ′ ∨ 1 ∫ | x | λ ′ d F ( x )$
(for which we applied Lemma 2.3 and Inequality (2.4) in Pitts (1994)), the convergence of the latter series (which holds by assumption), and $∥ v ∥ ϕ λ ′ ≤ ∥ v ∥ ϕ λ < ∞$. Further, it follows from Proposition 1 that the map $R α$ is uniformly quasi-Hadamard differentiable tangentially to $D ϕ λ ′ 〈 D ϕ λ ′ 〉$ at every distribution function of $F ϕ λ ′$ that takes the value $1 − α$ only once. This is Assumption (c) of Lemma A1.
It remains to show that Assumption (a) of Lemma A1 also holds true. In the present setting, Assumption (a) means that for every sequence $( F n ) ⊆ F ϕ λ$ with $∥ F n − F ∥ ϕ λ → 0$ we have $C p ( F n ) → C p ( F )$ pointwise. We show that we even have $∥ C p ( F n ) − C p ( F ) ∥ ϕ λ ′ → 0$. Thus, let $( F n ) ⊆ F ϕ λ$. Then,
$∥ C p ( F n ) − C p ( F ) ∥ ϕ λ ′ = ∥ ∑ k = 1 ∞ p k ( F n ∗ k − F ∗ k ) ∥ ϕ λ ′ = ∥ ( F n − F ) ∗ ∑ k = 1 ∞ p k H k ( F n , F ) ∥ ϕ λ ′ ≤ 2 λ ′ ∥ F n − F ∥ ϕ λ ′ ∑ k = 1 ∞ p k k 1 + 2 λ ′ ( 2 λ ′ − 1 ∨ 1 ) 2 + ( k − 1 ) λ ′ ∨ 1 C 2 ,$
where we used Equation (16) for the second “=” and applied Part (ii) of Lemma 1 to the summands of $H k$ to obtain the latter inequality. Since the series converges, we obtain $∥ C p ( F n ) − C p ( F ) ∥ ϕ λ ′ → 0$ when assuming $∥ F n − F ∥ ϕ λ → 0$. ☐
As an immediate consequence of Corollary A4, Examples A1 and A2, and Corollary 3, we obtain the following corollary.
Corollary 4.
Let F, $F ^ n$, $F ^ n ∗$, $C ^ n$, and $B F$ be as in Example A1 (S1. or S2.) or as in Example A2, respectively, and assume that the assumptions discussed in Example A1 or in Example A2 respectively are fulfilled for $ϕ = ϕ λ$ for some $λ > 1$ (in particular $F ∈ F 1$). Moreover, assume $∑ k = 1 ∞ p k k 1 + λ < ∞$ and that $C p ( F )$ takes the value α only once. Then,
and

## 4. Conclusions

In this paper, we consider the sub-additive risk measure Average Value at Risk and presented in Section 2.1 and Section 2.2 results on almost sure bootstrap consistency for the corresponding empirical plug-in estimator based on i.i.d. or strictly stationary, geometrically $β$-mixing observations. Our results supplement those by Beutner and Zähle (2016) on bootstrap consistency in probability and those by Sun and Cheng (2018) on bootstrap consistency in probability for the Tail Conditional Expectation (which is not sub-additive). In Section 2.1, we also look at the case where one is interested in Average Value of Risk in the collective risk model. Note that one might interpret the collective risk model as a pooling of independent risks. In the context of Solvency II, pooling of risks has received increased attention (see, for example, Bølviken and Guillen 2017). However, one should keep in mind that our results of Section 2.1 can typically not be applied in the Solvency II context. In Solvency II applications risks are usually dependent, whereas in the collective risk model the different risks (claims) are assumed to be independent.

## Author Contributions

Both authors contributed equllay to all sections of the article.

## Funding

This research received no external funding.

## Conflicts of Interest

The authors declare no conflict of interest.

## Appendix A. Convergence in Distribution °

Let $( E , d )$ be a metric space and $B ∘$ be the $σ$-algebra on $E$ generated by the open balls $B r ( x ) : = { y ∈ E : d ( x , y ) < r }$, $x ∈ E$, $r > 0$. We refer to $B ∘$ as open-ball σ-algebra. If $( E , d )$ is separable, then $B ∘$ coincides with the Borel $σ$-algebra $B$. If $( E , d )$ is not separable, then $B ∘$ might be strictly smaller than $B$ and thus a continuous real-valued function on $E$ is not necessarily $( B ∘ , B ( R ) )$-measurable. Let $C b ∘$ be the set of all bounded, continuous and $( B ∘ , B ( R ) )$-measurable real-valued functions on $E$, and $M 1 ∘$ be the set of all probability measures on $( E , B ∘ )$.
Let $X n$ be an $( E , B ∘ )$-valued random variable on some probability space $( Ω n , F n , P n )$ for every $n ∈ N 0$. Then, referring to Billingsley (1999, sct. 1.6), the sequence $( X n ) = ( X n ) n ∈ N$ is said to converge in distribution$∘$ to $X 0$ if
In this case, we write $X n ⇝ ∘ X 0$. This is the same as saying that the sequence $( P n ∘ X n − 1 )$ converges to $P 0 ∘ X 0 − 1$ in the weak$∘$ topology on $M 1 ∘$; for details see Appendix A of Beutner and Zähle (2016). It is worth mentioning that two probability measures $μ , ν ∈ M 1 ∘$ coincide if $μ [ E 0 ] = ν [ E 0 ] = 1$ for some separable $E 0 ∈ B ∘$ and $∫ f d μ = ∫ f d ν$ for all uniformly continuous $f ∈ C b ∘$ (see, for instance, (Billingsley 1999, Theorem 6.2)).
In Appendices A–C in Beutner and Zähle (2016), several properties of convergence in distribution$∘$ (and weak$∘$ convergence) have been discussed. The following two subsections complement this discussion.

#### Appendix A.1. Slutsky-Type Results for the Open-Ball σ-Algebra

For a sequence $( X n )$ of $( E , B ∘ )$-valued random variables that are all defined on the same probability space $( Ω , F , P )$, the sequence $( X n )$ is said to converge in probability$∘$ to $X 0$ if the mappings $ω ↦ d ( X n ( ω ) , X 0 ( ω ) )$, $n ∈ N$, are $( F , B ( R + ) )$-measurable and satisfy
In this case, we write $X n → p , ∘ X 0$. The superscript $∘$ points to the fact that measurability of the mapping $ω ↦ d ( X n ( ω ) , X 0 ( ω ) )$ is a requirement of the definition (and not automatically valid). Note, however, that in the specific situation where $X 0 ≡ x 0$ for some $x 0 ∈ E$, measurability of the mapping $ω ↦ d ( X n ( ω ) , X 0 ( ω ) )$ does hold (see Lemma B.3 in Beutner and Zähle (2016)). In addition, note that the measurability always hold when $( E , d )$ is separable; in this case, we also write $→ p$ instead of $→ p , ∘$.
Theorem A1.
Let $( X n )$ and $( Y n )$ be two sequences of $( E , B ∘ )$-valued random variables on a common probability space $( Ω , F , P )$, and assume that the mapping $ω ↦ d ( X n ( ω ) , Y n ( ω ) )$ is $( F , B ( R + ) )$-measurable for every $n ∈ N$. Let $X 0$ be an $( E , B ∘ )$-valued random variable on some probability space $( Ω 0 , F 0 , P 0 )$ with $P 0 [ X 0 ∈ E 0 ] = 1$ for some separable $E 0 ∈ B ∘$. Then, $X n ⇝ ∘ X 0$ and $d ( X n , Y n ) → p 0$ together imply $Y n ⇝ ∘ X 0$.
Proof.
In view of $X n ⇝ ∘ X$, we obtain for every fixed $f ∈ BL 1 ∘$
$lim sup n → ∞ | ∫ f d P Y n − ∫ f d P X 0 | ≤ lim sup n → ∞ | ∫ f d P Y n − ∫ f d P X n | + lim sup n → ∞ | ∫ f d P X n − ∫ f d P X 0 | ≤ lim sup n → ∞ ∫ | f ( Y n ) − f ( X n ) | d P .$
Since f lies in $BL 1 ∘$ and we assumed $d ( X n , Y n ) → p 0$, we also have
$lim sup n → ∞ ∫ | f ( Y n ) − f ( X n ) | d P ≤ lim sup n → ∞ ∫ | f ( Y n ) − f ( X n ) | 𝟙 { d ( X n , Y n ) ≥ ε } d P + 2 ε ≤ 2 lim sup n → ∞ P [ d ( X n , Y n ) ≥ ε ] + 2 ε$
for every $ε > 0$. Thus, $lim sup n → ∞ ∫ | f ( Y n ) − f ( X n ) | d P = 0$ which together with the Portmanteau theorem (in the form of (Beutner and Zähle 2016, Theorem A.4)) implies the claim. ☐
Set $E ¯ : = E × E$ and let $B ¯ ∘$ be the $σ$-algebra on $E ¯$ generated by the open balls with respect to the metric
$d ¯ ( ( x 1 , x 2 ) , ( y 1 , y 2 ) ) : = max { d ( x 1 , y 1 ) ; d ( x 2 , y 2 ) } .$
Recall that $B ¯ ∘ ⊆ B ∘ ⊗ B ∘$, where the inclusion may be strict.
Corollary A1.
Let $( X n )$ and $( Y n )$ be two sequences of $( E , B ∘ )$-valued random variables on a common probability space $( Ω , F , P )$. Let $X 0$ be an $( E , B ∘ )$-valued random variable on some probability space $( Ω 0 , F 0 , P 0 )$ with $P 0 [ X 0 ∈ E 0 ] = 1$ for some separable $E 0 ∈ B ∘$. Let $y 0 ∈ E 0$. Let $( E ˜ , d ˜ )$ be a metric space equipped with the corresponding open-ball σ-algebra $B ˜ ∘$. Then, $X n ⇝ ∘ X 0$ and $Y n → p , ∘ y 0$ together imply:
(i)
$( X n , Y n ) ⇝ ∘ ( X 0 , y 0 )$.
(ii)
$h ( X n , Y n ) ⇝ ∘ h ( X 0 , y 0 )$ for every continuous and $( B ¯ ∘ , B ˜ ∘ )$-measurable $h : E ¯ → E ˜$.
Proof.
Assertion (ii) is an immediate consequence of Assertion (i) and the Continuous Mapping theorem in the form of (Billingsley 1999, Theorem 6.4); take into account that $( X 0 , y 0 )$ takes values only in $E ¯ 0 : = E 0 × E 0$ and that $E 0 × E 0$ is separable with respect to $d ¯$. Thus, it suffices to show Assertion (i). First note that we have
$( X n , y 0 ) ⇝ ∘ ( X 0 , y 0 ) .$
Indeed, for every $f ∈ C ¯ b ∘$ (with $C ¯ b ∘$ the set of all bounded, continuous and $( B ¯ ∘ , B ( R ) )$-measurable real-valued functions on $E ¯$) we have $lim n → ∞ ∫ f ( X n , y 0 ) d P = ∫ f ( X 0 , y 0 ) d P 0$ by the assumption $X n ⇝ ∘ X 0$ and the fact that the mapping $x ↦ f ( x , y 0 )$ lies in $C b ∘$ (the latter was shown in the proof of Theorem 3.1 in Beutner and Zähle (2016)).
Second, the distance $d ¯ ( ( X n , Y n ) , ( X n , y 0 ) ) = d ( Y n , y 0 )$ is $( F , B ( R + ) )$-measurable for every $n ∈ N$, because $Y n$ is $( F , B ∘ )$-measurable and $x ↦ d ( x , y 0 )$ is $( B ∘ , B ( R ) )$-measurable (due to Lemma B.3 in Beutner and Zähle (2016)). Along with $Y n → p , ∘ y 0$, we obtain in particular that $d ¯ ( ( X n , Y n ) , ( X n , y 0 ) ) → p 0$. Together with Equation (A2) and Theorem A1 (applied to $X n ′ : = ( X n , y 0 )$, $X 0 ′ : = ( X 0 , y 0 )$, $Y n ′ : = ( X n , Y n )$), this implies $( X n , Y n ) ⇝ ∘ ( X 0 , y 0 )$; take into account again that $( X 0 , y 0 )$ takes values only in $E ¯ 0 : = E 0 × E 0$ and that $E 0 × E 0$ is separable with respect to $d ¯$. ☐
Corollary A2.
Let $( E , ∥ · ∥ E )$ be a normed vector space and d be the induced metric defined by $d ( x 1 , x 2 ) : = ∥ x 1 − x 2 ∥ E$. Let $( X n )$ and $( Y n )$ be two sequences of $( E , B ∘ )$-valued random variables on a common probability space $( Ω , F , P )$. Let $X 0$ be an $( E , B ∘ )$-valued random variable on some probability space $( Ω 0 , F 0 , P 0 )$ with $P 0 [ X 0 ∈ E 0 ] = 1$ for some separable $E 0 ∈ B ∘$. Let $y 0 ∈ E 0$. Assume that the map $h : E ¯ → E$ defined by $h ( x 1 , x 2 ) : = x 1 + x 2$ is $( B ¯ ∘ , B ∘ )$-measurable. Then, $X n ⇝ ∘ X 0$ and $Y n → p , ∘ y 0$ together imply $X n + Y n ⇝ ∘ X 0 + y 0$.
Proof.
The assertion is an immediate consequence of Corollary A1 and the fact that h is clearly continuous. ☐

#### Appendix A.2. Delta-Method and Chain Rule for Uniformly Quasi-Hadamard Differentiable Maps

Now, assume that $E$ is a subspace of a vector space $V$. Let $∥ · ∥ E$ be a norm on $E$ and assume that the metric d is induced by $∥ · ∥ E$. Let $V ˜$ be another vector space and $E ˜ ⊆ V ˜$ be any subspace. Let $∥ · ∥ E ˜$ be a norm on $E ˜$ and $B ˜ ∘$ be the corresponding open-ball $σ$-algebra on $E ˜$. Let $0 E ˜$ denote the null in $E ˜$. Moreover, let $E ˜ ¯ : = E ˜ × E ˜$ and $B ˜ ∘ ¯$ be the $σ$-algebra on $E ˜ ¯$ generated by the open balls with respect to the metric $d ˜ ¯ ( ( x ˜ 1 , x ˜ 2 ) , ( y ˜ 1 , y ˜ 2 ) ) : = max { ∥ x ˜ 1 − y ˜ 1 ∥ E ˜ ; ∥ x ˜ 2 − y ˜ 2 ∥ E ˜ }$.
Let $( Ω n , F n , P n )$ be a probability space and $T ^ n : Ω n → V$ be any map for every $n ∈ N$. Recall that $⇝ ∘$ and $→ p , ∘$ refer to convergence in distribution$∘$ and convergence in probability$∘$, respectively. Moreover, recall Definition A1 of quasi-Hadamard differentiability.
Theorem A2.
Let $H : V H → E ˜$ be a map defined on some $V H ⊆ V$. Let $E 0 ∈ B ∘$ be some $∥ · ∥ E$-separable subset of $E$. Let $( θ n ) ⊆ V H$ and define the singleton set $S : = { ( θ n ) }$. Let $( a n )$ be a sequence of positive real numbers tending to ∞, and consider the following conditions:
(a)
$T ^ n$ takes values only in $V H$.
(b)
$a n ( T ^ n − θ n )$ takes values only in $E$, is $( F n , B ∘ )$-measurable and satisfies
for some $( E , B ∘ )$-valued random variable ξ on some probability space $( Ω 0 , F 0 , P 0 )$ with $ξ ( Ω 0 ) ⊆ E 0$.
(c)
$a n ( H ( T ^ n ) − H ( θ n ) )$ takes values only in $E ˜$ and is $( F n , B ˜ ∘ )$-measurable.
(d)
The map H is uniformly quasi-Hadamard differentiable with respect to $S$ tangentially to $E 0 〈 E 〉$ with trace $E ˜$ and uniform quasi-Hadamard derivative $H ˙ S : E 0 → E ˜$.
(e)
$( Ω n , F n , P n ) = ( Ω , F , P )$ for all $n ∈ N$.
(f)
The uniform quasi-Hadamard derivative $H ˙ S$ can be extended to $E$ such that the extension $H ˙ S : E → E ˜$ is continuous at every point of $E 0$ and $( B ∘ , B ˜ ∘ )$-measurable.
(g)
The map $h : E ˜ ¯ → E ˜$ defined by $h ( x ˜ 1 , x ˜ 2 ) : = x ˜ 1 − x ˜ 2$ is $( B ˜ ∘ ¯ , B ˜ ∘ )$-measurable.
Then, the following two assertions hold:
(i)
If Conditions (a)–(d) hold true, then $H ˙ S ( ξ )$ is $( F 0 , B ˜ ∘ )$-measurable and
(ii)
If Conditions (a)–(g) hold true, then
Proof.
The proof is very similar to the proof of Theorem C.4 in Beutner and Zähle (2016).
(i): For every $n ∈ N$, let $E n : = { x n ∈ E : θ n + a n − 1 x n ∈ V H }$ and define the map $h n : E n → E ˜$ by
$h n ( x n ) : = H ( θ n + a n − 1 x n ) − H ( θ n ) a n − 1 .$
Moreover, define the map $h 0 : E 0 → E ˜$ by
$h 0 ( x ) : = H ˙ S ( x ) .$
Now, the claim would follow by the extended Continuous Mapping theorem in the form of Theorem C.1 in Beutner and Zähle (2016) applied to the functions $h n$, $n ∈ N 0$, and the random variables $ξ n : = a n ( T ^ n − θ n )$, $n ∈ N$, and $ξ 0 : = ξ$ if we can show that the assumptions of Theorem C.1 in Beutner and Zähle (2016) are satisfied. First, by Assumption (a) and the last part of Assumption (b), we have $ξ n ( Ω n ) ⊆ E n$ and $ξ 0 ( Ω 0 ) ⊆ E 0$. Second, by Assumption (c), we have that $h n ( ξ n ) = a n ( H ( T ^ n ) − H ( θ n ) )$ is $( F n , B ˜ ∘ )$-measurable. Third, the map $h 0$ is continuous by the definition of the quasi-Hadamard derivative. Thus, $h 0$ is $( B 0 ∘ , B ˜ ∘ )$-measurable, because the trace $σ$-algebra $B 0 ∘ : = B ∘ ∩ E 0$ coincides with the Borel $σ$-algebra on $E 0$ (recall that $E 0$ is separable). In particular, $H ˙ S ( ξ )$ is $( F 0 , B ˜ ∘ )$-measurable. Fourth, Condition (a) of Theorem C.1 in Beutner and Zähle (2016) holds by Assumption (b). Fifth, Condition (b) of Theorem C.1 in Beutner and Zähle (2016) is ensured by Assumption (d).
(ii): For every $n ∈ N$, let $E n$ and $h n$ be as above and define the map $h ¯ n : E n → E ˜ ¯$ by
$h ¯ n ( x n ) : = ( h n ( x n ) , H ˙ S ( x n ) ) .$
Moreover, define the map $h ¯ 0 : E 0 → E ˜ ¯$ by
$h ¯ 0 ( x ) : = ( h 0 ( x ) , H ˙ S ( x ) ) = ( H ˙ S ( x ) , H ˙ S ( x ) ) .$
We first show that
For Equation (A5), it suffices to show that the assumption of the extended Continuous Mapping theorem in the form of Theorem C.1 in Beutner and Zähle (2016) applied to the functions $h ¯ n$ and $ξ n$ (as defined above) are satisfied. The claim then follows by Theorem C.1 in Beutner and Zähle (2016). First, we have already observed that $ξ n ( Ω n ) ⊆ E n$ and $ξ 0 ( Ω 0 ) ⊆ E 0$. Second, we have seen in the proof of Part (i) that $h n ( ξ n )$ is $( F n , B ˜ ∘ )$-measurable, $n ∈ N$. By Assumption (f), the extended map $H ˙ S : E → E ˜$ is $( B ∘ , B ˜ ∘ )$-measurable, which implies that $H ˙ S ( ξ n )$ is $( F n , B ˜ ∘ )$-measurable. Thus, $h ¯ n ( ξ n ) = ( h n ( ξ n ) , H ˙ S ( ξ n ) )$ is $( F n , B ˜ ∘ ⊗ B ˜ ∘ )$-measurable (to see this note that, in view of $B ˜ ∘ ⊗ B ˜ ∘ = σ ( π 1 , π 2 )$ for the coordinate projections $π 1 , π 2$ on $E ˜ ¯ = E ˜ × E ˜$, Theorem 7.4 of Bauer (2001) shows that the map $( h n ( ξ n ) , H ˙ S ( ξ n ) )$ is $( F n , B ˜ ∘ ⊗ B ˜ ∘ )$-measurable if and only if the maps $h n ( ξ n ) = π 1 ∘ ( h n ( ξ n ) , H ˙ S ( ξ n ) )$ and $H ˙ S ( ξ n ) = π 2 ∘ ( h n ( ξ n ) , H ˙ S ( ξ n ) )$ are $( F n , B ˜ ∘ )$-measurable). In particular, the map $h ¯ n ( ξ n ) = ( h n ( ξ n ) , H ˙ S ( ξ n ) )$ is $( F n , B ˜ ∘ ¯ )$-measurable, $n ∈ N$. Third, we have seen in the proof of Part (i) that the map $h 0 = H ˙ S$ is $( B 0 ∘ , B ˜ ∘ )$-measurable. Thus, the map $h ¯ 0$ is $( B 0 ∘ , B ˜ ∘ ⊗ B ˜ ∘ )$-measurable (one can argue as above) and in particular $( B 0 ∘ , B ˜ ∘ ¯ )$-measurable. Fourth, Condition (a) of Theorem C.1 in Beutner and Zähle (2016) holds by Assumption (b). Fifth, Condition (b) of Theorem C.1 in Beutner and Zähle (2016) is ensured by Assumption (d) and the continuity of the extended map $H ˙ S$ at every point of $E 0$ (recall Assumption (f)). Hence, Equation (A5) holds.
By Assumption (g) and the ordinary Continuous Mapping theorem (see (Billingsley 1999, Theorem 6.4)) applied to Equation (A5) and the map $h : E ˜ ¯ → E ˜$, $( x ˜ 1 , x ˜ 2 ) ↦ x ˜ 1 − x ˜ 2$, we now have
$h n ( a n ( T ^ n − θ n ) ) − H ˙ S ( a n ( T ^ n − θ n ) ) ⇝ ∘ H ˙ S ( ξ ) − H ˙ S ( ξ ) ,$
i.e.,
$a n H ( T ^ n ) − H ( θ n ) − H ˙ S a n ( T ^ n − θ n ) ⇝ ∘ 0 E ˜ .$
By Proposition B.4 in Beutner and Zähle (2016), we can conclude Equation (A4). ☐
The following lemma provides a chain rule for uniformly quasi-Hadamard differentiable maps (a similar chain rule with different $S$ was found in Varron (2015)). To formulate the chain rule, let $V ˜ ˜$ be a further vector space and $E ˜ ˜ ⊆ V ˜ ˜$ be a subspace equipped with a norm $∥ · ∥ E ˜ ˜$.
Lemma A1.
Let $H : V H → V ˜ H ˜$ and $H ˜ : V ˜ H ˜ → V ˜ ˜$ be maps defined on subsets $V H ⊆ V$ and $V ˜ H ˜ ⊆ V ˜$ such that $H ( V H ) ⊆ V ˜ H ˜$. Let $E 0$ and $E ˜ 0$ be subsets of $E$ and $E ˜$, respectively. Let $S$ and $S ˜$ be sets of sequences in $V H$ and $V ˜ H ˜$, respectively, and assume that the following three assertions hold.
(a)
For every $( θ n ) ∈ S$, we have $( H ( θ n ) ) ∈ S ˜$.
(b)
H is uniformly quasi-Hadamard differentiable with respect to $S$ tangentially to $E 0 〈 E 〉$ with trace $E ˜$ and uniform quasi-Hadamard derivative $H ˙ S : E 0 → E ˜$, and we have $H ˙ S ( E 0 ) ⊆ E ˜ 0$.
(c)
$H ˜$ is uniformly quasi-Hadamard differentiable with respect to $S ˜$ tangentially to $E ˜ 0 〈 E ˜ 〉$ with trace $E ˜ ˜$ and uniform quasi-Hadamard derivative $H ˜ ˙ S ˜ : E ˜ 0 → E ˜ ˜$.
Then, the map $T : = H ˜ ∘ H : V H → V ˜ ˜$ is uniformly quasi-Hadamard differentiable with respect to $S$ tangentially to $E 0 〈 E 〉$ with trace $E ˜ ˜$, and the uniform quasi-Hadamard derivative $T ˙ S$ is given by $T ˙ S : = H ˜ ˙ S ˜ ∘ H ˙ S$.
Proof.
Obviously, since $H ( V H ) ⊆ V ˜ H ˜$ and $H ˜$ is associated with trace $E ˜ ˜$, the map $H ˜ ∘ H$ can also be associated with trace $E ˜ ˜$.
Now, let $( ( θ n ) , x , ( x n ) , ( ε n ) )$ be a quadruple with $( θ n ) ∈ S$, $x ∈ E 0$, $( x n ) ⊆ E$ satisfying $∥ x n − x ∥ E → 0$ as well as $( θ n + ε n x n ) ⊆ V H$, and $( ε n ) ⊆ ( 0 , ∞ )$ satisfying $ε n → 0$. Then,
$∥ H ˜ ˙ S ˜ ( H ˙ S ( x ) ) − H ˜ ( H ( θ n + ε n x n ) ) − H ˜ ( H ( θ n ) ) ε n ∥ E ˜ ˜ = ∥ H ˜ ˙ S ˜ ( H ˙ S ( x ) ) − H ˜ H ( θ n ) + ε n H ( θ n + ε n x n ) − H ( θ n ) ε n − H ˜ ( H ( θ n ) ) ε n ∥ E ˜ ˜ .$
Note that by assumption, $H ( θ n ) ∈ V ˜ H ˜$ and in particular $( H ( θ n ) ) ∈ S ˜$. By the uniform quasi-Hadamard differentiability of H with respect to $S$ tangentially to $E 0 〈 E 〉$ with trace $E ˜$,
$lim n → ∞ ∥ H ( θ n + ε n x n ) − H ( θ n ) ε n − H ˙ S ( x ) ∥ E ˜ = 0 .$
Moreover, $( H ( θ n + ε n x n ) − H ( θ n ) ) / ε n ∈ E ˜$ and $H ˙ S ( x ) ∈ E ˜ 0$, because H is associated with trace $E ˜$ and $H ˙ S ( E 0 ) ⊆ E ˜ 0$. Hence, by the uniform quasi-Hadamard differentiability of $H ˜$ with respect to $S ˜$ tangentially to $E ˜ 0 〈 E ˜ 〉$, we obtain
$lim n → ∞ ∥ H ˜ ˙ S ˜ ( H ˙ S ( x ) ) − H ˜ H ( θ n ) + ε n H ( θ n + ε n x n ) − H ( θ n ) ε n − H ˜ ( H ( θ n ) ) ε n ∥ E ˜ ˜ = 0 .$
This completes the proof. ☐

## Appendix B. Delta-Method for the Bootstrap

The functional delta-method is a widely used technique to derive bootstrap consistency for a sequence of plug-in estimators with respect to a map H from bootstrap consistency of the underlying sequence of estimators. An essential limitation of the classical functional delta-method for proving bootstrap consistency in probability (or outer probability) is the condition of Hadamard differentiability on H (see Theorem 3.9.11 of van der Vaart Wellner (1996)). It is commonly acknowledged that Hadamard differentiability fails for many relevant maps H. Recently, it was demonstrated in Beutner and Zähle (2016) that a functional delta-method for the bootstrap in probability can also be proved for quasi-Hadamard differentiable maps H. Quasi-Hadamard differentiability is a weaker notion of “differentiability” than Hadamard differentiability and can be obtained for many relevant statistical functionals H (see, e.g., Beutner et al. 2012; Beutner and Zähle 2010, 2012; Krätschmer et al. 2013; Krätschmer and Zähle 2017). Using the classical functional delta-method to prove almost sure (or outer almost sure) bootstrap consistency for a sequence of plug-in estimators with respect to a map H from almost sure (or outer almost sure) bootstrap consistency of the underlying sequence of estimators requires uniform Hadamard differentiability on H (see Theorem 3.9.11 of van der Vaart Wellner (1996)). In this section, we introduce the notion of uniform quasi-Hadamard differentiability and demonstrate that one can even obtain a functional delta-method for the almost sure bootstrap and uniformly quasi-Hadamard differentiable maps H.
To explain the background and the contribution of this section more precisely, assume that we are given an estimator $T ^ n$ for a parameter $θ$ in a vector space, with n denoting the sample size, and that we are actually interested in the aspect $H ( θ )$ of $θ$. Here, H is any map taking values in a vector space. Then, $H ( T ^ n )$ is often a reasonable estimator for $H ( θ )$. One of the main objects in statistical inference is the distribution of the error $H ( T ^ n ) − H ( θ )$, because the error distribution can theoretically be used to derive confidence regions for $H ( θ )$. However, in applications, the exact specification of the error distribution is often hardly possible or even impossible. A widely used way out is to derive the asymptotic error distribution, i.e., the weak limit $μ$ of $law { a n ( H ( T ^ n ) − H ( θ ) ) }$ for suitable normalizing constants $a n$ tending to infinity, and to use $μ$ as an approximation for $μ n : = law { a n ( H ( T ^ n ) − H ( θ ) ) }$ for large n. Since $μ$ usually still depends on the unknown parameter $θ$, one should use the notation $μ θ$ instead of $μ$. In particular, one actually uses $μ T ^ n : = μ θ | θ = T ^ n$ as an approximation for $μ n$ for large n.
Not least because of the estimation of the parameter $θ$ of $μ θ$, the approximation of $μ n$ by $μ T ^ n$ is typically only moderate. An often more efficient alternative technique to approximate $μ n$ is the bootstrap. The bootstrap has been introduced by Efron (1979) and many variants of his method have been introduced since then. One may refer to Davison and Hinkley (1997); Efron (1994); Lahiri (2003); Shao and Tu (1995) for general accounts on this topic. The basic idea of the bootstrap is the following. Re-sampling the original sample according to a certain re-sampling mechanism (depending on the particular bootstrap method) one can sometimes construct a so-called bootstrap version $T ^ n ∗$ of $T ^ n$ for which the conditional law of $a n ( H ( T ^ n ∗ ) − H ( T ^ n ) )$ “given the sample” has the same weak limit $μ θ$ as the law of $a n ( H ( T ^ n ) − H ( θ ) )$ has. The latter is referred to as bootstrap consistency. Since $T ^ n ∗$ depends only on the sample and the re-sampling mechanism, one can at least numerically determine the conditional law of $a n ( H ( T ^ n ∗ ) − H ( T ^ n ) )$ “given the sample” by means of a Monte Carlo simulation based on $L ≫ n$ repetitions. The resulting law $μ L ∗$ can then be used as an approximation of $μ n$, at least for large n.
In applications, the roles of $θ$ and $T ^ n$ are often played by a distribution function F and the empirical distribution function $F ^ n$ of n random variables that are identically distributed according to F, respectively. Not least for this particular setting several results on bootstrap consistency for $T ^ n$ are known (see also Appendix B.2). The functional delta-method then ensures that bootstrap consistency also holds for $H ( T ^ n )$ when H is suitably differentiable at $θ$. Technically speaking, as indicated above, one has to distinguish between two types of bootstrap consistency. First bootstrap consistency in probability for $H ( T ^ n )$ can be associated with
where $ω$ represents the sample, $P n ( ω , · )$ denotes the conditional law of $a n ( H ( T ^ n ∗ ) − H ( T ^ n ) )$ given the sample $ω$, $d BL ∘$ is the bounded Lipschitz distance, and the superscript $out$ refers to outer probability. At this point, it is worth pointing out that we consider weak convergence (respectively, convergence in distribution) with respect to the open-ball $σ$-algebra, in symbols $⇒ ∘$ (respectively, $⇝ ∘$), as defined in (Billingsley 1999, sct. 6) (see also Dudley 1966, 1967; Pollard 1984; Shorack and Wellner 1986) and that by the Portmanteau theorem A.3 in Beutner and Zähle (2016) weak convergence $μ n ⇒ ∘ μ$ holds if and only if $d BL ∘ ( μ n , μ ) → 0$. Second bootstrap consistency almost surely for $H ( T ^ n )$ means that
$law a n H ( T ^ n ∗ ( ω , · ) ) − H ( T ^ n ( ω ) ) ⇒ ∘ μ θ P − a . e . ω .$
In Beutner and Zähle (2016), it has been shown that Equation (A6) follows from the respective analogue for $T ^ n$ when H is suitably quasi-Hadamard differentiable at $θ$. This extends Theorem 3.9.11 of van der Vaart Wellner (1996) which covers only Hadamard differentiable maps. In this section, we show that Equation (A7) follows from the respective analogue for $T ^ n$ when H is suitably uniformly quasi-Hadamard differentiable at $θ$; the notion of uniform quasi-Hadamard differentiable is introduced in Definition A1 below. This extends Theorem 3.9.13 of van der Vaart Wellner (1996) which covers only Hadamard differentiable maps.

#### Appendix B.1. Abstract Delta-Method for the Bootstrap

Theorem A4 provides an abstract delta-method for the almost sure bootstrap. It is based on the notion of uniform quasi-Hadamard differentiability which we introduce first. This sort of differentiability extends the notion of quasi-Hadamard differentiability as introduced in Beutner and Zähle (2010, 2016). The latter corresponds to the differentiability concept in (i) of Definition A1 ahead with $S$ and $E ˜$ as in (iii) and (v) of this definition. Let $V$ and $V ˜$ be vector spaces. Let $E ⊆ V$ and $E ˜ ⊆ V ˜$ be subspaces equipped with norms $∥ · ∥ E$ and $∥ · ∥ E ˜$, respectively. Let
$H : V H ⟶ V ˜$
be any map defined on some subset $V H ⊆ V$.
Definition A1.
Let $E 0$ be a subset of $E$, and $S$ be a set of sequences in $V H$.
(i) The map H is said to be uniformly quasi-Hadamard differentiable with respect to $S$ tangentially to $E 0 〈 E 〉$ with trace $E ˜$ if $H ( y 1 ) − H ( y 2 ) ∈ E ˜$ for all $y 1 , y 2 ∈ V H$ and there is some continuous map $H ˙ S : E 0 → E ˜$ such that
$lim n → ∞ ∥ H ˙ S ( x ) − H ( θ n + ε n x n ) − H ( θ n ) ε n ∥ E ˜ = 0$
holds for each quadruple $( ( θ n ) , x , ( x n ) , ( ε n ) )$, with $( θ n ) ∈ S$, $x ∈ E 0$, $( x n ) ⊆ E$ satisfying $∥ x n − x ∥ E → 0$ as well as $( θ n + ε n x n ) ⊆ V H$, and $( ε n ) ⊆ ( 0 , ∞ )$ satisfying $ε n → 0$. In this case, the map $H ˙ S$ is called uniform quasi-Hadamard derivative of H with respect to $S$ tangentially to $E 0 〈 E 〉$.
(ii) If $S$ consists of all sequences $( θ n ) ⊆ V H$ with $θ n − θ ∈ E$, $n ∈ N$, and $∥ θ n − θ ∥ E → 0$ for some fixed $θ ∈ V H$, then we replace the phrase “ with respect to $S$” by “at θ” and “$H ˙ S$” by “$H ˙ θ$”.
(iii) If $S$ consists only of the constant sequence $θ n = θ$, $n ∈ N$, then we skip the phrase “uniformly” and replace the phrase “ with respect to $S$” by “at θ” and “$H ˙ S$” by “$H ˙ θ$”. In this case, we may also replace “$H ( y 1 ) − H ( y 2 ) ∈ E ˜$ for all $y 1 , y 2 ∈ V H$” by “$H ( y ) − H ( θ ) ∈ E ˜$ for all $y ∈ V H$”.
(iv) If $E = V$, then we skip the phrase “quasi-”.
(v) If $E ˜ = V ˜$, then we skip the phrase “with trace $E ˜$”.
The conventional notion of uniform Hadamard differentiability as used in Theorem 3.9.11 of van der Vaart Wellner (1996) corresponds to the differentiability concept in (i) with $S$ as in (ii), $E$ as in (iv), and $E ˜$ as in (v). Proposition 1 shows that it is beneficial to refrain from insisting on $E = V$ as in (iv). It was recently discussed in Belloni et al. (2017) that it can be also beneficial to refrain from insisting on the assumption of (ii). For $E = V$ (“non-quasi” case), uniform Hadamard differentiability in the sense of Definition B.1 in Belloni et al. (2017) corresponds to uniform Hadamard differentiability in the sense of our Definition A1 (Parts (i) and (iv)) when $S$ is chosen as the set of all sequences $( θ n )$ in a compact metric space $( K θ , d K )$ with $θ ∈ K θ ⊆ V H$ for which $d K ( θ n , θ ) → 0$. In Comment B.3 of Belloni et al. (2017), it is illustrated by means of the quantile functional that this notion of differentiability (subject to a suitable choice of $( K θ , d K )$) is strictly weaker than the notion of uniform Hadamard differentiability that was used in the classical delta-method for the almost sure bootstrap, Theorem 3.9.11 in van der Vaart Wellner (1996). Although this shows that the flexibility with respect to $S$ in our Definition A1 can be beneficial, it is somehow even more important that we allow for the “quasi” case.
Of course, the smaller the family $S$ the weaker the condition of uniform quasi-Hadamard differentiability with respect to $S$. On the other hand, if the set $S$ is too small, then Condition (e) in Theorem A4 ahead may fail. That is, for an application of the functional delta-method in the form of Theorem A4 the set $S$ should be large enough for Condition (e) to be fulfilled and small enough for being able to establish uniform quasi-Hadamard differentiability with respect to $S$ of the map H.
We now turn to the abstract delta-method. As mentioned in Section 1, convergence in distribution will always be considered for the open-ball $σ$-algebra. We use the terminology convergence in distribution$∘$ (symbolically $⇝ ∘$) for this sort of convergence; for details see Appendix A and Appendices A–C of Beutner and Zähle (2016). In a separable metric space the notion of convergence in distribution$∘$ boils down to the conventional notion of convergence in distribution for the Borel $σ$-algebra. In this case, we use the symbol ⇝ instead of $⇝ ∘$.
Let $( Ω , F , P )$ be a probability space, and $( T ^ n )$ be a sequence of maps
$T ^ n : Ω ⟶ V .$
Regard $ω ∈ Ω$ as a sample drawn from $P$, and $T ^ n ( ω )$ as a statistic derived from $ω$. Somewhat unconventionally, we do not (need to) require at this point that $T ^ n$ is measurable with respect to any $σ$-algebra on $V$. Let $( Ω ′ , F ′ , P ′ )$ be another probability space and set
$( Ω ¯ , F ¯ , P ¯ ) : = ( Ω × Ω ′ , F ⊗ F ′ , P ⊗ P ′ ) .$
The probability measure $P ′$ represents a random experiment that is run independently of the random sample mechanism $P$. In the sequel, $T ^ n$ will frequently be regarded as a map defined on the extension $Ω ¯$ of $Ω$. Let
$T ^ n ∗ : Ω ¯ ⟶ V$
be any map. Since $T ^ n ∗ ( ω , ω ′ )$ depends on both the original sample $ω$ and the outcome $ω ′$ of the additional independent random experiment, we may regard $T ^ n ∗$ as a bootstrapped version of $T ^ n$. Moreover, let
$C ^ n : Ω ⟶ V$
be any map. As with $T ^ n$, we often regard $C ^ n$ as a map defined on the extension $Ω ¯$ of $Ω$. We use $C ^ n$ together with a scaling sequence to get weak convergence results for $T ^ n ∗$. The role of $C ^ n$ is often played by $T ^ n$ itself (see Example A1), but sometimes also by a different map (see Example A2). Assume that $T ^ n$, $T ^ n ∗$, and $C ^ n$ take values only in $V H$.
Let $B ∘$ and $B ˜ ∘$ be the open-ball $σ$-algebras on $E$ and $E ˜$ with respect to the norms $∥ · ∥ E$ and $∥ · ∥ E ˜$, respectively. Note that $B ∘$ coincides with the Borel $σ$-algebra on $E$ when $( E , ∥ · ∥ E )$ is separable. The same is true for $B ˜ ∘$. Set $E ˜ ¯ : = E ˜ × E ˜$ and let $B ˜ ∘ ¯$ be the $σ$-algebra on $E ˜ ¯$ generated by the open balls with respect to the metric $d ˜ ¯ ( ( x ˜ 1 , x ˜ 2 ) , ( y ˜ 1 , y ˜ 2 ) ) : = max { ∥ x ˜ 1 − y ˜ 1 ∥ E ˜ ; ∥ x ˜ 2 − y ˜ 2 ∥ E ˜ }$. Recall that $B ˜ ∘ ¯ ⊆ B ˜ ∘ ⊗ B ˜ ∘$, because any $d ˜ ¯$-open ball in $E ˜ ¯$ is the product of two $∥ · ∥ E ˜$-open balls in $E ˜$.
Theorem A3 is a consequence of Theorem A2 in Appendix A.2 as we assume that $T ^ n$ takes values only in $V H$. The proof of the measurability statement of Theorem A3 is given in the proof of Theorem A4. Theorem A3 is stated here because, together with Theorem A4, it implies almost sure bootstrap consistency whenever the limit $ξ$ is the same in Theorem A3 and Theorem A4.
Theorem A3.
Let $( θ n )$ be a sequence in $V H$ and $S : = { ( θ n ) }$. Let $E 0 ⊆ E$ be a separable subspace and assume that $E 0 ∈ B ∘$. Let $( a n )$ be a sequence of positive real numbers with $a n → ∞$, and assume that the following assertions hold:
(a)
$a n ( T ^ n − θ n )$ takes values only in $E$, is $( F , B ∘ )$-measurable, and satisfies
for some $( E , B ∘ )$-valued random variable ξ on some probability space $( Ω 0 , F 0 , P 0 )$ with $ξ ( Ω 0 ) ⊆ E 0$.
(b)
$a n ( H ( T ^ n ) − H ( θ n ) )$ takes values only in $E ˜$ and is $( F , B ˜ ∘ )$-measurable.
(c)
H is uniformly quasi-Hadamard differentiable with respect to $S$ tangentially to $E 0 〈 E 〉$ with trace $E ˜$ and uniform quasi-Hadamard derivative $H ˙ S$.
Then, $H ˙ S ( ξ )$ is $( F 0 , B ˜ ∘ )$-measurable and
Theorem A4.
Let $S$ be any set of sequences in $V H$. Let $E 0 ⊆ E$ be a separable subspace and assume that $E 0 ∈ B ∘$. Let $( a n )$ be a sequence of positive real numbers with $a n → ∞$, and assume that the following assertions hold:
(a)
$a n ( T ^ n ∗ − C ^ n )$ takes values only in $E$, is $( F ¯ , B ∘ )$-measurable, and satisfies
for some $( E , B ∘ )$-valued random variable ξ on some probability space $( Ω 0 , F 0 , P 0 )$ with $ξ ( Ω 0 ) ⊆ E 0$.
(b)
$a n ( H ( T ^ n ∗ ) − H ( C ^ n ) )$ takes values only in $E ˜$ and is $( F ¯ , B ˜ ∘ )$-measurable.
(c)
H is uniformly quasi-Hadamard differentiable with respect to $S$ tangentially to $E 0 〈 E 〉$ with trace $E ˜$ and uniform quasi-Hadamard derivative $H ˙ S$.
(d)
The uniform quasi-Hadamard derivative $H ˙ S$ can be extended from $E 0$ to $E$ such that the extension $H ˙ S : E → E ˜$ is $( B ∘ , B ˜ ∘ )$-measurable and continuous at every point of $E 0$.
(e)
$( C ^ n ( ω ) ) ∈ S$ for $P$-a.e. ω.
(f)
The map $h : E ˜ ¯ → E ˜$ defined by $h ( x ˜ 1 , x ˜ 2 ) : = x ˜ 1 − x ˜ 2$ is $( B ˜ ∘ ¯ , B ˜ ∘ )$-measurable.
Then, $H ˙ S ( ξ )$ is $( F 0 , B ˜ ∘ )$-measurable and
Remark A1.
In Condition (a) of Theorem A4, it is assumed that $a n ( T ^ n ∗ − C ^ n )$ is $( F ¯ , B ∘ )$-measurable for $F ¯ : = F ⊗ F ′$. Thus, the mapping $ω ′ ↦ a n ( T ^ n ∗ ( ω , ω ′ ) − C ^ n ( ω ) )$ is $( F ′ , B ∘ )$-measurable for every fixed $ω ∈ Ω$. That is, $a n ( T ^ n ∗ ( ω , · ) − C ^ n ( ω ) )$ can be seen as an $( E , B ∘ )$-valued random variable on $( Ω ′ , F ′ , P ′ )$ for every fixed $ω ∈ Ω$, so that assertion (A10) makes sense. By the same line of reasoning one can regard $a n ( H ( T ^ n ∗ ( ω , · ) ) − H ( C ^ n ( ω ) ) )$ as an $( E ˜ , B ˜ ∘ )$-valued random variable on $( Ω ′ , F ′ , P ′ )$ for every fixed $ω ∈ Ω$, so that also assertion (A11) makes sense.
Remark A2.
Condition (c) in Theorem A3 (respectively, Theorem A4) assumes that the trace is given by $E ˜$, which implies that the first part of Condition (b) in Theorem A3 (respectively, Theorem A4) is automatically satisfied.
Remark A3.
Condition (f) of Theorem A4 is automatically fulfilled when $( E ˜ , ∥ · ∥ E ˜ )$ is separable. Indeed, in this case we have $B ˜ ∘ ¯ = B ˜ ∘ ⊗ B ˜ ∘$ so that every continuous map $h : E ˜ ¯ → E ˜$ (such as $h ( x ˜ 1 , x ˜ 2 ) : = x ˜ 1 − x ˜ 2$) is $( B ˜ ∘ ¯ , B ˜ ∘ )$-measurable.
Proof. Proof. Proof of Theorem A4.
First note that by the assumption imposed on $ξ$ (see Assumption (a)) and Assumption (c) the map $H ˙ S ( ξ )$ is $( F 0 , B ˜ ∘ )$-measurable. Next, note that
$a n H ( T ^ n ∗ ( ω , ω ′ ) ) − H ( C ^ n ( ω ) ) = a n H ( T ^ n ∗ ( ω , ω ′ ) ) − H ( C ^ n ( ω ) ) − H ˙ S a n ( T ^ n ∗ ( ω , ω ′ ) − C ^ n ( ω ) ) + H ˙ S a n ( T ^ n ∗ ( ω , ω ′ ) − C ^ n ( ω ) ) = : S 1 ( n , ω , ω ′ ) + S 2 ( n , ω , ω ′ ) .$
By Equation (A10) in Assumption (a) and the Continuous Mapping theorem in the form of (Billingsley 1999, Theorem 6.4) (along with $P 0 ∘ ξ − 1 [ E 0 ] = 1$ and the continuity of $H ˙ S$), we have that $S 2 ( n , ω , · ) ⇝ ∘ H ˙ S ( ξ )$ for $P$-a.e. $ω$. Moreover, for every fixed $ω$ we have that $ω ′ ↦ S 1 ( n , ω , ω ′ )$ is $( F ′ , B ˜ ∘ )$-measurable by Assumption (f), and for $P$-a.e. $ω$ we have
$a n H n ( T ^ n ∗ ( ω , · ) ) − H n ( C ^ n ( ω ) ) − H ˙ S a n ( T ^ n ∗ ( ω , · ) − C ^ n ( ω ) ) → p , ∘ 0 E ˜$
by Part (ii) of Theorem A2 (recall that $T ^ n ∗$ was assumed to take values only in $V H$), where $→ p , ∘$ refers to convergence in probability$∘$ (see Appendix A.1) and $T ^ n ∗ ( ω , · )$, $C ^ n ( ω )$, ${ ( C ^ n ( ω ) ) }$ play the roles of $T ^ n ( · )$, $θ n$, $S$, respectively. Hence, from Corollary A2, we get that Equation (A11) holds. ☐

#### Appendix B.2. Application to Plug-In Estimators of Statistical Functionals

Let $D$, $D ϕ$, $B ϕ ∘$ be as introduced at the beginning of Section 3. Let $C ϕ ⊆ D ϕ$ be a $∥ · ∥ ϕ$-separable subspace and assume $C ϕ ∈ B ϕ ∘$. Moreover, let $H : D ( H ) → V ˜$ be a map defined on a set $D ( H )$ of distribution functions of finite (not necessarily probability) Borel measures on $R$, where $V ˜$ is any vector space. In particular, $D ( H ) ⊆ D$. In the following, $D$, $( D ϕ , B ϕ ∘ , ∥ · ∥ ϕ )$, $C ϕ$, and $D ( H )$ play the roles of $V$, $( E , B ∘ , ∥ · ∥ E )$, $E 0$, and $V H$, respectively. As before, we let $( E ˜ , ∥ · ∥ E ˜ )$ be a normed subspace of $V ˜$ equipped with the corresponding open-ball $σ$-algebra $B ˜ ∘$.
Let $( Ω , F , P )$ be a probability space. Let $( F n ) ⊆ D ( H )$ be any sequence and $( X i )$ be a sequence of real-valued random variables on $( Ω , F , P )$. Moreover, let $F ^ n : Ω → D$ be the empirical distribution function of $X 1 , … , X n$, which will play the role of $T ^ n$. It is defined by
$F ^ n : = 1 n ∑ i = 1 n 𝟙 [ X i , ∞ ) .$
Assume that $F ^ n$ takes values only in $D ( H )$. Let $( Ω ′ , F ′ , P ′ )$ be another probability space and set $( Ω ¯ , F ¯ , P ¯ ) : = ( Ω × Ω ′ , F ⊗ F ′ , P ⊗ P ′ )$. Moreover, let $F ^ n ∗ : Ω ¯ → D$ be any map. Assume that $F ^ n ∗$ take values only in $D ( H )$. Furthermore, let $C ^ n : Ω ¯ → D$ be any map that takes values only in $D ( H )$. In the present setting Theorems A3 and A4 can be reformulated as follows, where we recall from Remark A3 that Condition (f) of Theorem A4 is automatically fulfilled when $( E ˜ , ∥ · ∥ E ˜ )$ is separable.
Corollary A3.
Let $( F n )$ be a sequence in $D ( H )$ and $S : = { ( F n ) }$. Let $( a n )$ be a sequence of positive real numbers with $a n → ∞$, and assume that the following assertions hold:
(a)
$a n ( F ^ n − F n )$ takes values only in $D ϕ$ and satisfies
for some $( D ϕ , B ϕ ∘ )$-valued random variable B on some probability space $( Ω 0 , F 0 , P 0 )$ with $B ( Ω 0 ) ⊆ C ϕ$.
(b)
$a n ( H ( F ^ n ) − H ( F n ) )$ takes values only in $E ˜$ and is $( F , B ˜ ∘ )$-measurable.
(c)
H is uniformly quasi-Hadamard differentiable with respect to $S$ tangentially to $C ϕ 〈 D ϕ 〉$ with trace $E ˜$ and uniform quasi-Hadamard derivative $H ˙ S$.
Then, $H ˙ S ( B )$ is $( F 0 , B ˜ ∘ )$-measurable and
Note that the measurability assumption in Condition (a) of Theorem A3 is automatically satisfied in the present setting (and is therefore omitted in Condition (a) of Corollary A3). Indeed, $a n ( F ^ n − F )$ is easily seen to be $( F , B ϕ ∘ )$-measurable, because $B ϕ ∘$ coincides with the trace $σ$-algebra of $D$.
Corollary A4.
Let $S$ be any set of sequences in $D ( H )$. Let $( a n )$ be a sequence of positive real numbers with $a n → ∞$, and assume that the following assertions hold:
(a)
$a n ( F ^ n ∗ − C ^ n )$ takes values only in $D ϕ$, is $( F ¯ , B ϕ ∘ )$-measurable, and
for some $( D ϕ , B ϕ ∘ )$-valued random variable B on some probability space $( Ω 0 , F 0 , P 0 )$ with $B ( Ω 0 ) ⊆ C ϕ$.
(b)
$a n ( H ( F ^ n ∗ ) − H ( C ^ n ) )$ takes values only in $E ˜$ and is $( F ¯ , B ˜ ∘ )$-measurable.
(c)
H is uniformly quasi-Hadamard differentiable with respect to $S$ tangentially to $C ϕ 〈 D ϕ 〉$ with trace $E ˜$ and uniform quasi-Hadamard derivative $H ˙ S$.
(d)
The uniform quasi-Hadamard derivative $H ˙ S$ can be extended from $C ϕ$ to $D ϕ$ such that the extension $H ˙ S : D ϕ → E ˜$ is $( B ϕ ∘ , B ˜ ∘ )$-measurable, and continuous at every point of $C ϕ$.
(e)
$( C ^ n ( ω ) ) ∈ S$ for $P$-a.e. ω.
(f)
The map $h : E ˜ ¯ → E ˜$ defined by $h ( x ˜ 1 , x ˜ 2 ) : = x ˜ 1 − x ˜ 2$ is $( B ˜ ∘ ¯ , B ˜ ∘ )$-measurable.
Then, $H ˙ S ( B )$ is $( F 0 , B ˜ ∘ )$-measurable and
The following examples illustrate $F ^ n ∗$ and $C ^ n$. In Example A1, we have $C ^ n = F ^ n$, and in Example A2 $C ^ n$ may differ from $F ^ n$. Examples for uniformly quasi-Hadamard differentiable functionals H can be found in Section 3. In the examples in Section 3.1 and Section 3.3 we have $V ˜ = E ˜ = R$, and in the Example in Section 3.2 we have $V ˜ = D$ and $E ˜ = D ϕ$ for some $ϕ$.
Example A1.
Let $( X i )$ be a sequence of i.i.d. real-valued random variables on $( Ω , F , P )$ with distribution function F satisfying $∫ ϕ 2 d F < ∞$, and $F ^ n$ be given by Equation (A12). Let $( W n i )$ be a triangular array of nonnegative real-valued random variables on $( Ω ′ , F ′ , P ′ )$ such that Setting S1. or Setting S2. of Section 2.1 is met. Define the map $F ^ n ∗ : Ω ¯ → D$ by $F ^ n ∗ ( ω , ω ′ ) : = 1 n ∑ i = 1 n W n i ( ω ′ ) 𝟙 [ X i ( ω ) , ∞ )$. Recall that Setting S1. is nothing but Efron’s boostrap (Efron (1979)), and that Setting S2. is in line with the Bayesian bootstrap of Rubin (1981) if $Y 1$ is exponentially distribution with parameter 1.
In Section 5.1 in Beutner and Zähle (2016), it was proved with the help of results of Shorack and Wellner (1986) and van der Vaart Wellner (1996) that respectively Condition (a) of Corollary A3 (with $F n : = F$) and Condition (a) of Corollary A4 (with $C ^ n : = F ^ n$) hold for $a n : = n$ and $B : = B F$, where $B F$ is an F-Brownian bridge. Here, $C ϕ$ can be chosen to be the set $C ϕ , F$ of all $v ∈ D ϕ$ whose discontinuities are also discontinuities of F. In addition, note that, in view of $C ^ n = F ^ n$, Condition (e) holds if $S$ is (any subset of) the set of all sequences $( G n )$ of distribution functions on $R$ satisfying $G n − F ∈ D ϕ$, $n ∈ N$, and $∥ G n − F ∥ ϕ → 0$ (see, for instance, Theorem 2.1 in Zähle (2014)).
Example A2.
Let $( X i )$ be a strictly stationary sequence of β-mixing random variables on $( Ω , F , P )$ with distribution function F, and $F ^ n$ be given by Equation (A12). Let $( ℓ n )$ be a sequence of integers such that $ℓ n ↗ ∞$ as $n → ∞$, and $ℓ n < n$ for all $n ∈ N$. Set $k n : = ⌈ n / ℓ n ⌉$ for all $n ∈ N$. Let $( I n j ) n ∈ N , 1 ≤ j ≤ k n$ be a triangular array of random variables on $( Ω ′ , F ′ , P ′ )$ such that $I n 1 , … , I n k n$ are i.i.d. according to the uniform distribution on ${ 1 , … , n − ℓ n + 1 }$ for every $n ∈ N$. Define the map $F ^ n ∗ : Ω ¯ → D$ by $F ^ n ∗ ( ω , ω ′ ) : = 1 n ∑ i = 1 n W n i ( ω ′ ) 𝟙 [ X i ( ω ) , ∞ )$ with $W n i$ given by Equation (8), and recall from Section 2.2 that this is the blockwise bootstrap. Similar as in Lemma 5.3 in Beutner and Zähle (2016) it follows that $a n ( F ^ n ∗ − C ^ n )$, with $C ^ n : = E ′ [ F ^ n ∗ ]$, takes values only in $D ϕ$ and is $( F ¯ , B ϕ ∘ )$-measurable. That is, the first part of Condition (a) of Corollary A4 holds true for $C ^ n : = E ′ [ F ^ n ∗ ]$. Now, assume that Assumptions A1.–A3. of Section 2.2 hold true. Then, as discussed in Example 4.4 and Section 5.2 of Beutner and Zähle (2016), it can be derived from a result in Arcones and Yu (1994) that under Assumptions A1. and A2. we have that Condition (a) of Corollary A3 holds for $a n : = n$, $B : = B F$, and $F n : = F$, where $B F$ is a centered Gaussian process with covariance function $Γ ( t 0 , t 1 ) = F ( t 0 ∧ t 1 ) ( 1 − F ( t 0 ∨ t 1 ) ) + ∑ i = 0 1 ∑ k = 2 ∞ C ov ( 𝟙 { X 1 ≤ t i } , 𝟙 { X k ≤ t 1 − i } )$. Here, $C ϕ$ can be chosen to be the set $C ϕ , F$ of all $v ∈ D ϕ$ whose discontinuities are also discontinuities of F. Moreover, Theorem A5 below shows that under the assumptions A1.–A3. the second part of Condition (a) (i.e., Equation (A14)) and Condition (e) of Corollary A4 hold for $C ^ n : = E ′ [ F ^ n ∗ ] = 1 n ∑ i = 1 n w n i 𝟙 [ X i , ∞ )$ with $w n i : = E ′ [ W n i ]$ (see also Equation (9)) and the same choice of $a n$, B, and $F n$, when $S$ is the set of all sequences $( G n ) ⊆ D ( H )$ with $G n − F ∈ D ϕ$, $n ∈ N$, and $∥ G n − F ∥ ϕ → 0$.
Further examples for Condition (a) in Corollary A4 for dependent observations can, for example, be found in Bühlmann (1994); Naik-Nimbalkar and Rajarshi (1994); Peligrad (1998).
Theorem A5.
In the setting of Example A2 assume that assertions A1.–A3. of Section 2.2 hold, and let $S$ be the set of all sequences $( G n ) ⊆ D ( H )$ with $G n − F ∈ D ϕ$, $n ∈ N$, and $∥ G n − F ∥ ϕ → 0$. Then, the second part of assertion (a) (i.e., Equation (A14)) and assertion (e) in Corollary A4 hold.
Proof.
Proof of second part of (a): It is enough to show that under assumptions A1.–A3. the Assumptions (A1)–(A4) of Theorem 1 in Bühlmann (1995) hold when the class of functions is $F ϕ : = F ϕ − ∪ F ϕ +$. Here, $F ϕ − : = { f x : x ≤ 0 }$ and $F ϕ + : = { f x : x > 0 }$ with $f x ( · ) : = ϕ ( x ) 𝟙 ( − ∞ , x ] ( · )$ for $x ≤ 0$ and $f x ( · ) : = − ϕ ( x ) 𝟙 ( x , ∞ ) ( · )$ for $x > 0$. Due to A2. and A3. we only have to verify Assumptions (A3) and (A4) of Theorem 1 in Bühlmann (1995). That is, we show that the following two assertions hold.
(1)
There exist constants $b , c > 0$ such that $N [ ] ( ε , F ϕ , ∥ · ∥ p ) ≤ c ε − b$ for all $ε > 0$.
(2)
$∫ f ¯ p d F < ∞$ for the envelope function $f ¯ ( z ) : = sup x ∈ R | f x ( z ) |$.
Here, the bracketing number $N [ ] ( ε , F ϕ , ∥ · ∥ p )$ is the minimal number of $ε$-brackets with respect to $∥ · ∥ p$ ($L p$-norm with respect to $d F$) to cover $F ϕ$, where an $ε$-bracket with respect to $∥ · ∥ p$ is the set, $[ ℓ , u ]$, of all functions f with $ℓ ≤ f ≤ u$ for some Borel measurable functions $ℓ , u : R → R +$ with $ℓ ≤ u$ pointwise and $∥ u − ℓ ∥ p ≤ ε$.
(1) We only show that (1) with $F ϕ$ replaced by $F ϕ −$ holds true. Analogously, one can show that the same holds true for $F ϕ +$ (and therefore for $F ϕ$). On the one hand, since $I p − : = ∫ ( − ∞ , 0 ] ϕ p d F < ∞$ by Assumption (a), we can find for every $ε > 0$ a finite partition $− ∞ = y 0 ε < y 1 ε < ⋯ < y k ε ε = 0$ such that
$max i = 1 , … , k ε ∫ ( y i − 1 ε , y i ε ] ϕ p d F ≤ ( ε / 2 ) p$
and $k ε ≤ ⌈ I p − / ( ε / 2 ) p ⌉$. On the other hand, using integration by parts we obtain
$∫ ( − ∞ , 0 ] F d ( − ϕ p ) = ϕ ( 0 ) p F ( 0 ) − ∫ ( − ∞ , 0 ] ( − ϕ p ) d F = ϕ ( 0 ) p F ( 0 ) + I p − ,$
so that we can find a finite partition $− ∞ = z 0 ε < z 1 ε < ⋯ < z m ε ε = 0$ such that
$max i = 1 , … , m ε ∫ ( z i − 1 ε , z i ε ] F d ( − ϕ p ) ≤ ( ε / 2 ) p$
and $m ε ≤ ⌈ ( ϕ ( 0 ) p F ( 0 ) + I p − ) / ( ε / 2 ) p ⌉$.
Now, let $− ∞ = x 0 ε < x 1 ε < ⋯ < x k ε + m ε ε = 0$ be the partition consisting of all points $y i ε$ and $z i ε$, and set
$ℓ i ε ( · ) : = ϕ ( x i ε ) 𝟙 ( − ∞ , x i − 1 ε ] ( · ) , u i ε ( · ) : = ϕ ( x i − 1 ε ) 𝟙 ( − ∞ , x i − 1 ε ] ( · ) + ϕ ( · ) 𝟙 ( x i − 1 ε , x i ε ] ( · ) .$
Then, $ℓ i ε ≤ u i ε$. Moreover,
$∥ u i ε − ℓ i ε ∥ p = ∫ u i ε − ℓ i ε p d F 1 / p ≤ ∫ ( − ∞ , x i − 1 ε ] ϕ ( x i − 1 ε ) − ϕ ( x i ε ) p d F 1 / p + ∫ ( x i − 1 ε , x i ε ] ϕ p d F 1 / p ≤ ∫ ( − ∞ , x i − 1 ε ] ϕ ( x i − 1 ε ) p − ϕ ( x i ε ) p d F 1 / p + ε / 2 ≤ ϕ ( x i − 1 ε ) p − ϕ ( x i ε ) p F ( x i − 1 ε ) 1 / p + ε / 2$
where we used Minkovski’s inequality and Equation (A15), and that $ϕ$ is non-increasing on $( − ∞ , 0 ]$ and $x i − 1 ε ≤ x i ε$. Since F is at least $F ( x i − 1 ε )$ on $( x i − 1 ε , x i ε ]$, we have
$ϕ ( x i − 1 ε ) p − ϕ ( x i ε ) p F ( x i − 1 ε ) ≤ ∫ ( x i − 1 ε , x i ε ] F d ( − ϕ p ) ≤ ( ε / 2 ) p$
due to Equation (A16). Thus, $∥ u i ε − ℓ i ε ∥ p ≤ ε$, so that $[ ℓ i ε , u i ε ]$ provides an $ε$-bracket with respect to $∥ · ∥ p$. It is moreover obvious that the $ε$-brackets $[ ℓ i ε , u i ε ]$, $i = 1 , … , k ε + m ε$, cover $F ϕ −$. Thus, $N [ ] ( ε , F ϕ − , ∥ · ∥ p ) ≤ c ε − p$ for a suitable constant $c > 0$ and all $ε > 0$.
(2) The envelope function $f ¯$ is given by $f ¯ ( y ) = ϕ ( y )$ for $y ≤ 0$ and by $f ¯ ( y ) = ϕ ( y − ) = ϕ ( y )$ (recall that $ϕ$ is continuous) for $y > 0$. Then, under Assumption (a) the integrability condition 2) holds.
Proof of (e): We have to show that $∥ C ^ n − F ∥ ϕ = sup x ∈ R | C ^ n ( x ) − F ( x ) | ϕ ( x ) → 0$$P$-a.s. We only show that
$sup x ∈ ( − ∞ , 0 ] | C ^ n ( x ) − F ( x ) | ϕ ( x ) ⟶ 0 P − a . s . ,$
because the analogue for the positive real line can be shown in the same way. Let $ℓ i ε$ and $u i ε$ be as defined in Equation (A17). By assumption A1. we have $∫ ϕ d F < ∞$, so that similar as above we can find a finite partition $− ∞ = x 0 ε < x 1 ε < ⋯ < x k ε + m ε ε = 0$ such that $[ ℓ i ε , u i ε ]$, $i = 1 , … , k ε + m ε$, are $ε$-brackets with respect to $∥ · ∥ 1$ ($L 1$-norm with respect to F) covering the class $F ϕ : = { f x : x ∈ R }$ introduced above. We proceed in two steps.
Step 1. First we show that
$sup x ≤ 0 | C ^ n ( x ) − F ( x ) | ϕ ( x ) ≤ max i = 1 , … , k ε + m ε max ∫ u i ε d ( C ^ n − F ) ; ∫ ℓ i ε d ( F − C ^ n ) + ε$
holds true for every $ε > 0$. Since $( C ^ n ( x ) − F ( x ) ) ϕ ( x ) = ∫ f x d C ^ n − ∫ f x d F$, for Equation (A19) it suffices to show
$sup x ≤ 0 | ∫ f x d C ^ n − ∫ f x d F | ≤ max i = 1 , … , k ε + m ε max ∫ u i ε d ( C ^ n − F ) ; ∫ ℓ i ε d ( F − C ^ n ) + ε .$
To prove Equation (A20), we note that for every $x ∈ ( − ∞ , y ]$ there is some $i x ∈ { 1 , … , k ε + m ε }$ such that $f x ∈ [ ℓ i x ε , u i x ε ]$ (see Step 1). Therefore, since $[ ℓ i x ε , u i x ε ]$ is an $ε$-bracket with respect to $∥ · ∥ 1$,
$∫ f x d C ^ n − ∫ f x d F ≤ ∫ u i x ε d C ^ n − ∫ f x d F = ∫ u i x ε d ( C ^ n − F ) + ∫ ( u i x ε − f x ) d F ≤ ∫ u i x ε d ( C ^ n − F ) + ∫ ( u i x ε − ℓ i x ε ) d F ≤ max i = 1 , … , k ε + m ε ∫ u i ε d ( C ^ n − F ) + ε .$
Analogously, we obtain
$∫ f x d C ^ n − ∫ f x d F ≥ − max i = 1 , … , k ε + m ε ∫ ℓ i ε d ( F − C ^ n ) + ε .$
That is, Equation (A19) holds true.
Step 2. Because of Equation (A19), for Equation (A18) to be true, it suffices to show that
$∫ ℓ i ε d ( F − C ^ n ) ⟶ 0 and ∫ u i ε d ( C ^ n − F ) ⟶ 0 P − a . s .$
for every $i = 1 , … , k ε + m ε$. We only show the second convergence in Equation (A21), the first convergence can be shown even easier. We have
$∫ u i ε d ( C ^ n − F ) = 1 n ∑ j = 1 n w n i ϕ ( x i − 1 ε ) 𝟙 ( − ∞ , x i − 1 ε ] ( X j ) − E ϕ ( x i − 1 ε ) 𝟙 ( − ∞ , x i − 1 ε ] ( X 1 ) + 1 n ∑ j = 1 n w n i ϕ ( X j ) 𝟙 ( x i − 1 ε , x i ε ] ( X j ) − E ϕ ( X 1 ) 𝟙 ( x i − 1 ε , x i ε ] ( X 1 ) = : S 1 ( n ) + S 2 ( n ) .$
The first summand on the right-hand side of
$S 2 ( n ) = 1 n ∑ j = 1 n ϕ ( X j ) 𝟙 ( x i − 1 ε , x i ε ] ( X j ) − E ϕ ( X 1 ) 𝟙 ( x i − 1 ε , x i ε ] ( X 1 ) + 1 n ∑ j = 1 n ( w n i − 1 ) ϕ ( X j ) 𝟙 ( x i − 1 ε , x i ε ] ( X j )$
converges $P$-a.s. to 0 by Theorem 1 (ii) (and Application 5, p. 924) in Rio (1995) and our assumption A1. The second summand converges $P$-a.s. to 0 too, which can be seen as follows. From Equation (9), we obtain for n sufficiently large
$| w n i − 1 | ≤ 2 , i = 1 , … , ℓ n ℓ n − 1 n − ℓ n + 1 , i = ℓ n + 1 , … , n − ℓ n 2 , i = n − ℓ n + 1 , … , n ,$
so that for n sufficiently large
$| 1 n ∑ j = 1 n ( w n i − 1 ) ϕ ( X j ) 𝟙 ( x i − 1 ε , x i ε ] ( X j ) | ≤ ℓ n − 1 n − ℓ n + 1 1 n ∑ j = ℓ n + 1 n − ℓ n ϕ ( X j ) 𝟙 ( x i − 1 ε , x i ε ] ( X j ) + 2 2 ℓ n n 1 2 ℓ n ∑ j = 1 ℓ n ϕ ( X j ) 𝟙 ( x i − 1 ε , x i ε ] ( X j ) + ∑ j = n − ℓ n + 1 n ϕ ( X j ) 𝟙 ( x i − 1 ε , x i ε ] ( X j ) = : S 2 , 1 ( n ) + S 2 , 2 ( n ) .$
We have seen above that $1 n ∑ j = 1 n ϕ ( X j ) 𝟙 ( x i − 1 ε , x i ε ] ( X j )$ converges $P$-a.s. to the constant $E [ ϕ ( X 1 ) 𝟙 ( x i − 1 ε , x i ε ] ( X 1 ) ]$. Since $ℓ n$ converges to at a slower rate than n (by assumption A3.), it follows that $S 2 , 1 ( n )$ converges $P$-a.s. to 0. Using the same arguments we obtain that $S 2 , 2 ( n )$ converges $P$-a.s. to 0. Hence, $S 2 ( n )$ converges $P$-a.s. to 0. Analogously, one can show that $S 1 ( n )$ converges $P$-a.s. to 0. ☐

## References

1. Acerbi, Carlo. 2002. Spectral measures of risk: A coherent representation of subjective risk aversion. Journal of Banking & Finance 26: 1505–18. [Google Scholar]
2. Acerbi, Carlo, and Balazs Szekely. 2014. Backtesting Expected Shortfall. New York: Morgan Stanley Capital International. [Google Scholar]
3. Acerbi, Carlo, and Dirk Tasche. 2002a. On the coherence of expected shortfall. Journal of Banking & Finance 26: 1487–503. [Google Scholar]
4. Acerbi, Carlo, and Dirk Tasche. 2002b. Expected Shortfall: A natural coherent alternative to Value at Risk. Economic Notes 31: 379–88. [Google Scholar] [CrossRef]
5. Arcones, Miguel Angel, and Bin Yu. 1994. Central limit theorems for empirical and U-processes of stationary mixing sequences. Journal of Theoretical Probability 7: 47–71. [Google Scholar] [CrossRef]
6. Bauer, Heinz. 2001. Measure and Integration Theory. Berlin: De Gruyter. [Google Scholar]
7. Belloni, Alexandre, Victor Chernozhukov, Ivan Frenández-Val, and Christian B. Hansen. 2017. Program evaluation and causal inference with high-dimensional data. Econometrica 85: 233–98. [Google Scholar] [CrossRef]
8. Beutner, Eric, Wei Biao Wu, and Henryk Zähle. 2012. Asymptotics for statistical functionals of long-memory sequences. Stochastic Processes and their Applications 122: 910–29. [Google Scholar] [CrossRef]
9. Beutner, Eric, and Henryk Zähle. 2010. A modified functional delta method and its application to the estimation of risk functionals. Journal of Multivariate Analysis 101: 2452–63. [Google Scholar] [CrossRef]
10. Beutner, Eric, and Henryk Zähle. 2012. Deriving the asymptotic distribution of U- and V-statistics of dependent data using weighted empirical processes. Bernoulli 18: 803–22. [Google Scholar] [CrossRef]
11. Beutner, Eric, and Henryk Zähle. 2016. Functional delta-method for the bootstrap of quasi-Hadamard differentiable functionals. Electronic Journal of Statistics 10: 1181–222. [Google Scholar] [CrossRef]
12. Billingsley, Patrick. 1999. Convergence of Probability Measures. New York: Wiley. [Google Scholar]
13. Bølviken, Eric, and Montserrat Guillen. 2017. Risk aggregation in Solvency II through recursive log-normals. Insurance: Mathematics and Economics 73: 20–26. [Google Scholar] [CrossRef]
14. Bühlmann, Peter. 1994. Blockwise bootstrapped empirical process for stationary sequences. Annals of Statistics 22: 995–1012. [Google Scholar] [CrossRef]
15. Bühlmann, Peter. 1995. The blockwise bootstrap for general empirical processes of stationary sequences. Stochastic Processes and their Applications 58: 247–65. [Google Scholar] [CrossRef]
16. Davison, Anthony C., and David Victor Hinkley. 1997. Bootstrap Methods and Their Application. Cambridge: Cambridge University Press. [Google Scholar]
17. Dudley, Richard Mansfield. 1966. Weak convergence of probabilities on nonseparable metric spaces and empirical measures on Euclidean spaces. Illinois Journal of Mathematics 10: 109–26. [Google Scholar]
18. Dudley, Richard Mansfield. 1967. Measures on non-separable metric spaces. Illinois Journal of Mathematics 11: 449–53. [Google Scholar]
19. Efron, Bradley. 1979. Bootstrap methods: Another look at the jackknife. Annals of Statistics 7: 1–26. [Google Scholar] [CrossRef]
20. Efron, Bradley, and Robert Tibshirani. 1994. An introduction to the Bootstrap. New York: Chapman & Hall. [Google Scholar]
21. Emmer, Susanne, Marie Kratz, and Dirk Tasche. 2015. What is the best risk measure in practice? A comparison of standard measures. Journal of Risk 18: 31–60. [Google Scholar] [CrossRef][Green Version]
22. Gilat, David, and Roelof Helmers. 1997. On strong laws for generalized L-statistics with dependent data. Commentationes Mathtematicae Universitatis Carolinae 38: 187–92. [Google Scholar]
23. Gribkova, Nadezhda. 2002. Bootstrap approximation of distributions of the L-statistics. Journal of Mathematical Sciences 109: 2088–102. [Google Scholar] [CrossRef]
24. Gribkova, Nadezhda, and Department of Theory of Probability and Mathematical Statistics, Saint Petersburg State University, Saint Petersburg, Russia. 2016. Personal communication.
25. Helmers, Roelof, Paul Janssen, and Robert Serfling. 1990. Berry-Esséen and bootstrap results for generalized L-statistics. Scandinavian Journal of Statistics 17: 65–77. [Google Scholar]
26. Jones, Bruce L., and Ričardas Zitikis. 2003. Empirical estimation of risk measures and related quantities. North American Actuarial Journal 7: 44–54. [Google Scholar] [CrossRef]
27. Krätschmer, Volker, Alexander Schied, and Henryk Zähle. 2013. Quasi-Hadamard differentiability of general risk functionals and its application. Statistics and Risk Modeling 32: 25–47. [Google Scholar] [CrossRef]
28. Krätschmer, Volker, and Henryk Zähle. 2017. Statistical inference for expectile-based risk measures. Scandinavian Journal of Statistics 44: 425–54. [Google Scholar]
29. Lahiri, Soumendra Nath. 2003. Resampling Methods for Dependent Data. New York: Springer. [Google Scholar]
30. Lauer, Alexandra, and Henryk Zähle. 2015. Nonparametric estimation of risk measures of collective risks. Statistics and Risk Modeling 32: 89–102. [Google Scholar] [CrossRef]
31. Lauer, Alexandra, and Henryk Zähle. 2017. Bootstrap consistency and bias correction in the nonparametric estimation of risk measures of collective risks. Insurance: Mathematics and Economics 74: 99–108. [Google Scholar] [CrossRef]
32. Mehra, K. L., and Sudhakara Rao. 1975. On functions of order statistics for mixing processes. Annals of Statistics 3: 874–83. [Google Scholar] [CrossRef]
33. Naik-Nimbalkar, Uttara V., and M.B. Rajarshi. 1994. Validity of blockwise bootstrap for empirical processes with stationary observations. Annals of Statistics 22: 980–94. [Google Scholar] [CrossRef]
34. Peligrad, Magda. 1998. On the blockwise bootstrap for empirical processes for stationary sequences. Annals of Probability 26: 877–901. [Google Scholar] [CrossRef]
35. Pitts, Susan M. 1994. Nonparametric estimation of compound distributions with applications in insurance. Annals of the Institute of Mathematical Statistics 46: 537–55. [Google Scholar]
36. Pollard, David. 1984. Convergence of Stochastic Processes. New York: Springer. [Google Scholar]
37. Rio, Emmanuel. 1995. A maximal inequality and dependent Marcinkiewicz-Zygmund strong laws. Annals of Probability 23: 918–37. [Google Scholar] [CrossRef]
38. Rubin, Donald. 1981. The Bayesian bootstrap. Annals of Statistics 9: 130–34. [Google Scholar] [CrossRef]
39. Shao, Jun, and Dongsheng Tu. 1995. The Jackknife and Bootstrap. New York: Springer. [Google Scholar]
40. Shorack, Galen R. 1972. Linear functions of order statistics. Annals of Mathematical Statistics 43: 412–27. [Google Scholar] [CrossRef]
41. Shorack, Galen R., and Jon A. Wellner. 1986. Empirical Processes with Applications to Statistics. New York: Wiley. [Google Scholar]
42. Stigler, Stephen M. 1974. Linear functions of order statistics with smooth weight functions. Annals of Statistics 2: 676–93. [Google Scholar] [CrossRef]
43. Sun, Shuxia, and Fuxia Cheng. 2018. Bootstrapping the Expected Shortfall. Theoretical Economics Letters 8: 685–98. [Google Scholar] [CrossRef]
44. Tsukahara, Hideatsu. 2013. Estimation of distortion risk measures. Journal of Financial Econometrics 12: 213–35. [Google Scholar] [CrossRef]
45. Van der Vaart, Aad W., and Jon A. Wellner. 1996. Weak Convergence and Empirical Processes. New York: Springer. [Google Scholar]
46. Van Zwet, Willem R. 1980. A strong law for linear functionals of order statistics. Annals of Probability 8: 986–90. [Google Scholar] [CrossRef]
47. Varron, Davit, and Laboratoire de Mathématiques de Besançon, University of Franche-Comté, Besançon, France. 2015. Personal communication.
48. Zähle, Henryk. 2014. Marcinkiewicz–Zygmund and ordinary strong laws for empirical distribution functions and plug-in estimators. Statistics 48: 951–64. [Google Scholar] [CrossRef]

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Risks EISSN 2227-9091 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top