A study of seven asymmetric kernels for the estimation of cumulative distribution functions

In Mombeni et al. (2019), Birnbaum-Saunders and Weibull kernel estimators were introduced for the estimation of cumulative distribution functions (c.d.f.s) supported on the half-line $[0,\infty)$. They were the first authors to use asymmetric kernels in the context of c.d.f. estimation. Their estimators were shown to perform better numerically than traditional methods such as the basic kernel method and the boundary modified version from Tenreiro (2013). In the present paper, we complement their study by introducing five new asymmetric kernel c.d.f. estimators, namely the Gamma, inverse Gamma, lognormal, inverse Gaussian and reciprocal inverse Gaussian kernel c.d.f. estimators. For these five new estimators, we prove the asymptotic normality and we find asymptotic expressions for the following quantities: bias, variance, mean squared error and mean integrated squared error. A numerical study then compares the performance of the five new c.d.f. estimators against traditional methods and the Birnbaum-Saunders and Weibull kernel c.d.f. estimators from Mombeni et al. (2019). By using the same experimental design, we show that the lognormal and Birnbaum-Saunders kernel c.d.f. estimators perform the best overall, while the other asymmetric kernel estimators are sometimes better but always at least competitive against the boundary kernel method.

The interested reader is referred to Hirukawa (2018) and Section 2 in Ouimet (2020c) for a review of some of these papers and an extensive list of papers dealing with asymmetric kernels in other settings.
In contrast, there are almost no papers dealing with the estimation of cumulative distribution functions (c.d.f.s) in the literature on asymmetric kernels. In fact, to the best of our knowledge, Mombeni et al. (2019) seems to be the first (and only) paper in this direction if we exclude the closely related theory of Bernstein estimators. 1 In the present paper, we complement the study in Mombeni et al. (2019) by introducing five new asymmetric kernel c.d.f. estimators, namely the Gamma, inverse Gamma, lognormal, inverse Gaussian and reciprocal inverse Gaussian kernel c.d.f. estimators. Our goal is to prove several asymptotic properties for these five new estimators (bias, variance, mean squared error, mean integrated squared error and asymptotic normality) and compare their numerical performance against traditional 1 In the setting of Bernstein estimators, c.d.f. estimation on compact sets was tackled for example by Babu et al. (2002), Leblanc (2009), Leblanc (2012a), Leblanc (2012b), Dutta (2016), Jmaei et al. (2017), Erdogan et al. (2019) and Wang et al. (2019) in the univariate setting, and by Babu & Chaubey (2006), Belalia (2016), Dib et al. (2020) and Ouimet (2020a,b) in the multivariate setting. In Hanebeck & Klar (2020), the authors introduced Bernstein estimators with Poisson weights (also called Szasz estimators) for the estimation of c.d.f.s that are supported on [0, ∞), see also Ouimet (2020d). methods and against the Birnbaum-Saunders and Weibull kernel c.d.f. estimators from Mombeni et al. (2019). As we will see in the discussion of the results (Section 10), the lognormal and Birnbaum-Saunders kernel c.d.f. estimators perform the best overall, while the other asymmetric kernel estimators are sometimes better but always at least competitive against the boundary kernel method from Tenreiro (2013).
The function Γ(α, z) := ∞ z t α−1 e −t dt denotes the upper incomplete gamma function (where Γ(α) := Γ(α, 0)), and Φ denotes the c.d.f. of the standard normal distribution. The parametrizations are chosen so that • The mode of the kernel function in (2.1) is x; • The median of the kernel function in (2.3) and (2.6) is x; • The mean of the kernel function in (2.2), (2.4), (2.5) and (2.7) is x.
In this paper, we will compare the numerical performance of the above seven asymmetric kernel c.d.f. estimators against the following three traditional estimators (K here is the c.d.f. of a kernel function): (2.10) which denote, respectively, the ordinary kernel (OK) c.d.f. estimator (from Tiago de Oliveira (1963), Nadaraja (1964) or Watson & Leadbetter (1964)), the boundary kernel (BK) c.d.f. estimator (from Example 2.3 in Tenreiro (2013)) and the empirical c.d.f. (EDF) estimator.

Outline
In Section 4, 5, 6, 7 and 8, the asymptotic normality, and the asymptotic expressions for the bias, variance, mean squared error (MSE) and mean integrated squared error (MISE), are stated for the Gam, IGam, LN, IGau and RIG kernel c.d.f. estimators, respectively. The proofs can be found in Appendix A, B, C, D and E, respectively. Aside from the asymptotic normality (which can easily be deduced), these results were obtained for the Birnbaum-Saunders and Weibull kernel c.d.f. estimators in Mombeni et al. (2019). In Section 9, we compare the performance of all seven asymmetric kernel estimators above with the three traditional estimators OK, BK and EDF, defined in (2.8), (2.9) and (2.10). A discussion of the results and our conclusion follow in Section 10 and Section 11. Technical calculations for the proofs of the asymptotic results are gathered in Appendix F.

Assumptions
Throughout the paper, we make the following two basic assumptions: 1. The target c.d.f. F has two continuous and bounded derivatives; 2. The smoothing (or bandwidth) parameter b = b n > 0 satisfies b → 0 as n → ∞.

Notation
Throughout the paper, the notation u = O(v) means that lim sup |u/v| < C < ∞ as n → ∞. The positive constant C can depend on the target c.d.f. F , but no other variable unless explicitly written as a subscript. For example, if C depends on a given point x ∈ (0, ∞), we would write u = O x (v). Similarly, the notation u = o(v) means that lim |u/v| = 0 as n → ∞. The same rule applies for the subscript. The expression ' D −→' will denote the convergence in law (or distribution).

Asymptotic properties of the c.d.f. estimator with Gam kernel
In this section, we find the asymptotic properties of the Gamma (Gam) kernel estimator defined in (2.1).

Asymptotic properties of the c.d.f. estimator with IGam kernel
In this section, we find the asymptotic properties of the inverse Gamma (IGam) kernel estimator defined in (2.2).
6. Asymptotic properties of the c.d.f. estimator with LN kernel In this section, we find the asymptotic properties of the LogNormal (LN) kernel estimator defined in (2.3).
7. Asymptotic properties of the c.d.f. estimator with IGau kernel In this section, we find the asymptotic properties of the inverse Gaussian (IGau) kernel estimator defined in (2.4).

Asymptotic properties of the c.d.f. estimator with RIG kernel
In this section, we find the asymptotic properties of the reciprocal inverse Gaussian (RIG) kernel estimator defined in (2.5).

Numerical study
As in Mombeni et al. (2019), we generated M = 1000 samples of size n = 256 and n = 1000 from eight distributions: 1. Burr(1, 3, 1), with the following parametrization for the density function: (9.1) 2. Gamma(0.6, 2), with the following parametrization for the density function: 3. Gamma(4, 2), with the following parametrization for the density function: 4. GeneralizedPareto(0.4, 1, 0), with the following parametrization for the density function: HalfNormal (1), with the following parametrization for the density function: 6. LogNormal(0, 0.75), with the following parametrization for the density function: (9.6) 7. Weibull(1.5, 1.5), with the following parametrization for the density function: 8. Weibull(3, 2), with the following parametrization for the density function: For each of the eight distributions (i = 1, 2, . . . , 8), each of the ten estimators (j = 1, 2, . . . , 10), each sample size (n = 256, 1000), and each sample (k = 1, 2, . . . , M ), we calculated the integrated squared errors 7,n , b opt is optimal with respect to the MISE (see (4.8), (5.8), (6.8), (7.8) and (8.8)) and approximated under the assumption that the target distribution is Gamma(α n are the maximum likelihood estimates for the k-th sample) and Everywhere in our R code, we approximated the integrals on (0, ∞) using the integral function from the R package pracma (the base function integrate had serious precision issues). Table 9.1 below shows the mean and standard deviation of the ISE's, i.e., (9.10) for the eight distributions (i = 1, 2, . . . , 8), the ten estimators (j = 1, 2, . . . , 10) and the two sample sizes (n = 256, 1000). All the values presented in the table have been multiplied by 10 4 . In Table 9.2, we computed, for each distribution and each sample size, the difference between the ISE means and the lowest ISE mean for the corresponding distribution and sample size (i.e., the ISE means minus the ISE mean of the best estimator on the corresponding line). The totals of those differences are also calculated for each sample size on the two "total" lines. Figure 9.3 gives a better idea of the distribution of ISE's by displaying the boxplot of the ISE's for every distribution and every estimator, when the sample size is n = 1000. Finally, the Figures 9.4,9.5,9.6,9.7,9.8,9.9,9.10,9.11 (one figure for each of the eight distributions) show a collection of ten c.d.f. estimates from each of the ten estimators.
Here are the results, which we discuss in Section 10:   i,j,n , k = 1, 2, . . . , M , for the eight distributions (i = 1, 2, . . . , 8), the ten estimators (j = 1, 2, . . . , 10) and the two sample sizes (n = 256, 1000). All the values presented in the table have been multiplied by 10 4 . The ordinary kernel estimatorF8 is denoted by OK, the boundary kernel estimatorF9 is denoted by BK, and the empirical c.d.f.F10 is denoted by EDF. For each line in the table, the lowest ISE means are highlighted in blue.  i,j,n , k = 1, 2, . . . , M , minus the lowest ISE mean for that line (i.e., minus the ISE mean of the best estimator for that specific distribution and sample size). For each estimator (j = 1, 2, . . . , 10) and each sample size, the total of those differences to the best ISE mean is calculated on the line called "total". For each sample size, the lowest totals are highlighted in blue. i,j,n , k = 1, 2, . . . , M , for the eight distributions and the ten estimators, when the sample size is n = 1000.  Figure 9.11: The Weibull(3, 2) density function appears on the top-left, and the true c.d.f. is depicted in red everywhere else. Each plot has ten estimates in blue for the Weibull(3, 2) c.d.f. using one of the ten estimators and n = 256.

Discussion of the simulation results
In Table 9.1, we see that the LN and B-S kernel c.d.f. estimators performed the best (had the lowest ISE means) for the majority of the distributions considered (j = 1, 2, 3, 4, 6, 7 for n = 256 and j = 1, 2, 4, 6, 7 for n = 1000). They also always did so in pair, with the same ISE mean up to the second decimal. For the remaining cases, the boundary kernel estimator (BK) from Tenreiro (2013) had the lowest ISE means. As expected, the ordinary kernel estimator and the empirical c.d.f. performed the worst. In Mombeni et al. (2019), the authors reported that the empirical c.d.f. performed better than the BK estimator, but this has to be a programming error (especially since the bandwidth was optimized with a plug-in method). Overall, our means and standard deviations in Table 9.1 seem to be lower than the ones reported in Mombeni et al. (2019) at least in part because we used a more precise option (the pracma::integral function in R) to approximate the integrals involved in the bandwidth selection procedures and the computation of the ISE's.
In all cases, the asymmetric kernel estimators were at least competitive with the BK estimator in Table 9.1. Table 9.2, which shows the difference between an estimator's ISE mean and the lowest ISE mean for the corresponding distribution and sample size, paints a nicer picture of the asymmetric kernel c.d.f. estimators' performance. It shows that, for each sample size (n = 256, 1000), the total of those differences to the best ISE mean is significantly lower for the LN and B-S kernel estimators, compared to all the other alternatives. Also, the totals for all the asymmetric kernel estimators are lower than for the BK estimator when n = 256 and are in the same range (or better in the case of LN and B-S) when n = 1000. This means that all the asymmetric kernel estimators are overall better alternatives (or at least always remain competitive) compared to the BK estimator, although the advantage seems to dissipate (except for LN and B-S) when n increases.

Conclusion
In this paper, we considered five new asymmetric kernel c.d.f. estimators, namely the Gamma (Gam), inverse Gamma (IGam), lognormal (LN), inverse Gaussian (IGau) and reciprocal inverse Gaussian (RIG) kernel c.d.f. estimators. We proved the asymptotic normality of these estimators and also found asymptotic expressions for their bias, variance, mean squared error and mean integrated squared error. The expressions for the optimal bandwidth under the mean integrated squared error was used in each case to implement a bandwidth selection procedure in our simulation study. With the same experimental design as in Mombeni et al. (2019) (but with an improved approximation of the integrals involved in the bandwidth selection procedures and the computation of the ISE's), our results show that the lognormal and Birnbaum-Saunders kernel c.d.f. estimators perform the best overall. The results also show that all seven asymmetric kernel c.d.f. estimator are better in some cases and at least always competitive against the boundary kernel alternative presented in Tenreiro (2013). In that sense, all seven asymmetric kernel c.d.f. estimators are safe to use in place of more traditional methods. We recommend using the lognormal and Birnbaum-Saunders kernel c.d.f. estimators in the future.

F. Technical lemmas
The lemma below computes the first two moments for the minimum of two i.i.d. random variables with a Gamma distribution. The proof is a slight generalization of the answer provided by Felix Marin in the following MathStackExchange post. 2 where Γ(α) := ∞ 0 t α−1 e −t dt denotes the gamma function. In particular, for all x ∈ R, Proof. Assume throughout the proof that j ∈ {1, 2}. By the simple change of variables (u, v) = (x/θ, y/θ), we have where δ denotes the Dirac delta function. The second term in the last brace is 1/2 and the principal value is (1 + τ 2 ) −α−j dτ, (F.7) where we crucially used the fact that j ∈ {1, 2} to obtain the last equality. Putting all the work back in (F.6), we get The remaining integral can be evaluated using Ramanujan's master theorem. Indeed, note that Therefore, ∞ 0 t 1/2−1 (1 + t) −α−j dt = Γ(1/2)ϕ(−1/2) = √ π Γ(α + j − 1/2) Γ(α + j) . (F.10) By putting this result in (F.8), we obtain This ends the proof.
The lemma below computes the first two moments for the minimum of two i.i.d. random variables with an inverse Gamma distribution.
∼ InverseGamma(b −1 + 1, x −1 b) for some x ∈ (0, ∞) and b ∈ (0, 1), then The lemma below computes the first two moments for the minimum of two i.i.d. random variables with a lognormal distribution.
where Φ denotes the c.d.f. of the standard normal distribution. In particular, for all x ∈ R, Proof. With the change of variables Corollary F.6. Let X, Y i.i.d.

G. R code
The R code for the simulations in Section 9 is available online.