On Underdispersed Count Kernels for Smoothing Probability Mass Functions

: Only a few count smoothers are available for the widespread use of discrete associated kernel estimators, and their constructions lack systematic approaches. This paper proposes the mean dispersion technique for building count kernels. It is only applicable to count distributions that exhibit the underdispersion property, which ensures the convergence of the corresponding estimators. In addition to the well-known binomial and recent CoM-Poisson kernels, we introduce two new ones such the double Poisson and gamma-count kernels. Despite the challenging problem of obtaining explicit expressions, these kernels e ﬀ ectively smooth densities. Their good performances are pointed out from both numerical and comparative analyses, particularly for small and moderate sample sizes. The optimal tuning parameter is here investigated by integrated squared errors. Also, the added advantage of faster computation times is really very interesting. Thus, the overall accuracy of two newly suggested kernels appears to be between the two old ones. Finally, an application including a tail probability estimation on a real count data and some concluding remarks are given.


Introduction
Nonparametric statistics use (discrete) asymmetric kernel methods to capture and visually represent complex relationships between variables that cannot be effectively captured by symmetric kernels.A common practice now is to employ kernels whose support coincides with the nature or support of the dataset, whether it is count, categorical, bounded, unbounded, continuous, left or right skewed, and so on.See, for example, [1][2][3] for smoothing of probability mass, probability density, and regression functions with environmental, econometric and financial applications, among others.Discrete smoothing of probability mass functions in the case of counts and categoricals has not been studied as extensively as its continuous counterparts, primarily due to the limited options for suitable kernels.There are two main classes of discrete associated kernels.The first, called of the "second order" (with δ = 0 in Definition 1), includes kernels whose estimators are consistent and asymptotically tend towards the true function as for the Dirac kernel.Such discrete (asymmetric and symmetric) kernels are, for instance, triangular [4], Aitchison-Aitken or Dirac Uniform [5], Wang and Van Ryzin [6], and the recently proposed CoM-Poisson [7,8]; see also [3].The second class, so-called of the "first order" (with δ ∈ (0, 1) in Definition 1), generally contains the binomial kernel [4] appropriate for small and moderate sample sizes for which the corresponding estimators do not converge.The previous discrete kernels are suitable for categorical data, with the exception of the binomial and CoM-Poisson kernels, which are appropriated for count data.Additionally, all these associated kernels are typically underdispersed, i.e., its variance is less than the expectation; on the contrary, the equidispersed Poisson and overdispersed negative binomial kernels are not recommended for count smoothings; see [4], the authors of which make intensive simulations that highlight the superiority of these underdispersed discrete smoothers versus equi-/over-dispersed ones.The reader can also refer to [9,10] for some applications of discrete kernels in survey sampling and the model specification test, respectively, and more generally to Li and Racine [11].Here is the precise definition of a discrete associated kernel; see, e.g., Esstafa et al. [8].

Definition 1.
Let T ⊆ R be the discrete support of the probability mass function (pmf) f to be estimated, x ∈ T a target point and h > 0 a bandwidth.A parameterized pmf K x,h (•) on the discrete support S x ⊆ R is called "discrete associated kernel" if the following conditions are satisfied: where Z x,h denotes the discrete random variable with pmf K x,h (•).
We suppose X 1 , X 2 , . . ., X n is a sample of independent and identically distributed (iid) discrete random variables with a pmf f defined on T ⊆ R. The usual discrete associated kernel estimator of f is generally not a pmf.This is particularly true for some discrete associated-kernels such as the binomial, triangular, and CoM-Poisson, where the total mass of the corresponding estimator does not necessarily equal one; see , e.g., [12].Specifically, one can express both estimators as follows: with where (h n ) n≥1 is an arbitrary sequence of positive smoothing (or tuning) parameters that satisfies lim n→∞ h n = 0, while K x,h n (•) is a suitably chosen discrete kernel function.For the following three kernels of Dirac, Aitchison and Aitken [5] and Wang and Van Rizin [6], it is easy to check that C n = 1, and therefore f n = f n .Esstafa et al. [8] recently demonstrated the effectiveness of the normalized version (1) compared to the unnormalized one (2) with illustrations using the existing count (convergent or not-convergent) smoothers: binomial and CoM-Poisson, respectively.We here use the standardized version (1) for all.In nonparametric (discrete or continous) kernel estimation, the tuning (smoothing) parameter plays a crucial role in preventing overfitting and underfitting.Bandwidth selection methods can be categorized into three families: global bandwidths for all smoothers, adaptive for continuous kernels, and local ones for discrete (count and categorical) estimators.For example, Chu et al. [13] proposed a rule-of-thumb approach, Harfouche et al. [1] utilized cross-validations, and Somé et al. [2] emploed a local Bayesian method.It is important to note that smoothers using local bandwidths, which vary according to each estimation point, are referred to as "balloon estimators", while adaptive bandwidths, varying for each data point, are known as "sample-point estimators".
In this paper, we propose two new count kernels, namely the double Poisson and the gamma-count, derived from their underdispersed distributions parts.These additions enrich the list of existing count kernels, such as binomial and CoM-Poisson.Specifically, they reinforce the roster of "first-order" kernels, which exclusively contains the CoM-Poisson, and whose smoothers are consistent and asymptotically tend towards the Dirac kernel.Their construction is performed through the mean dispersion technique which appears as a variant of the mode dispersion method [12] in continuous cases.The rest of the paper is organized as follows.In Section 2, we recall some underdispersed count distributions including the double Poisson and gamma-count which have some specific properties in their expressions.Section 3 is devoted to building the double Poisson and gamma-count kernels after introducing the mean dispersion method of consruction and despite some approximations in their properties.Section 4 presents the main results from our simulation studies and then an application on a count dataset on development days of insect pests on Hura trees.Final remarks are made in Section 5. Some other underdispersed count distributions and local Bayesian bandwidth selection are mentionned in Appendices A and B in relation to their feasibility.

Some Properties of Underdispersed Count Distributions
In this section we recall three count distributions, namely the double Poisson, the gamma-count and the CoM-Poisson, which are underdispersed according to a part of their parameters.Before building their corresponding associated kernels to satisfy Definition 1, we point out their main properties needed (pmf, mean and variance) even if they are not generally in closed-form expressions.Thus, approximation and computation approaches are used for a better understanding of the parameters.

•
The double Poisson pmf is defined by and where γ > 0 is the dispersion parameter and λ > 0. The mean and variance do not have closed-form expressions but they can be approximated, respectively, by We note that the values 0 < γ < 1, γ = 1 and γ > 1 correspond to overdispersion, equidispersion and underdispersion, respectively.See Efron [14] and Toledo et al. [15] for further details.

•
The gamma-count pmf for the number of events within the time interval (0, T) is given, with α, β > 0, through with the cumulative distribution function and for y = 0, G(0, βT) = 1 and T can be set to one, without loss of generality.The parameter α > 0 is such that α > 1 and 0 < α < 1 refer to underdispersion or overdispersion, respectively.Here, the mean and variance are not available in closed form but they can be computed through See Winkelmann [16] for further details, Zeviani et al. [17] for an application to regression model, and also [15].Numerically and from Figure 1, we can observe that the mean E(Y) of the gamma-count distribution is almost always a constant around β; specifically, by zooming in, we notice that the shape of the curve is logarithmic or approximately linear in α > 0 for fixed β > 0. The same fact is observed for its mode, as shown in Figure 2. We also note that Figure 2 highlights the role of β > 0 as a shape or location parameter and α > 0 as a scale or dispersion parameter of the gamma-count distribution.Hence, the variance of the gamma-count distribution can be seen as a function of α > 0.

•
The CoM-Poisson distribution with location parameter µ ≥ 0 and dispersion parameter ν > 0 (ν > 1 for underdispersion) such that its pmf p(•; µ, ν) is defined by where function λ := λ(µ, ν) is the solution to equation and it is used to define the normalizing constant See, for example, [8] for some references.
Also, we can refer to Appendix A for some underdispersed count distributions such as the BerPoi, generalized Poisson, an underdispersed Poisson, the BerG and the hyper-Poisson, for which their corresponding count associated kernels are inconclusive.

Associated Kernel Versions
We introduce in this section the notion of the mean dispersion-ready pmf, a new method inspired by the mode dispersion technique (see, for example, [12]) and adapted to the discrete setting.This method allows construction of discrete associated kernels and is applicable to underdispersed count distributions.Definition 2. A mean dispersion-ready pmf K θ is a underdispersed parametrized pmf with discrete support S θ ⊆ R, θ ∈ Θ ⊆ R 2 , such that K θ has moments of second order with mode m ∈ R and admitting dispersion parameter D. Remark 1.Let K θ be a mean dispersion-ready pmf on S θ ⊆ R. The following two assertions are satisfied: (i) the mode m of K θ always belongs to S θ ; (ii) if µ is the mean of K θ , then K θ (m) ≥ K θ ( µ ), where • denotes the integer part.
In order to create discrete associated kernels from an underdispersed unimodal mean dispersion-ready pmf K θ defined on S θ , the mean dispersion method requires, if it exists, an explicit solution of the following system of equations: It should be noted that this construction may not always be possible, and alternative methods can be found in [8,12,18].Now, we illustrate the use of (6) in four examples such both new double Poisson and gamma-count kernels, as well as the old CoM-Poisson and binomial kernels.
Example 1.The double Poisson kernel of the second order and underdispersed for any h ∈ (0, 1) is defined on S x = T = N for each x ∈ N, where k(x + h, 1/h) is the normalizing constant.It comes from (3) with the reparametrization of the system (λ, γ) = (x + h, 1/h) which implies

E Z DP
x,h x + h → x and Var Z DP x,h (x + h)h → 0 as h → 0, where Z := Z DP x,h is the count random variable associated to this double Poisson kernel.
Example 2. The gamma-count kernel, which exhibits the underdispersion phenomenon for any h ∈ (0, 1), is derived from (4) with parametrization (α, β) = (1/h, x + h).It is defined on S x = T = N for each x ∈ N and any h ∈ (0, 1) by From the analyses of Figures 1 and 2, the mean and mode of the associated gamma-count random variable Z := Z GC x,h are around x + h, which therefore tend to a neighborhood of x when h → 0. Also, one can observe that Var Z GC x,h → 0 as h → 0.
Example 3. The CoM-Poisson kernel of the second order and underdispersed for any h ∈ (0, 1) is defined with S x = T = N for each x ∈ N, is the normalizing constant and λ := λ(x, 1/h) represents a function of x and 1/h given by the solution of One can refer to [7,8] for further details.This construction implies that E Z CMP x,h = x and Example 4. The first-order and underdispersed binomial kernel is introduced by Kokonendji and Senga Kiessé [4] as follows: S x = {0, 1, . . ., x + 1} for each x ∈ N = T and h ∈ (0, 1), Figures 3 and 4 show different behaviours of these four underdispersed count kernels at the origin x = 0 and at x = 5, respectively, according to three values of the bandwidth h > 0. Hence, the two newly suggested count kernels appear to be better competitors to the second-order CoM-Poisson kernel compared to the binomial one.

Simulation Studies and an Application to Real Data
The purpose of all numerical studies conducted here is to investigate the performances of the two new double Poisson and gamma-count kernels alongside the classical binomial and CoM-Poisson smoothers derived from (1) and ( 2).Computations are conducted on a 2.30 GHz PC using R software [19].The previous smoothers are fitted using the rmutil, Ake and mpcmp packages [20][21][22], respectively.The corresponding four underdispersed count kernel estimators are assessed by employing integrated squared error (ISE) method to determine the optimal bandwidth parameter In fact, the usual cross-validation (data-driven) technique does not converge in simulations and real data for the proposed kernel estimator: double Poisson and gammacount.The reader can refer, for others methods, to Chu [13] for the plug-in method, Harfouche et al. [1] for cross-validation and to Kokonendji and Senga Kiéssé [4] for mean integrated squared errors.
In this study, we examine the performances of four count smoothers using count simulated datasets under four different scenarios denoted by A, B, C, and D. These scenarios are chosen to assess how well the estimators handle zero inflation, unimodality and multimodality.We evaluate the effectiveness of the smoothers by analyzing the empirical estimates of the ISE, specifically where N sim is the number of replications and n denotes the sample size.
• Scenario A is generated by using the Poisson distribution x! , x ∈ N; • Scenario B comes from the zero-inflated Poisson distribution x! , x ∈ N; Table 1 presents the computation times required to perform all ISE bandwidth selection techniques (7) for gamma-count, double Poisson, binomial and CoM-Poisson smoothers based on a single replication of sample sizes ranging from n = 20 to 500 for the target function C. For all sample sizes, the results show that the CoM-Poisson is the most time consuming followed by the double Poisson smoother mainly due to the normalizing constant in their expressions, ( 5) and (3), respectively.As the sample sizes increase, the binomial kernel outperforms in terms of CPU times due to its support S x = {0, 1, . . ., x}, whereas the gamma-count kernel becomes the second quickest due to the integrals in its expression.Table 2 exhibits some empirical values of ISE n , obtained through ISE bandwidth selection (7) using N = 500 as number of replications, according to the four Scenarios A, B, C and D and with respect to sample sizes n = 10, 25, 50, 100, 250, 500.Then, several behaviours emerge.As the sample sizes increase, the smoothings improve for all smoothers.As expected, the binomial kernel is the least efficient since it is of the first order.The three others have comparable performances.The two new count kernels, namely double Poisson and gamma-count, are slightly more precise than the CoM-Poisson one, notably for small and medium sample sizes (i.e., n ≤ 100) while the latter is the best for large sample sizes.Additionally, approximations made for the moments of the gamma-count distribution (4) may help clarify the performance discrepancy between the two new kernels for larger sample sizes.Finally, from a purely practical perspective, Tables 1 and 2    Now, we apply these four underdispersed count kernels for smoothing the real count dataset on development days of insect pests on Hura trees with moderate sample size n = 51; see also [8].Practical performances are here examined via the empirical ISE method (7) and the empirical criterion of ISE: Figure 6 offers their graphical representations.We also evaluate the practical upper tail probability P(X ≥ 32) suitable for applied statisticians.Then, these tail probabilities are estimated to be 0.1569, 0.1949, 0.1568, 0.1510 and 0.1572 for the empirical frequency f 0 , gamma-count, double Poisson, binomial and CoM-Poisson kernel estimations, respectively.Although the double Poisson and the CoM-Poisson have similar performances, we recommend, again, the first one, which is more flexible and much faster; see Table 1.

Summary and Final Remarks
We introduced two novel underdispersed count kernels, specifically the double Poisson and gamma-count ones.They were developed using the proposed mean dispersion method.Also, we considered the integrated squared error method (7) to select as quickly and efficiently as possible the bandwidth of their corresponding estimations.Through simulation experiments and real count data analysis, we demonstrated that these kernels perform better than the binomial kernel, while falling between the CoM-Poisson kernel smoothing (which performs the best) and the binomial kernel (which performs the worst).Although the CoM-Poisson and double Poisson kernels have similar performances, we strongly recommend using the latter due to its significantly lower time consumption and its flexibility from some closed-form expressions.
We note that any underdispersed count distribution cannot always lead to its corresponding associated kernel; see Appendix A and also [23,24].It would also be better to improve the bandwidth selection with data-driven methods; Appendix B mentions the direction for the local Bayesian bandwidth selection.In addition, an important fact for smoothing a pmf on T = {k, k + 1, . ..} with k ≥ 1 is to consider, for instance, the k-shifted version of any underdispersed count kernel.In fact, the two main properties of the associated kernel, as recalled in Definition 1, are first to adapt the support S of the kernel to T and second to maintain the variance property which tends to δ ∈ [0, 1) as h → 0.
Since f is unknown, we use f n in Equation (1) as the natural estimator of f , and afterward we can estimate the posterior π(h | x) by the so-called posterior density as π(h | x, X 1 , X 2 , . . ., X n ) = f n (x)π(h) , ∀x ∈ N.
Since the smoothing parameter h here belongs to [0, 1], a natural univariate prior distribution of π(h) is the beta distribution with positive parameters α and β: where h ∈ (0, 1] and B(α, β) is the Euler beta function defined by Then, the posterior becomes , where specific N(h) of double Poisson, gamma-count, CoM-Poisson and binomial kernels are, respectively, G(X i (x + h), T/h) − G((x + h)(X i + 1), T/h) , and For instance, only the local bandwidths of the binomial kernel estimator have the exact expressions as , ∀x ∈ N with X i ≤ x + 1; see, e.g., Somé et al. [2] for more details in univariate and multivariate setups.

• 5 × 10
Scenario D comes from a mixture of three Poisson distributions f D (x) = 3
highlight the following ranking in performances: double Poisson, gamma-count and CoM-Poisson.

2 ,=0=
where f 0 (•) is the empirical or naive estimator.The double Poisson and the CoM-Poisson kernels are comparable and appear to be the best with h DP ise 0 = 0.01232621 and finally the gamma-count smoother with h GC ise 0

Figure 6 .
Figure 6.Empirical frequency with its corresponding gamma-count (GC), double Poisson (DP), binomial (B) and CoM-Poisson (CMP) kernel estimates of count dataset of insect pests on Hura trees with n = 51.

1 .
Under the squared error loss function, the Bayes estimator of the smoothing (tuning) parameter h is the mean of the previous posterior density given by h n (x) =

Table 1 .
Comparison of execution times (in seconds) for one replication of Scenario C using gamma-count (gc), double Poisson (dp), binomial (b) and CoM-Poisson (cmp) kernel estimates.Figure 5 depicts the true pmf and the smoothing ones using gamma-count, double Poisson, binomial and CoM-Poisson kernels with respect to Scenario C, and for one replication.The graphs show that, in general, the two new underdispersed count kernel estimators are accurate.

Table 2 .
Empirical mean values (×10 3 ) of ISE n with their standard deviations in parentheses over N sim = 500 replications and with different sample sizes n = 10, 25, 50, 100, 250, 500 under four Scenarios A, B, C and D by using gamma-count (gc), double Poisson (dp), binomial (b) and CoM-Poisson (cmp) kernel estimators with the ISE bandwidth selection.