On the Discretization of Continuous Probability Distributions Using a Probabilistic Rounding Mechanism

: Most existing ﬂexible count distributions allow only approximate inference when used in a regression context. This work proposes a new framework to provide an exact and ﬂexible alternative for modeling and simulating count data with various types of dispersion (equi-, under-, and over-dispersion). The new method, referred to as “balanced discretization”, consists of discretizing continuous probability distributions while preserving expectations. It is easy to generate pseudo random variates from the resulting balanced discrete distribution since it has a simple stochastic representation (probabilistic rounding) in terms of the continuous distribution. For illustrative purposes, we develop the family of balanced discrete gamma distributions that can model equi-, under-, and over-dispersed count data. This family of count distributions is appropriate for building ﬂexible count regression models because the expectation of the distribution has a simple expression in terms of the parameters of the distribution. Using the Jensen–Shannon divergence measure, we show that under the equidispersion restriction, the family of balanced discrete gamma distributions is similar to the Poisson distribution. Based on this, we conjecture that while covering all types of dispersions, a count regression model based on the balanced discrete gamma distribution will allow recovering a near Poisson distribution model ﬁt when the data are Poisson distributed.


Introduction
The regression analysis of count responses mostly relies on the Poisson model. However, the equidispersion (variance equals mean) assumption of the Poisson distribution makes Poisson regression inappropriate in many situations where data show overdispersion (variance greater than mean) or underdispersion (variance less than mean). Moreover, it has been observed that many data analyzed using overdispersion models (e.g., negative binomial [1]), which are as popular as the Poisson regression model, may be mixtures of overdispersed and underdispersed or equidispersed counts [2]. The implication is that appropriate alternatives to the Poisson model should allow variable dispersion, i.e., full dispersion flexibility [3]. Existing count regression models associated with variable dispersion exhibit some drawbacks. The first is improperly normalized probability mass functions for underdispersion situations (quasi-Poisson [4], Consul's generalized Poisson [5], and extended Poisson-Tweedy regressions [6]), which makes inference approximate with quasi-models. Another drawback is the lack of a simple expression for the model mean value (Conway-Maxwell-Poisson [7], double Poisson [8,9], gamma count [10], semi-nonparametric Poisson polynomial [11], and discrete Weibull [12] models). The latter drawback motivated some research works where quantities other than the mean were modeled, leading to hardly interpretable fits [11,13].
The development of discrete analogues of continuous probability distributions, which has received great attention in the last two decades, provides an attractive route for building count regression models with variable dispersion. A review by [14] offers a survey of the different methods with their specific application fields. An appealing characteristic of discrete analogues of continuous distributions is the generation of discrete pseudo random values, which only requires basic operations once the continuous distribution can be simulated. This simulation easiness is especially interesting for simulation based model evaluation [15] or parametric bootstrapping based inference [16].
Despite the various existing discretization methods, mean parameterizable approaches necessary to build easily interpretable regression models are rare. For reliability evaluation, the discretizing approach of [17] attempts to match the mean and the variance of the discrete and the related continuous variable, but it provides only an approximate solution at the cost of a tuning parameter. The proposals in [3,18,19] offer solutions for constructing count variables with a fixed mean value and variable dispersion, but they lack a physical basis, i.e., a generating mechanism to motivate their use in practice.
This work proposes a discretization procedure to start from continuous probability distributions and construct count models with (i) properly normalized probability mass functions for underdispersion, equidispersion, as well as overdispersion situations and (ii) simple expressions for the model mean values. As a result, the proposal allows full likelihood inference (as opposed to quasi-likelihood inference) for any dispersion level in observed data and is thus suited for regression analysis where the estimation of covariate effects on the mean count is of great interest.
The proposed discretization approach modifies the "discrete concentration" method, i.e., "Methodology IV" in [14], to preserve the expectation of the continuous distribution. Our proposal, referred to as "balanced discretization" is based on a probabilistic rounding mechanism, which provides a generating mechanism with a simple interpretation. Interestingly, the probabilistic rounding mechanism, expressed as a simple stochastic representation in terms of the continuous distribution, allows easily generating pseudo random variates from the resulting balanced discrete distribution.
The rest of the paper is organized as follows. Section 2 motivates and presents the balanced discretization method. The general expressions of the distribution functions and moments of balanced discrete distributions are given. The method is applied to the gamma distributions in Section 3 to produce the balanced discrete gamma distribution, which is compared to the discrete concentration of the gamma distribution and to the Poisson distribution. Concluding remarks are given in Section 4.

The Balanced Discretization Method
This section motivates and describes the balanced discretization method. The general expressions of the probability mass, the cumulative distribution, the survival and quantile functions, the moments, and the index of dispersion are presented. The link between balanced discretization and the mean-preserving discretization approach of [3] is also established. The proofs of the results are routine and given for completeness in Appendix A.

Reminders
First, we recall the discrete concentration method and the mean-preserving approach of [3]. Let CD(θ) be a continuous probability distribution of interest. The discrete concentration DC(θ) of X ∼ CD(θ) is the count variable Y with the pmf and suf: for y ∈ Z, i.e., Y has cdf F Y (y|θ) = F X (y + 1|θ) and quf Q Y (u|θ) = Q X (u|θ) for u ∈ (0, 1). Accordingly, the nth moment about zero of Y is: Clearly, the discrete concentration of X is simply Y  [20].
The mean-preserved discrete version Y of X is the variable with the cdf: and expectation E[Y] = E[X] [3].

Motivating Example and Definition
Example 1 (Measuring tree diameter). Discretization mechanisms arise when measuring any continuous quantity. Indeed, no sample can cover a whole continuum since the latter has an infinite number of points, and only a finite number of decimal places are reported in practice [14,21]. Assume for instance an operator measuring tree diameters X in a forest inventory frame, using a measurement device scaled to millimeters (mm). Since X is a continuous variable, the probability of observing X = x mm is zero. When the true value x of the diameter of a tree actually falls between two consecutive graduations z and z + 1, the operator reports either y = z mm or y = z + 1 mm, i.e., only a discretized version Y of X is observed. Beyond this example, when direct measures are taken, only the number of an arbitrary unit is actually counted. Clearly, the closer x is to z, the higher the probability of reporting y = z, and conversely, the closer x is to z + 1, the higher the probability of reporting y = z + 1. Balanced discretization results from assuming that given z ≤ x < z + 1, the probability for reporting y = z + 1 is exactly x − z.
Definition 1 (Balanced discretization). Let us consider an absolutely continuous probability distribution CD(θ) of interest. A count random variable Y is said to be distributed as the balanced discrete counterpart denoted BD(θ) of the continuous distribution CD(θ), if it has the stochastic representation: where z = x and r = x − z.
Let E X (n, y|θ) denote the nth partial moment: E X (n, y|θ) = y+1 y x n f X (x|θ)dx (6) of X over (y, y + 1), and set: The balanced discretization mechanism in Equation (5) preserves partial expectations E X (1, y|θ) of the continuous variable as shown by Equation (10) of the following lemma.
Lemma 1. Let X and Y be defined as in Equation (5). Then, for any y ∈ Z, P(Y = y and y ≤ X < y + 1) = (y where E Y [Y|X ∈ A] is the partial mean of Y for X ∈ A.

Probability Mass and Distribution Functions
We derive in this section some general distributional properties of balanced discrete distributions.
Proposition 1 (Distribution function). Let Y ∼ BD(θ). The pmf, the cdf, the suf, and the quf of Y are given for y ∈ Z and 0 ≤ u ≤ 1 by: Note from Equation (11) that BD(θ) assigns less probability mass to zero than the discrete concentration of X ∼ CD(θ) if X has support R + or (0, M) for M ∈ R + . Equation (13) emphasizes that the balanced discretization method does not preserve the suf of the continuous distribution, unlike the discrete concentration (see Equation (2)). Nevertheless, the balanced discrete cdf and suf satisfy the inequalities F X (y|θ) ≤ F Y (y|θ) ≤ F X (y + 1|θ) (with equalities when the support of X is upper bounded by y) and S X (y|θ) ≤ S Y (y − 1|θ) ≤ S X (y − 1|θ) (with equalities when the support of X is lower bounded by y).
By Equation (14), balanced discretization somewhat preserves the median of the continuous distribution. Indeed, if X has an integral median m X , then Y has median

Moments and Index of Dispersion
This section presents expressions for the moments of balanced discrete distributions. We start with the first two moments since they are the most important in a count regression context.

Proposition 2 (Mean and variance)
. Let X ∼ CD(θ) with mean µ X (θ) and variance σ 2 X (θ). The balanced discrete counterpart of X, Y ∼ BD(θ), has mean µ Y (θ) = µ X (θ) and variance: From Proposition 2, it appears that when the variance of a balanced discrete variable Y exists, it satisfies σ 2 ,μ being the mean of Y (exact or estimate). The following corollary infers the ID (index of dispersion) of a balanced discrete distribution from Proposition 2.
Furthermore, ζ 0 (θ) can be approximated with a tolerance α ∈ (0, 1) by the truncated sum: The next proposition shows the relation between moments of balanced discrete distributions and of discrete concentrations. Proposition 3. Let Y ∼ BD(θ) be the balanced discrete counterpart of X ∼ CD(θ). The nth moment of Y satisfies for n ∈ N + : where µ (n) Z (θ) is the nth moment of the discrete concentration Z = X (Equation (3)) and µ (i) ZU (θ) is the expectation of the product of Z i and U with U|X ∼ BE R(X − Z), given by µ

Conditional Distributions of Latent Continuous and Binary Outcomes
Although the balanced discrete distribution was motivated by the need for meanparameterizable flexible discrete probability distributions, it may be used to model any continuous outcome measured to fewer decimal places. In such instances, the conditional distribution and in particular the conditional mean of the underlying continuous distribution may be useful for predicting the continuous variable given an observed discrete value. In addition, since a balanced discrete variable is the observable feature of an underlying continuous outcome, a useful tool for maximum likelihood inference in complex models is the expectation-maximization algorithm [22], which handles any latent class-like model. In a Bayesian inference framework, the stochastic representation of the balanced discrete distribution can also be useful for sampling the posterior distribution of model parameters when draws from the truncated form of the continuous distribution are inexpensive. The following result provides expressions for these purposes. Proposition 4. Let X, U, and Y be defined as in Equation (5). Then, for y ∈ Z with probability mass f Y (y|θ) > 0: and for n ∈ R such that X n is well defined in both (y − 1, y) and (y, y + 1), where is the conditional mean (success probability) of the Bernoulli variable U given Y = y and I A (x) is the indicator function, which equals one if x ∈ A and 0 otherwise.
Note from Equation (22) that given the continuous variable, the distribution of the discrete variable does not depend on the parameter vector θ: Therefore, in the expectation-maximization algorithm framework, the maximization of the joint likelihood: of Y and X is reduced to the maximization of the likelihood f X (x|θ) of the continuous variable X. Hence, the expectation-maximization algorithm will be appropriate for fitting a balanced discrete distribution whenever fitting the underlying continuous distribution is easy.

Link with Mean-Preserving Discretization
Recall from Equation (4) that the cdf of the mean-preserved count variable has the form: From the identity F X (x|θ)dx = xF X (x|θ) − x f X (x|θ)dx, we have: Then, using the identity F X (y + 1|θ) = F X (y|θ) + [F X (y + 1|θ) − F X (y|θ)] straightforwardly results in: i.e., the cdf in Equation (12). Therefore, balanced discretization as defined in Equation (5) provides a generating mechanism for the mean-preserving method of [3].

The Balanced Discrete Gamma Family
The class of gamma distributions is a flexible class of distributions encountered in various statistical applications. This class of distributions includes as special cases the exponential, the one-parameter gamma, and up to rescaling the chi-squared distributions [23]. Chakraborty and Chakravarty [24] studied the discrete concentration of the gamma distribution with applications in biology and socio-economics. In order to allow exact inference in flexible count regression models, we apply in this section the balanced discretization method to gamma distributions. We present the balanced discrete gamma distribution under mean parametrization convenient for regression purposes and compare the distribution to the discrete concentration of the gamma distribution and to the Poisson distribution.
, and nth order partial moment E g (n, y|b, a) = E X [X n |y ≤ X < y + 1] given by: The one-parameter gamma distribution is obtained for a = 1 (equidispersion) and is denoted G(b).

The Balanced Discrete Gamma Distribution
A count random variable with support N is said to follow a balanced discrete gamma distribution denoted BG(µ, a) for (µ, a) ∈ R 2 + , if it is generated by the discretization mechanism in Equation (5) with X ∼ G(aµ, a). By Proposition 2, a BG(µ, a) variable has expectation µ. Using Equation (27), some properties of BG(µ, a) follow as in Corollary 2 hereafter.
Corollary 2 (Balanced discrete gamma distribution). Let Y ∼ BG(µ, a), and set b = aµ. Then, the pmf and the cdf of Y are respectively given for y ∈ N by: and the variance and the index of dispersion of Y are respectively given by: where: The pmf (28) and the cdf (29) follow by Equations (11) and (12) respectively along with Equation (27). Note that the computations of the pmf and the cdf of a balanced discrete gamma distribution only require a routine for the incomplete gamma ratio γ(·, ·), which is available in most statistical software as the cdf of the continuous gamma distribution (e.g., pgamma in the R freeware [25] and gamcdf in MATLAB [26]). The variance (30) and the index of dispersion (31) follow by Equation (15) along with Equation (27). Note that the variance term ζ g 0 (µ, a) in Equation (32) can be approximated via the truncation mechanism in Equation (19) with a tolerance α ∈ (0, 1) as: where ) the inverse function of the incomplete gamma ratio γ(·, b) (available, e.g., as qgamma in R and gaminv in MATLAB). The one-parameter BDG (balanced discrete gamma) distribution, denoted BG(µ) and obtained by setting a = 1, corresponds to a latent equidispersion mechanism and is marginally slightly overdispersed as indicated by Equation (31) with a = 1. Setting a = µ −1 produces the balanced discrete exponential distribution BE (µ), which is close to the geometric distribution since the latter corresponds to the discrete concentration of the exponential distribution [27]. Figure 1 displays the probability mass function of the BDG distributions with mean values µ = 2.5 and µ = 5 (computed using Equation (28) in R). It appears that the scale parameter a controls the shape of the distribution, allowing both unimodal and reverse J shapes. It can be observed that the spread of a BDG distribution BG(µ, a) decreases with a for fixed µ. The boxplots depicted in Figure 2 show that the skewness of the distribution (as measured by the difference between the mean and the median values) increases with its spread and thus decreases with a. As expected, both the length of the right tail of the distribution (Figure 1) and the probability of unusual (far from the mean and the median) observations ( Figure 2) increase with the spread.
The index of dispersion, which can assume any positive value, is computed using Equation (31) in R and depicted in Figure 3 for a ≥ 1. It can be observed in accordance with Equation (31) that for large mean values (µ ≥ 10), the variance-to-mean ratio is not very sensitive to µ for fixed, but not too large scale parameter values (a < 50). This also holds for very low scales (a < 0.5) for any mean value. Very large scales (a ≥ 50) induce, in addition to severe underdispersion (ID < 0.2), an oscillating index of dispersion with an amplitude approaching zero as µ increases. Indeed, for large-scale values (as a → ∞) and finite µ, the continuous gamma distribution converges to a point mass at µ. The balanced discrete gamma distribution thus converges to a binary variable, which takes the values y = µ with the probability 1 − (µ − µ ) and y = µ + 1 with the probability (µ − µ ) and has variance σ 2 dg (µ, ∞) = (µ − µ )(1 − µ + µ ). The oscillating pattern in Figure 3 accordingly shows that the variance is minimal when µ is an integer and maximal when µ is a half integer. When both µ and a are large, the amplitude of the oscillations of ID decreases, and ID approaches zero, in accordance with Equation (31).

Comparison with Some Alternatives
The balanced discretization approach results from a light modification of the discrete concentration method. This section assesses on the one hand to what extent the two discretization approaches differ, considering the balanced discrete gamma (BDG) distribution case. On the other hand, the difference between the Poisson and the BDG distributions is evaluated under both latent and marginal equidispersion restrictions.
Among the miscellaneous measures proposed to assess the similarities between probability distributions, the Jensen-Shannon divergence (JSD) [28] has many desirable properties that support its use in statistics [29]. The JSD is an information theory measure given for two pmfs q 1 (·) and q 2 (·) by [28]: with the convention q 1 (y) log 2 (q 1 (y)/(0.5q 1 (y) + 0.5q 2 (y))) = 0 if q 1 (y) = 0. The JSD measures the discrepancy between q 1 (·) and q 2 (·) in bits. It is bounded as 0 ≤ JSD(q 1 , q 2 ) ≤ 2 and is zero only if q 1 (y) = q 2 (y) ∀ y ∈ Z. The JSD values presented in this section are computed using Equation (35) using the R freeware. Figure 4 illustrates the balanced discretization method using the continuous gamma distribution with parameters a = 1, b = 5. Unlike the discrete concentration whose cdf lies above the continuous cdf, the balanced discrete distribution is constructed so that the continuous cdf interpolates the cdf of the balanced discrete distribution. The curves in Figure 5A show the JSD measure between the balanced discrete gamma and the corresponding discrete concentrations for mean count values µ ≤ 30. The selected scale values allow a wide range for the index of dispersion (ID), which roughly runs from 0.04 to 10. It can be observed that the JSD measure is relatively low (JSD < 0.80 bit) and decreases overall with the mean count (but not monotonically). In other words, the discrete concentration and balanced discretization methods produce similar discrete analogues of the considered continuous gamma distributions for large mean values. The JSD measure is especially low (JSD ≤ 0.10 bit) for equidispersed and overdispersed balanced discrete gamma distributions (a ≤ 1). High discrepancy (JSD ≥ 0.5 bit) actually appears between the discrete analogues from the two discretization methods generally in underdispersion situations with a very low mean count (µ < 1) or large scale parameter (a > 5, implying ID < 0.45). For very large scale values (a ≥ 30), JSD becomes erratic, oscillating between minima right before the integer values of µ and maxima right after the integer values of µ.

Distance to the Poisson Distribution under Equidispersion
The Poisson regression is the most common count regression model, which is appropriate for equidispersed count data. Although the BDG distribution does not include the Poisson distribution as a special case, the distribution can be restricted to allow equidispersion. This can be achieved by solving the nonlinear equation σ 2 dg (µ, a) = µ for a using Equation (30) (marginal equidispersion). However, the one-parameter BDG distribution BG(µ) offers a conceptually insightful alternative (latent equidispersion), which is analytically tractable (a = 1).
In order to determine which equidispersion balanced discrete gamma model (marginal vs. latent equidispersion) is the most appropriate when seeking the parsimonious flexible count regression model, the JSD measure was computed for fixed mean count µ between the Poisson distribution and the BDG distribution under marginal, as well as latent equidispersion restrictions. The results displayed in Figure 5B against the mean count indicate that when restricted to be equidispersed, the BDG distribution becomes similar to the Poisson distribution as per the low JSD values (JSD < 0.015 bit). It can be observed that the marginally equidispersed BDG distribution is the closest to the Poisson distribution only for a very low mean count (µ ≤ 1.1176). For a larger mean count (µ > 1.1176), the one-parameter BDG distribution is closer to the Poisson distribution than the marginally equidispersed BDG distribution, although the difference becomes unnoticeable for µ > 10.
It appears that the one-parameter BDG distribution based count regression model will be an effective parsimonious (few parameters and more tractable) model [30] that can be fit to observed data to check the appropriateness of an equidispersion model. Therefore, while a BDG regression model will allow exact inference in flexible count modeling, testing for latent equidispersion will allow recovering a near Poisson regression model when supported by observed data.

Conclusions
With a view toward allowing exact inference in flexible count regression models, this work describes balanced discretization, a method for simulating and modeling integer valued data starting from a continuous random variable, through the use of a probabilistic rounding mechanism. Most of the existing alternatives were built to conserve a specific characteristic of the continuous variable, e.g., the failure rate [31] and the survival [14] functions for modeling reliability data. Our proposal preserves expectation and is thus appropriate for count regression. The method is very close to the discretizing approach of [17], which also preserves the mean value, but requires an a priori double truncation of the continuous variable and introduces a tuning parameter. Physical interpretation is an important selection criterion for choosing an appropriate discretization method [32]. As such, our proposal is motivated by a real-world generating mechanism and provides a physical interpretation for the mean-preserving method of [3]. Although balanced discrete distributions can model any count data, it may not be appropriate for aging data for which the integer part or the ceil is generally used [14] so that discrete concentrations are a better choice.
The flexibility of the balanced discrete gamma family developed from the continuous gamma distribution illustrates the potential of the balanced discretization method for capturing any level of dispersion in observed count data. In addition to flexibility, the balanced discrete gamma family turns out to be similar to the Poisson distribution when restricted to be equidispersed (marginal equidispersion) and when constructed using an equidispersed continuous gamma distribution (latent equidispersion). Based on this, we conjecture that while covering all types of dispersion, a flexible count regression model based on the balanced discrete gamma distribution will allow recovering a near Poisson distribution model when the data are Poisson distributed. Future research will target the use of the balanced discrete distribution in count regression analysis. The extension of balanced discretization to a multivariate setting is also considered to handle count data grouped by some sampling units and mixtures of count and continuous responses.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: Index of dispersion JSD Jensen-Shannon divergence pdf probability density function pmf probability mass function quf quantile function suf survival function

Appendix A. Proofs of Lemmas and Propositions
Appendix A.1. Proof of Lemma 1 Proof of Lemma 1. By the definition in Equation (5), the probability of observing Y = y given that X = x with y ≤ x < y + 1 is 1 − r = 1 − x + y with r = x − y. Thus, the probability of observing Y = y and y ≤ X < y + 1 is the integral of [(y + 1) − x] f X (x|θ) with respect to x over (y, y + 1), i.e., P(Y = y and y ≤ X < y which proves Equation (8). Using the same argument on the probability of observing Y = y + 1 given that y ≤ X < y + 1 leads to P(Y = y + 1 and y ≤ X ≤ y + 1) equaling the integral of (x − y) f X (x|θ) with respect to x over (y, y + 1) yields Equation (9). Next, since Y is discrete and takes one of the two values y and y + 1 when y ≤ X < y + 1, the partial expectation of Y is: E Y [Y|y ≤ X < y + 1] = yP(Y = y and y ≤ X < y + 1) + (y + 1)P(Y = y + 1 and y ≤ X < y + 1) = P(Y = y + 1 and y ≤ X < y + 1) + y[P(Y = y and y ≤ X < y + 1) + P(Y = y + 1 and y ≤ X < y + 1)].