Modeling Count Distributions via Skewness–Kurtosis Orthogonal Expansions

Lee, Won-Woo; Lee, Ji-Hun; Lee, Jong-Seung; Ha, Hyung-Tae

doi:10.3390/math14091422

Open AccessFeature PaperArticle

Modeling Count Distributions via Skewness–Kurtosis Orthogonal Expansions

¹

Department of Applied Statistics, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si 13557, Gyeonggi-do, Republic of Korea

²

Department of Next Generation Smart Energy System Convergence, Gachon University, 1342 Seongnam-daero, Sujeong-gu, Seongnam-si 13557, Gyeonggi-do, Republic of Korea

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(9), 1422; https://doi.org/10.3390/math14091422

Submission received: 24 March 2026 / Revised: 14 April 2026 / Accepted: 16 April 2026 / Published: 23 April 2026

(This article belongs to the Special Issue Advances in Flexible Parametric Distributions for Modeling Skewness and Kurtosis)

Download

Browse Figures

Versions Notes

Abstract

We develop a semi-parametric framework for representing discrete probability mass functions through orthogonal polynomial representations. Classical count models, such as the Poisson and negative binomial distributions, impose restrictive structural assumptions that often fail to accommodate empirical features including heavy overdispersion, multimodality, and nonstandard tail behavior. To address these limitations, we introduce a linear-tilt model constructed from orthonormal polynomial systems associated with Poisson and negative binomial baselines, namely the Charlier and Meixner families. The proposed representation improves the baseline distribution using additional information from empirical moments. This allows the distribution to flexibly adjust its shape, capturing differences in skewness and kurtosis. We establish theoretical properties of the expansion within a weighted Hilbert space formulation, where the coefficients arise as orthogonal projections that can be expressed as expectations of the corresponding polynomial basis functions. In addition, we analyze approximation behavior and provide numerical bounds on the resulting numerical error and convergence properties of truncated approximations. The practical relevance of the proposed methodology is illustrated through applications to several empirical datasets, demonstrating its ability to capture complex distributional structures while preserving a tractable semi-parametric form.

Keywords:

overdispersion; count data analysis; semi-parametric modeling; discrete orthogonal polynomials; Charlier expansion; Meixner expansion; linear-tilt models

MSC:

62E10; 62E15; 62E17

1. Introduction

1.1. Motivation

The statistical modeling of discrete events, commonly referred to as count data analysis, serves as a fundamental component for quantitative research in actuarial science, financial econometrics, population biology, and industrial quality control. At the core of these applications is the need to describe probability mass functions (pmfs) that differ from the standard Poisson model. The Poisson distribution assumes that the mean and variance are equal, a property called equidispersion (

σ^{2} = μ

). In practice, real-world phenomena often exhibit overdispersion, where the variance exceeds the mean (

σ^{2} > μ

). The negative binomial distribution arises when the Poisson rate parameter is treated as a gamma-distributed random variable, and this hierarchical setup produces count data with variance exceeding the mean, unlike the standard Poisson model.

Classical Poisson and negative binomial models place restrictive assumptions on the relationship between the mean and variance. In practice, count data often depart from these assumptions, showing heavy overdispersion, underdispersion (

σ^{2} < μ

), multimodality, or unusual tail behavior. Overdispersion occurs when the variance exceeds the mean, leading to greater variability than the Poisson model allows, whereas underdispersion refers to the opposite case in which the variance is smaller than the mean and the counts are more tightly concentrated. Multimodality typically occurs when observations are generated by a mixture of distinct subpopulations or mechanisms, leading to multiple modes in the empirical distribution, whereas standard Poisson and negative binomial distributions are unimodal and thus unable to accommodate more than one dominant peak. Tail behavior concerns the frequency of very large (or very small) counts; the empirical distribution may exhibit much heavier tails than the negative binomial model implies, with more extreme values than expected, or much lighter or truncated tails, where extreme counts are virtually absent, and such discrepancies in the tails can lead to substantial model misfit even when the mean and moderate-range counts are reasonably well captured.

1.2. Literature Review

The Poisson and negative binomial distributions form the canonical modeling framework for count data and have been widely studied and applied in both theoretical and empirical research. In particular, the negative binomial distribution has become the standard baseline model for handling moderate overdispersion relative to the Poisson model [1,2,3]. While the Poisson and negative binomial distributions provide canonical models for count data, they impose relatively rigid mean–variance structures. In practice, observed counts often deviate from these assumptions through phenomena such as pronounced overdispersion, underdispersion, multimodality, and atypical tail behavior. As a result, various extensions—including generalization, contagion, mixture, and weighting schemes—have been developed.

Generalized Poisson models introduce a dispersion parameter to allow the variance to differ from the mean, accommodating overdispersion [4,5]. The Conway–Maxwell–Poisson (COM-Poisson) distribution provides additional flexibility by modeling both overdispersion and underdispersion [5,6]. Other extensions, such as the hyper-Poisson and certain truncated Poisson models, have also been proposed to capture different dispersion patterns and tail behaviors [4,7]. Weighted Poisson distributions and generalized fractional Poisson models further address situations with extreme overdispersion and pronounced skewness [7]. In particular, COM-Poisson and truncated variants can also model lighter or truncated tails, offering more precise control over distributional shape [7]. A flexible nonparametric approach to modeling heterogeneous count data is discussed to represent the distribution as a mixture of simpler component distributions. Finite mixtures of Poisson and negative binomial distributions explicitly model multiple modes together with overdispersion [8]. Contagious distributions, such as the Thomas and Pólya–Aeppli models, can naturally generate bimodality [9,10]. These mixture and contagious models are particularly useful for capturing heavy tails and multimodal patterns observed in empirical count data [8,10].

In addition, there have been recent developments in semi-parametric approaches for discrete distributions based on orthogonal polynomial expansions. Ha [11] applied Krawtchouk polynomial expansions to discrete distributions, demonstrating their ability to approximate overdispersed and skewed data with improved tail behavior. Ha [12] further proposed a Charlier-series-based adjustment for modeling nonhomogeneous Poisson processes. Lee and Ha [13] extended this framework by developing a maximum-likelihood approach for the zero-inflated polynomial-adjusted Poisson distribution, enabling efficient parameter estimation while accommodating excess zeros and dispersion beyond the classical Poisson model.

Recent reviews discuss unified frameworks and extensions of Poisson models. For example, weighted Poisson distributions provide a general framework for modeling both overdispersion and underdispersion [14]. Surveys by Azevedo et al. [15], Sellers and Shmueli [6], and Cahoy et al. [7] review major Poisson alternatives such as the negative binomial, generalized Poisson, and Conway–Maxwell–Poisson models, along with diagnostic methods. Contagious distributions for overdispersion, bimodality, and tail behavior are discussed by Coly et al. [10], while Inouye et al. [16] reviews multivariate mixture models. While existing count models often introduce new parametric families to address specific dispersion patterns or distributional shapes, the present work takes a different route by developing a semi-parametric orthogonal-expansion framework that adjusts a chosen baseline distribution through higher-order moments.

1.3. Research Proposition

To address these complexities, we adopt discrete orthogonal polynomial expansions, which provide a rigorous mechanism for systematically adjusting baseline distributions of the classical negative binomial distributions to match higher-order moment structures such as skewness and kurtosis. We develop a theoretical framework for discrete orthogonal expansions and examine in detail negative binomial–Meixner systems. The study of discrete orthogonal polynomials is deeply rooted in the search for flexible frequency curves that can fit empirical data. Historically, the Pearson system of distributions provided a continuous framework for such modeling, while the Meixner systems emerged as the discrete analogues [17,18]. These polynomials are defined by their orthogonality with respect to specific weight functions of the Negative Binomial distribution for Meixner polynomials. Real-world phenomena often exhibit overdispersion, or complex multimodal structures that simpler parametric models cannot adequately capture. While traditional remedies such as the negative binomial model introduce additional parameters to address dispersion, they often remain limited in capturing tail risk and higher-order moment deviations. This motivates a semi-parametric framework based on orthogonal polynomial expansions, which extends a baseline distribution by incorporating information from skewness, kurtosis, and other higher-order moments as needed to capture the complexity of the target distribution.

In this paper, we develop a rigorous formulation of discrete orthogonal expansions based on the following mathematical proposition. To fix notation, let w denote a baseline probability mass function on

N = {0, 1, 2, \dots}

, let

ψ (x) = {(ψ_{1} (x), \dots, ψ_{K} (x))}^{⊤}

be a vector of orthonormal polynomial basis functions, and let

θ = {(θ_{1}, \dots, θ_{K})}^{⊤}

be the corresponding coefficient vector; formal definitions are given in Section 2.

Semi-parametric expansion. For any target pmf p satisfying the $ℓ^{2} (w)$ condition, the semi-parametric expansion can be represented as a truncated linear-tilting model

$p_{θ} (x) = w (x) (1 + θ^{⊤} ψ (x)) .$

This representation defines a valid pmf if and only if the coefficient vector $θ$ belongs to the closed convex feasible set $C_{K}$ , which guarantees the non-negativity condition

$\forall x \in N, 1 + θ^{⊤} ψ (x) \geq 0 .$
Parameter estimation via orthogonal projection. For any target pmf p satisfying the $ℓ^{2} (w)$ condition, a unique orthogonal expansion exists with respect to a complete orthonormal polynomial basis ${ψ_{n}}_{n \geq 0}$ . The expansion coefficients are determined by the orthogonal projection of p onto the basis functions, given by

${\hat{θ}}_{n} = E_{p} [ψ_{n} (X)] .$
$L_{1}$ computational error and optimal truncation degree. Given a truncated expansion of degree K, the discrepancy between the target pmf p and the approximating model $p_{θ}$ can be quantified using the $L_{1}$ distance

$∥ p - p_{θ} ∥_{L_{1}} = \sum_{x \in N} | p (x) - p_{θ} (x) | .$

The optimal truncation degree K may then be selected by minimizing this error.

2. Semi-Parametric Count Distributions via Orthogonal Polynomial Expansions

2.1. Discrete Hilbert-Space Expansions

Let

N = {0, 1, 2, \dots}

. Fix a baseline probability mass function (pmf)

w : N \to (0, 1)

satisfying

\sum_{x = 0}^{\infty} w (x) = 1 and w (x) > 0 \forall x \in N .

(1)

We write

Y \sim w

and denote expectation with respect to w by

E_{w} [f (Y)] : = \sum_{x = 0}^{\infty} f (x) w (x),

(2)

whenever the series is absolutely convergent. The full-support condition in (1) implies that any other pmf p on

N

is absolutely dominated families of distributions with respect to w, so the likelihood ratio

p (x) / w (x)

is well-defined for every

x \in N

.

We obtain that the weighted square-integrable function space consists of all functions

f : N \to R

such that

\sum_{x \in N} w (x) f {(x)}^{2} < \infty

, where w denotes a positive reference probability mass function serving as the weighting measure. Then, we define the weighted square-integrable function space

ℓ^{2} (w) : = \{f : N \to {R : ∥ f ∥}_{ℓ^{2} (w)}^{2} < \infty\}, {∥ f ∥}_{ℓ^{2} (w)}^{2} : = \sum_{x = 0}^{\infty} f {(x)}^{2} w (x) = E_{w} [f {(Y)}^{2}],

(3)

and the inner product in the space

{〈 f, g 〉}_{w} : = \sum_{x = 0}^{\infty} f (x) g (x) w (x) = E_{w} [f (Y) g (Y)] .

(4)

ℓ^{2} (w)

is the space of all functions whose weighted average squared value is finite, where the baseline distribution w serves as the weighting measure. This condition ensures that the orthogonal expansion coefficients are well-defined and that the resulting series converges in a meaningful sense. Because w is a probability measure on a countable set,

ℓ^{2} (w)

is precisely the usual discrete Hilbert space

L^{2} (N, w)

, and is therefore complete under the norm induced by (4). In this work, we specifically focus on the Charlier–Meixner systems corresponding to Poisson and negative binomial baseline distributions.

2.2. Discrete Orthonormal Polynomial Systems

An orthonormal polynomial system with respect to w is a sequence of degree n, denoted by

{ψ_{n}}_{n \geq 0} \subset ℓ^{2} (w)

satisfying:

ψ_{n} \in P, deg (ψ_{n}) = n, and {〈 ψ_{n}, ψ_{m} 〉}_{w} = δ_{n m} for all m, n \geq 0,

(5)

where

deg (\cdot)

is the degree of a polynomial and

δ_{n m}

is the orthogonality factor. We adopt the normalization

ψ_{0} (x) \equiv 1

, which is the unit norm since we use Poisson and negative binomial baseline distributions, that is,

∥ ψ_{0} ∥_{ℓ^{2} (w)}^{2} = \sum_{x} w (x) = 1

. Then, for every

n \geq 1

, orthogonality to the constant function implies the zero-mean property

E_{w} [ψ_{n} (Y)] = {〈 ψ_{n}, ψ_{0} 〉}_{w} = 0, Y \sim w .

(6)

The requirement

deg (ψ_{n}) = n

ensures that the system is aligned with polynomial order and that

span {ψ_{0}, \dots, ψ_{K}}

corresponds to the space of polynomial corrections up to order K relative to w.

We require the following two assumptions on existence and completeness. For the existence assumption, every monomial belongs to

ℓ^{2} (w)

:

x^{n} \in ℓ^{2} (w) \forall n \geq 0 ⟺ \sum_{x = 0}^{\infty} x^{2 n} w (x) < \infty \forall n \geq 0 .

(7)

Under (7), the sequence

{1, x, x^{2}, \dots}

lies in

ℓ^{2} (w)

, and applying the Gram–Schmidt orthogonalization procedure yields an orthonormal polynomial system

{ψ_{n}}_{n \geq 0}

. For completeness assumption, the orthonormal polynomial system

{ψ_{n}}_{n \geq 0}

is complete in

ℓ^{2} (w)

, i.e., its closed linear span coincides with

ℓ^{2} (w)

. Classical baseline distributions such as the Poisson and negative binomial satisfy these two assumptions of existence and completeness.

2.3. Expansion Model and Coefficient Identification

Proposition 1.

Let w be a baseline pmf on

N

and let

{ψ_{n}}_{n \geq 0}

be an orthonormal polynomial system in

ℓ^{2} (w)

with

ψ_{0} (x) \equiv 1

. For a fixed order

K \geq 1

, define the truncated linear-tilt model

p_{θ} (x) : = w (x) (1 + \sum_{n = 1}^{K} θ_{n} ψ_{n} (x)) = w (x) (1 + θ^{⊤} ψ (x)),

(8)

where

θ = {(θ_{1}, \dots, θ_{K})}^{⊤}

and

ψ (x) = {(ψ_{1} (x), \dots, ψ_{K} (x))}^{⊤}

. Then

p_{θ}

is a probability mass function on

N

if and only if

1 + θ^{⊤} ψ (x) \geq 0, \forall x \in N .

Equivalently, θ must lie in the feasible parameter set

C_{K} : = \{θ \in R^{K} : 1 + θ^{⊤} ψ (x) \geq 0, \forall x \in N\} .

(9)

C_{K}

is the set of coefficient vectors for which the truncated linear-tilt model remains a valid probability mass function. Moreover,

C_{K}

is a closed and convex subset of

R^{K}

, and

0 \in C_{K}

.

Proof.

p_{θ}

must satisfy non-negativity and normalization to be a valid pmf. First, the normalization constraint is automatically satisfied since

\sum_{x = 0}^{\infty} p_{θ} (x) = \sum_{x} w (x) + \sum_{n = 1}^{K} θ_{n} \sum_{x} w (x) ψ_{n} (x) = 1,

where, by orthonormality,

ψ_{0} (x) \equiv 1

and

\sum_{x = 0}^{\infty} w (x) ψ_{n} (x) = {〈 ψ_{n}, ψ_{0} 〉}_{w} = \{\begin{matrix} 1, & n = 0, \\ 0, & n \geq 1 . \end{matrix}

And, for non-negativity property, the feasible parameter set

C_{K} = \{θ \in R^{K} : 1 + θ^{⊤} ψ (x) \geq 0, \forall x \in N\}

is defined by linear inequality constraints.

C_{K} = ⋂_{x \in N} \{θ \in R^{K} : θ^{⊤} ψ (x) \geq - 1\} .

Each set in the above intersection is a closed convex half-space. Since arbitrary intersections of closed convex sets are closed and convex, it follows that

C_{K} \subseteq R^{K}

is closed and convex. Moreover, the vector

0

satisfies

1 + 0^{⊤} ψ (x) = 1 \geq 0, \forall x \in N,

so

0 \in C_{K}

. Therefore,

C_{K}

is nonempty. These properties follow from standard results in convex analysis and linear inequality systems [19,20]. □

Theorem 1.

Suppose two assumptions of existence and completeness hold. Then for any

f \in ℓ^{2} (w)

there exists a unique orthogonal expansion

f (x) = \sum_{n = 0}^{\infty} θ_{n} ψ_{n} (x)

with coefficients given by

θ_{n} = {〈 f, ψ_{n} 〉}_{w} = \sum_{x = 0}^{\infty} f (x) ψ_{n} (x) w (x) .

Each coefficient

{\hat{θ}}_{n}

measures how strongly the target distribution projects onto the corresponding basis pattern

ψ_{n}

; coefficients near zero indicate little contribution from that component. Moreover, the partial sums

S_{K} (x) = \sum_{n = 0}^{K} θ_{n} ψ_{n} (x)

converge to f in

ℓ^{2} (w)

as

K \to \infty

.

Since the

{ψ_{n}}

are also orthonormal in

ℓ^{2} (w)

(i.e.,

{〈 ψ_{m}, ψ_{n} 〉}_{w} = \sum_{x} w (x) ψ_{m} (x) ψ_{n} (x) = δ_{m n}

), this completeness is equivalent to

{ψ_{n}}_{n \geq 0}

forming an orthonormal basis for the Hilbert space

ℓ^{2} (w)

. In this case, every

f \in ℓ^{2} (w)

admits a unique series expansion

f = \sum_{n = 0}^{\infty} {〈 f, ψ_{n} 〉}_{w} ψ_{n}

that converges in the

ℓ^{2} (w)

-norm, with Parseval’s identity holding:

{∥ f ∥}_{ℓ^{2} (w)}^{2} = \sum_{n = 0}^{\infty} {〈 f, ψ_{n} 〉}_{w}^{2}

. The completeness assumption is standard for the classical discrete orthogonal polynomial families associated with Poisson (Charlier) and negative binomial (Meixner) baselines.

Orthogonal polynomial expansion provides a spectral decomposition of the Pearson’s

χ^{2}

divergence, and the truncation degree directly controls the

L_{1}

approximation error. The orthogonal expansion admits an energy interpretation, where the squared

ℓ^{2} (w)

norm of the likelihood ratio

g (x) = p (x) / w (x) - 1

represents the total expansion energy. This quantity coincides with Pearson’s

χ^{2}

divergence and yields a bound for the

L_{1}

distance.

Corollary 1.

Let p be a pmf satisfying the

ℓ^{2} (w)

condition and denote

g (x) = \frac{p (x) - w (x)}{w (x)}

. Then the total orthogonal expansion energy coincides with Pearson’s

χ^{2}

divergence between p and the reference distribution w:

{∥ g ∥}_{ℓ^{2} (w)}^{2} = \sum_{x = 0}^{\infty} {(\frac{p (x) - w (x)}{w (x)})}^{2} w (x) = \sum_{x = 0}^{\infty} \frac{{(p (x) - w (x))}^{2}}{w (x)} = χ^{2} (p ∥ w) .

Moreover, this Pearson’s

χ^{2}

divergence admits the spectral representation under the orthonormal system

{ψ_{n}}_{n \geq 0}

χ^{2} (p ∥ w) = \sum_{n = 1}^{\infty} {({\hat{θ}}_{n})}^{2},

where

{\hat{θ}}_{n} = E_{p} [ψ_{n} (X)]

are the orthogonal projection coefficients. If q is any probability mass function (in particular the truncated approximation

{\hat{p}}_{K}

when it remains nonnegative), then the

L_{1}

distance admits the bound

L_{1} (p, q) = \sum_{x} | p (x) - q (x) | \leq {∥ p - q ∥}_{ℓ^{2} (w^{- 1})} .

Consequently, whenever the truncated expansion

{\hat{p}}_{K}

defines a valid pmf, the approximation error in total variation is controlled by the tail energy of the orthogonal expansion:

L_{1} (p, {\hat{p}}_{K}) \leq {(\sum_{n = K + 1}^{\infty} {({\hat{θ}}_{n})}^{2})}^{1 / 2} .

This shows that the truncation error is governed by the residual

χ^{2}

energy contained in the higher-order orthogonal components.

Now we derive the expansions more specifically for two important baseline distributions. In particular, we consider the Poisson and negative binomial baselines by employing the Charlier and Meixner orthogonal polynomial systems, respectively.

For the specification of the Poisson–Charlier (PC) expansion, let the baseline distribution be Poisson with mean

μ > 0

:

w_{P} (x) = e^{- μ} \frac{μ^{x}}{x!}, x \in {0, 1, 2, \dots} .

Define the nth-degree Charlier polynomials

C_{n} (x; μ)

via the exponential generating function

\sum_{n = 0}^{\infty} C_{n} (x; μ) \frac{t^{n}}{n!} = e^{t} {(1 - \frac{t}{μ})}^{x},

or equivalently through the falling-factorial expansion

C_{n} (x; μ) = \sum_{k = 0}^{n} (\binom{n}{k}) {(- 1)}^{k} \frac{{(x)}_{k}}{μ^{k}}, {(x)}_{k} : = x (x - 1) \dots (x - k + 1), {(x)}_{0} : = 1 .

(10)

These polynomials satisfy the orthogonality relation

\sum_{x = 0}^{\infty} w_{P} (x) C_{n} (x; μ) C_{m} (x; μ) = \frac{n!}{μ^{n}} δ_{n m} .

Accordingly, the degree-n orthonormal Charlier basis function is given by

ϕ_{n} (x) : = \sqrt{\frac{μ^{n}}{n!}} C_{n} (x; μ), n \geq 0,

which satisfies

\sum_{x = 0}^{\infty} w_{P} (x) ϕ_{n} (x) ϕ_{m} (x) = δ_{n m} .

For a target pmf p and

X \sim p

, define the factorial moments

μ_{(k), p} : = E_{p} [{(X)}_{k}], k \geq 0 .

In particular, for the Poisson baseline

w_{P}

, the factorial moments are given by

μ_{(k), w_{P}} = μ^{k}, k \geq 0 .

Proposition 2.

Let p be a target pmf with

p / w_{P} \in ℓ^{2} (w_{P})

. Then the coefficients of the Poisson–Charlier semi-parametric expansion

{\hat{θ}}_{n} = E_{p} [ϕ_{n} (X)]

admit the explicit expression, for all

n \geq 0

,

{\hat{θ}}_{n} = \sqrt{\frac{μ^{n}}{n!}} E_{p} [C_{n} (X; μ)] = \sqrt{\frac{μ^{n}}{n!}} \sum_{k = 0}^{n} (\binom{n}{k}) {(- 1)}^{k} \frac{μ_{(k), p}}{μ^{k}} .

(11)

If μ is mean-matched, i.e.,

μ = E_{p} [X]

, and we define the factorial-moment deviations from the Poisson baseline by

Δ_{k} : = μ_{(k), p} - μ^{k},

then

{\hat{θ}}_{1} = 0

and

\begin{matrix} {\hat{θ}}_{2} & = \sqrt{\frac{μ^{2}}{2!}} (\frac{Δ_{2}}{μ^{2}}), \\ {\hat{θ}}_{3} & = \sqrt{\frac{μ^{3}}{3!}} (\frac{3 Δ_{2}}{μ^{2}} - \frac{Δ_{3}}{μ^{3}}), \\ {\hat{θ}}_{4} & = \sqrt{\frac{μ^{4}}{4!}} (\frac{6 Δ_{2}}{μ^{2}} - \frac{4 Δ_{3}}{μ^{3}} + \frac{Δ_{4}}{μ^{4}}) . \end{matrix}

Proof.

Equation (11) follows by substituting (10) into

{\hat{θ}}_{n} = \sqrt{\frac{μ^{n}}{n!}} E_{p} [C_{n} (X; μ)]

and applying linearity of expectation. Under mean matching

μ = μ_{(1), p}

, we have

Δ_{1} = 0

, so the

k = 1

term vanishes, yielding the simplified expressions. □

We can rewrite the coefficients in terms of central moments, which provides a direct decomposition in terms of dispersion, skewness, and kurtosis relative to the Poisson benchmark. Let

μ : = E_{p} [X], σ^{2} : = {Var}_{p} (X),

where, under mean matching,

μ

denotes the common mean of the Poisson baseline and the target distribution p, while

σ^{2}

denotes the variance of p. We further define the higher-order central moments

μ_{3} : = E_{p} [{(X - μ)}^{3}], μ_{4} : = E_{p} [{(X - μ)}^{4}] .

To facilitate interpretation, introduce the rescaled (orthogonal but not normalized) coordinates

{\tilde{ψ}}_{n} (x) : = \frac{μ^{n}}{n!} C_{n} (x; μ), {\tilde{θ}}_{n} : = E_{p} [{\tilde{ψ}}_{n} (X)] .

Then the relation between normalized and rescaled coefficients is

{\hat{θ}}_{n} = \sqrt{\frac{n!}{μ^{n}}} {\tilde{θ}}_{n} .

Under mean matching (

μ = E_{p} [X]

), the first four coefficients are given by

{\tilde{θ}}_{1} = 0,

{\tilde{θ}}_{2} = \frac{σ^{2} - μ}{2},

(12)

{\tilde{θ}}_{3} = \frac{3 σ^{2} - 2 μ - μ_{3}}{6},

(13)

{\tilde{θ}}_{4} = \frac{μ_{4}}{24} - \frac{μ_{3}}{6} + \frac{σ^{2}}{4} - \frac{μ^{2}}{8} - \frac{μ}{4} .

(14)

The representation (12) shows that

{\tilde{θ}}_{2}

measures deviation from Poisson equidispersion (

σ^{2} = μ

), while (13) and (14) capture skewness and kurtosis effects, respectively. Moreover, all coefficients vanish when

p = Poi (μ)

, confirming consistency with the Poisson baseline.

Now, we construct the negative binomial–Meixner (NBM) expansion. Let the baseline be negative binomial (NB) in the

(β, c)

parameterization:

w_{M} (x) = NB (x; β, c) = \frac{{(β)}^{(x)}}{x!} {(1 - c)}^{β} c^{x}, β > 0, c \in (0, 1),

where

{(β)}^{(x)} : = β (β + 1) \dots (β + x - 1)

is the rising factorial. The mean and variance are

μ_{M} = \frac{β c}{1 - c}, σ_{M}^{2} = \frac{β c}{{(1 - c)}^{2}} = \frac{μ_{M}}{1 - c} .

The Meixner polynomials

M_{n} (x; β, c)

admit the falling-factorial representation

M_{n} (x; β, c) = \sum_{k = 0}^{n} (\binom{n}{k}) {(- 1)}^{k} \frac{{(1 - c)}^{k}}{{(β)}^{(k)} c^{k}} {(x)}_{k} .

(15)

They satisfy the orthogonality relation

\sum_{x = 0}^{\infty} w_{M} (x) M_{n} (x; β, c) M_{m} (x; β, c) = h_{n} δ_{n m}, h_{n} = \frac{n!}{{(β)}^{(n)}} {(\frac{1 - c}{c})}^{n} .

Accordingly, the orthonormal basis is

φ_{n} (x) : = \frac{M_{n} (x; β, c)}{\sqrt{h_{n}}} = \sqrt{\frac{{(β)}^{(n)}}{n!}} {(\frac{c}{1 - c})}^{n / 2} M_{n} (x; β, c) .

Given a target pmf p with mean

μ_{p}

and variance

σ_{p}^{2}

, an NB baseline can match both provided

σ_{p}^{2} > μ_{p}

(overdispersion). Solving

μ_{M} = μ_{p}

and

σ_{M}^{2} = σ_{p}^{2}

yields

c = 1 - \frac{μ_{p}}{σ_{p}^{2}}, β = \frac{μ_{p}^{2}}{σ_{p}^{2} - μ_{p}} .

(16)

When (16) holds, the first two Meixner coefficients vanish, so the leading terms represent deviations in third- and fourth-order structure beyond what the NB baseline captures. The factorial moments are, for Y∼

NB (β, c)

,

E [{(Y)}_{k}] = {(β)}^{(k)} {(\frac{c}{1 - c})}^{k}, k \geq 0 .

Proposition 3.

Let p be a target pmf with

p / w_{M} \in ℓ^{2} (w_{M})

, and define

{\hat{θ}}_{n} : = E_{p} [φ_{n} (X)] .

Then, for all

n \geq 0

,

{\hat{θ}}_{n} = \frac{1}{\sqrt{h_{n}}} E_{p} [M_{n} (X; β, c)] = \frac{1}{\sqrt{h_{n}}} \sum_{k = 0}^{n} (\binom{n}{k}) {(- 1)}^{k} \frac{{(1 - c)}^{k}}{{(β)}^{(k)} c^{k}} μ_{(k), p}

(17)

If

(β, c)

are chosen by moment matching (16), then

{\hat{θ}}_{1} = {\hat{θ}}_{2} = 0,

and with

Δ_{(k)} : = μ_{(k), p} - μ_{(k), w_{M}}

,

\begin{matrix} {\hat{θ}}_{3} & = - \frac{1}{\sqrt{h_{3}}} \frac{{(1 - c)}^{3}}{{(β)}^{(3)} c^{3}} Δ_{(3)}, \\ {\hat{θ}}_{4} & = \frac{1}{\sqrt{h_{4}}} (- 4 \frac{{(1 - c)}^{3}}{{(β)}^{(3)} c^{3}} Δ_{(3)} + \frac{{(1 - c)}^{4}}{{(β)}^{(4)} c^{4}} Δ_{(4)}) . \end{matrix}

Proof.

Equation (17) follows by substituting (15) and applying linearity of expectation. Since

E_{w_{M}} [M_{n} (Y; β, c)] = 0

for

n \geq 1

, one may equivalently write

E_{p} [M_{n}] = \sum_{k = 1}^{n} (\binom{n}{k}) {(- 1)}^{k} \frac{{(1 - c)}^{k}}{{(β)}^{(k)} c^{k}} Δ_{(k)} .

Moment matching implies

Δ_{(1)} = Δ_{(2)} = 0

, leaving only the stated terms for

n = 3, 4

. □

3. Data Analysis

The theoretical framework developed above is now evaluated through empirical applications to datasets with increasingly complex count structures. To empirically validate the performance of the proposed linear-tilt expansions, we conduct experiments across three distinct datasets. These datasets also differ substantially in sample size, consisting of FIFA World Cup match scores (

n = 964

), Medical Insurance records (

n = 1338

), and Sepsis Lab Count observations (

n = 13, 987

). These three cases were selected to provide a step-wise challenge to the robustness of our models, as detailed in the following rationale and the metadata in Table 1:

Baseline Validation (FIFA): A stable and slightly over-dispersed case as a near-Poisson benchmark to test.
Shape Flexibility Test (Insurance): A moderate over-dispersion case with a bimodal structure to evaluate the model’s ability to capture complex shape.
Extreme Stress Test (Sepsis): A high-volatility scenario with extreme Kurtosis to test the structural limits of the framework under massive over-dispersion.

Table 2 presents the fundamental statistical properties of the chosen datasets, highlighting the transition from Poisson-like stability to complex states with heavy tail characteristics.

And the analysis is designed to evaluate the robustness of the framework through increasingly complex stochastic scenarios. Beyond final fit accuracy, the study is structured to assess the convergence behavior of the models as the expansion complexity (degree K) increases. Our experimental design consists of the following components:

Data Selection Criteria: To provide a step-wise challenge, we utilize the Variance-to-Mean Ratio (V/M) as the primary criterion for selecting datasets. By progressing from near-equidispersion ( $V / M \approx 1$ ) to extreme over-dispersion ( $V / M > 30$ ), we define the structural difficulty of each estimation task.
Performance and Convergence Analysis: The efficacy of the models is measured using the $L_{1}$ discrepancy ( $L_{1}$ error). Crucially, we track the evolution of the $L_{1}$ error relative to the expansion degree (K) to verify numerical stability and identify the optimal degree for high-fidelity shape recovery. The optimal truncation degree is selected as $K^{*} = arg {min}_{K} L_{1} (K)$ , where $L_{1} (K)$ denotes the $L_{1}$ discrepancy at degree K.

Throughout this study, all computations, parameter estimation, and figure generation were performed in Python (v3.14) using Matplotlib (v3.10.8) for visualization, NumPy for numerical operations, and SciPy for statistical computations.

3.1. Case 1: FIFA World Cup Goal Counts

The FIFA dataset represents a low over-dispersion scenario, serving as a benchmark where standard count models such as the Poisson, COM-Poisson, or Negative Binomial (NB) distributions are typically considered sufficient. However, as shown in Table 3 and Table 4, our proposed expansions demonstrate superior performance even in this case. By applying the linear-tilt mechanism, both the PC and NBM models successfully improve their respective baselines, achieving significantly lower

L_{1}

errors as illustrated in Figure 1. For reference, the COM-Poisson model also improves substantially over the Poisson baseline, but it remains less accurate than the PC and NBM expansions in this benchmark setting.

The coefficient profiles in Table 3 confirm the efficiency of this estimation. For the PC model, the active use of

{\hat{θ}}_{2} = 0.2229

corrects the mild over-dispersion that the Poisson baseline fails to capture. Similarly, the NBM model refines the NB baseline by adjusting for higher-order moments through

{\hat{θ}}_{3}

and

{\hat{θ}}_{4}

. This improvement proves that our approach is robust, providing high-fidelity shape recovery even in scenarios where standard models are already expected to perform well.

Beyond the initial comparison, we further analyzed the convergence properties of the NBM framework across increasing degrees (K). The transition of

L_{1}

accuracy relative to the expansion order is presented in Figure 2 and Table 5.

In this stable and low-noise regime, we identify

K = 3

as the optimal degree for practical approximation. Although the global minimum of the

L_{1}

error is reached at

K = 14

, the third-order expansion is sufficient to correct the dominant asymmetry (skewness) in the dataset. Since the Poisson-based baseline already effectively captures the central tendency of FIFA goals,

K = 3

achieves a high-fidelity fit with minimal complexity. Selecting this concise degree avoids the risk of over-fitting empirical noise while ensuring that the model remains structurally stable. Furthermore, the steady reduction in error as K rises effectively alleviates concerns regarding numerical divergence, proving that our linear-tilt framework maintains reliable convergence even in stable count data regimes.

3.2. Case 2: Medical Insurance Charges

To evaluate the model’s flexibility in capturing multimodal structures, we utilize the Medical Insurance dataset. Unlike the previous case, the original data consists of continuous expenditure records. To transform this into a suitable format for count data analysis, we performed a specific preprocessing step: the continuous variable of “charges” was discretized into intervals of $1500. Specifically, each observation x was transformed into a discrete count value

y = ⌊ x / 1500 ⌋

, where

⌊ \cdot ⌋

denotes the floor function. This discretization was introduced to place the insurance data within a count-data framework while preserving the overall distributional shape of the original charges. The chosen interval width provides a practically interpretable scale and avoids an excessively sparse support that would arise from overly discretization. This binning process effectively generates a discretized count distribution that exhibits moderate over-dispersion (

V / M \approx 7.8

) and a distinct bimodal structure, providing a rigorous test for the model’s ability to capture complex empirical shapes.

As shown in Table 6 and Table 7, our proposed expansions are tested on their ability to capture such multimodal shapes, with the resulting PMF estimation illustrated in Figure 3.

In this dataset, Table 6 reflects the effort of the Charlier expansion to improve the Poisson base toward the bimodal structure shown in Figure 3. While the large magnitude of

{\hat{θ}}_{2} = 4.8114

highlights the model intent to expand the unimodal Poisson shape to match the empirical distribution, it leads to increased complexity in higher-order terms such as

{\hat{θ}}_{4} = 36.3246

, resulting in a higher

L_{1}

error as shown in Table 7. In contrast, the NBM expansion demonstrates superior performance, maintaining stable coefficients and providing a more refined fit to the bimodal contours. To resolve the bimodal complexity more effectively, we examine the convergence profile of the NBM framework relative to the expansion degree (K), as presented in Figure 4 and Table 8.

In this bimodal case, the degree of expansion plays a crucial role in capturing complex features. As shown in the convergence results, we observe a meaningful performance gap at the 7th degree, where the

L_{1}

error reaches its optimal point. These higher-order terms are essential for capturing the secondary peak of the distribution, which lower-order expansions fail to resolve. The continuous improvement in accuracy without numerical oscillation proves that the NBM framework remains robust while providing high-fidelity shape recovery, ensuring that the model converges reliably even as complexity increases.

3.3. Case 3: Sepsis Lab Test Counts

The Sepsis dataset serves as an extreme stress test, representing a scenario with massive over-dispersion (

V / M \approx 37.17

). To construct a robust count variable for clinical analysis, we aggregated the total frequency of 26 different laboratory test variables per patient. Additionally, to mitigate the influence of extreme outliers, we applied a 99.5th percentile capping, removing the top 0.5% of the data. This preprocessing results in a distribution with extreme kurtosis and long-tail complexity, testing the structural limits of our framework. As shown in Table 9 and Table 10, the performance gap between the baselines is significant, and the resulting NBM estimation is illustrated in Figure 5.

The strength of the NBM framework is absolute in this extreme regime. To adjust for massive over-dispersion, the Charlier system undergoes significant coefficient strain, with

{\hat{θ}}_{4}

skyrocketing as shown in Table 10. This indicates that the adjustment has moved beyond the boundaries of the feasible set

C_{K}

. In contrast, the NBM expansion achieves superior estimation by maintaining stable and efficient coefficients, effectively capturing the complex clinical variability depicted in Figure 5.

To further investigate the robustness of this fit, we analyze the convergence of the NBM framework relative to the expansion degree (K). The results are presented in Figure 6 and Table 11.

In this high-volatility case, the degree of expansion plays a crucial role in capturing the extreme features of the long-tail distribution. Specifically, the transition from skewness (

K = 3

) to kurtosis (

K = 4

) marks a critical threshold for a precise fit, resulting in a substantial reduction in

L_{1}

error. We observed that the

L_{1}

error reaches its optimal point at

K = 9

, effectively recovering the complex shape of the sepsis data. Crucially, the accuracy gains are sustained without any signs of numerical divergence as K increases up to 14, proving that the NBM framework remains stable and provides a rigorous and reliable tool even under extreme volatility.

3.4. Discussion

The empirical evidence across the three cases validates the linear-tilt mechanism as a robust semi-parametric tool for capturing complex variability. The performance of the Poisson–Charlier (PC) system proves its suitability for low-variance data, such as the FIFA dataset. In these stable regimes, a low-order expansion (

K = 3

) is sufficient to correct mild asymmetry without losing numerical stability. However, the PC expansion faces clear limitations as the

V / M

ratio increases. In high-variance scenarios, the Charlier polynomials attempt to compensate for the fundamental lack of dispersion in the Poisson baseline by stretching toward extreme empirical moments. The truncated PC expansion requires higher-order corrections to approximate the empirical shape, making it increasingly difficult to preserve the non-negativity required for a valid probability mass function. This places strain on the approximation near the boundary of the feasible set

C_{K}

and can lead to numerical instability. In contrast, the Negative Binomial–Meixner (NBM) system provides a structurally superior framework for high-volatility processes. Our convergence analysis shows that the NBM expansion maintains reliable convergence even as K increases up to 14. This stability allows the model to capture complex features that lower-order models fail to recognize. For instance, the transition from

K = 4

to

K = 7

in the Insurance dataset was essential for resolving multimodality, while the shift to

K = 9

in the Sepsis dataset was critical for capturing extreme long-tail behavior. These computational findings show that the NBM framework avoids the numerical divergence often feared in higher-order expansions, ensuring that the model remains within a stable feasible region. This proves that the choice of K should be driven by the structural complexity of the data, such as bimodality or heavy tails. From a practical standpoint, the choice of baseline distribution may be guided by the variance-to-mean ratio of the data: the Poisson–Charlier expansion is suitable when dispersion is mild, whereas the Negative Binomial–Meixner expansion is preferable for more strongly overdispersed settings. In addition, the truncation degree K should be increased gradually while monitoring both improvement in

L_{1}

fit and numerical stability.

4. Conclusions

This research establishes a rigorous mathematical framework for the expansion of discrete probability mass functions using orthonormal polynomial systems associated with Poisson and Negative Binomial baselines and proposes a flexible semi-parametric modeling approach that extends these classical families to address well-known limitations such as heavy overdispersion, underdispersion, multimodality, and complex tail behavior. By framing these expansions through the lens of a “linear-tilt” model, we demonstrate that a target distribution can be systematically estimated to match higher-order empirical moments through the simple expectation of basis polynomials,

{\hat{θ}}_{n} = E_{p} [ψ_{n} (X)]

. This methodology bridges the gap between traditional parametric modeling and fully nonparametric density estimation, providing a semi-parametric toolset that isolates specific distributional anomalies like skewness and kurtosis. The expansion of a target probability mass function into a series of discrete orthogonal polynomials offers a rigorous path toward high-fidelity statistical modeling. By leveraging the properties of Charlier and Meixner bases, practitioners can systematically account for the deviations in dispersion, skewness, and kurtosis that plague simpler Poisson-based models.

The computational experiments on the three datasets demonstrate that the proposed linear-tilt expansion performs effectively across heterogeneous count-data settings. In particular, the empirical results show that orthogonal polynomial corrections can recover important distributional features while retaining numerical tractability. In particular, the framework enables systematic shape recovery while retaining numerical tractability. A central lesson from the experiments is the importance of balancing baseline simplicity with numerical stability, a balance that is further governed by the choice of the truncation degree K.

For low-variance data, such as the FIFA dataset, the Poisson–Charlier (PC) system performs well. In this relatively stable regime, a low-order expansion (

K = 3

) is sufficient to correct mild skewness and small departures from the Poisson baseline while preserving numerical stability. The PC system therefore serves as an efficient correction mechanism when the empirical variance is close to the mean. However, the experiments also reveal an inherent limitation of the PC expansion when the variance-to-mean ratio increases. As the dispersion grows, the Charlier basis attempts to compensate for the insufficient variability of the Poisson baseline by amplifying higher-order coefficients. This process drives the parameter vector toward the boundary of the feasible set

C_{K}

, creating boundary strain and eventually leading to numerical instability. In contrast, the Negative Binomial–Meixner (NBM) expansion exhibits substantially greater robustness for highly dispersed data. The numerical results indicate that the NBM system maintains stable convergence even when the expansion degree increases. This stability allows the model to recover structural features that cannot be captured by low-order approximations. For example, in the Insurance dataset the increase from

K = 4

to

K = 6

was essential for resolving secondary peaks in the empirical distribution. Similarly, in the Sepsis dataset the use of

K = 4

enabled the model to reproduce the pronounced long-tail behavior characteristic of the data.

Future work will focus on further theoretical analysis of the geometric structure underlying the orthogonal expansion framework, as well as principled strategies for selecting the truncation degree while preserving numerical stability. A central remaining challenge, however, lies in ensuring that the truncated expansions remain within the feasible set

C_{K}

, thereby preserving the non-negativity required for valid probability mass functions. Future work may also investigate the development of estimation procedures within this framework, including likelihood-based, least-squares, and moment-based approaches such as maximum likelihood estimation, least squares estimation, and generalized method of moments, in order to provide systematic inference tools for the proposed semi-parametric expansion models. Higher-order expansions in which coefficient estimation and feasibility enforcement may become demanding shall be considered for the cases of increasing computational complexity and numerical efficiency.

Author Contributions

Conceptualization, W.-W.L. and H.-T.H.; Methodology, W.-W.L. and H.-T.H.; Software, W.-W.L., J.-H.L. and J.-S.L.; Validation, W.-W.L., J.-H.L. and J.-S.L.; Formal analysis, W.-W.L.; Investigation, W.-W.L. and J.-S.L.; Data curation, W.-W.L. and J.-H.L.; Writing—original draft, W.-W.L.; Writing—review and editing, J.-S.L. and H.-T.H.; Visualization, W.-W.L. and J.-H.L.; Supervision, H.-T.H.; Project administration, H.-T.H.; Funding acquisition, H.-T.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the IITP (Institute of Information & Communications Technology Planning & Evaluation)-ITRC (Information Technology Research Center) grant funded by the Korea government (Ministry of Science and ICT) (IITP-2026-00259004).

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cameron, A.C.; Trivedi, P.K. Regression Analysis of Count Data, 2nd ed.; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar] [CrossRef]
Hilbe, J.M. Negative Binomial Regression, 2nd ed.; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar] [CrossRef]
Lawless, J.F. Negative binomial and mixed Poisson regression. Can. J. Stat. 1987, 15, 209–225. [Google Scholar] [CrossRef] [PubMed]
Consul, P.C.; Famoye, F. Generalized Poisson regression model. Commun. Stat. Theory Methods 1992, 21, 89–109. [Google Scholar] [CrossRef]
Shmueli, G.; Minka, T.P.; Kadane, J.B.; Borle, S.; Boatwright, P. A Useful Distribution for Fitting Discrete Data: Revival of the Conway–Maxwell–Poisson Distribution. J. R. Stat. Soc. Ser. C (Appl. Stat.) 2005, 54, 127–142. [Google Scholar] [CrossRef]
Sellers, K.F.; Shmueli, G. A Flexible Regression Model for Count Data. Ann. Appl. Stat. 2010, 4, 943–961. [Google Scholar] [CrossRef]
Cahoy, D.; Di Nardo, E.; Polito, F. Flexible models for overdispersed and underdispersed count data. Stat. Pap. 2021, 62, 2969–2990. [Google Scholar] [CrossRef]
Cross, J.L.; Hoogerheide, L.; Labonne, P.; van Dijk, H.K. Flexible Negative Binomial Mixtures for Credible Mode Inference in Heterogeneous Count Data from Finance, Economics and Bioinformatics; Tinbergen Institute Discussion Paper, No. TI 2024-056/III; Tinbergen Institute: Amsterdam, The Netherlands, 2024. [Google Scholar]
Gregory, G.G. Multivariate Cluster-Sum Distributions. Sankhyā Indian J. Stat. Ser. B 1997, 59, 261–276. [Google Scholar]
Coly, S.; Yao, A.-F.; Abrial, D.; Charras-Garrido, M. Distributions to model overdispersed count data. J. Soc. Fr. Stat. 2016, 157, 39–63. [Google Scholar]
Ha, H.-T. Krawtchouk Polynomial Approximation for Binomial Convolutions. Kyungpook Math. J. 2017, 57, 493–502. [Google Scholar] [CrossRef]
Ha, H.-T. Charlier Series Approximation for Nonhomogeneous Poisson Processes. Commun. Stat. Appl. Methods 2024, 31, 645–659. [Google Scholar] [CrossRef]
Lee, J.-S.; Ha, H.-T. Maximum-Likelihood Estimation for the Zero-Inflated Polynomial-Adjusted Poisson Distribution. Mathematics 2025, 13, 2383. [Google Scholar] [CrossRef]
Harris, T.; Yang, Z.; Hardin, J.W. Modeling underdispersed count data with generalized Poisson regression. Stata J. 2012, 12, 736–747. [Google Scholar] [CrossRef]
Azevedo, A.M.; Silva, I.J.; Nery, M.C.; Rocha, H.P.; Santana, R.A. Counting models for overdispersed data: A review with application to tuberculosis data. Braz. J. Biom. 2023, 41, 274–286. [Google Scholar] [CrossRef]
Inouye, D.I.; Yang, E.; Allen, G.I.; Ravikumar, P. A review of multivariate distributions for count data derived from the Poisson distribution. Wiley Interdiscip. Rev. Comput. Stat. 2017, 9, e1398. [Google Scholar] [CrossRef] [PubMed]
Nikiforov, A.F.; Suslov, S.K.; Uvarov, V.B. Classical Orthogonal Polynomials of a Discrete Variable; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Baik, J.; Kriecherbauer, T.; McLaughlin, K.T.-R.; Miller, P.D. Discrete Orthogonal Polynomials: Asymptotics and Applications; Princeton University Press: Princeton, NJ, USA, 2007. [Google Scholar]
Rockafellar, R.T. Convex Analysis; Princeton University Press: Princeton, NJ, USA, 1970. [Google Scholar]
Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. [Google Scholar] [CrossRef]

Figure 1. Empirical PMF and model estimation for the FIFA dataset (Standard Comparison).

Figure 2. NBM

L_{1}

error convergence across expansion degrees (K) for the FIFA dataset.

Figure 2. NBM

L_{1}

error convergence across expansion degrees (K) for the FIFA dataset.

Figure 3. Empirical PMF and model estimation for the Insurance dataset (Standard Comparison).

Figure 4. NBM

L_{1}

error convergence across expansion degrees (K) for the Insurance dataset.

Figure 4. NBM

L_{1}

error convergence across expansion degrees (K) for the Insurance dataset.

Figure 5. Empirical PMF and NBM model estimation for the Sepsis dataset (Standard Comparison). Poisson and PC models are omitted due to extreme numerical instability.

Figure 6. NBM

L_{1}

error convergence across expansion degrees (K) for the Sepsis dataset.

Figure 6. NBM

L_{1}

error convergence across expansion degrees (K) for the Sepsis dataset.

Table 1. Metadata and preprocessing specifications for the evaluated datasets.

Dataset	Format/Unit	Description	Source
FIFA World Cup	Goal Counts	Total scores from 964 World Cup matches ( $n = 964$ ) Low-variance benchmark for stable count data.	https://www.kaggle.com/datasets/martj42/international-football-results-from-1872-to-2017 (accessed on 10 February 2026)
Medical Insurance	Discretized Charges	Discretized expenditure records from 1338 insurance observations ( $n = 1338$ ), using $1500 intervals. Moderate over-dispersion with bimodal complexity.	https://www.kaggle.com/datasets/mirichoi0218/insurance (accessed on 10 February 2026)
Sepsis Lab Count	Total Counts	Combined lab frequencies across 26 variables for 13,987 patient-level observations ( $n = 13, 987$ ). Extreme over-dispersion and long-tail complexity.	https://www.kaggle.com/datasets/tea340yashjoshi/sepsis-prediction-dataset (accessed on 11 February 2026)

Table 2. Descriptive statistics and dispersion characteristics of the evaluated datasets.

Dataset	Mean ( $μ$ )	Variance ( $σ^{2}$ )	V/M Ratio	Skewness	Kurtosis
FIFA World Cup	2.8216	3.7109	1.3152	0.9624	1.4421
Medical Insurance	8.3520	65.1818	7.8043	1.5212	1.6148
Sepsis Lab Count	63.4402	2358.0073	37.1690	6.0820	5.3578

Table 3. Estimated linear-tilt coefficients (

{\hat{θ}}_{n}

) for the FIFA dataset (

K = 4

).

Table 3. Estimated linear-tilt coefficients (

{\hat{θ}}_{n}

) for the FIFA dataset (

K = 4

).

Model	$μ$	${\hat{θ}}_{1}$	${\hat{θ}}_{2}$	${\hat{θ}}_{3}$	${\hat{θ}}_{4}$
PC	2.822	0.0000	0.2229	−0.1198	0.1242
NBM	2.822	0.0000	0.0000	−0.0406	−0.0132

Table 4.

L_{1}

discrepancy results for FIFA goal counts.

Table 4.

L_{1}

discrepancy results for FIFA goal counts.

Poisson-Side	$L_{1}$ Error	NB-Based	$L_{1}$ Error
Poisson (Baseline)	0.1336	NB (Baseline)	0.0714
PC Expansion ( $K = 3$ )	0.0754	NBM Expansion ( $K = 3$ )	0.0650
PC Expansion ( $K = 4$ )	0.0729	NBM Expansion ( $K = 4$ )	0.0670
COM-Poisson	0.0844

Table 5. NBM Convergence Analysis:

L_{1}

Error by Degree (K) for FIFA dataset.

Table 5. NBM Convergence Analysis:

L_{1}

Error by Degree (K) for FIFA dataset.

Order (K)	0 (NB)	3	4	8	12	14
$L_{1}$ Error	0.0714	0.0650	0.0670	0.0666	0.0428	0.0305

Table 6. Estimated linear-tilt coefficients (

{\hat{θ}}_{n}

) for the Insurance dataset (

K = 4

).

Table 6. Estimated linear-tilt coefficients (

{\hat{θ}}_{n}

) for the Insurance dataset (

K = 4

).

Model	$μ$	${\hat{θ}}_{1}$	${\hat{θ}}_{2}$	${\hat{θ}}_{3}$	${\hat{θ}}_{4}$
PC	8.352	0.0000	4.8114	−10.5150	36.3246
NBM	8.352	0.0000	0.0000	0.0539	−0.0080

Table 7.

L_{1}

Discrepancy results for Insurance Charges.

Table 7.

L_{1}

Discrepancy results for Insurance Charges.

Poisson-Based	$L_{1}$ Error	NB-Based	$L_{1}$ Error
Poisson (Baseline)	0.9513	NB (Baseline)	0.3653
PC Expansion ( $K = 3$ )	1.0542	NBM Expansion ( $K = 3$ )	0.3626
PC Expansion ( $K = 4$ )	1.2668	NBM Expansion ( $K = 4$ )	0.3614
COM-Poisson	0.3981

Table 8. NBM Convergence Analysis:

L_{1}

Error by Degree (K) for Insurance dataset.

Table 8. NBM Convergence Analysis:

L_{1}

Error by Degree (K) for Insurance dataset.

Order (K)	0 (NB)	3	5	6	7	10	14
$L_{1}$ Error	0.3653	0.3626	0.3296	0.2680	0.2431	0.2592	0.2562

Table 9. Estimated linear-tilt coefficients (

{\hat{θ}}_{n}

) for the Sepsis dataset (

K = 4

).

Table 9. Estimated linear-tilt coefficients (

{\hat{θ}}_{n}

) for the Sepsis dataset (

K = 4

).

Model	$μ$	${\hat{θ}}_{1}$	${\hat{θ}}_{2}$	${\hat{θ}}_{3}$	${\hat{θ}}_{4}$
PC	63.440	0.0000	25.5753	−170.7936	2246.8712
NBM	63.440	0.0000	0.0000	−0.0883	−0.1121

Table 10.

L_{1}

Discrepancy results for Sepsis Lab Counts.

Table 10.

L_{1}

Discrepancy results for Sepsis Lab Counts.

Poisson-Based	$L_{1}$ Error	NB-Based	$L_{1}$ Error
Poisson (Baseline)	1.3892	NB (Baseline)	0.2203
PC Expansion ( $K = 3$ )	1.4052	NBM Expansion ( $K = 3$ )	0.2071
PC Expansion ( $K = 4$ )	1.4586	NBM Expansion ( $K = 4$ )	0.1931
COM-Poisson	0.2703

Table 11. NBM Convergence Analysis:

L_{1}

Error by Degree (K) for Sepsis dataset.

Table 11. NBM Convergence Analysis:

L_{1}

Error by Degree (K) for Sepsis dataset.

Order (K)	0 (NB)	3	4	9	10	14
$L_{1}$ Error	0.2203	0.2071	0.1931	0.1722	0.1722	0.1725

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lee, W.-W.; Lee, J.-H.; Lee, J.-S.; Ha, H.-T. Modeling Count Distributions via Skewness–Kurtosis Orthogonal Expansions. Mathematics 2026, 14, 1422. https://doi.org/10.3390/math14091422

AMA Style

Lee W-W, Lee J-H, Lee J-S, Ha H-T. Modeling Count Distributions via Skewness–Kurtosis Orthogonal Expansions. Mathematics. 2026; 14(9):1422. https://doi.org/10.3390/math14091422

Chicago/Turabian Style

Lee, Won-Woo, Ji-Hun Lee, Jong-Seung Lee, and Hyung-Tae Ha. 2026. "Modeling Count Distributions via Skewness–Kurtosis Orthogonal Expansions" Mathematics 14, no. 9: 1422. https://doi.org/10.3390/math14091422

APA Style

Lee, W.-W., Lee, J.-H., Lee, J.-S., & Ha, H.-T. (2026). Modeling Count Distributions via Skewness–Kurtosis Orthogonal Expansions. Mathematics, 14(9), 1422. https://doi.org/10.3390/math14091422

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Modeling Count Distributions via Skewness–Kurtosis Orthogonal Expansions

Abstract

1. Introduction

1.1. Motivation

1.2. Literature Review

1.3. Research Proposition

2. Semi-Parametric Count Distributions via Orthogonal Polynomial Expansions

2.1. Discrete Hilbert-Space Expansions

2.2. Discrete Orthonormal Polynomial Systems

2.3. Expansion Model and Coefficient Identification

3. Data Analysis

3.1. Case 1: FIFA World Cup Goal Counts

3.2. Case 2: Medical Insurance Charges

3.3. Case 3: Sepsis Lab Test Counts

3.4. Discussion

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI