2.1. Discrete Hilbert-Space Expansions
Let
. Fix a baseline probability mass function (pmf)
satisfying
We write
and denote expectation with respect to
w by
whenever the series is absolutely convergent. The full-support condition in (
1) implies that any other pmf
p on
is absolutely dominated families of distributions with respect to
w, so the likelihood ratio
is well-defined for every
.
We obtain that the weighted square-integrable function space consists of all functions
such that
, where
w denotes a positive reference probability mass function serving as the weighting measure. Then, we define the weighted square-integrable function space
and the inner product in the space
is the space of all functions whose weighted average squared value is finite, where the baseline distribution
w serves as the weighting measure. This condition ensures that the orthogonal expansion coefficients are well-defined and that the resulting series converges in a meaningful sense. Because
w is a probability measure on a countable set,
is precisely the usual discrete Hilbert space
, and is therefore complete under the norm induced by (
4). In this work, we specifically focus on the Charlier–Meixner systems corresponding to Poisson and negative binomial baseline distributions.
2.2. Discrete Orthonormal Polynomial Systems
An orthonormal polynomial system with respect to
w is a sequence of degree
n, denoted by
satisfying:
where
is the degree of a polynomial and
is the orthogonality factor. We adopt the normalization
, which is the unit norm since we use Poisson and negative binomial baseline distributions, that is,
. Then, for every
, orthogonality to the constant function implies the zero-mean property
The requirement ensures that the system is aligned with polynomial order and that corresponds to the space of polynomial corrections up to order K relative to w.
We require the following two assumptions on existence and completeness. For the existence assumption, every monomial belongs to
:
Under (
7), the sequence
lies in
, and applying the Gram–Schmidt orthogonalization procedure yields an orthonormal polynomial system
. For completeness assumption, the orthonormal polynomial system
is complete in
, i.e., its closed linear span coincides with
. Classical baseline distributions such as the Poisson and negative binomial satisfy these two assumptions of existence and completeness.
2.3. Expansion Model and Coefficient Identification
Proposition 1.
Let w be a baseline pmf on and let be an orthonormal polynomial system in with . For a fixed order , define the truncated linear-tilt modelwhere and . Then is a probability mass function on if and only if Equivalently, θ must lie in the feasible parameter set is the set of coefficient vectors for which the truncated linear-tilt model remains a valid probability mass function. Moreover, is a closed and convex subset of , and .
Proof. must satisfy non-negativity and normalization to be a valid pmf. First, the normalization constraint is automatically satisfied since
where, by orthonormality,
and
And, for non-negativity property, the feasible parameter set
is defined by linear inequality constraints.
Each set in the above intersection is a closed convex half-space. Since arbitrary intersections of closed convex sets are closed and convex, it follows that
is closed and convex. Moreover, the vector
satisfies
so
. Therefore,
is nonempty. These properties follow from standard results in convex analysis and linear inequality systems [
19,
20]. □
Theorem 1.
Suppose two assumptions of existence and completeness hold. Then for any there exists a unique orthogonal expansionwith coefficients given by Each coefficient measures how strongly the target distribution projects onto the corresponding basis pattern ; coefficients near zero indicate little contribution from that component. Moreover, the partial sumsconverge to f in as . Since the are also orthonormal in (i.e., ), this completeness is equivalent to forming an orthonormal basis for the Hilbert space . In this case, every admits a unique series expansion that converges in the -norm, with Parseval’s identity holding: . The completeness assumption is standard for the classical discrete orthogonal polynomial families associated with Poisson (Charlier) and negative binomial (Meixner) baselines.
Orthogonal polynomial expansion provides a spectral decomposition of the Pearson’s divergence, and the truncation degree directly controls the approximation error. The orthogonal expansion admits an energy interpretation, where the squared norm of the likelihood ratio represents the total expansion energy. This quantity coincides with Pearson’s divergence and yields a bound for the distance.
Corollary 1.
Let p be a pmf satisfying the condition and denote . Then the total orthogonal expansion energy coincides with Pearson’s divergence between p and the reference distribution w:Moreover, this Pearson’s divergence admits the spectral representation under the orthonormal system where are the orthogonal projection coefficients. If q is any probability mass function (in particular the truncated approximation when it remains nonnegative), then the distance admits the boundConsequently, whenever the truncated expansion defines a valid pmf, the approximation error in total variation is controlled by the tail energy of the orthogonal expansion:This shows that the truncation error is governed by the residual energy contained in the higher-order orthogonal components. Now we derive the expansions more specifically for two important baseline distributions. In particular, we consider the Poisson and negative binomial baselines by employing the Charlier and Meixner orthogonal polynomial systems, respectively.
For the specification of the Poisson–Charlier (PC) expansion, let the baseline distribution be Poisson with mean
:
Define the
nth-degree Charlier polynomials
via the exponential generating function
or equivalently through the falling-factorial expansion
These polynomials satisfy the orthogonality relation
Accordingly, the degree-
n orthonormal Charlier basis function is given by
which satisfies
For a target pmf
p and
, define the factorial moments
In particular, for the Poisson baseline
, the factorial moments are given by
Proposition 2.
Let p be a target pmf with . Then the coefficients of the Poisson–Charlier semi-parametric expansionadmit the explicit expression, for all , If μ is mean-matched, i.e., , and we define the factorial-moment deviations from the Poisson baseline bythen and Proof. Equation (
11) follows by substituting (
10) into
and applying linearity of expectation. Under mean matching
, we have
, so the
term vanishes, yielding the simplified expressions. □
We can rewrite the coefficients in terms of central moments, which provides a direct decomposition in terms of dispersion, skewness, and kurtosis relative to the Poisson benchmark. Let
where, under mean matching,
denotes the common mean of the Poisson baseline and the target distribution
p, while
denotes the variance of
p. We further define the higher-order central moments
To facilitate interpretation, introduce the rescaled (orthogonal but not normalized) coordinates
Then the relation between normalized and rescaled coefficients is
Under mean matching (
), the first four coefficients are given by
The representation (
12) shows that
measures deviation from Poisson equidispersion (
), while (
13) and (
14) capture skewness and kurtosis effects, respectively. Moreover, all coefficients vanish when
, confirming consistency with the Poisson baseline.
Now, we construct the negative binomial–Meixner (NBM) expansion. Let the baseline be negative binomial (NB) in the
parameterization:
where
is the rising factorial. The mean and variance are
The Meixner polynomials
admit the falling-factorial representation
They satisfy the orthogonality relation
Accordingly, the orthonormal basis is
Given a target pmf
p with mean
and variance
, an NB baseline can match both provided
(overdispersion). Solving
and
yields
When (
16) holds, the first two Meixner coefficients vanish, so the leading terms represent deviations in third- and fourth-order structure beyond what the NB baseline captures. The factorial moments are, for
Y∼
,
Proposition 3.
Let p be a target pmf with , and define If are chosen by moment matching (16), thenand with , Proof. Equation (
17) follows by substituting (
15) and applying linearity of expectation. Since
for
, one may equivalently write
Moment matching implies , leaving only the stated terms for . □