Symmetric Discrete Distributions on the Integer Line: A Versatile Family and Applications

Lamia Alyami; Hugo S. Salinas; Hassan S. Bakouch; Maher Kachour; Amira F. Daghestani; Sudeep R. Bapat

doi:10.3390/sym17122148

,

and

¹

Department of Mathematics, College of Sciences and Arts, Najran University, P.O. Box 1988, Najran 11001, Saudi Arabia

²

Departamento de Matemática, Facultad de Ingeniería, Universidad de Atacama, Copiapó 1531772, Chile

³

Department of Mathematics, College of Science, Qassim University, Buraydah 51452, Saudi Arabia

⁴

Department of Mathematics and Natural Sciences, Gulf University for Science and Technology, P.O. Box 7207, Hawally 32093, Kuwait

Symmetry2025, 17(12), 2148;https://doi.org/10.3390/sym17122148

This article belongs to the Special Issue Skewed (Asymmetrical) Probability Distributions and Applications Across Disciplines, Fourth Edition

Version Notes

Order Reprints

Abstract

We introduce the Symmetric-

Z

(Sy-

Z

) family, a unified class of symmetric discrete distributions on the integers obtained by multiplying a three-point symmetric sign variable by an independent non-negative integer-valued magnitude. This sign-magnitude construction yields interpretable, zero-centered models with tunable mass at zero and dispersion balanced across signs, making them suitable for outcomes, such as differences of counts or discretized return increments. We derive general distributional properties, including closed-form expressions for the probability mass and cumulative distribution functions, bilateral generating functions, and even moments, and show that the tail behavior is inherited from the magnitude component. A characterization by symmetry and sign–magnitude independence is established and a distinctive operational feature is proved: for independent members of the family, the sum and the difference have the same distribution. As a central example, we study the symmetric Poisson model, providing measures of skewness, kurtosis, and entropy, together with estimation via the method of moments and maximum likelihood. Simulation studies assess finite-sample performance of the estimators, and applications to datasets from finance and education show improved goodness-of-fit relative to established integer-valued competitors. Overall, the Sy-

Z

framework offers a mathematically tractable and interpretable basis for modeling symmetric integer-valued outcomes across diverse domains.

Keywords:

integer-valued models; zero-inflated counts; sign-magnitude factorization; entropy; symmetric Poisson distribution; probability generating function; maximum likelihood estimation

1. Introduction

Many discrete datasets encountered in practice take values on the non-negative integers that are routinely modeled using standard families, such as the Poisson or geometric distributions. In contrast, there are important situations where the natural support is the whole set of integers

Z

, most notably when observations are signed differences of counts or other zero-centered measurements. Canonical examples include score differences in sports, day-to-day changes in transaction counts or sales, inter-rater differences in clinical tallies, and discretized (symmetric) return increments in finance. For such problems, modeling directly on

Z

with distributions that respect symmetry around zero is both natural and desirable.

Several integer-valued distributions on

Z

have been proposed. Prominent instances include the Skellam distribution [1], obtained as the difference of two independent Poisson variables; the discrete Laplace distribution [2], along with related skew/asymmetric variants [3,4]; the discrete normal distribution [5]; and, recently, perturbed Laplace–type models [6]. Applications of signed count differences appear in medical and reliability studies [7,8] and in sports analytics, such as goal differences [9]. For general background on count modeling and discrete distributions, see refs. [10,11]. In addition, substantial probability mass at zero frequently arises in practice, creating links with the zero-inflated literature [12].

Beyond these classical constructions, there has been a notable increase in recent work on flexible integer-valued distributions supported on

Z

, including several models explicitly designed to capture symmetry and tunable dispersion. For example, ref. [13] introduced the discrete skew logistic distribution, which can accommodate symmetric and asymmetric count data and provides a useful reference for tail-shape control. Two recent contributions by refs. [14,15] developed new symmetric and perturbation-based distributions on the integers, with applications to stock exchange and hydrological data. In parallel, ref. [6] proposed a general perturbation of the discrete Laplace distribution, demonstrating improvements in financial and health datasets. More broadly, ref. [16] reviewed Skellam-type models and related integer-supported families, while ref. [17] provided an up-to-date survey on models for integer-valued data, highlighting the importance of distributions supported on

Z

in modern applications. These recent works underscore the need for simple, interpretable, and analytically tractable symmetric models on

Z

, a gap that the present Sy-

Z

family aims to fill. These recent developments further motivate the need for a symmetric decomposition-based model with explicit identifiability and analytical tractability, such as the Sy-

Z

and Sy-

P

formulations proposed in this work.

This paper introduces a unified and tractable framework for symmetric integer-valued data on

Z

, named the Sy-

Z

family. The construction separates a three-point symmetric sign component from a non-negative magnitude: a data-generating sign takes values in

{- 1, 0, 1}

with a tunable mass at zero and is multiplied by an independent, non-negative integer-valued variable. This sign–magnitude representation yields zero-centered, exactly symmetric models with interpretable control of the atom at zero while allowing the analyst to inherit tail behavior, dispersion, and computational convenience from the chosen baseline magnitude distribution.

We develop a coherent set of distributional results for the family: closed-form probability mass functions and cumulative distribution functions, bilateral probability generating functions, and moment identities. Symmetry implies vanishing odd moments, whereas even moments factor through the baseline magnitude. We also establish a characterization by symmetry and independence: an integer-valued distribution belongs to the proposed family if and only if it is symmetric and its sign is independent of the magnitude with a three-point symmetric distribution. Beyond these foundations, we study general consequences of the product structure, including tail transfer from the magnitude, conditions for unimodality or bimodality, and simple obstructions to infinite divisibility.

A distinctive feature of this framework is a strong symmetry property for operations on independent variables, where for two independent members of the family, the sum and the difference have the same distribution. This identity is a direct consequence of the bilateral generating function symmetry and does not generally hold for standard two-sided competitors, such as Skellam [1], discrete Laplace [2], perturbed Laplace [6], or extended Poisson models [18].

As a central example, we particularize the family with a Poisson magnitude, thereby obtaining the symmetric Poisson model. We derive distributional formulas (including entropy), discuss the induced zero mass in relation to zero-inflated counts [12], and develop estimation via the method of moments and maximum likelihood. Simulation studies assess finite-sample behavior, and applications to datasets from finance and education illustrate a competitive or improved fit relative to established alternatives supported on

Z

.

The remainder of the paper is organized as follows. Section 2 introduces the Sy-

Z

family, detailing the construction from a symmetric modified Bernoulli sign and an independent non-negative magnitude. Section 3 develops core distributional results for the family, including closed-form probability mass function (PMF) and cumulative distribution function (CDF) identities, bilateral generating functions, characterization by symmetry and sign–magnitude independence, tail transfer from the magnitude, modality conditions, criteria precluding infinite divisibility, the quantile function, and the median, as well as a discussion of first-order stochastic dominance for

| Z |

. Section 4 specializes in the Sy–Poisson model, deriving the moment generating function (MGF) and probability generating function (PGF), closed-form even moments (via Touchard polynomials), skewness, kurtosis, and Shannon entropy. Section 5 presents inference for Sy–Poisson: method-of-moments estimation (with asymptotic variance via the delta method), likelihood-based estimation, and both observed and expected Fisher information. Section 6 reports a simulation study evaluating the finite-sample bias and mean squared error of the maximum likelihood estimators. Section 7 provides two empirical applications (finance and education) comparing Sy–Poisson with established competitors on

Z

. A concluding section summarizes implications and outlines directions for future research.

2. The Sy- $Z$ Family: Construction and Basic Setup

A key building block of the Sy-

Z

family is a three-point symmetric distribution on

{- 1, 0, 1}

that combines a random sign with a controllable mass at zero. This distribution will serve as the canonical sign mechanism in our sign–magnitude representation of Z. We refer to it as the symmetric modified Bernoulli distribution.

Definition 1.

Let

θ \in (0, 1 / 2)

. A discrete random variable X is said to follow the symmetric modified Bernoulli distribution with parameter θ, denoted

SMB (θ)

, if its PMF is

\begin{matrix} P (X = k) = {(1 - 2 θ)}^{1 - | k |} θ^{| k |}, k \in {- 1, 0, 1} . \end{matrix}

(1)

Proposition 1.

Let

V_{1} \sim Bernoulli (2 θ)

and

V_{2} \sim Bernoulli (1 / 2)

be independent random variables. Define

X = V_{1} (2 V_{2} - 1) .

Then

X \sim SMB (θ)

with support

{- 1, 0, 1}

and

θ \in (0, 1 / 2)

.

Proof.

Write

W = 2 V_{2} - 1

, so

P (W = 1) = P (W = - 1) = 1 / 2

and

W ⊥ V_{1}

, where ⊥ denotes independence between random variables. Then, we obtain

P (X = 0) = P (V_{1} = 0) = 1 - 2 θ, P (X = \pm 1) = P (V_{1} = 1) P (W = \pm 1) = (2 θ) \frac{1}{2} = θ,

which matches the symmetric modified Bernoulli distribution with parameter

θ

. □

To facilitate simulation and highlight the structural symmetry of

SMB (θ)

, we present it in Appendix A two equivalent stochastic constructions for generating

X \sim SMB (θ)

. These representations are convenient for both algorithmic sampling and concise derivations of basic properties.

2.1. Stochastic Representation

Definition 2.

Let

X \sim SMB (θ)

be as defined in Definition 1, and let Y be a discrete random variable with support

N_{0}

. Assume X and Y are independent. We say that a discrete random variable Z belongs to the Sy-

Z

family if it admits the stochastic representation

Z \overset{d}{=} X Y .

(2)

Proposition 2.

Let

X \sim SMB (θ)

with

θ \in (0, 1 / 2)

, let Y be an independent

N_{0}

-valued random variable, and set

Z = X Y

. Then:

(i): Moments of X. For every odd integer r, $E (X^{r}) = 0$ . For every even integer $q \geq 2$ , $E (X^{q}) = 2 θ$ . In particular, $E (X) = 0$ , $Var (X) = 2 θ$ , and $\sqrt{Var (X)} = \sqrt{2 θ}$ .
(ii): Moments of $Z = X Y$ . If $E (Y^{2 m}) < \infty$ , then for all $m \geq 0$ , $E (Z^{2 m + 1}) = 0$ , and for all $m \geq 1$ ,

$E (Z^{2 m}) = E (X^{2 m}) E (Y^{2 m}) = 2 θ E (Y^{2 m}) .$

In particular, $E (Z) = 0$ and $Var (Z) = E (Z^{2}) = 2 θ E (Y^{2})$ .

Proof.

For

X \sim SMB (θ)

,

P (X = 0) = 1 - 2 θ

, and

P (X = \pm 1) = θ

. Hence

E (X^{r}) = θ (1^{r} + {(- 1)}^{r})

for

r \geq 1

, which is 0 for odd r and

2 θ

for even r; the variance follows since

E (X) = 0

and

E (X^{2}) = 2 θ

. For

Z = X Y

with

X ⊥ Y

, we have

E (Z^{k}) = E (X^{k}) E (Y^{k})

whenever the moments exist; the claims for odd and even powers follow by substituting the moments of X obtained above. □

2.2. Characterization by Symmetry and Independence

We show that the Sy-

Z

family is exactly the class of symmetric integer-valued distributions for which the sign is independent of the magnitude and is represented by the symmetric modified Bernoulli distribution.

Theorem 1.

Let Z be an integer-valued random variable with

P (Z = 0) < 1

. The following statements are equivalent:

(i): Z belongs to the Sy- $Z$ family; that is, there exist $X \sim SMB (θ)$ with $θ \in (0, 1 / 2)$ and an independent $Y \in N_{0}$ such that $Z \overset{d}{=} X Y$ .
(ii): Z is symmetric about zero, i.e., $P (Z = k) = P (Z = - k)$ for all $k \in Z$ , and there exist random variables S and W such that $S \sim SMB (θ)$ for some $θ \in (0, 1 / 2)$ , $W \in N_{0}$ , S, and W are independent, and $Z \overset{d}{=} S W$ .

Proof.

(i) ⇒ (ii). Suppose

Z \overset{d}{=} X Y

with

X \sim SMB (θ)

,

θ \in (0, 1 / 2)

, and

Y \in N_{0}

independent of X. Then Z is symmetric because

P (Z = k) = P (X Y = k) = P (X = 1, Y = k) = P (X = - 1, Y = k) = P (Z = - k), k > 0 .

Set

S : = X

and

W : = Y

. By construction,

S \sim SMB (θ)

,

W \in N_{0}

,

S ⊥ W

, and

Z \overset{d}{=} S W

. Thus (ii) holds. (ii) ⇒ (i). Conversely, suppose (ii) holds. Take

X : = S

and

Y : = W

. Then

X \sim SMB (θ)

,

Y \in N_{0}

, and

X ⊥ Y

, with

Z \overset{d}{=} X Y

. Hence Z belongs to the Sy-

Z

family, so (i) holds. □

Remark 1.

Theorem 1 establishes that Definition 2 is not merely a constructive scheme but a complete characterization of the class: symmetry, together with sign–magnitude independence and a three-point symmetric sign distribution, is equivalent to membership in Sy-

Z

. This validates the use of

SMB (θ)

(Definition 1) as the canonical mechanism for the sign component.

Further analytical properties of the Sy-

Z

distribution, including its moment generating function and characteristic function, are provided in Section 3.

3. Main Properties of the Sy- $Z$ Family Distributions

Proposition 3.

Let

X \sim SMB (θ)

with

θ \in (0, 1 / 2)

and support

{- 1, 0, 1}

, and let

Y \in N_{0}

be independent of X. Define

Z = X Y

. Then Z belongs to the Sy-

Z

family, and its PMF is

P (Z = k) = \{\begin{matrix} θ P (Y = | k |), & if k \in Z ∖ {0}, \\ (1 - 2 θ) + 2 θ P (Y = 0), & if k = 0 . \end{matrix}

(3)

Proof.

Since X and Y are independent and

Y \geq 0

, consider three cases:

For

k > 0

. Then

P (Z = k) = P (X Y = k) = P (X = 1, Y = k) = P (X = 1) P (Y = k) = θ P (Y = k) .

For

k < 0

. Writing

k = - m

with

m > 0

,

P (Z = k) = P (X Y = - m) = P (X = - 1, Y = m) = P (X = - 1) P (Y = - k) = θ P (Y = - k) .

For

k = 0

. The event

{Z = 0}

occurs if

X = 0

(regardless of Y) or if

Y = 0

with

X \in {\pm 1}

. By independence,

\begin{matrix} P (Z = 0) & = & P (X = 0) + P (Y = 0, X \neq 0) \\ = & (1 - 2 θ) + P (X \neq 0) P (Y = 0) \\ = & (1 - 2 θ) + 2 θ P (Y = 0) . \end{matrix}

Combining the three cases yields the result. Normalization follows since

\begin{matrix} \sum_{k \in Z} P (Z = k) & = & P (Z = 0) + \sum_{k \geq 1} (P (Z = k) + P (Z = - k)) \\ = & (1 - 2 θ) + 2 θ P (Y = 0) + \sum_{k \geq 1} (θ P (Y = k) + θ P (Y = - k)) \\ = & (1 - 2 θ) + 2 θ P (Y = 0) + 2 θ \sum_{k \geq 1} P (Y = k) \\ = & (1 - 2 θ) + 2 θ \underset{1}{\underset{︸}{\sum_{k \geq 0} P (Y = k)}} = 1 . \end{matrix}

□

Corollary 1.

Let Z be an integer-valued random variable and define

W = | Z |

. Then W has support

N

, and its PMF is

P (W = k) = \{\begin{matrix} P (Z = k) + P (Z = - k) = 2 θ P (Y = k), & if k \in N ∖ {0}, \\ P (Z = 0) = (1 - 2 θ) + 2 θ P (Y = 0), & if k = 0 . \end{matrix}

Moreover, we have

E (| Z |) = 2 θ E (Y) a n d V a r (| Z |) = 2 θ V a r (Y) + 2 θ (1 - 2 θ) {(E (Y))}^{2} .

(4)

3.1. Identifiability Within Sy- $Z$

Let

Z = X Y

, where

X \sim SMB (θ)

with

θ \in (0, 1 / 2)

and

Y \in N_{0}

is independent of X. Denote

p_{Z} (k) = P (Z = k)

and

p_{Y} (k) = P (Y = k)

for

k \in N_{0}

. The PMF of Z is

p_{Z} (0) = (1 - 2 θ) + 2 θ p_{Y} (0), p_{Z} (k) = θ p_{Y} (| k |) for k \neq 0 .

(5)

Proposition 4.

From the marginal distribution

p_{Z} (\cdot)

alone, the pair

(θ, p_{Y} (\cdot))

is not identifiable in the model class

M = \{(θ, p_{Y} (\cdot)) : θ \in (0, 1 / 2), p_{Y} (\cdot) a P M F o n N_{0}\} .

More precisely, for any fixed

p_{Z} (\cdot)

, there exist infinitely many pairs

(θ, p_{Y} (\cdot)) \in M

satisfying (5).

Proof.

Fix

p_{Z} (\cdot)

. From (5) and the constraint

0 < θ < 1 / 2

, we have for

k \geq 1

θ p_{Y} (k) = p_{Z} (k) .

Thus, whenever

θ

is chosen, the off-zero masses of Y must satisfy

p_{Y} (k) = p_{Z} (k) / θ

for

k \geq 1

. Using normalization,

1 = \sum_{k \geq 0} p_{Y} (k) = p_{Y} (0) + \sum_{k \geq 1} \frac{p_{Z} (k)}{θ} = p_{Y} (0) + \frac{1 - p_{Z} (0)}{θ},

which yields

p_{Y} (0) = 1 - \frac{1 - p_{Z} (0)}{θ} .

(6)

On the other hand, the zero-mass identity in (5) gives

p_{Z} (0) = (1 - 2 θ) + 2 θ p_{Y} (0) = (1 - 2 θ) + 2 θ (1 - \frac{1 - p_{Z} (0)}{θ}) = p_{Z} (0),

so (6) is consistent for any

θ

that makes

p_{Y} (0) \in [0, 1]

and

p_{Y} (k) \geq 0

for

k \geq 1

. For all such

θ

, we obtain a valid PMF

p_{Y} (\cdot)

producing the same

p_{Z} (\cdot)

. Hence, the mapping

(θ, p_{Y} (\cdot)) \mapsto p_{Z} (\cdot)

is not one-to-one on

M

, proving non-identifiability. □

Proposition 5.

Suppose Y belongs to a parametric family

{p_{Y} (\cdot; λ) : λ \in Λ}

with

p_{Y} (0; λ)

known as a function of λ, and such that for every λ, the map

k \mapsto p_{Y} (k; λ)

is injective in λ on a set of indices with nonzero

p_{Y} (\cdot; λ)

. Then the parameter pair

(θ, λ)

is identifiable from

p_{Z} (\cdot)

.

Proof.

From (5), for any

k \geq 1

,

p_{Z} (k) = θ p_{Y} (k; λ) .

If

(θ_{1}, λ_{1})

and

(θ_{2}, λ_{2})

produce the same

p_{Z}

, then

p_{Z} (k)

agrees for all

k \geq 1

, and hence

θ_{1} p_{Y} (k; λ_{1}) = θ_{2} p_{Y} (k; λ_{2}) for all k \geq 1 .

If

λ_{1} \neq λ_{2}

, the injectivity of

k \mapsto p_{Y} (k; λ)

in

λ

implies that the ratio

p_{Y} (k; λ_{1}) / p_{Y} (k; λ_{2})

cannot be constant in k on any set of indices with positive mass; thus, the equality above cannot hold for all

k \geq 1

unless

λ_{1} = λ_{2}

. Therefore

λ_{1} = λ_{2}

, and then

θ_{1} = θ_{2}

follows from any single

k \geq 1

. Finally,

p_{Z} (0) = (1 - 2 θ) + 2 θ p_{Y} (0; λ)

is automatically satisfied, so

(θ, λ)

is identifiable. □

Corollary 2.

(i): If $Y \sim Poisson (λ)$ , then $(θ, λ)$ is identifiable.
(ii): If $Y \sim Geometric (q)$ , then $(θ, q)$ is identifiable.

Proof.

For Poisson,

p_{Y} (k; λ) = e^{- λ} λ^{k} / k!

and

p_{Y} (0; λ) = e^{- λ}

; distinct

λ

produce non-proportional sequences

{p_{Y} (k; λ)}_{k \geq 1}

, so the injectivity condition holds. For geometric distributions with success probabilities q,

p_{Y} (k; q) = {(1 - q)}^{k} q

, and

p_{Y} (0; q) = q

; distinct q once again yield non-proportional sequences on

k \geq 1

. Proposition 5 applies in both cases. □

Corollary 3.

If θ is known (such as from external calibration), then

p_{Y} (\cdot)

is recovered from

p_{Z} (\cdot) (\cdot)

via

p_{Y} (0) = \frac{p_{Z} (0) - (1 - 2 θ)}{2 θ}, p_{Y} (k) = \frac{p_{Z} (k)}{θ} for k \geq 1,

(7)

and this defines a valid PMF provided

p_{Z} (\cdot)

arises from a Sy-

Z

model with the given θ.

Proof.

Equations (7) are a direct rearrangement of (5). Nonnegativity and normalization follow from the fact that

p_{Z} (\cdot)

is generated by the Sy-

Z

structure with

θ

. □

Proposition 4 clarifies that, without additional structure on the magnitude Y, the sign parameter

θ

, and the zero mass

p_{Y} (0)

are confounded through

p_{Z} (0) = (1 - 2 θ) + 2 θ p_{Y} (0)

, while only the products

θ p_{Y} (k)

and

k \geq 1

are determined by

{p_{Z} (k)}_{k \neq 0}

. Imposing a parametric family for Y restores identifiability (Proposition 5), as illustrated by Sy-Poisson and Sy-Geometric (Corollary 2). When

θ

is externally known,

p_{Y}

is uniquely reconstructed from

p_{Z}

(Corollary 3).

3.2. Tail Behaviour

Proposition 6.

Under the assumptions of Proposition 3, for all

k \geq 0

,

P (| Z | > k) = 2 θ P (Y > k) .

Consequently,

| Z |

is tail-equivalent to Y up to factor

2 θ

: if

P (Y > k) \sim c g (k)

as

k \to \infty

for some reference function g, then

P (| Z | > k) \sim (2 θ c) g (k)

.

Proof.

By (3), for

k \geq 0

,

P (| Z | > k) = \sum_{j > k} [P (Z = j) + P (Z = - j)] = \sum_{j > k} 2 θ P (Y = j) = 2 θ P (Y > k) .

The asymptotic equivalence follows immediately. □

Remark 2.

If Y has exponential (light) tails (for example, Poisson, binomial, geometric), then so does

| Z |

; if Y has regularly varying (power-distribution) tails, then so does

| Z |

with the same index. The parameter θ scales tail probabilities but does not change their rate.

3.3. Unimodality and Number of Modes

Proposition 7.

Let

Z = X Y

with

X \sim SMB (θ)

,

θ \in (0, 1 / 2)

, and

Y \in N_{0}

be independent. Define the off-zero mode of Y by

m_{+} \in arg max_{k \geq 1} P (Y = k) (choose the smallest such index) .

Then:

(i): For $k \geq 1$ , $P (Z = k) = P (Z = - k) = θ P (Y = k)$ . Hence, the positive (and negative) sides of Z are proportional copies of the PMF of Y restricted to ${1, 2, \dots}$ , and $m_{+}$ is the (unique smallest) mode on the positive side of Z as well as the mode of $| Z |$ on ${1, 2, \dots}$ .
(ii): The global modes of Z are determined by comparing the mass at zero with the peak on the positive side:

$\{\begin{matrix} single mode at 0 & if P (Z = 0) > θ P (Y = m_{+}), \\ three modes at 0, \pm m_{+} & if P (Z = 0) = θ P (Y = m_{+}), \\ two symmetric modes at \pm m_{+} & if P (Z = 0) < θ P (Y = m_{+}) . \end{matrix}$

In particular, using $P (Z = 0) = (1 - 2 θ) + 2 θ P (Y = 0)$ ,

$(1 - 2 θ) + 2 θ P (Y = 0) \geq θ max_{k \geq 1} P (Y = k) ⟹ Z is unimodal at 0 .$

Proof.

Part (i) is immediate from the Sy-

Z

PMF: for

k \geq 1

,

P (Z = \pm k) = θ P (Y = k)

, so the positive and negative sides inherit the shape of Y on

{1, 2, \dots}

. For (ii), because both sides are scaled copies of

{P (Y = k)}_{k \geq 1}

, the only competitors for the global maximum are

k = 0

and

k = \pm m_{+}

. Comparing

P (Z = 0)

with

θ P (Y, m_{+})

yields three cases. □

Corollary 4.

If Y is unimodal with a mode at

m = 0

(such as, geometric or binomial with

p \geq 1 / 2

), then Z is unimodal at 0 for all

θ \in (0, 1 / 2)

. More generally, if Y is log-concave on

N_{0}

, then Z is either unimodal at 0 or bimodal at

\pm m_{+}

, depending on the inequality in Proposition 7 (ii); no additional modes can appear.

3.4. Cumulative Distribution Function

Proposition 8.

Let

X \sim SMB (θ)

with

θ \in (0, 1 / 2)

,

Y \in N_{0}

be a random variable with CDF

F_{Y} (\cdot)

, independent of X and

Z = X Y

. The CDF of Z is

F_{Z} (k) = \{\begin{matrix} θ (1 - F_{Y} (| k | - 1)), & if k \leq - 1, \\ (1 - θ) + θ P (Y = 0), & if k = 0, \\ (1 - θ) + θ F_{Y} (k), & if k \geq 1 . \end{matrix}

Proof.

From the given PMF,

P (Z = m) = θ P (Y = | m |) for m \neq 0, m \in Z

, and

P (Z = 0) = (1 - 2 θ) + 2 θ P (Y = 0)

.

For

k \leq - 1

,

F_{Z} (k) = \sum_{m = - \infty}^{k} P (Z = m) = \sum_{j = | k |}^{\infty} θ P (Y = j) = θ P (Y \geq | k |) = θ (1 - F_{Y} (| k | - 1)) .

For

k = 0

,

\begin{matrix} F_{Z} (0) & = \sum_{m \leq - 1} P (Z = m) + P (Z = 0) \\ = θ P (Y \geq 1) + (1 - 2 θ) + 2 θ P (Y = 0) \\ = (1 - θ) + θ P (Y = 0) . \end{matrix}

For

k \geq 1

,

\begin{matrix} F_{Z} (k) & = \sum_{m \leq - 1} P (Z = m) + P (Z = 0) + \sum_{m = 1}^{k} P (Z = m) \\ = (1 - θ) + θ P (Y = 0) + θ \sum_{m = 1}^{k} P (Y = m) \\ = (1 - θ) + θ F_{Y} (k) . \end{matrix}

This completes the proof. □

Corollary 5.

Under the assumptions of Proposition 8, let

S_{Z} (k) = P (Z > k) = 1 - F_{Z} (k)

denote the survival function of Z. Then

S_{Z} (k) = \{\begin{matrix} 1 - θ S_{Y} (| k | - 1), & if k \leq - 1, \\ θ S_{Y} (k), & if k \geq 0, \end{matrix}

where

S_{Y} (k) = 1 - F_{Y} (k)

denotes the survival function of Y.

3.5. First-Order Stochastic Dominance

We say that A dominates B in the first-order sense if

F_{A} (z) \leq F_{B} (z)

holds for all

z \in Z

. For Sy-

Z

distributions

Z = X Y

with

X \sim SMB (θ)

and

Y \in N_{0}

being independent, write

T_{Y} (k) : = P (Y \geq k), k \in N .

From Proposition 8, for any

k \geq 1

,

F_{Z} (- k) = θ T_{Y} (k), F_{Z} (k) = 1 - θ T_{Y} (k + 1) .

(8)

Theorem 2.

Let

Z_{i} = X_{i} Y_{i}

with

X_{i} \sim SMB (θ_{i})

,

Y_{i} \in N_{0}

, and

X_{i} ⊥ Y_{i}

(

i = 1, 2

). If

F_{Z_{1}} (z) \leq F_{Z_{2}} (z)

for all

z \in Z

, then

F_{Z_{1}} (z) = F_{Z_{2}} (z)

for all

z \in Z

. In particular, strict first-order dominance cannot occur between two distinct members of the Sy-

Z

family.

Proof.

Assume

F_{Z_{1}} (\cdot) \leq F_{Z_{2}} (\cdot)

pointwise. For every

k \geq 1

,

F_{Z_{1}} (- k) \leq F_{Z_{2}} (- k) \Rightarrow θ_{1} T_{Y_{1}} (k) \leq θ_{2} T_{Y_{2}} (k),

and

F_{Z_{1}} (k - 1) \leq F_{Z_{2}} (k - 1) \Rightarrow 1 - θ_{1} T_{Y_{1}} (k) \leq 1 - θ_{2} T_{Y_{2}} (k) \Rightarrow θ_{1} T_{Y_{1}} (k) \geq θ_{2} T_{Y_{2}} (k) .

Hence

θ_{1} T_{Y_{1}} (k) = θ_{2} T_{Y_{2}} (k)

for all

k \geq 1

. Using (8) this yields

F_{Z_{1}} (\cdot) = F_{Z_{2}} (\cdot)

on

Z

. □

Remark 3.

Classical first-order dominance is too restrictive on symmetric two-sided supports. A more informative comparison works on magnitudes. Since

P (| Z | > k) = 2 θ P (Y > k)

,

| Z_{1} | dominates | Z_{2} | in the first- order sense ⟺ θ_{1} P (Y_{1} > k) \leq θ_{2} P (Y_{2} > k) for all k \geq 0 .

This provides a practical dominance notion for dispersion comparisons and for tail–probability assessments based on

| Z |

.

Remark 4.

Other stochastic orders can still distinguish members of the family. In particular, the convex order and increasing–convex order separate distributions with the same mean (zero by symmetry) but different tail weights. For Sy-

Z

, many such comparisons reduce to corresponding orders on

| Z |

(or on Y) via the product representation; a systematic treatment is left for future work.

3.6. Generating Function

Proposition 9.

Under the assumptions of Proposition 8, let

M_{Y} (t) = E (e^{t Y})

denote the MGF of Y, and

G_{Y} (s) = E (s^{Y})

its PGF. Then, for all t,

M_{Z} (t) = E (e^{t Z}) = (1 - 2 θ) + θ (M_{Y} (t) + M_{Y} (- t)) = (1 - 2 θ) + θ (G_{Y} (e^{t}) + G_{Y} (e^{- t})) .

(9)

Proof.

By independence and conditioning on X,

\begin{matrix} M_{Z} (t) & = & E (E (e^{t X Y} ∣ X)) = E (M_{Y} (t X)) \\ = & P (X = 0) M_{Y} (0) + P (X = 1) M_{Y} (t) + P (X = - 1) M_{Y} (- t) . \end{matrix}

Since

P (X = 0) = 1 - 2 θ

and

P (X = \pm 1) = θ

, we obtain

M_{Z} (t) = (1 - 2 θ) + θ M_{Y} (t) + θ M_{Y} (- t),

which proves the first identity. The second follows from

M_{Y} (t) = G_{Y} (e^{t})

for integer

Y \geq 0

. The stated domain requires simultaneous finiteness of

M_{Y} (t)

and

M_{Y} (- t)

. □

Corollary 6.

If

M_{Y}^{″} (0) < \infty

, then

E (Z) = 0

and

Var (Z) = 2 θ E (Y^{2})

.

Proof.

Differentiating at

t = 0

:

M_{Z}^{'} (0) = θ M_{Y}^{'} (0) - θ M_{Y}^{'} (- 0) = 0

, and

M_{Z}^{″} (0) = θ M_{Y}^{″} (0) + θ M_{Y}^{″} (0) = 2 θ E (Y^{2})

. □

3.7. Quantile Function and Median

Let

p_{k} = P (Y = k)

,

p_{0} = P (Y = 0)

,

F_{Y} (k) = P (Y \leq k)

, and

T_{Y} (k) = P (Y \geq k)

be such that

Y \in N_{0}

. From Proposition 8, for integers

k \geq 1

,

F_{Z} (- k) = θ T_{Y} (k), F_{Z} (0) = 1 - θ + θ p_{0}, F_{Z} (k) = 1 - θ + θ F_{Y} (k) .

Equivalently, for

k \geq 0

,

P (| Z | > k) = 2 θ P (Y > k), P (| Z | \leq k) = 1 - 2 θ P (Y > k) .

Proposition 10.

Let

Q_{Z} (u) : = inf {z \in Z : F_{Z} (z) \geq u}

denote the (left-continuous) quantile function. Set

a_{-} : = F_{Z} (- 1) = θ (1 - p_{0}), a_{0} : = F_{Z} (0) = 1 - θ + θ p_{0} .

Then, for

u \in (0, 1)

,

Q_{Z} (u) = \{\begin{matrix} - k^{-} (u), & 0 < u \leq a_{-}, \\ 0, & a_{-} < u \leq a_{0}, \\ k^{+} (u), & a_{0} < u < 1, \end{matrix}

where

k^{-} (u), k^{+} (u) \in {1, 2, \dots}

are obtained from Y via

k^{-} (u) = min {k \geq 1 : θ T_{Y} (k) \geq u}, k^{+} (u) = min {k \geq 1 : 1 - θ + θ F_{Y} (k) \geq u} .

Proof.

The direct inversion of the three pieces above, using that

T_{Y}

is non-increasing and

F_{Y} (\cdot)

is non-decreasing on

N_{0}

. □

3.8. Median

Corollary 7.

Every Sy-

Z

distribution has 0 as a median. Moreover, if

θ \in (0, 1 / 2)

, then 0 is the unique median.

Proof.

Since

F_{Z} (- 1) = θ (1 - p_{0}) \leq θ < 1 / 2

and

F_{Z} (0) = 1 - θ + θ p_{0} > 1 / 2

, we have

F_{Z} (- 1) < 1 / 2 \leq F_{Z} (0)

, so 0 is the unique integer m with

F_{Z} (m - 1) \leq 1 / 2 \leq F_{Z} (m)

. □

3.9. Distribution of Sums and Differences

We now characterize the distribution of sums and differences of independent Sy-

Z

variables, showing that they share the same law and admit explicit convolution formulas for their PMFs.

Proposition 11.

Let

Z_{i} = X_{i} Y_{i}

,

X_{i} \sim SMB (θ_{i})

,

θ_{i} \in (0, 1 / 2)

, and

Y_{i} \in N_{0}

be independent of

X_{i}

, and

(Z_{1}, Z_{2})

be independent. Write

p_{i} (k) = P (Y_{i} = k)

and

G_{Y_{i}} (s) = \sum_{k \geq 0} p_{i} (k) s^{k}

for the one–sided PGF of

Y_{i}

. Then, for

k \geq 1

,

P (Z_{i} = \pm k) = θ_{i} p_{i} (k), P (Z_{i} = 0) = (1 - 2 θ_{i}) + 2 θ_{i} p_{i} (0),

the following holds:

(i): The bilateral PGF of $Z_{i}$ is

$G_{Z_{i}} (s) = (1 - 2 θ_{i}) + θ_{i} (G_{Y_{i}} (s) + G_{Y_{i}} (1 / s)), s > 0,$

so $G_{Z_{i}} (s) = G_{Z_{i}} (1 / s)$ for all $s > 0$ . Consequently, for the sum $S : = Z_{1} + Z_{2}$ and the difference $D : = Z_{1} - Z_{2}$ ,

$G_{S} (s) = G_{Z_{1}} (s) G_{Z_{2}} (s), G_{D} (s) = G_{Z_{1}} (s) G_{Z_{2}} (1 / s) = G_{Z_{1}} (s) G_{Z_{2}} (s) = G_{S} (s),$

and therefore $D \overset{d}{=} S$ .
(ii): Let $S : = Z_{1} + Z_{2}$ . For $k > 0$ ,

$\begin{matrix} P (S = k) = & \underset{Z_{1} = 0, Z_{2} = k}{\underset{︸}{P (Z_{1} = 0) θ_{2} p_{2} (k)}} + \underset{Z_{2} = 0, Z_{1} = k}{\underset{︸}{P (Z_{2} = 0) θ_{1} p_{1} (k)}} + \underset{Z_{1} > 0, Z_{2} > 0}{\underset{︸}{θ_{1} θ_{2} \sum_{j = 1}^{k - 1} p_{1} (j) p_{2} (k - j)}} \\ + \underset{Z_{1} = - u, Z_{2} = k + u}{\underset{︸}{θ_{1} θ_{2} \sum_{u = 1}^{\infty} p_{1} (u) p_{2} (k + u)}} + \underset{Z_{2} = - u, Z_{1} = k + u}{\underset{︸}{θ_{1} θ_{2} \sum_{u = 1}^{\infty} p_{2} (u) p_{1} (k + u)}} . \end{matrix}$

(10)

For $k = 0$ ,

$P (S = 0) = P (Z_{1} = 0) P (Z_{2} = 0) + 2 θ_{1} θ_{2} \sum_{u = 1}^{\infty} p_{1} (u) p_{2} (u),$

(11)

and by symmetry $P (S = - k) = P (S = k)$ for all $k \geq 1$ . By part (i), the same formulas hold for D.

Proof.

For

Z = X Y

with

X \sim SMB (θ)

and

Y \in N_{0}

independent of X, we have

s^{Z} = \{\begin{matrix} 1, & X = 0, \\ s^{Y}, & X = 1, \\ s^{- Y}, & X = - 1, \end{matrix}

thus, using independence and

P (X = 0) = 1 - 2 θ

,

P (X = \pm 1) = θ

,

G_{Z} (s) = (1 - 2 θ) + θ G_{Y} (s) + θ G_{Y} (1 / s),

which yields

G_{Z} (s) = G_{Z} (1 / s)

for all

s > 0

. For independent

Z_{1}, Z_{2}

,

G_{S} (s) = G_{Z_{1}} (s) G_{Z_{2}} (s), G_{D} (s) = G_{Z_{1}} (s) G_{Z_{2}} (1 / s) .

Since

G_{Z_{2}} (1 / s) = G_{Z_{2}} (s)

, we obtain

G_{D} (s) = G_{S} (s)

, and hence

D \overset{d}{=} S

.

On the other hand, let

S = Z_{1} + Z_{2}

and

k > 0

. By independence,

P (S = k) = \sum_{z \in Z} P (Z_{1} = z) P (Z_{2} = k - z) .

Splitting the sum into the disjoint cases

z = 0

,

z = k

,

z = j \in {1, \dots, k - 1}

,

z = - u

, and

z = k + u

with

u \geq 1

, and using

P (Z_{i} = 0)

and

P (Z_{i} = \pm m) = θ_{i} p_{i} (m)

for

m \geq 1

yields

\begin{matrix} P (S = k) & = P (Z_{1} = 0) θ_{2} p_{2} (k) + P (Z_{2} = 0) θ_{1} p_{1} (k) + θ_{1} θ_{2} \sum_{j = 1}^{k - 1} p_{1} (j) p_{2} (k - j) \\ + θ_{1} θ_{2} \sum_{u \geq 1} p_{1} (u) p_{2} (k + u) + θ_{1} θ_{2} \sum_{u \geq 1} p_{2} (u) p_{1} (k + u), \end{matrix}

which is (10).

For

k = 0

,

P (S = 0) = P (Z_{1} = 0, Z_{2} = 0) + 2 \sum_{u \geq 1} P (Z_{1} = u, Z_{2} = - u),

and substituting the same probabilities gives (11). The symmetry of each

Z_{i}

implies

P (S = - k) = P (S = k)

for

k \geq 1

, and by part (i) the same formulas hold for D. □

Remark 5.

Equations (10) and (11) decompose the mass of the sum S into three types of contributions: first, cases where one addend is zero; second, positive–positive convolution over

{1, \dots, k - 1}

; and third, positive–negative cross terms with unbalanced magnitudes (tails). This partition is useful both for theoretical bounds (for example, tail comparisons driven by the

Y_{i}

) and for stable numerical evaluation. If some

Y_{i}

has finite support, the infinite sums truncate automatically. By Proposition 11, the same decomposition applies to the difference D.

4. Special Case: The Sy-Poisson Distribution

In this subsection, we particularize the Sy-

Z

family by taking the mixing variable Y to be Poisson. Recall the set-up in Definition 2 (and Proposition 8):

X \sim SMB (θ)

with

θ \in (0, 1 / 2)

,

Y ⊥ X

takes values in

N

, and

Z = X Y

.

Definition 3.

Let

Y \sim P (λ)

with

λ > 0

and

X \sim SMB (θ)

,

θ \in (0, 1 / 2)

, be independent. The random variable

Z = X Y

is said to follow a Sy-Poisson distribution, denoted

Z \sim S y- P (θ, λ)

. Its PMF, inherited from the Sy-

Z

construction, is

P (Z = k) = \{\begin{matrix} θ e^{- λ} \frac{λ^{| k |}}{| k |!}, & k \in Z ∖ {0}, \\ (1 - 2 θ) + 2 θ e^{- λ}, & k = 0 . \end{matrix}

(12)

Consequently, the CDF of Z is obtained from Proposition 8 by replacing

F_{Y} (\cdot)

with the Poisson CDF.

It is useful to contrast the proposed Sy–Poisson specification with classical symmetric models on

Z

. The Skellam distribution, obtained as the difference of two independent Poisson variables, is symmetric but couples the zero mass and tail decay through a single intensity parameter. Similarly, symmetric negative binomial variants provide heavier tails but still link dispersion and central mass through a common shape parameter. Zero-inflated symmetric count models allow for additional mass at zero but do not preserve exact symmetry unless extra constraints are imposed. In contrast, the Sy–Poisson model separates the sign and magnitude mechanisms, offering exact bilateral symmetry, independent control of the zero mass via

θ

, and inherited Poisson-type tail behavior through

λ

. This decomposition makes the model both flexible and analytically tractable, and it ensures identifiability under mild parametric assumptions, providing advantages over the alternatives above.

Proposition 12.

Let

Z = X Y

with

X \sim SMB (θ)

,

θ \in (0, 1 / 2)

,

Y \sim P (λ)

, and

λ > 0

be independent. Then the CDF of

Z \sim S y - P (θ, λ)

is

F_{Z} (k) = P (Z \leq k) = \{\begin{matrix} θ (1 - H (| k |, λ)), & k \leq - 1, \\ (1 - θ) + θ e^{- λ}, & k = 0, \\ (1 - θ) + θ H (k + 1, λ), & k \geq 1, \end{matrix}

(13)

where

H (m, λ) = P (Y \leq m - 1)

for

m \geq 1

,

H (a, x) = Γ (a, x) / Γ (a)

, and

Γ (a, z)

denotes the incomplete gamma function defined by

\int_{z}^{\infty} t^{a - 1} e^{- t} d t

.

Proof.

Apply Proposition 8 with

F_{Y} (\cdot)

equal to the

P (λ)

CDF:

F_{Y} (k) = e^{- λ} \sum_{j = 0}^{k} λ^{j} / j! = H (k + 1, λ)

for

k \geq 0

, and

1 - F_{Y} (| k | - 1) = e^{- λ} \sum_{j = | k |}^{\infty} λ^{j} / j! = 1 - H (| k |, λ)

for

k \leq - 1

. □

Remark 6.

The distribution

Z \sim Sy- P (θ, λ)

is symmetric about 0 and zero–inflated, with

P (Z = 0) = (1 - 2 θ) + 2 θ e^{- λ}

. For small λ, most of the mass is concentrated at 0, so the PMF is sharply unimodal at the origin. As λ increases, the Poisson component spreads out, and two symmetric shoulders emerge on the positive and negative sides; for moderate and large λ, the central mode at 0 is flanked by two lighter side peaks, yielding an overall three–bump shape. In the limit

θ \to 1 / 2

, the point mass at 0 tends to

e^{- λ}

, and the side peaks become more pronounced, whereas when

θ \to 0

, the concentration at 0 dominates for any fixed λ. These behaviors are illustrated by the PMF in Figure 1 and Figure 2, and the corresponding CDF in Figure 3 and Figure 4.

Figure 1. Probability mass function of

Z \sim Sy- P (θ, λ)

. Each panel fixes

0 < λ < 1

.

Figure 2. Probability mass function of

Z \sim Sy- P (θ, λ)

. Each panel fixes

λ \geq 1

.

Figure 3. Cumulative distribution function of

Z \sim Sy- P (θ, λ)

. Each panel fixes

0 < λ < 1

.

Figure 4. Cumulative distribution function of

Z \sim Sy- P (θ, λ)

. Each panel fixes

λ \geq 1

.

4.1. Generating Functions

Proposition 13.

Let

Z \sim Sy- P (θ, λ)

with

θ \in (0, 1 / 2)

and

λ > 0

, constructed as

Z = X Y

where

X \sim SMB (θ)

and

Y \sim P (λ)

are independent. Then:

\begin{matrix} M_{Z} (t) & = & (1 - 2 θ) + θ exp \{λ (e^{t} - 1)\} + θ exp \{λ (e^{- t} - 1)\}, t \in R, \end{matrix}

(14)

\begin{matrix} G_{Z} (s) & = & (1 - 2 θ) + θ exp \{λ (s - 1)\} + θ exp \{λ (1 / s - 1)\}, s > 0 . \end{matrix}

(15)

Moreover,

G_{Z} (s) = G_{Z} (1 / s)

for all

s > 0

, reflecting the exact symmetry of Z.

Proof.

Condition on X. Since

Y \sim P (λ)

,

E (e^{t Y}) = exp {λ (e^{t} - 1)}

, and

E (s^{Y}) = exp {λ (s - 1)}

. Use

P (X = 0) = 1 - 2 θ

,

P (X = \pm 1) = θ

, and the independence of X and Y. □

Corollary 8.

All odd raw moments vanish,

E (Z^{2 m + 1}) = 0

for

m \geq 0

. For

m \geq 1

,

E (Z^{2 m}) = E (X^{2 m} Y^{2 m}) = 2 θ E (Y^{2 m}) = 2 θ T_{2 m} (λ),

where

T_{r} (λ)

is the r-th Touchard polynomial [10] (raw moment of a Poisson

(λ)

). In particular,

E (Z^{2}) = 2 θ λ (1 + λ), E (Z^{4}) = 2 θ λ (1 + 7 λ + 6 λ^{2} + λ^{3}) .

Hence

Var (Z) = 2 θ λ (1 + λ)

.

Corollary 9.

Skewness is 0 due to symmetry, and the (non–excess) kurtosis is

κ = \frac{E (Z^{4})}{{(Var (Z))}^{2}} = \frac{1 + 7 λ + 6 λ^{2} + λ^{3}}{2 θ λ {(1 + λ)}^{2}},

and the excess kurtosis equals

κ - 3

.

Remark 7.

(i) The identities (14) and (15) yield derivatives at

t = 0

(or

s = 1

) that recover the even moments without resorting to series expansions. (ii) The symmetry

G_{Z} (s) = G_{Z} (1 / s)

implies that the distributions of

Z_{1} + Z_{2}

and

Z_{1} - Z_{2}

coincide for independent Sy–Poisson variables; see Section 3.9.

4.2. The Total Time on Test Transform

The total time on test (TTT) transform is a standard tool in reliability analysis and quality-control methodology for assessing distributional shape and aging properties. For a non-negative random variable X with distribution function F and survival function

S (x) = 1 - F (x)

, the TTT transform is defined by

T (u) = \frac{1}{E (X)} \int_{0}^{F^{- 1} (u)} S (x) d x, 0 \leq u \leq 1,

where

F^{- 1}

denotes the (generalized) quantile function of F. The function

T (u)

is increasing in u, and its curvature reveals information about the underlying failure rate behavior: concave curves indicate a decreasing failure rate (DFR), convex curves indicate an increasing failure rate (IFR), and curves close to the diagonal

T (u) = u

correspond to approximately exponential or memoryless behavior.

In practice, the TTT transform is implemented through its discrete empirical version. For ordered non-negative observations

X_{(1)} \leq \dots \leq X_{(n)}

, the empirical TTT values are computed as

T_{i} = \frac{1}{\sum_{j = 1}^{n} X_{(j)}} (\sum_{j = 1}^{i} X_{(j)} + (n - i) X_{(i)}), i = 1, \dots, n,

and the TTT plot is obtained by graphing

T_{i}

against

u_{i} = i / n

. The diagonal line

T (u) = u

serves as a natural reference: empirical curves lying above the diagonal suggest DFR behavior, while those below the diagonal suggest IFR behavior.

In our setting, we apply the TTT transform to the non-negative magnitudes

| Z |

(equivalently, to the Y component in the Sy-

Z

representation), and we compare the empirical TTT plot with the TTT curve implied by the fitted

Sy- P (θ, λ)

model as a diagnostic tool; see Section 6.

4.3. Shannon Entropy

Proposition 14.

Under the assumptions of Proposition 13, the Shannon entropy

H (Z) = - \sum_{k \in Z} P (Z = k) log (P (Z = k))

(natural logarithm) admits an exact representation

H (Z) = - p_{0} log (p_{0}) - 2 θ ((log (θ) - λ) (1 - e^{- λ}) + λ log (λ) - E_{Y} (log (Y!))),

(16)

where

p_{0} = (1 - 2 θ) + 2 θ e^{- λ}

and

Y \sim P (λ)

. In base-2 units, replace

H (Z)

with

H (Z) / log (2)

.

Proof.

By symmetry, write

p_{0} = P (Z = 0)

and, for

n \geq 1

,

p_{\pm n} = P (Z = \pm n) = θ e^{- λ} λ^{n} / n!

. Then

H (Z) = - p_{0} log (p_{0}) - 2 \sum_{n = 1}^{\infty} p_{n} log (p_{n}), p_{n} = θ e^{- λ} \frac{λ^{n}}{n!} .

Using

log (p_{n}) = log (θ) - λ + n log (λ) - log (n!)

and factoring out

2 θ e^{- λ}

,

\begin{matrix} H (Z) & = - p_{0} log (p_{0}) - 2 θ e^{- λ} \sum_{n = 1}^{\infty} \frac{λ^{n}}{n!} (log (θ) - λ + n log (λ) - log (n!)) \\ = - p_{0} log (p_{0}) - 2 θ [(log (θ) - λ) \underset{1 - e^{- λ}}{\underset{︸}{e^{- λ} \sum_{n = 1}^{\infty} \frac{λ^{n}}{n!}}} + log (λ) \underset{λ}{\underset{︸}{e^{- λ} \sum_{n = 1}^{\infty} \frac{n λ^{n}}{n!}}} \\ - \underset{E_{Y} (log (Y!))}{\underset{︸}{e^{- λ} \sum_{n = 1}^{\infty} \frac{λ^{n}}{n!} log (n!)}}] . \end{matrix}

This yields (16). □

Corollary 10.

Using Stirling’s expansion [19]

log (n!) = n log (n) - n + \frac{1}{2} log (2 π n) + O (1 / n)

and taking expectations for

Y \sim P (λ)

,

E (log (Y!)) = λ (log (λ) - 1) + \frac{1}{2} log (2 π λ) + O (λ^{- 1}),

so that

H (Z) = - p_{0} log (p_{0}) - 2 θ ((log (θ) - λ) (1 - e^{- λ}) + λ - \frac{1}{2} log (2 π λ)) + O (λ^{- 1}) .

(17)

5. The Statistical Inference of the Model

5.1. Method of Moments Estimation (MoM)

Consider an i.i.d. sample

z = (z_{1}, \dots, z_{n})

from

Z \sim Sy- P (θ, λ)

with

θ \in (0, 1 / 2)

and

λ > 0

. From Section 4.1 we have

E (| Z |) = 2 θ λ, E (Z^{2}) = 2 θ λ (1 + λ) .

Let the empirical moments be

{\bar{m}}_{1} = \frac{1}{n} \sum_{i = 1}^{n} | z_{i} |, {\bar{m}}_{2} = \frac{1}{n} \sum_{i = 1}^{n} z_{i}^{2} .

Matching

{\bar{m}}_{1}

and

{\bar{m}}_{2}

to their population counterparts yields a closed form solution. If

{\bar{m}}_{1} > 0

and

{\bar{m}}_{2} > {\bar{m}}_{1}

, the MoM are

{\hat{λ}}_{M} = \frac{{\bar{m}}_{2}}{{\bar{m}}_{1}} - 1, {\hat{θ}}_{M} = \frac{{\bar{m}}_{1}}{2 {\hat{λ}}_{M}} = \frac{{\bar{m}}_{1}^{2}}{2 ({\bar{m}}_{2} - {\bar{m}}_{1})} .

(18)

The feasibility conditions

{\bar{m}}_{1} > 0

and

{\bar{m}}_{2} > {\bar{m}}_{1}

ensure

{\hat{λ}}_{M} > 0

and

{\hat{θ}}_{M} > 0

. If

{\hat{θ}}_{M} \geq 1 / 2

, the estimate lies outside the parameter space; in practice, one may either declare the MoM fit infeasible or project to

1 / 2 - ε

for a small

ε > 0

and use the projected value as initialization for maximum likelihood.

5.2. Asymptotic Distribution and Standard Errors (Delta Method)

Let

μ_{1} = E (| Z |) = 2 θ λ

and

μ_{2} = E (Z^{2}) = 2 θ λ (1 + λ)

. From the closed-form moments,

Var (| Z |) = 2 θ λ + 2 θ (1 - 2 θ) λ^{2}, E (Z^{4}) = 2 θ λ (1 + 7 λ + 6 λ^{2} + λ^{3}),

so

Var (Z^{2}) = E (Z^{4}) - μ_{2}^{2} = 2 θ λ (1 + 7 λ + 6 λ^{2} + λ^{3}) - {2 θ λ (1 + λ)}^{2} .

Using

{E (| Z |}^{3}) = 2 θ λ (1 + 3 λ + λ^{2})

,

Cov (| Z |, Z^{2} {) = E (| Z |}^{3}) - μ_{1} μ_{2} = 2 θ λ [1 + 3 λ + λ^{2} - 4 θ λ (1 + λ)] .

A multivariate central limit theorem [20] yields

\sqrt{n} ({({\bar{m}}_{1}, {\bar{m}}_{2})}^{⊤} - {(μ_{1}, μ_{2})}^{⊤}) \overset{d}{\to} N (0, Σ), Σ = (\begin{matrix} Var (| Z |) & Cov (| Z |, Z^{2}) \\ Cov (| Z |, Z^{2}) & Var (Z^{2}) \end{matrix}) .

Define the transformation

g ({\bar{m}}_{1}, {\bar{m}}_{2}) = (λ, θ)

by

λ = {\bar{m}}_{2} / {\bar{m}}_{1} - 1

and

θ = {\bar{m}}_{1}^{2} / (2 ({\bar{m}}_{2} - {\bar{m}}_{1}))

. The Jacobian at

(μ_{1}, μ_{2})

is

K = (\begin{matrix} - \frac{μ_{2}}{μ_{1}^{2}} & \frac{1}{μ_{1}} \\ \frac{μ_{1} (2 μ_{2} - μ_{1})}{2 {(μ_{2} - μ_{1})}^{2}} & - \frac{μ_{1}^{2}}{2 {(μ_{2} - μ_{1})}^{2}} \end{matrix}) = (\begin{matrix} - \frac{1 + λ}{2 θ λ} & \frac{1}{2 θ λ} \\ \frac{1 + 2 λ}{2 λ^{2}} & - \frac{1}{2 λ^{2}} \end{matrix}) .

By the delta method,

\sqrt{n} ({({\hat{λ}}_{M}, {\hat{θ}}_{M})}^{⊤} - {(λ, θ)}^{⊤}) \overset{d}{\to} N (0, V), V = K Σ K^{⊤},

with a practical plug-in estimator of V obtained by replacing

(θ, λ)

with

({\hat{θ}}_{M}, {\hat{λ}}_{M})

.

5.3. Algorithm for MoM Estimation

To facilitate the practical implementation of the inference method, we summarize the computational procedure below, by the Algorithm 1 that outlines the step-by-step calculation of the moment-based estimates.

Remark 8.

The closed form pair

({\bar{m}}_{1}, {\bar{m}}_{2})

provides stable initialization for maximum likelihood, typically improving convergence and reducing sensitivity to local optima. In very small samples, the method of moments may yield

{\hat{θ}}_{M}

close to

1 / 2

; in that case, it is advisable to compare the implied modality (see Proposition 7) using the empirical shape as a diagnostic check.

Algorithm 1 Computation of MoM estimates

1:: Compute ${\bar{m}}_{1} = \sum_{i} | z_{i} | / n$ and ${\bar{m}}_{2} = \sum_{i} z_{i}^{2} / n$ .
2:: If ${\bar{m}}_{1} \leq 0$ or ${\bar{m}}_{2} \leq {\bar{m}}_{1}$ , declare the MoM fit infeasible and proceed with likelihood-based estimation.
3:: Else, compute $({\hat{λ}}_{M}, {\hat{θ}}_{M})$ via (18).
4:: Optionally report delta-method standard errors using the plug-in estimate of V.

5.4. Likelihood-Based Inference

Given an i.i.d. sample

z = (z_{1}, \dots, z_{n})

from

Z \sim Sy- P (θ, λ)

with

θ \in (0, 1 / 2)

and

λ > 0

, let

n_{0} = # {i : z_{i} = 0}

and

n_{1} = n - n_{0}

be the numbers of zeros and nonzeros, respectively. Writing

p_{0} = P (Z = 0) = (1 - 2 θ) + 2 θ e^{- λ} = 1 - 2 θ (1 - e^{- λ})

, the likelihood factorizes as

L (θ, λ ∣ z) = p_{0}^{n_{0}} θ^{n_{1}} e^{- λ n_{1}} \frac{λ^{\sum_{i : z_{i} \neq 0} | z_{i} |}}{\prod_{i : z_{i} \neq 0} | z_{i} |!},

(19)

so that the log-likelihood is

ℓ (θ, λ) = n_{0} log (p_{0}) + n_{1} log (θ) - λ n_{1} + log (λ) \sum_{i : z_{i} \neq 0} | z_{i} | - \sum_{i : z_{i} \neq 0} log (| z_{i} |!) .

The score equations are obtained from

\frac{\partial ℓ}{\partial θ} = - \frac{2 n_{0} (1 - e^{- λ})}{p_{0}} + \frac{n_{1}}{θ}, \frac{\partial ℓ}{\partial λ} = - \frac{2 θ n_{0} e^{- λ}}{p_{0}} - n_{1} + \frac{\sum_{i : z_{i} \neq 0} | z_{i} |}{λ},

The maximum likelihood estimators (MLEs) solve

\partial ℓ / \partial θ = 0

and

\partial ℓ / \partial λ = 0

numerically (no closed form in general). For numerical maximization of the log-likelihood, we used the quasi-Newton BFGS algorithm [21], which provides stable performance for smooth two-parameter models such as

Sy- P

. The optimization was initialized at

(θ^{(0)}, λ^{(0)}) = ({\bar{p}}_{0} / 2, max {{\bar{Z}}^{2}, 10^{- 3}})

, where

{\bar{p}}_{0}

is the empirical proportion of zeros, and

{\bar{Z}}^{2}

is the empirical second moment. A convergence tolerance of

10^{- 8}

on both the absolute and relative change in the objective value was imposed. To guard against potential local maxima, the optimization was repeated from five random starting points generated uniformly over

(0, 0.5) \times (0, 10)

. The algorithm converged to identical estimates, indicating that the likelihood is well behaved for the

Sy- P

model. These implementation details ensure reproducibility and support the observed numerical stability of the MLE.

5.5. Score Derivatives and Information Matrices

Hessian (second derivatives).

\begin{matrix} \frac{\partial^{2} ℓ}{\partial θ^{2}} & = - \frac{4 n_{0} {(1 - e^{- λ})}^{2}}{p_{0}^{2}} - \frac{n_{1}}{θ^{2}}, \\ \frac{\partial^{2} ℓ}{\partial θ \partial λ} & = \frac{\partial^{2} ℓ}{\partial λ \partial θ} = - \frac{2 n_{0} e^{- λ}}{p_{0}^{2}}, \\ \frac{\partial^{2} ℓ}{\partial λ^{2}} & = \frac{2 θ n_{0} e^{- λ} (1 - 2 θ)}{p_{0}^{2}} - \frac{\sum_{i : z_{i} \neq 0} | z_{i} |}{λ^{2}} . \end{matrix}

5.6. Observed Information

The observed information matrix is

J (θ, λ) = - \nabla^{2} ℓ (θ, λ)

:

J (θ, λ) = [\begin{matrix} \frac{4 n_{0} {(1 - e^{- λ})}^{2}}{p_{0}^{2}} + \frac{n_{1}}{θ^{2}} & \frac{2 n_{0} e^{- λ}}{p_{0}^{2}} \\ \frac{2 n_{0} e^{- λ}}{p_{0}^{2}} & \frac{\sum_{i : z_{i} \neq 0} | z_{i} |}{λ^{2}} - \frac{2 θ n_{0} e^{- λ} (1 - 2 θ)}{p_{0}^{2}} \end{matrix}] .

An asymptotically consistent covariance estimator for

(\hat{θ}, \hat{λ})

is

J {(\hat{θ}, \hat{λ})}^{- 1}

.

5.7. Expected Fisher Information

Let

q = 1 - p_{0} = 2 θ (1 - e^{- λ})

. Using

E (N_{0}) = n p_{0}

,

E (N_{1}) = n q

, and

E (\sum_{i : z_{i} \neq 0} | Z_{i} |) = E (\sum_{i = 1}^{n} | Z_{i} |) = n E (| Z |) = n (2 θ λ)

, the expected (per-observation) Fisher information is

i (θ, λ) = \frac{1}{n} E [J (θ, λ)] = [\begin{matrix} \frac{4 {(1 - e^{- λ})}^{2}}{p_{0}} + \frac{q}{θ^{2}} & \frac{2 e^{- λ}}{p_{0}} \\ \frac{2 e^{- λ}}{p_{0}} & \frac{2 θ}{λ} - \frac{2 θ e^{- λ} (1 - 2 θ)}{p_{0}} \end{matrix}], I (θ, λ) = n i (θ, λ) .

Remark 9.

The observed Fisher information matrix for the

Sy- P

model is positive definite whenever

(θ, λ)

belongs to the interior of the parameter space

(0 < θ < 0.5, λ > 0)

. This follows from the strict concavity of the log-likelihood in both parameters: the sign component induces a strictly negative second derivative in θ, while the Poisson magnitude contributes a negative curvature in λ for all

λ > 0

. Consequently, the Hessian matrix is negative definite, and the observed information and its negative remain positive definite away from the boundary. Loss of positive definiteness can only occur near the boundary limits

θ \to 0

,

θ \to 0.5

, or

λ \to 0

, where the model collapses to a degenerate or nearly degenerate form, and classical asymptotic theory is no longer applicable. Thus, for all practical estimation scenarios in the interior region, the observed information matrix provides a valid and reliable approximation to the asymptotic covariance.

Hence, under standard regularity conditions,

\sqrt{n} ((\hat{θ}, \hat{λ}) - (θ, λ)) \overset{d}{\to} N (0, i {(θ, λ)}^{- 1}),

and a large-sample covariance estimator is

I {(\hat{θ}, \hat{λ})}^{- 1} = {(n i (\hat{θ}, \hat{λ}))}^{- 1}

.

5.8. Percentile Estimators

Percentile estimators provide a robust and intuitive alternative to likelihood and moment based procedures. Because the

Sy- P (θ, λ)

distribution is defined through a symmetric, zero-centered mixture structure, its shape is largely determined by its quantiles, particularly those away from the median. Percentile estimators, therefore, offer a way to capture the distributional form using direct CDF inversion, without relying on the smoothness of the likelihood or long-tailed moments. Percentile estimators are obtained by matching the empirical and model based percentiles at two symmetric quantiles, namely the 25th and 75th percentiles. Let

F (z; θ, λ)

be the

Sy- P

CDF, and let

Q (p)

denote the population p-th percentile, given as

Q (p; θ, λ) = F^{- 1} (p; θ, λ)

.

Given empirical percentiles

\tilde{Q} (p_{1})

and

\tilde{Q} (p_{2})

, which are the sample percentiles at level p, the goal of percentile estimation is to find parameter values

(θ, λ)

such that the model percentiles match the data percentiles. We thus define the percentile estimators

(\hat{θ}, \hat{λ})

as the solution of

Q (p_{1}; θ, λ) = \tilde{Q} (p_{1}) and Q (p_{2}; θ, λ) = \tilde{Q} (p_{2}) .

These two non-linear equations can be solved numerically to obtain the estimators. The practical utility of the proposed percentile estimators is illustrated in Section 6, where they are applied to both empirical datasets alongside the MLE estimators. The comparison highlights the robustness of percentile-based estimation, especially in settings with moderate tails or irregular zero concentrations.

6. Simulation Study

To evaluate the finite-sample properties of the proposed estimators for

Z \sim Sy- P (θ, λ)

, we performed a comprehensive Monte Carlo experiment. This simulation mechanism follows directly from Proposition 1, which characterizes any Sy-

Z

random variable as the product of an independent symmetric sign and a non-negative magnitude. Specifically, the representation

Z = V_{1} (2 V_{2} - 1) Y

The constructive form is associated with the probability mass function given in Equation (12):

V_{1}

determines whether

Z = 0

or

Z \neq 0

applies,

V_{2}

selects the sign when

Z \neq 0

, and Y supplies the magnitude. Thus, the generative steps above reproduce exactly the

Sy- P

distribution implied by the theoretical results. For each parameter configuration, 10,000 independent samples of size

n \in {10, 20, \dots, 200}

were drawn under the true parameters

(θ, λ) = (0.3, 3)

. This parameter configuration is representative of many practical scenarios:

θ = 0.3

yields a moderate zero mass in the

Sy- P

model, while

λ = 3

produces a magnitude distribution with moderate dispersion. Together, these values generate datasets whose symmetry, central concentration, and tail behavior closely resemble those encountered in empirical applications, making them well suited for assessing estimation accuracy in realistic settings. The log-likelihood

ℓ (θ, λ)

was maximized numerically to obtain MLEs, and MoM estimators were computed from the first two empirical moments. The Algorithm 2 below provides a concise summary of the exact data-generation procedure used in all Monte Carlo experiments. This step-by-step formulation clarifies how independent draws from the

Sy- P

model are produced and ensures full reproducibility of the simulation design.

Algorithm 2 Generation of an i.i.d. sample

{Z_{i}}_{i = 1}^{n}

from

Sy- P (θ, λ)

1:: for $i = 1, \dots, n$ do
2:: Draw $V_{1 i} \sim Bernoulli (2 θ)$ . If $V_{1 i} = 0$ , set $Z_{i} \leftarrow 0$ and continue.
3:: Draw $V_{2 i} \sim Bernoulli (1 / 2)$ and set $S_{i} \leftarrow 2 V_{2 i} - 1 \in {- 1, 1}$ .
4:: Draw $Y_{i} \sim P (λ)$ and set $Z_{i} \leftarrow S_{i} Y_{i}$ .
5:: end for

6.1. Comparison of MLE and MoM Estimators

For each n, bias and mean-squared error (MSE) were calculated as

{Bias}_{n} (\hat{η}) = \frac{1}{R} \sum_{r = 1}^{R} ({\hat{η}}_{r} - η), {MSE}_{n} (\hat{η}) = \frac{1}{R} \sum_{r = 1}^{R} {({\hat{η}}_{r} - η)}^{2},

with

η \in {θ, λ}

. Figure 5 plots Monte Carlo estimates of

{Bias}_{n}

against n, whereas Figure 6 plots Monte Carlo estimates of

{MSE}_{n}

against n.

Figure 5. Bias of

\hat{λ}

and

\hat{θ}

under

Sy- P (θ, λ)

. Curves compare MLE and MoM estimators across sample sizes.

Figure 6. MSE of

\hat{λ}

and

\hat{θ}

under

Sy- P (θ, λ)

. Curves compare MLE and MoM estimators across sample sizes.

Both MLEs and MoM estimators show rapidly vanishing bias and MSE as the sample size increases, consistent with standard asymptotic theory. The MoM estimator displays a slightly higher MSE for small n compared to the MLE, but its performance converges to that of the MLE as

n \to 200

.

Across the grid of n,

\hat{θ}

shows a mild positive bias, while

\hat{λ}

tends to underestimate the true value. Moreover,

{MSE}_{n} (\hat{λ})

remains larger than

{MSE}_{n} (\hat{θ})

, reflecting the higher sampling variability of the rate parameter. These bias patterns are consistent with the structure of the log-likelihood and the Fisher information described in Section 5.8. The information for

θ

is primarily driven by the zero and near zero observations, where the curvature of the likelihood is pronounced; this yields relatively strong identifiability for

θ

and explains the small positive finite-sample bias. In contrast, the information for

λ

depends on the dispersion of the magnitude component: when many observations fall at or near zero, the effective information about

λ

is reduced, leading to a mild tendency toward underestimation. As the sample size increases, both information components scale linearly with n, causing the biases in

\hat{θ}

and

\hat{λ}

to diminish, in agreement with the asymptotic theory.

6.2. Standard-Error Accuracy and Confidence-Interval Coverage

To assess standard-error accuracy and validate asymptotic normality, we formed observed Wald confidence intervals using the inverse Hessian at the MLE, as given by

\hat{Var} (\hat{θ}, \hat{λ}) = J {(\hat{θ}, \hat{λ})}^{- 1}

. For each replication and parameter, we constructed two-sided Wald intervals at the 90%, 95%, and 99% nominal levels and recorded empirical coverage along with the average interval length. Figure 7 reports the CI coverage, whereas Figure 8 shows the average CI length as functions of n.

Figure 7. Empirical coverage probabilities of the observed Wald confidence intervals for

\hat{θ}

and

\hat{λ}

as a function of n.

Figure 8. Average lengths of the observed Wald confidence intervals for

\hat{θ}

and

\hat{λ}

as a function of n.

The observed Wald intervals show coverage approaching nominal levels as n increases, with noticeable gains between small and moderate sample sizes. In cases of small

λ

, the intervals can be slightly conservative for very small n, but accuracy improves quickly with n, and the average lengths decrease at the expected

n^{- 1 / 2}

rate, consistent with the asymptotic theory for the MLE.

A further point of reassurance concerns the use of Wald confidence intervals in a discrete setting. Although discrete models sometimes induce irregular likelihood shapes and poor Wald performance, the

Sy- P

model benefits from a smooth and strictly concave log-likelihood in both parameters. The independence between the symmetric sign and the Poisson magnitude yields well-behaved score functions and a Fisher information matrix that remains finite and positive for all admissible

(θ, λ)

. These properties ensure that the MLEs lie well within the interior of the parameter space and satisfy standard differentiability conditions, so Wald intervals inherit the usual asymptotic validity even in finite samples. This explains why the simulation results show accurate coverage levels despite the inherent discreteness of the data-generating process. The observed coverage behavior can be directly linked to the analytical form of the Fisher information derived in Section 5.8. When the zero mass

p_{0} = (1 - 2 θ) + 2 θ e^{- λ}

is large, the term

q / θ^{2}

in the information matrix (where

q = 2 θ (1 - e^{- λ})

) dominates, yielding high curvature of the log-likelihood with respect to

θ

and, therefore, tighter confidence intervals. Conversely, the information component associated with

λ

depends on both

θ

and

λ

through

2 θ / λ - 2 θ e^{- λ} (1 - 2 θ) / p_{0}

, which can be relatively flat for small

λ

. This explains why the empirical coverage for

λ

tends to be slightly conservative in small samples, whereas the coverage for

θ

rapidly approaches nominal levels. As the sample size increases, both components of the Fisher information scale linearly with n, leading to asymptotic normality and the near-exact coverage observed for

n \geq 100

.

7. Practical Data Analysis

We illustrate the applicability of the proposed

Sy- P (θ, λ)

model using two real-life datasets. After first-differencing (day-to-day or session-to-session changes), the outcomes lie on

Z

, matching the model’s support.

7.1. PTT Stock Price Increments (Thailand, 2014)

The first dataset, previously analyzed in ref. [4], consists of daily closing prices for the Petroleum Authority of Thailand (PTT), recorded from 1 April 2014 to 20 October 2014 (Stock Exchange of Thailand). For completeness, the data are as follows:

-12, -11, -9, -8, -7, -6, -6, -5, -5, -5, -5, -5, -5, -5, -5, -4, -4, -4,

-4, -4, -4, -4, -4, -4, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -2, -2,

-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -1, -1, -1, -1, -1, -1, -1, -1,

-1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1,

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 3,

3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 5, 5, 5, 6, 6, 6, 6, 6, 6, 7, 7,

7, 7, 8, 8, 9, 9, 9, 10, 14

According to ref. [4], we model price increments; that is, the day-to-day change in the closing price is measured in integer Baht. A Mann–Kendall test against a monotonic trend yields a p-value of

0.9467

, providing no evidence of such a trend and supporting the i.i.d. working assumption for the increment series.

We fit

Sy- P (θ, λ)

by maximum likelihood, obtaining

(\hat{θ}, \hat{λ}) = (0.45, 3.71)

. A Kolmogorov–Smirnov (K–S) test gives

p = 0.47

, indicating an adequate fit. Furthermore, we also find the percentile estimators by fixing

p_{1} = 0.25, p_{2} = 0.75

of the parameters, which are given by

(\hat{θ}, \hat{λ}) = (0.37, 3.43)

. Figure 9 overlays the fitted probability mass function on the empirical frequencies. Figure 10 contains the total time on test (TTT) plot, as described in Section 4.2. The empirical TTT curve was compared to the TTT curve implied by the fitted

Sy- P

model. A close alignment of the curves indicates an adequate fit.

Figure 9. Empirical PMF and fitted

Sy- P (\hat{θ}, \hat{λ})

for PTT price increments.

Figure 10. TTT plot for PTT price increments comparing the empirical curve with the fitted

Sy- P

model. The diagonal line represents the exponential distribution reference.

For benchmarking, we also fit the perturbed discrete Laplace

P D L (p, α)

[6], the discrete Laplace

D L (p)

, the discrete normal

D N (μ, σ)

[5], and the discrete asymmetric Laplace

D A L D (μ, β, λ)

[4]. Table 1 reports log-likelihood, AIC, and BIC. The

Sy- P

model attains the smallest AIC/BIC, and, following ref. [22], the BIC differences relative to competitors exceed 2, providing positive evidence in favor of

Sy- P

. This good empirical fit is consistent with the theoretical properties of the

Sy- P

model. The financial return increments are nearly symmetric and exhibit a pronounced central mass at zero; the

Sy- P

structure accommodates both features naturally through its symmetric sign component and the tunable zero probability governed by

θ

. Moreover, the inherited Poisson tail behavior aligns well with the moderate dispersion observed in the positive and negative increments, explaining the improved information-criteria performance over classical competitors.

Table 1. Model comparison for PTT price increments.

Remark 10.

The

Sy- P

fit closely tracks competitors while imposing exact symmetry around zero and supporting closed-form manipulations for sums and differences (Section 3.9).

7.2. Attendance Increments in a Marketing Course (Lyon, 2012–2013)

We revisit a dataset previously analyzed in ref. [18], where the extended Poisson model was introduced. The data record attendance counts for 60 consecutive marketing sessions in the Bachelor program at IDRAC International Management School (Lyon, France), between 1 September 2012 and 1 April 2013. For completeness, the exact dataset is as given below:

-5, -5, -5, -4, -4, -4, -3, -3, -3, -3, -3, -3, -2, -2, -2, -2, -2, -2,

-2, -2, -2, -2, -2, -2, -2, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0,

0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 3,

3, 3, 3, 3, 3, 3, 4, 4, 5, 5, 5, 5, 5, 5, 6, 7, 8

As documented in ref. [18], we analyze first differences between consecutive sessions, yielding

n = 59

integer-valued observations ranging from

- 5

to 7. This transformation centers the process and produces signed counts, providing a natural benchmark for two-sided discrete models.

As a preliminary diagnostic, the runs test reported in ref. [18] indicates that while the raw series is not random (p-value

= 0

), the differenced series behaves as an approximately random sample (p-value

= 0.7077

), supporting independent-sample modeling at the differenced scale. We fit the symmetric Poisson model

Sy- P (θ, λ)

by maximum likelihood and compare it with standard competitors: discrete Laplace

D L (p)

, perturbed discrete Laplace

P D L (p, α)

[6], discrete normal

D N (μ, σ)

[5], and the extended Poisson

E- P (p, λ)

introduced in ref. [18]. Model fit is assessed via log-likelihood and information criteria (AIC/BIC), using grouped frequencies consistent with ref. [18] for the goodness-of-fit evaluation.

Figure 11 displays the fitted probability mass functions, whereas Figure 12 contains the Total Time on Test (TTT) plot. The empirical TTT curve was compared to the TTT curve implied by the fitted

Sy- P

model. A close alignment of the curves indicates an adequate fit. Table 2 reports the numerical comparisons. The symmetric Poisson achieves the best (smallest) AIC and BIC, marginally improving upon the extended Poisson while preserving explicit symmetry around zero. The estimated parameters for

Sy- P

are

\hat{θ} = 0.49

and

\hat{λ} = 2.47

, consistent with a zero-centered, moderately dispersed pattern with a high concentration of probability near the origin. Further, we also find the percentile estimators by fixing

p_{1} = 0.25, p_{2} = 0.75

of the parameters, which are given by

(\hat{θ}, \hat{λ}) = (0.41, 2.87)

. The favorable fit obtained for attendance differences directly reflects the strengths of the

Sy- P

specification. These data are balanced around zero and moderately dispersed; the

Sy- P

model captures these patterns through exact symmetry, flexible control of the zero mass via

θ

, and tail heredity from the Poisson magnitude. Such theoretical features translate into practical performance, as confirmed by the reduced AIC/BIC values relative to other symmetric alternatives.

Figure 11. Empirical PMF and fitted

Sy- P (\hat{θ}, \hat{λ})

for attendance increments.

Figure 12. TTT plot for attendance increments comparing the empirical curve with the fitted

Sy- P

model. The diagonal line represents the exponential distribution reference.

Table 2. Model comparison for attendance increments (IDRAC, Lyon, 2012–2013).

Remark 11.

(i) The

Sy- P

fit closely tracks the extended Poisson while enforcing exact symmetry and offering closed-form tools for sums/differences (Section 3.9). (ii) The estimate

\hat{θ} = 0.49

lies near the upper boundary

1 / 2

, aligning with pronounced concentration at zero and balanced tails; convergence diagnostics were stable under maximum likelihood with MoM initialization.

8. Concluding Remarks

In this paper, we introduce the Sy-

Z

family, a unified and tractable framework for symmetric integer-valued modeling based on a simple sign-magnitude decomposition. Writing

Z = X Y

with

X \sim SMB (θ)

and

Y \in N_{0}

as independent variables yields models that are exactly symmetric around zero, allowing for interpretable control of the atom at zero and inheriting tail behavior and dispersion from the chosen magnitude distribution. Within this framework, we derived closed-form expressions for the PMF and CDF, bilateral generating functions, and even-order moments; established a characterization by symmetry and sign-magnitude independence; and studied tail transfer, modality, and the equality in law between sums and differences of independent Sy-

Z

variables.

Specializing in a Poisson magnitude leads to the Sy-Poisson model, for which we obtained explicit generating functions, moment identities, and entropy, and developed both method-of-moments and likelihood-based inference. Monte Carlo simulations showed that the maximum likelihood estimators exhibit small finite-sample bias and accurate Wald confidence-interval coverage, while the TTT plots and empirical applications in finance and education confirmed that Sy-Poisson can match or improve upon classical two-sided competitors on

Z

. Beyond these case studies, the sign-magnitude structure suggests further applications in quality-control contexts, where signed deviations from target defect levels or specification limits arise naturally. Promising directions for future work include regression extensions, dependence modeling and time-series formulations, multivariate constructions, and the development of Sy-

Z

based monitoring tools for quality-control problems.

Author Contributions

Conceptualization, H.S.B. and M.K.; methodology, M.K., H.S.B., H.S.S., S.R.B. and L.A.; software, M.K. and S.R.B.; validation, M.K., H.S.B., H.S.S. and S.R.B.; writing—original draft preparation, M.K., H.S.B., H.S.S. and S.R.B.; writing—review and editing, M.K., H.S.B., H.S.S., S.R.B., L.A. and A.F.D.; visualization, M.K., S.R.B., L.A. and A.F.D.; funding acquisition, L.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Deanship of Graduate Studies and Scientific Research at Najran University for funding this work under the Easy Funding Program grant code (NU/EFP/SERC/13/239).

Data Availability Statement

This paper’s application section includes a list of the data that were used, along with their citations.

Acknowledgments

The authors are thankful to the Deanship of Graduate Studies and Scientific Research at Najran University for funding this work under the Easy Funding Program grant code (NU/EFP/SERC/13/239).

Conflicts of Interest

The authors declare no potential conflicts of interest.

Appendix A

Proposition A1.

Let

θ \in (0, 1 / 2)

. Consider independent random variables

S \sim Bernoulli (1 / 2)

and

T \sim Bernoulli (2 θ)

. Define the random variable

X = \{\begin{matrix} 0 & if T = 0, \\ 1 & if S = 1 a n d T = 1, \\ - 1 & if S = 0 a n d T = 1 . \end{matrix}

Then

X \sim SMB (θ)

, and its PMF is given by

P (X = 0) = 1 - 2 θ, P (X = 1) = P (X = - 1) = θ .

Proof.

By the definition of X, we have

{X = 0} = {T = 0}, {X = 1} = {S = 1, T = 1}, {X = - 1} = {S = 0, T = 1} .

Hence

P (X = 0) = P (T = 0) = 1 - 2 θ

. Using the independence of S and T,

P (X = 1) = P (S = 1) P (T = 1) = \frac{1}{2} 2 θ = θ,

and similarly

P (X = - 1) = P (S = 0) P (T = 1) = \frac{1}{2} 2 θ = θ .

Thus X has a PMF

P (X = 0) = 1 - 2 θ, P (X = 1) = P (X = - 1) = θ,

which is exactly

SMB (θ)

. □

Proposition A2.

To generate a random variable

X \sim SMB (θ)

:

1.: Generate $U \sim Uniform (0, 1)$ .
2.: Set

$\begin{matrix} X & = \{\begin{matrix} 0 & if 0 \leq U < 1 - 2 θ, \\ 1 & if 1 - 2 θ \leq U < 1 - θ, \\ - 1 & if 1 - θ \leq U < 1 . \end{matrix} \end{matrix}$

Proof.

Since

U \sim Uniform (0, 1)

, for any interval

[a, b) \subset [0, 1)

we have

P (a \leq U < b) = b - a

. By the construction,

\begin{matrix} {X = 0} & = {0 \leq U < 1 - 2 θ}, \\ {X = 1} & = {1 - 2 θ \leq U < 1 - θ}, \\ {X = - 1} & = {1 - θ \leq U < 1} . \end{matrix}

Hence

\begin{matrix} P (X = 0) & = (1 - 2 θ) - 0 = 1 - 2 θ, \\ P (X = 1) & = (1 - θ) - (1 - 2 θ) = θ, \\ P (X = - 1) & = 1 - (1 - θ) = θ . \end{matrix}

Thus X has PMF

P (X = 0) = 1 - 2 θ, P (X = 1) = P (X = - 1) = θ,

which is exactly

SMB (θ)

. □

References

Skellam, J.G. The frequency distribution of the difference between two Poisson variates belonging to different populations. J. R. Stat. Soc. Ser. A 1946, 109, 296. [Google Scholar] [CrossRef] [PubMed]
Inusah, S.; Kozubowski, T.J. A discrete analogue of the Laplace distribution. J. Stat. Plan. Inference 2006, 136, 1090–1102. [Google Scholar] [CrossRef]
Barbiero, A. An alternative discrete skew Laplace distribution. Stat. Methodol. 2014, 16, 47–67. [Google Scholar] [CrossRef]
Sangpoom, S.; Bodhisuwan, W. The discrete asymmetric Laplace distribution. J. Stat. Theory Pract. 2016, 10, 73–86. [Google Scholar] [CrossRef]
Roy, D. The discrete normal distribution. Commun. Stat. Theory Methods 2003, 32, 1871–1883. [Google Scholar] [CrossRef]
Bapat, S.R.; Bakouch, H.; Chesneau, C. A distribution on Z via perturbing the Laplace distribution with applications to finance and health data. STAT 2023, 12, e535. [Google Scholar] [CrossRef]
Chakraborty, S.; Chakravarty, D. A new discrete probability distribution with integer support on (−∞,∞). Commun. Stat. Theory Methods 2016, 45, 492–505. [Google Scholar] [CrossRef]
Ong, S.H.; Shimizu, K.; Choung, M.N. A class of distribution arising from difference of two random variables. Comput. Stat. Data Anal. 2008, 52, 1490–1499. [Google Scholar] [CrossRef]
Karlis, D.; Ntzoufras, I. Analysis of sports data using bivariate Poisson models. Statistician 2003, 52, 381–393. [Google Scholar] [CrossRef]
Johnson, N.L.; Kemp, A.W.; Kotz, S. Univariate Discrete Distributions, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2005. [Google Scholar]
Cameron, A.C.; Trivedi, P.K. Regression Analysis of Count Data, 2nd ed.; Cambridge University Press: Cambridge, UK, 2013. [Google Scholar]
Lambert, D. Zero-inflated Poisson regression, with an application to defects in manufacturing. Technometrics 1992, 34, 1–14. [Google Scholar] [CrossRef]
Bhati, D.; Chakraborty, S.; Lateef, S.G. A discrete probability model suitable for both symmetric and asymmetric count data. Filomat 2020, 34, 2559–2572. [Google Scholar] [CrossRef]
Chesneau, C.; Pakyari, R.; Kohansal, A.; Bakouch, H.S. Estimation and prediction under different schemes for a flexible symmetric distribution with applications. J. Math. 2024, 2024, 6517277. [Google Scholar] [CrossRef]
Chesneau, C.; Bakouch, H.S.; Tomy, L.; Veena, G. A new discrete distribution on integers: Analytical and applied study on stock exchange and flood data. J. Stat. Manag. Syst. 2022, 25, 1899–1917. [Google Scholar] [CrossRef]
Tomy, L.; Veena, G. A retrospective study on Skellam and related distributions. Austrian J. Stat. 2022, 51, 1102–1111. [Google Scholar] [CrossRef]
Karlis, D.; Mamode Khan, N. Models for integer data. Annu. Rev. Stat. Its Appl. 2023, 10, 297–323. [Google Scholar] [CrossRef]
Bakouch, H.S.; Kachour, M.; Nadarajah, S. An extended Poisson distribution. Commun. Stat. Theory Methods 2016, 45, 6746–6764. [Google Scholar] [CrossRef]
Abramowitz, M.; Stegun, I.A. (Eds.) Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables; Dover Publications: New York, NY, USA, 1965. [Google Scholar]
Rohatgi, V.K.; Saleh, A.K. An Introduction to Probability and Statistics, 2nd ed.; John Wiley and Sons: Hoboken, NJ, USA, 2000. [Google Scholar]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2021; Available online: https://www.R-project.org/ (accessed on 1 November 2025).
Raftery, A.E. Bayesian model selection in social research. Sociol. Methodol. 1995, 25, 111–163. [Google Scholar] [CrossRef]

Figure 1. Probability mass function of

Z \sim Sy- P (θ, λ)

. Each panel fixes

0 < λ < 1

.

Figure 2. Probability mass function of

Z \sim Sy- P (θ, λ)

. Each panel fixes

λ \geq 1

.

Figure 3. Cumulative distribution function of

Z \sim Sy- P (θ, λ)

. Each panel fixes

0 < λ < 1

.

Figure 4. Cumulative distribution function of

Z \sim Sy- P (θ, λ)

. Each panel fixes

λ \geq 1

.

Figure 5. Bias of

\hat{λ}

and

\hat{θ}

under

Sy- P (θ, λ)

. Curves compare MLE and MoM estimators across sample sizes.

Figure 6. MSE of

\hat{λ}

and

\hat{θ}

under

Sy- P (θ, λ)

. Curves compare MLE and MoM estimators across sample sizes.

Figure 7. Empirical coverage probabilities of the observed Wald confidence intervals for

\hat{θ}

and

\hat{λ}

as a function of n.

Figure 8. Average lengths of the observed Wald confidence intervals for

\hat{θ}

and

\hat{λ}

as a function of n.

Figure 9. Empirical PMF and fitted

Sy- P (\hat{θ}, \hat{λ})

for PTT price increments.

Figure 10. TTT plot for PTT price increments comparing the empirical curve with the fitted

Sy- P

model. The diagonal line represents the exponential distribution reference.

Figure 11. Empirical PMF and fitted

Sy- P (\hat{θ}, \hat{λ})

for attendance increments.

Figure 12. TTT plot for attendance increments comparing the empirical curve with the fitted

Sy- P

model. The diagonal line represents the exponential distribution reference.

Table 1. Model comparison for PTT price increments.

Model	Fitted Parameters	Log-Likelihood	AIC	BIC
$P D L (p, α)$	$\hat{p} = 0.74, \hat{α} = 0.67$	$- 388.17$	780.35	786.15
$D L (p)$	$\hat{p} = 0.59$	$- 391.52$	785.04	792.85
$D N (μ, σ)$	$\hat{μ} = 0.32, \hat{σ} = 1.71$	$- 412.15$	828.29	834.11
$D A L D (μ, β, λ)$	$\hat{μ} = 1, \hat{β} = 0.73, \hat{λ} = 0.72$	$- 409.67$	825.34	834.05
$Sy- P (θ, λ)$	$\hat{θ} = 0.45, \hat{λ} = 3.71$	−385.32	772.64	778.91

Table 2. Model comparison for attendance increments (IDRAC, Lyon, 2012–2013).

Model	Fitted Parameters	Log-Likelihood	AIC	BIC
$D L (p)$	$\hat{p} = 0.73$	$- 203.54$	409.08	411.43
$P D L (p, α)$	$\hat{p} = 0.65, \hat{α} = 0.49$	$- 202.94$	409.89	414.57
$D N (μ, σ)$	$\hat{μ} = 0.31, \hat{σ} = 1.73$	$- 195.10$	394.21	398.88
$E- P (p, λ)$	$\hat{p} = 0.51, \hat{λ} = 2.45$	$- 193.64$	391.28	395.96
$Sy- P (θ, λ)$	$\hat{θ} = 0.49, \hat{λ} = 2.47$	−193.62	391.25	395.93

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Symmetric Discrete Distributions on the Integer Line: A Versatile Family and Applications

Abstract

1. Introduction

2. The Sy- Z Family: Construction and Basic Setup

2.1. Stochastic Representation

2.2. Characterization by Symmetry and Independence

3. Main Properties of the Sy- Z Family Distributions

3.1. Identifiability Within Sy- Z

3.2. Tail Behaviour

3.3. Unimodality and Number of Modes

3.4. Cumulative Distribution Function

3.5. First-Order Stochastic Dominance

3.6. Generating Function

3.7. Quantile Function and Median

3.8. Median

3.9. Distribution of Sums and Differences

4. Special Case: The Sy-Poisson Distribution

4.1. Generating Functions

4.2. The Total Time on Test Transform

4.3. Shannon Entropy

5. The Statistical Inference of the Model

5.1. Method of Moments Estimation (MoM)

5.2. Asymptotic Distribution and Standard Errors (Delta Method)

5.3. Algorithm for MoM Estimation

5.4. Likelihood-Based Inference

5.5. Score Derivatives and Information Matrices

5.6. Observed Information

5.7. Expected Fisher Information

5.8. Percentile Estimators

6. Simulation Study

6.1. Comparison of MLE and MoM Estimators

6.2. Standard-Error Accuracy and Confidence-Interval Coverage

7. Practical Data Analysis

7.1. PTT Stock Price Increments (Thailand, 2014)

7.2. Attendance Increments in a Marketing Course (Lyon, 2012–2013)

8. Concluding Remarks

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Article Metrics

Citations

Article Access Statistics

2. The Sy- $Z$ Family: Construction and Basic Setup

3. Main Properties of the Sy- $Z$ Family Distributions

3.1. Identifiability Within Sy- $Z$