MDPI - Publisher of Open Access Journals

18 pages, 1357 KB

Open AccessArticle

Zero-Inflated Data Analysis Using Graph Neural Networks with Convolution

by Sunghae Jun

Computers 2026, 15(2), 104; https://doi.org/10.3390/computers15020104 - 2 Feb 2026

Viewed by 309

Zero-inflated count data are characterized by an excessive frequency of zeros that cannot be adequately analyzed by a single distribution, such as Poisson or negative binomial. This problem is pervasive in many practical applications, including document–keyword matrix derived from text corpora, where most [...] Read more.

Zero-inflated count data are characterized by an excessive frequency of zeros that cannot be adequately analyzed by a single distribution, such as Poisson or negative binomial. This problem is pervasive in many practical applications, including document–keyword matrix derived from text corpora, where most keyword frequencies are zero. Conventional statistical approaches, such as the zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models, explicitly separate a structural zero component from a count component, but they typically assume independent observations and can be unstable when covariates are high-dimensional and sparse. To address these limitations, this paper proposes a graph-based zero-inflated learning framework that combines simple graph convolution (SGC) with zero-inflated count regression heads such as ZIP and ZINB. We first construct an observation graph by connecting similar samples, and then apply SGC to propagate and smooth features over the graph, producing convolutional representations that incorporate neighborhood information while remaining computationally lightweight. The resulting representations are used as covariates in ZIP and ZINB heads, which preserve probabilistic interpretability through maximum likelihood learning. Our experiments on simulated zero-inflated datasets with controlled zero ratios demonstrate that the proposed ZIP+SGC and ZINB+SGC consistently reduce prediction errors compared with their non-graph baselines, as measured by mean absolute error and root mean squared error. Overall, the proposed approach provides an efficient and interpretable way to integrate graph neural computation with zero-inflated modeling for sparse count prediction problems. Full article

► Show Figures

Figure 1

16 pages, 336 KB

Open AccessArticle

Bayesian Neural Networks with Regularization for Sparse Zero-Inflated Data Modeling

by Sunghae Jun

Information 2026, 17(1), 81; https://doi.org/10.3390/info17010081 - 13 Jan 2026

Viewed by 401

Abstract

Zero inflation is pervasive across text mining, event log, and sensor analytics, and it often degrades the predictive performance of analytical models. Classical approaches, most notably the zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models, address excess zeros but rely on rigid [...] Read more.

Zero inflation is pervasive across text mining, event log, and sensor analytics, and it often degrades the predictive performance of analytical models. Classical approaches, most notably the zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models, address excess zeros but rely on rigid parametric assumptions and fixed model structures, which can limit flexibility in high-dimensional, sparse settings. We propose a Bayesian neural network (BNN) with regularization for sparse zero-inflated data modeling. The method separately parameterizes the zero inflation probability and the count intensity under ZIP/ZINB likelihoods, while employing Bayesian regularization to induce sparsity and control overfitting. Posterior inference is performed using variational inference. We evaluate the approach through controlled simulations with varying zero ratios and a real-world dataset, and we compare it against Poisson generalized linear models, ZIP, and ZINB baselines. The present study focuses on predictive performance measured by mean squared error (MSE). Across all settings, the proposed method achieves consistently lower prediction error and improved uncertainty problems, with ablation studies confirming the contribution of the regularization components. These results demonstrate that a regularized BNN provides a flexible and robust framework for sparse zero-inflated data analysis in information-rich environments. Full article

(This article belongs to the Special Issue Feature Papers in Information in 2024–2025)

► Show Figures

Graphical abstract

16 pages, 2700 KB

Open AccessArticle

Spatio-Temporal Distribution of Setipinna taty Resources Using a Zero-Inflated Model in the Offshore Waters of Southern Zhejiang, China

by Xiaoxue Liu, Wen Ma, Jin Ma, Chunxia Gao, Weifeng Chen and Jing Zhao

J. Mar. Sci. Eng. 2026, 14(1), 96; https://doi.org/10.3390/jmse14010096 - 3 Jan 2026

Viewed by 347

Abstract

Effective fishery management in coastal waters requires accurate assessments of species–environment relationships, particularly in data-rich but zero-inflated contexts (i.e., datasets with an excess of zero catches). Here, we used fishery-independent trawl survey data collected from 2018 to 2019 in the offshore waters of [...] Read more.

Effective fishery management in coastal waters requires accurate assessments of species–environment relationships, particularly in data-rich but zero-inflated contexts (i.e., datasets with an excess of zero catches). Here, we used fishery-independent trawl survey data collected from 2018 to 2019 in the offshore waters of southern Zhejiang Province of China to investigate the spatio-temporal distribution of Setipinna taty (scaly hairfin anchovy) and its environmental determinants. Given the high frequency of zero catches, we fitted both zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models and selected the best-performing approach using the Akaike information criterion (AIC). Cross-validation indicated that the ZINB model (RMSE: 199.1, R²; 0.25) outperformed ZIP model (RMSE: 239.4, R²; 0.23). Temperature, depth, and salinity were key predictors of S. taty abundance, which generally occurred at depths of 20–40 m and salinities of 26–34 psu. We then applied the optimal ZINB model to predict S. taty distributions in spring, summer, and autumn of 2020. The predictions indicated a summer peak in abundance and a nearshore-to-offshore decreasing gradient, and were broadly consistent with the spatial distribution trends observed in the 2020 survey data. The highest predicted densities were located in nearshore areas off Wenzhou and Taizhou, west of 122° E. By clarifying the key environmental factors shaping S. taty distribution and applying zero-inflated count models to account for an excess of zero catches, which occur more frequently than expected under standard negative binomial models, this study provides an improved basis for effective conservation and sustainable utilization of S. taty resources in the southern offshore waters of Zhejiang; nevertheless, predictive performance could be further improved by incorporating additional environmental and biotic covariates together with extended spatio-temporal data. Full article

(This article belongs to the Section Marine Ecology)

► Show Figures

Figure 1

30 pages, 539 KB

Open AccessArticle

Symmetric Discrete Distributions on the Integer Line: A Versatile Family and Applications

by Lamia Alyami, Hugo S. Salinas, Hassan S. Bakouch, Maher Kachour, Amira F. Daghestani and Sudeep R. Bapat

Symmetry 2025, 17(12), 2148; https://doi.org/10.3390/sym17122148 - 13 Dec 2025

Viewed by 389

Abstract

We introduce the Symmetric-

Z

(Sy-

Z

) family, a unified class of symmetric discrete distributions on the integers obtained by multiplying a three-point symmetric sign variable by an independent non-negative integer-valued magnitude. This sign-magnitude construction yields interpretable, zero-centered models with tunable mass [...] Read more.

We introduce the Symmetric-

Z

(Sy-

Z

) family, a unified class of symmetric discrete distributions on the integers obtained by multiplying a three-point symmetric sign variable by an independent non-negative integer-valued magnitude. This sign-magnitude construction yields interpretable, zero-centered models with tunable mass at zero and dispersion balanced across signs, making them suitable for outcomes, such as differences of counts or discretized return increments. We derive general distributional properties, including closed-form expressions for the probability mass and cumulative distribution functions, bilateral generating functions, and even moments, and show that the tail behavior is inherited from the magnitude component. A characterization by symmetry and sign–magnitude independence is established and a distinctive operational feature is proved: for independent members of the family, the sum and the difference have the same distribution. As a central example, we study the symmetric Poisson model, providing measures of skewness, kurtosis, and entropy, together with estimation via the method of moments and maximum likelihood. Simulation studies assess finite-sample performance of the estimators, and applications to datasets from finance and education show improved goodness-of-fit relative to established integer-valued competitors. Overall, the Sy-

Z

framework offers a mathematically tractable and interpretable basis for modeling symmetric integer-valued outcomes across diverse domains. Full article

(This article belongs to the Special Issue Skewed (Asymmetrical) Probability Distributions and Applications Across Disciplines, Fourth Edition)

► Show Figures

Figure 1

25 pages, 2764 KB

Open AccessArticle

Integrated Quality Inspection and Production Run Optimization for Imperfect Production Systems with Zero-Inflated Non-Homogeneous Poisson Deterioration

by Chih-Chiang Fang and Ming-Nan Chen

Mathematics 2025, 13(24), 3901; https://doi.org/10.3390/math13243901 - 5 Dec 2025

Cited by 1 | Viewed by 450

Abstract

This study develops an integrated quality inspection and production optimization framework for an imperfect production system, where system deterioration follows a zero-inflated non-homogeneous Poisson process (ZI-NHPP) characterized by a power-law intensity function. Parameters are estimated from historical data using the Expectation-Maximization (EM) algorithm, [...] Read more.

This study develops an integrated quality inspection and production optimization framework for an imperfect production system, where system deterioration follows a zero-inflated non-homogeneous Poisson process (ZI-NHPP) characterized by a power-law intensity function. Parameters are estimated from historical data using the Expectation-Maximization (EM) algorithm, with a zero-inflation parameter π modeling scenario where the system remains defect-free. Operating in either an in-control or out-of-control state, the system produces products with Weibull hazard rates, exhibiting higher failure rates in the out-of-control state. The proposed model integrates system status, defect rates, employee efficiency, and market demand to jointly optimize the number of conforming items inspected and the production run length, thereby minimizing total costs—including production, inspection, correction, inventory, and warranty expenses. Numerical analyses, supported by sensitivity studies, validate the effectiveness of this integrated approach in achieving cost-efficient quality control. This framework enhances quality assurance and production management, offering practical insights for manufacturing across diverse industries. Full article

(This article belongs to the Section C: Mathematical Analysis)

► Show Figures

Figure 1

16 pages, 522 KB

Open AccessArticle

Zero-Inflated Text Data Analysis Using Imbalanced Data Sampling and Statistical Models

by Sunghae Jun

Computers 2025, 14(12), 527; https://doi.org/10.3390/computers14120527 - 2 Dec 2025

Viewed by 528

Abstract

Text data often exhibits high sparsity and zero inflation, where a substantial proportion of entries in the document–keyword matrix are zeros. This characteristic presents challenges to traditional count-based models, which may suffer from reduced predictive accuracy and interpretability in the presence of excessive [...] Read more.

Text data often exhibits high sparsity and zero inflation, where a substantial proportion of entries in the document–keyword matrix are zeros. This characteristic presents challenges to traditional count-based models, which may suffer from reduced predictive accuracy and interpretability in the presence of excessive zeros and overdispersion. To overcome this issue, we propose an effective analytical framework that integrates imbalanced data handling by undersampling with classical probabilistic count models. Specifically, we apply Poisson’s generalized linear models, zero-inflated Poisson, and zero-inflated negative binomial models to analyze zero-inflated text data while preserving the statistical interpretability of term-level counts. The framework is evaluated using both real-world patent documents and simulated datasets. Empirical results demonstrate that our undersampling-based approach improves the model fit without modifying the downstream models. This study contributes a practical preprocessing strategy for enhancing zero-inflated text analysis and offers insights into model selection and data balancing techniques for sparse count data. Full article

► Show Figures

Graphical abstract

30 pages, 1354 KB

Open AccessArticle

Driving Behavior and Insurance Pricing: A Framework for Analysis and Some Evidence from Italian Data Using Zero-Inflated Poisson (ZIP) Models

by Paola Fersini, Michele Longo and Giuseppe Melisi

Risks 2025, 13(11), 214; https://doi.org/10.3390/risks13110214 - 3 Nov 2025

Viewed by 2737

Abstract

Usage-Based Insurance (UBI), also referred to as telematics-based insurance, has been experiencing a growing global diffusion. In addition to being well established in countries such as Italy, the United States, and the United Kingdom, UBI adoption is also accelerating in emerging markets such [...] Read more.

Usage-Based Insurance (UBI), also referred to as telematics-based insurance, has been experiencing a growing global diffusion. In addition to being well established in countries such as Italy, the United States, and the United Kingdom, UBI adoption is also accelerating in emerging markets such as Japan, South Africa, and Brazil. In Japan, telematics insurance has shown significant growth in recent years, with a steadily increasing subscription rate. In South Africa, UBI adoption ranks among the highest worldwide, with market penetration placing the country among the top three globally, just after the United States and Italy. In Brazil, UBI adoption is expanding, supported by government initiatives promoting road safety and innovation in the insurance sector. According to a MarketsandMarkets report of February 2025, the global UBI market is expected to grow from USD 43.38 billion in 2023 to USD 70.46 billion by 2030, with a compound annual growth rate (CAGR) of 7.2% over the forecast period. This growth is driven by the increasing adoption of both electric and internal combustion vehicles equipped with integrated telematics systems, which enable insurers to collect data on driving behavior and to tailor insurance premiums accordingly. In this paper, we analyze a large dataset consisting of trips recorded over five years from 100,000 policyholders across the Italian territory through the installation of black-box devices. Using univariate and multivariate statistical analyses, as well as Generalized Linear Models (GLMs) with Zero-Inflated Poisson distribution, we examine claims frequency and assess the relevance of various synthetic indicators of driving behavior, with the aim of identifying those that are most significant for insurance pricing. Full article

(This article belongs to the Special Issue Innovations in Non-Life Insurance Pricing and Reserving)

► Show Figures

Figure 1

21 pages, 1332 KB

Open AccessArticle

The Ridge-Hurdle Negative Binomial Regression Model: A Novel Solution for Zero-Inflated Counts in the Presence of Multicollinearity

by HM Nayem and B. M. Golam Kibria

Stats 2025, 8(4), 102; https://doi.org/10.3390/stats8040102 - 1 Nov 2025

Viewed by 1590

Abstract

Datasets with many zero outcomes are common in real-world studies and often exhibit overdispersion and strong correlations among predictors, creating challenges for standard count models. Traditional approaches such as the Zero-Inflated Poisson (ZIP), Zero-Inflated Negative Binomial (ZINB), and Hurdle models can handle extra [...] Read more.

Datasets with many zero outcomes are common in real-world studies and often exhibit overdispersion and strong correlations among predictors, creating challenges for standard count models. Traditional approaches such as the Zero-Inflated Poisson (ZIP), Zero-Inflated Negative Binomial (ZINB), and Hurdle models can handle extra zeros and overdispersion but struggle when multicollinearity is present. This study introduces the Ridge-Hurdle Negative Binomial model, which incorporates L₂ regularization into the truncated count component of the hurdle framework to jointly address zero inflation, overdispersion, and multicollinearity. Monte Carlo simulations under varying sample sizes, predictor correlations, and levels of overdispersion and zero inflation show that Ridge-Hurdle NB consistently achieves the lowest mean squared error (MSE) compared to ZIP, ZINB, Hurdle Poisson, Hurdle Negative Binomial, Ridge ZIP, and Ridge ZINB models. Applications to the Wildlife Fish and Medical Care datasets further confirm its superior predictive performance, highlighting RHNB as a robust and efficient solution for complex count data modeling. Full article

(This article belongs to the Section Statistical Methods)

► Show Figures

Figure 1

17 pages, 3465 KB

Open AccessArticle

Longitudinal Gut Microbiome Changes Associated with Transitions from C. difficile Negative to C. difficile Positive on Surveillance Tests

by L. Silvia Munoz-Price, Samantha N. Atkinson, Vy Lam, Blake Buchan, Nathan Ledeboer, Nita H. Salzman and Amy Y. Pan

Microorganisms 2025, 13(10), 2277; https://doi.org/10.3390/microorganisms13102277 - 29 Sep 2025

Viewed by 810

Abstract

Clostridioides difficile is an obligate anaerobe and is primarily transmitted via the fecal–oral route. Data characterizing the microbiome changes accompanying transitions from non-colonized to C. difficile colonized subjects are currently lacking. In this retrospective cohort study, we examined 16S rRNA gene sequencing data [...] Read more.

Clostridioides difficile is an obligate anaerobe and is primarily transmitted via the fecal–oral route. Data characterizing the microbiome changes accompanying transitions from non-colonized to C. difficile colonized subjects are currently lacking. In this retrospective cohort study, we examined 16S rRNA gene sequencing data in a total of 481 fecal samples belonging to 107 patients. Based on C. difficile status over time, patients were categorized as Negative-to-Positive, Negative Control, and Positive Control. A linear mixed effects model was fitted to investigate the changes in the Shannon α-diversity index over time. Zero-inflated negative binomial/Poisson mixed effects models or generalized linear mixed models with negative binomial/Poisson distribution were used to investigate the changes in taxon counts over time among different groups. A total of 107 patients were eligible for the study. The median number of stool samples per patient was 3 (IQR 2–4). A total of 42 patients transitioned from C. difficile negative to positive (Negative-to-Positive), 47 patients remained negative throughout their tests (Negative Control) and 18 were always C. difficile positive (Positive Control). A significant difference in microbiome composition between the last negative samples and the first positive samples were shown in Negative-to-Positive patients, ANOSIM p = 0.022. In Negative-to-Positive patients, the phylum Pseudomonadota and family Enterobacteriaceae increased significantly in the first positive samples compared to the last negative samples, p = 0.0075 and p = 0.0094, respectively. Within the first 21 days, Actinomycetota decreased significantly over time in the Positive Control group compared to the other two groups (p < 0.001) while Bacillota decreased in both the Negative-to-Positive group and Positive Control. These results demonstrate that the transition from C. difficile negative to C. difficile positive is associated with alterations in gut microbial communities and their compositional patterns over time. Moreover, these changes play an important role in both the emergence and intensification of the gut microbiome dysbiosis in patients who transitioned from C. difficile negative to positive and those who always tested positive. Full article

(This article belongs to the Special Issue The Microbiome in Ecosystems)

► Show Figures

Figure 1

32 pages, 1288 KB

Open AccessArticle

Random Forest Adaptation for High-Dimensional Count Regression

by Oyebayo Ridwan Olaniran, Saidat Fehintola Olaniran, Ali Rashash R. Alzahrani, Nada MohammedSaeed Alharbi and Asma Ahmad Alzahrani

Mathematics 2025, 13(18), 3041; https://doi.org/10.3390/math13183041 - 21 Sep 2025

Cited by 2 | Viewed by 1484

Abstract

The analysis of high-dimensional count data presents a unique set of challenges, including overdispersion, zero-inflation, and complex nonlinear relationships that traditional generalized linear models and standard machine learning approaches often fail to adequately address. This study introduces and validates a novel Random Forest [...] Read more.

The analysis of high-dimensional count data presents a unique set of challenges, including overdispersion, zero-inflation, and complex nonlinear relationships that traditional generalized linear models and standard machine learning approaches often fail to adequately address. This study introduces and validates a novel Random Forest framework specifically developed for high-dimensional Poisson and Negative Binomial regression, designed to overcome the limitations of existing methods. Through comprehensive simulations and a real-world genomic application to the Norwegian Mother and Child Cohort Study, we demonstrate that the proposed methods achieve superior predictive accuracy, quantified by lower root mean squared error and deviance, and critically produced exceptionally stable and interpretable feature selections. Our theoretical and empirical results show that these distribution-optimized ensembles significantly outperform both penalized-likelihood techniques and naive-transformation-based ensembles in balancing statistical robustness with biological interpretability. The study concludes that the proposed frameworks provide a crucial methodological advancement, offering a powerful and reliable tool for extracting meaningful insights from complex count data in fields ranging from genomics to public health. Full article

(This article belongs to the Special Issue Statistics for High-Dimensional Data)

► Show Figures

Figure 1

23 pages, 575 KB

Open AccessArticle

A Comparison of the Robust Zero-Inflated and Hurdle Models with an Application to Maternal Mortality

by Phelo Pitsha, Raymond T. Chiruka and Chioneso S. Marange

Math. Comput. Appl. 2025, 30(5), 95; https://doi.org/10.3390/mca30050095 - 2 Sep 2025

Cited by 1 | Viewed by 2544

Abstract

This study evaluates the performance of count regression models in the presence of zero inflation, outliers, and overdispersion using both simulated and real-world maternal mortality dataset. Traditional Poisson and negative binomial regression models often struggle to account for the complexities introduced by excess [...] Read more.

This study evaluates the performance of count regression models in the presence of zero inflation, outliers, and overdispersion using both simulated and real-world maternal mortality dataset. Traditional Poisson and negative binomial regression models often struggle to account for the complexities introduced by excess zeros and outliers. To address these limitations, this study compares the performance of robust zero-inflated (RZI) and robust hurdle (RH) models against conventional models using the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) to determine the best-fitting model. Results indicate that the robust zero-inflated Poisson (RZIP) model performs best overall. The simulation study considers various scenarios, including different levels of zero inflation (50%, 70%, and 80%), outlier proportions (0%, 5%, 10%, and 15%), dispersion values (1, 3, and 5), and sample sizes (50, 200, and 500). Based on AIC comparisons, the robust zero-inflated Poisson (RZIP) and robust hurdle Poisson (RHP) models demonstrate superior performance when outliers are absent or limited to 5%, particularly when dispersion is low (5). However, as outlier levels and dispersion increase, the robust zero-inflated negative binomial (RZINB) and robust hurdle negative binomial (RHNB) models outperform robust zero-inflated Poisson (RZIP) and robust hurdle Poisson (RHP) across all levels of zero inflation and sample sizes considered in the study. Full article

► Show Figures

Figure 1

15 pages, 358 KB

Open AccessFeature PaperArticle

Multi-Task CNN-LSTM Modeling of Zero-Inflated Count and Time-to-Event Outcomes for Causal Inference with Functional Representation of Features

by Jong-Min Kim

Axioms 2025, 14(8), 626; https://doi.org/10.3390/axioms14080626 - 11 Aug 2025

Cited by 1 | Viewed by 1393

Abstract

We propose a novel deep learning framework for counterfactual inference on the COMPAS dataset, utilizing a multi-task CNN-LSTM architecture. The model jointly predicts multiple outcome types: (i) count outcomes with zero inflation, modeled using zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), and negative [...] Read more.

We propose a novel deep learning framework for counterfactual inference on the COMPAS dataset, utilizing a multi-task CNN-LSTM architecture. The model jointly predicts multiple outcome types: (i) count outcomes with zero inflation, modeled using zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), and negative binomial (NB) distributions; (ii) time-to-event outcomes, modeled via the Cox proportional hazards model. To effectively leverage the structure in high-dimensional tabular data, we integrate functional data analysis (FDA) techniques by transforming covariates into smooth functional representations using B-spline basis expansions. Specifically, we construct a pseudo-temporal index over predictor variables and fit basis expansions to each subject’s feature vector, yielding a low-dimensional set of coefficients that preserve smooth variation while reducing noise. This functional representation enables the CNN-LSTM model to capture both local and global temporal patterns in the data, including treatment-covariate interactions. Our approach estimates both population-average and individual-level treatment effects (ATE and CATE) for each outcome and evaluates predictive performance using metrics such as Poisson deviance, root mean squared error (RMSE), and the concordance index (C-index). Statistical inference on treatment effects is supported via bootstrap-based confidence intervals and hypothesis testing. Overall, this comprehensive framework facilitates flexible modeling of heterogeneous treatment effects in structured, high-dimensional data, advancing causal inference methodologies in criminal justice and related domains. Full article

(This article belongs to the Special Issue Functional Data Analysis and Its Application)

► Show Figures

Figure 1

19 pages, 539 KB

Open AccessFeature PaperArticle

Maximum-Likelihood Estimation for the Zero-Inflated Polynomial-Adjusted Poisson Distribution

by Jong-Seung Lee and Hyung-Tae Ha

Mathematics 2025, 13(15), 2383; https://doi.org/10.3390/math13152383 - 24 Jul 2025

Viewed by 879

Abstract

We propose the zero-inflated Polynomially Adjusted Poisson (zPAP) model. It extends the usual zero-inflated Poisson by multiplying the Poisson kernel with a nonnegative polynomial, enabling the model to handle extra zeros, overdispersion, skewness, and even multimodal counts. We derive the maximum-likelihood framework—including the [...] Read more.

We propose the zero-inflated Polynomially Adjusted Poisson (zPAP) model. It extends the usual zero-inflated Poisson by multiplying the Poisson kernel with a nonnegative polynomial, enabling the model to handle extra zeros, overdispersion, skewness, and even multimodal counts. We derive the maximum-likelihood framework—including the log-likelihood and score equations under both general and regression settings—and fit zPAP to the zero-inflated, highly dispersed Fish Catch data as well as a synthetic bimodal mixture. In both cases, zPAP not only outperforms the standard zero-inflated Poisson model but also yields reliable inference via parametric bootstrap confidence intervals. Overall, zPAP is a clear and tractable tool for real-world count data with complex features. Full article

(This article belongs to the Special Issue Statistical Theory and Application, 2nd Edition)

► Show Figures

Figure 1

17 pages, 343 KB

Open AccessArticle

On the Conflation of Poisson and Logarithmic Distributions with Applications

by Abdulhamid A. Alzaid, Anfal A. Alqefari and Najla Qarmalah

Axioms 2025, 14(7), 518; https://doi.org/10.3390/axioms14070518 - 6 Jul 2025

Cited by 1 | Viewed by 915

Abstract

It is frequent for real-life count data to show inflation in lower values; however, most of the well-known count distributions cannot capture such a feature. The present paper introduces a new distribution for modeling inflated count data in small values based on a [...] Read more.

It is frequent for real-life count data to show inflation in lower values; however, most of the well-known count distributions cannot capture such a feature. The present paper introduces a new distribution for modeling inflated count data in small values based on a conflation of distributions approach. The new distribution inherits some properties from Poisson distribution (PD) and logarithmic distribution (LD), making it a powerful modeling tool. It can serve as an alternative to PD, LD, and zero-truncated distributions. The new distribution is worth considering theoretically, as it belongs to the weighted PD family. With zero as a support point, two additional models are suggested for the new distribution. These modifications yield distributions that demonstrate overdispersion models comparable to the negative binomial distribution (NBD) while retaining essential PD properties, making them suitable for accurately representing count data with frequent events of low frequency and high variance. Furthermore, we discuss the superior performance of three new distributions in modeling real count data compared to traditional count distributions such as PD and NBD, as well as other discrete distributions. This paper examines the key statistical properties of the proposed distributions. A comparison of the novel and other distributions in the literature is shown employing real-life data from some domains. All of the computations shown in this study are generated using the R programming language. Full article

(This article belongs to the Special Issue Advances in the Theory and Applications of Statistical Distributions)

► Show Figures

Figure 1

24 pages, 347 KB

Open AccessArticle

Estimating the Ratio of Means in a Zero-Inflated Poisson Mixture Model

by Michael Pearce and Michael D. Perlman

Stats 2025, 8(3), 55; https://doi.org/10.3390/stats8030055 - 5 Jul 2025

Viewed by 626

Abstract

The problem of estimating the ratio of the means of a two-component Poisson mixture model is considered, when each component is subject to zero-inflation, i.e., excess zero counts. The resulting zero-inflated Poisson mixture (ZIPM) model can be viewed as a three-component Poisson mixture [...] Read more.

The problem of estimating the ratio of the means of a two-component Poisson mixture model is considered, when each component is subject to zero-inflation, i.e., excess zero counts. The resulting zero-inflated Poisson mixture (ZIPM) model can be viewed as a three-component Poisson mixture model with one degenerate component. The EM algorithm is applied to obtain frequentist estimators and their standard errors, the latter determined via an explicit expression for the observed information matrix. As an intermediate step, we derive an explicit expression for standard errors in the two-component Poisson mixture model (without zero-inflation), a new result. The ZIPM model is applied to simulated data and real ecological count data of frigatebirds on the Coral Sea Islands off the coast of Northeast Australia. Full article

► Show Figures

Figure 1

Search Results (86)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (86)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI