Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (84)

Search Parameters:
Keywords = zero inflated negative binomial model

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
14 pages, 1656 KB  
Article
Nationwide Patterns and Predictors of Sick Leave Among Healthcare Workers in Kuwait, 2022
by Saleh Alsarhan, Lolwah Alzafiri and Eiman Alawadhi
Healthcare 2026, 14(6), 758; https://doi.org/10.3390/healthcare14060758 - 18 Mar 2026
Abstract
Background/Objectives: Healthcare worker (HCW) sickness absenteeism can disrupt healthcare service delivery and increase workload pressures, yet evidence from Kuwait remains limited. This study examined patterns of sick leave episodes and duration among HCWs in Kuwait and identified associated predictors. Methods: We [...] Read more.
Background/Objectives: Healthcare worker (HCW) sickness absenteeism can disrupt healthcare service delivery and increase workload pressures, yet evidence from Kuwait remains limited. This study examined patterns of sick leave episodes and duration among HCWs in Kuwait and identified associated predictors. Methods: We conducted a nationwide retrospective analysis of sick leave utilization using Ministry of Health (MOH) administrative records for 2022, including 51,204 HCWs across all MOH healthcare facilities. Outcomes were sick leave episodes and sick leave duration. Independent variables included age, gender, nationality, place of residence, profession, managerial position, and influenza vaccination status. Zero-inflated negative binomial regression models were used to estimate adjusted incidence rate ratios (IRRs) with 95% confidence intervals (CIs). Results: In 2022, 196,840 sick leave episodes and 295,206 sick leave days were recorded, with 53% of HCWs experiencing at least one episode. Upper respiratory tract infection (URTI)-related sick leave exhibited seasonal variation, with higher proportions during winter months. Younger age, female sex, Kuwaiti nationality, non-managerial position, and medical technician professions were associated with higher sick leave episodes and duration, while physicians, dentists, and pharmacists had lower sick leave utilization compared with nurses. Influenza vaccination was associated with fewer sick leave episodes and shorter duration. Conclusions: Sick leave patterns among HCWs in Kuwait show noticeable seasonal, demographic, and occupational variation. Targeted preventive strategies and workforce policies may help reduce sick leave burden. Full article
Show Figures

Figure 1

21 pages, 855 KB  
Article
Global Market Shocks and Food Riots: The Impact of Energy Prices, Biofuels, and Financial Speculation in Africa
by Tetsuji Tanaka and Jin Guo
Sustainability 2026, 18(6), 2959; https://doi.org/10.3390/su18062959 - 17 Mar 2026
Abstract
Even though the existing literature has elucidated the domestic causes of riots and the links between global food prices and riots, the relationship between riots and various external factors, such as biofuel production, global crude oil prices, speculation, and the US dollar exchange [...] Read more.
Even though the existing literature has elucidated the domestic causes of riots and the links between global food prices and riots, the relationship between riots and various external factors, such as biofuel production, global crude oil prices, speculation, and the US dollar exchange rate, has yet to be fully analyzed. This study aimed to fill this research gap by examining the associations of these external factors on the occurrence of riots in Africa using the Poisson, negative binomial, and zero-inflated negative binomial models. Our key findings are as follows: (1) U.S. ethanol production and international crude oil prices are positively associated with riot frequency, whereas U.S. biodiesel production is not statistically significant. (2) A higher long share relative to open interest increases riot incidence, while a higher short share reduces it. (3) Both international food prices and African domestic food prices exhibit positive and statistically significant associations with riots. (4) Appreciation of the U.S. dollar is negatively correlated with food riots. Overall, the findings suggest that global energy, financial, monetary, and food price dynamics are systematically linked to food riots in Africa. Full article
(This article belongs to the Special Issue Sustainable Development and Climate, Energy, and Food Security Nexus)
Show Figures

Figure 1

29 pages, 1593 KB  
Article
COVID-19 Mortality, Human Development, and Age Across the WHO Member States: A Longitudinal Multilevel Count Data Analysis
by José Clemente Jacinto Ferreira, Ana Paula Matias Gama, Luiz Paulo Fávero, Ricardo Goulart Serra, Patrícia Belfiore, Igor Pinheiro de Araújo Costa, Miguel Ângelo Lellis Moreira, Marcos dos Santos and Wilson Tarantin Junior
Computers 2026, 15(2), 136; https://doi.org/10.3390/computers15020136 - 22 Feb 2026
Viewed by 330
Abstract
This study aims to verify whether there is a statistically significant relationship between COVID-19 mortality rates, the Human Development Index (HDI), and population age across the World Health Organisation (WHO) member states. Despite the extensive literature on COVID-19 mortality and socio-demographic indicators, few [...] Read more.
This study aims to verify whether there is a statistically significant relationship between COVID-19 mortality rates, the Human Development Index (HDI), and population age across the World Health Organisation (WHO) member states. Despite the extensive literature on COVID-19 mortality and socio-demographic indicators, few studies explicitly integrate count data diagnostics, zero-inflation mechanisms, and multilevel longitudinal modelling to jointly capture cross-country heterogeneity and temporal dynamics. This study addresses this gap by applying a structured modelling framework that combines negative binomial, zero-inflated, and multilevel regression models to the WHO country-level data. For this purpose, two different statistical techniques were applied, namely: negative binomial regression modelling, zero-inflated negative binomial type for daily temporal exposure on 20 July 2020 and 20 July 2022, before and after the application of the first dose of the COVID-19 vaccine; and multilevel regression for two-level repeated measures data. Negative binomial regression estimates indicate statistically significant positive associations between HDI, age, and COVID-19 mortality rates before the application of the first dose of the vaccine. The variance decomposition from the definition of an unconditional model indicates significant variability in the occurrences of infection and death and between countries/states over time. Full article
Show Figures

Figure 1

18 pages, 1357 KB  
Article
Zero-Inflated Data Analysis Using Graph Neural Networks with Convolution
by Sunghae Jun
Computers 2026, 15(2), 104; https://doi.org/10.3390/computers15020104 - 2 Feb 2026
Viewed by 319
Abstract
Zero-inflated count data are characterized by an excessive frequency of zeros that cannot be adequately analyzed by a single distribution, such as Poisson or negative binomial. This problem is pervasive in many practical applications, including document–keyword matrix derived from text corpora, where most [...] Read more.
Zero-inflated count data are characterized by an excessive frequency of zeros that cannot be adequately analyzed by a single distribution, such as Poisson or negative binomial. This problem is pervasive in many practical applications, including document–keyword matrix derived from text corpora, where most keyword frequencies are zero. Conventional statistical approaches, such as the zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models, explicitly separate a structural zero component from a count component, but they typically assume independent observations and can be unstable when covariates are high-dimensional and sparse. To address these limitations, this paper proposes a graph-based zero-inflated learning framework that combines simple graph convolution (SGC) with zero-inflated count regression heads such as ZIP and ZINB. We first construct an observation graph by connecting similar samples, and then apply SGC to propagate and smooth features over the graph, producing convolutional representations that incorporate neighborhood information while remaining computationally lightweight. The resulting representations are used as covariates in ZIP and ZINB heads, which preserve probabilistic interpretability through maximum likelihood learning. Our experiments on simulated zero-inflated datasets with controlled zero ratios demonstrate that the proposed ZIP+SGC and ZINB+SGC consistently reduce prediction errors compared with their non-graph baselines, as measured by mean absolute error and root mean squared error. Overall, the proposed approach provides an efficient and interpretable way to integrate graph neural computation with zero-inflated modeling for sparse count prediction problems. Full article
Show Figures

Figure 1

16 pages, 336 KB  
Article
Bayesian Neural Networks with Regularization for Sparse Zero-Inflated Data Modeling
by Sunghae Jun
Information 2026, 17(1), 81; https://doi.org/10.3390/info17010081 - 13 Jan 2026
Viewed by 408
Abstract
Zero inflation is pervasive across text mining, event log, and sensor analytics, and it often degrades the predictive performance of analytical models. Classical approaches, most notably the zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models, address excess zeros but rely on rigid [...] Read more.
Zero inflation is pervasive across text mining, event log, and sensor analytics, and it often degrades the predictive performance of analytical models. Classical approaches, most notably the zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models, address excess zeros but rely on rigid parametric assumptions and fixed model structures, which can limit flexibility in high-dimensional, sparse settings. We propose a Bayesian neural network (BNN) with regularization for sparse zero-inflated data modeling. The method separately parameterizes the zero inflation probability and the count intensity under ZIP/ZINB likelihoods, while employing Bayesian regularization to induce sparsity and control overfitting. Posterior inference is performed using variational inference. We evaluate the approach through controlled simulations with varying zero ratios and a real-world dataset, and we compare it against Poisson generalized linear models, ZIP, and ZINB baselines. The present study focuses on predictive performance measured by mean squared error (MSE). Across all settings, the proposed method achieves consistently lower prediction error and improved uncertainty problems, with ablation studies confirming the contribution of the regularization components. These results demonstrate that a regularized BNN provides a flexible and robust framework for sparse zero-inflated data analysis in information-rich environments. Full article
(This article belongs to the Special Issue Feature Papers in Information in 2024–2025)
Show Figures

Graphical abstract

22 pages, 3994 KB  
Article
Sustainable Safety Planning on Two-Lane Highways: A Random Forest Approach for Crash Prediction and Resource Allocation
by Fahmida Rahman, Cidambi Srinivasan, Xu Zhang and Mei Chen
Sustainability 2026, 18(2), 635; https://doi.org/10.3390/su18020635 - 8 Jan 2026
Viewed by 256
Abstract
During the safety planning stage, accurate crash prediction tools are critical for prioritizing countermeasures and allocating resources effectively. Traditional statistical approaches, while long applied in this field, often depend on distributional assumptions that may introduce bias and limit model accuracy. To address these [...] Read more.
During the safety planning stage, accurate crash prediction tools are critical for prioritizing countermeasures and allocating resources effectively. Traditional statistical approaches, while long applied in this field, often depend on distributional assumptions that may introduce bias and limit model accuracy. To address these issues, studies have started exploring Machine Learning (ML)-based techniques for crash prediction, particularly for higher functional class roads. However, the application of ML models on two-lane highways remains relatively limited. This study aims to develop an approach to integrate traffic, geometric, and critically, speed-based factors in crash prediction using Random Forest (RF) and SHapley Additive exPlanations (SHAP) techniques. Comparative analysis shows that the RF model improves crash prediction accuracy by up to 25% over the traditional Zero-Inflated Negative Binomial model. SHAP analysis identified AADT, segment length, and average speed as the three most influential predictors of crash frequency, with speed emerging as a key operational factor alongside traditional exposure measures. The strong influence of speed in the RF–SHAP results depicts its critical role in the safety performance of two-lane highways and highlights the value of incorporating detailed operating characteristics into crash prediction models. Overall, the proposed RF–SHAP framework advances roadway safety assessment by offering both predictive accuracy and interpretability, allowing agencies to identify high-impact factors, prioritize countermeasures, and direct resources more efficiently. In doing so, the approach supports sustainable safety management by enabling evidence-based investments, promoting optimal use of limited transportation funds, and contributing to safer, more resilient mobility systems. Full article
(This article belongs to the Special Issue Sustainable Urban Mobility: Road Safety and Traffic Engineering)
Show Figures

Figure 1

16 pages, 2700 KB  
Article
Spatio-Temporal Distribution of Setipinna taty Resources Using a Zero-Inflated Model in the Offshore Waters of Southern Zhejiang, China
by Xiaoxue Liu, Wen Ma, Jin Ma, Chunxia Gao, Weifeng Chen and Jing Zhao
J. Mar. Sci. Eng. 2026, 14(1), 96; https://doi.org/10.3390/jmse14010096 - 3 Jan 2026
Viewed by 352
Abstract
Effective fishery management in coastal waters requires accurate assessments of species–environment relationships, particularly in data-rich but zero-inflated contexts (i.e., datasets with an excess of zero catches). Here, we used fishery-independent trawl survey data collected from 2018 to 2019 in the offshore waters of [...] Read more.
Effective fishery management in coastal waters requires accurate assessments of species–environment relationships, particularly in data-rich but zero-inflated contexts (i.e., datasets with an excess of zero catches). Here, we used fishery-independent trawl survey data collected from 2018 to 2019 in the offshore waters of southern Zhejiang Province of China to investigate the spatio-temporal distribution of Setipinna taty (scaly hairfin anchovy) and its environmental determinants. Given the high frequency of zero catches, we fitted both zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models and selected the best-performing approach using the Akaike information criterion (AIC). Cross-validation indicated that the ZINB model (RMSE: 199.1, R2; 0.25) outperformed ZIP model (RMSE: 239.4, R2; 0.23). Temperature, depth, and salinity were key predictors of S. taty abundance, which generally occurred at depths of 20–40 m and salinities of 26–34 psu. We then applied the optimal ZINB model to predict S. taty distributions in spring, summer, and autumn of 2020. The predictions indicated a summer peak in abundance and a nearshore-to-offshore decreasing gradient, and were broadly consistent with the spatial distribution trends observed in the 2020 survey data. The highest predicted densities were located in nearshore areas off Wenzhou and Taizhou, west of 122° E. By clarifying the key environmental factors shaping S. taty distribution and applying zero-inflated count models to account for an excess of zero catches, which occur more frequently than expected under standard negative binomial models, this study provides an improved basis for effective conservation and sustainable utilization of S. taty resources in the southern offshore waters of Zhejiang; nevertheless, predictive performance could be further improved by incorporating additional environmental and biotic covariates together with extended spatio-temporal data. Full article
(This article belongs to the Section Marine Ecology)
Show Figures

Figure 1

16 pages, 522 KB  
Article
Zero-Inflated Text Data Analysis Using Imbalanced Data Sampling and Statistical Models
by Sunghae Jun
Computers 2025, 14(12), 527; https://doi.org/10.3390/computers14120527 - 2 Dec 2025
Viewed by 536
Abstract
Text data often exhibits high sparsity and zero inflation, where a substantial proportion of entries in the document–keyword matrix are zeros. This characteristic presents challenges to traditional count-based models, which may suffer from reduced predictive accuracy and interpretability in the presence of excessive [...] Read more.
Text data often exhibits high sparsity and zero inflation, where a substantial proportion of entries in the document–keyword matrix are zeros. This characteristic presents challenges to traditional count-based models, which may suffer from reduced predictive accuracy and interpretability in the presence of excessive zeros and overdispersion. To overcome this issue, we propose an effective analytical framework that integrates imbalanced data handling by undersampling with classical probabilistic count models. Specifically, we apply Poisson’s generalized linear models, zero-inflated Poisson, and zero-inflated negative binomial models to analyze zero-inflated text data while preserving the statistical interpretability of term-level counts. The framework is evaluated using both real-world patent documents and simulated datasets. Empirical results demonstrate that our undersampling-based approach improves the model fit without modifying the downstream models. This study contributes a practical preprocessing strategy for enhancing zero-inflated text analysis and offers insights into model selection and data balancing techniques for sparse count data. Full article
Show Figures

Graphical abstract

32 pages, 14199 KB  
Article
Gated vs. Non-Gated Estates: Spatial Factors Shaping Stationary and Social Activities in Chinese Housing Estates
by Yufeng Yang, Laura Vaughan and Matthew Carmona
Land 2025, 14(12), 2340; https://doi.org/10.3390/land14122340 - 28 Nov 2025
Cited by 2 | Viewed by 950
Abstract
Open spaces in housing estates are crucial for residents’ physical and mental well-being, especially when access to other public spaces is restricted (e.g., during a pandemic). While existing studies focus on public spaces, less is known about how residential landscapes, particularly in gated [...] Read more.
Open spaces in housing estates are crucial for residents’ physical and mental well-being, especially when access to other public spaces is restricted (e.g., during a pandemic). While existing studies focus on public spaces, less is known about how residential landscapes, particularly in gated estates, influence outdoor activities. This study investigates the spatial logic behind the distribution of standing, sitting and social interaction within six pairs of gated and non-gated housing estates in Wuhan. Using space syntax analysis and zero-inflated negative binomial regression, we explore how the spatial configuration influences the incidence of outdoor activities in gated and non-gated estates. The findings suggest that spatial attributes not only significantly explain where activities occurred but also where they did not. More importantly, we found distinct differences between the two types: non-gated estates were more responsive to design, with more spatial factors significantly predicting activities simultaneously, whereas in gated compounds, only a few factors had a significant impact. Critical factors of outdoor activities include seating provision, convex area, perimeter enclosure, and spatial accessibility. These findings contribute to the theoretical understanding of spatial dynamics in residential environments and provide practical insights for urban design and residential planning. Full article
Show Figures

Figure 1

21 pages, 1332 KB  
Article
The Ridge-Hurdle Negative Binomial Regression Model: A Novel Solution for Zero-Inflated Counts in the Presence of Multicollinearity
by HM Nayem and B. M. Golam Kibria
Stats 2025, 8(4), 102; https://doi.org/10.3390/stats8040102 - 1 Nov 2025
Viewed by 1603
Abstract
Datasets with many zero outcomes are common in real-world studies and often exhibit overdispersion and strong correlations among predictors, creating challenges for standard count models. Traditional approaches such as the Zero-Inflated Poisson (ZIP), Zero-Inflated Negative Binomial (ZINB), and Hurdle models can handle extra [...] Read more.
Datasets with many zero outcomes are common in real-world studies and often exhibit overdispersion and strong correlations among predictors, creating challenges for standard count models. Traditional approaches such as the Zero-Inflated Poisson (ZIP), Zero-Inflated Negative Binomial (ZINB), and Hurdle models can handle extra zeros and overdispersion but struggle when multicollinearity is present. This study introduces the Ridge-Hurdle Negative Binomial model, which incorporates L2 regularization into the truncated count component of the hurdle framework to jointly address zero inflation, overdispersion, and multicollinearity. Monte Carlo simulations under varying sample sizes, predictor correlations, and levels of overdispersion and zero inflation show that Ridge-Hurdle NB consistently achieves the lowest mean squared error (MSE) compared to ZIP, ZINB, Hurdle Poisson, Hurdle Negative Binomial, Ridge ZIP, and Ridge ZINB models. Applications to the Wildlife Fish and Medical Care datasets further confirm its superior predictive performance, highlighting RHNB as a robust and efficient solution for complex count data modeling. Full article
(This article belongs to the Section Statistical Methods)
Show Figures

Figure 1

17 pages, 2574 KB  
Article
Calling Phenology of Two Frog Species in South Korean Rice Paddies Using Automated Call Detection
by Soyeon Chae, Jinu Eo and Yikweon Jang
Animals 2025, 15(21), 3141; https://doi.org/10.3390/ani15213141 - 29 Oct 2025
Viewed by 792
Abstract
Amphibian breeding phenology provides key insights into species’ sensitivity to climatic and anthropogenic drivers. We used passive acoustic monitoring (PAM) with automated call detection to examine the calling activity of Dryophytes japonicus and Pelophylax nigromaculatus in South Korean rice paddies across five breeding [...] Read more.
Amphibian breeding phenology provides key insights into species’ sensitivity to climatic and anthropogenic drivers. We used passive acoustic monitoring (PAM) with automated call detection to examine the calling activity of Dryophytes japonicus and Pelophylax nigromaculatus in South Korean rice paddies across five breeding seasons (2018–2022). Both species exhibited distinct seasonal patterns: D. japonicus showed a synchronous and concentrated calling peak in mid-June (GAM deviance explained = 34%), whereas P. nigromaculatus initiated calling earlier and maintained a longer, less synchronized calling period extending into July (GAM deviance explained = 19%). Zero-inflated negative binomial models demonstrated that temperature was the strongest predictor of calling activity in both species, though responses to humidity and wind differed. D. japonicus maintained high calling rate under warm conditions, with only modest suppression at high humidity, whereas P. nigromaculatus was strongly inhibited by combined warm and humid conditions. These results establish a detailed information on the calling phenology of D. japonicus and P. nigromaculatus in East Asian agroecosystems highlight species-specific sensitivities to local weather variables. Our findings demonstrate that automated acoustic monitoring offers an efficient way to document ecological responses to weather variability and may serve as a long-term tool to track phenological shifts under climate change. Future advances in sound analysis, including the integration of deep-learning algorithms and cross-species detection frameworks, could further improve automated biodiversity monitoring in complex agricultural landscapes. Full article
Show Figures

Figure 1

17 pages, 3465 KB  
Article
Longitudinal Gut Microbiome Changes Associated with Transitions from C. difficile Negative to C. difficile Positive on Surveillance Tests
by L. Silvia Munoz-Price, Samantha N. Atkinson, Vy Lam, Blake Buchan, Nathan Ledeboer, Nita H. Salzman and Amy Y. Pan
Microorganisms 2025, 13(10), 2277; https://doi.org/10.3390/microorganisms13102277 - 29 Sep 2025
Viewed by 817
Abstract
Clostridioides difficile is an obligate anaerobe and is primarily transmitted via the fecal–oral route. Data characterizing the microbiome changes accompanying transitions from non-colonized to C. difficile colonized subjects are currently lacking. In this retrospective cohort study, we examined 16S rRNA gene sequencing data [...] Read more.
Clostridioides difficile is an obligate anaerobe and is primarily transmitted via the fecal–oral route. Data characterizing the microbiome changes accompanying transitions from non-colonized to C. difficile colonized subjects are currently lacking. In this retrospective cohort study, we examined 16S rRNA gene sequencing data in a total of 481 fecal samples belonging to 107 patients. Based on C. difficile status over time, patients were categorized as Negative-to-Positive, Negative Control, and Positive Control. A linear mixed effects model was fitted to investigate the changes in the Shannon α-diversity index over time. Zero-inflated negative binomial/Poisson mixed effects models or generalized linear mixed models with negative binomial/Poisson distribution were used to investigate the changes in taxon counts over time among different groups. A total of 107 patients were eligible for the study. The median number of stool samples per patient was 3 (IQR 2–4). A total of 42 patients transitioned from C. difficile negative to positive (Negative-to-Positive), 47 patients remained negative throughout their tests (Negative Control) and 18 were always C. difficile positive (Positive Control). A significant difference in microbiome composition between the last negative samples and the first positive samples were shown in Negative-to-Positive patients, ANOSIM p = 0.022. In Negative-to-Positive patients, the phylum Pseudomonadota and family Enterobacteriaceae increased significantly in the first positive samples compared to the last negative samples, p = 0.0075 and p = 0.0094, respectively. Within the first 21 days, Actinomycetota decreased significantly over time in the Positive Control group compared to the other two groups (p < 0.001) while Bacillota decreased in both the Negative-to-Positive group and Positive Control. These results demonstrate that the transition from C. difficile negative to C. difficile positive is associated with alterations in gut microbial communities and their compositional patterns over time. Moreover, these changes play an important role in both the emergence and intensification of the gut microbiome dysbiosis in patients who transitioned from C. difficile negative to positive and those who always tested positive. Full article
(This article belongs to the Special Issue The Microbiome in Ecosystems)
Show Figures

Figure 1

32 pages, 1288 KB  
Article
Random Forest Adaptation for High-Dimensional Count Regression
by Oyebayo Ridwan Olaniran, Saidat Fehintola Olaniran, Ali Rashash R. Alzahrani, Nada MohammedSaeed Alharbi and Asma Ahmad Alzahrani
Mathematics 2025, 13(18), 3041; https://doi.org/10.3390/math13183041 - 21 Sep 2025
Cited by 2 | Viewed by 1488
Abstract
The analysis of high-dimensional count data presents a unique set of challenges, including overdispersion, zero-inflation, and complex nonlinear relationships that traditional generalized linear models and standard machine learning approaches often fail to adequately address. This study introduces and validates a novel Random Forest [...] Read more.
The analysis of high-dimensional count data presents a unique set of challenges, including overdispersion, zero-inflation, and complex nonlinear relationships that traditional generalized linear models and standard machine learning approaches often fail to adequately address. This study introduces and validates a novel Random Forest framework specifically developed for high-dimensional Poisson and Negative Binomial regression, designed to overcome the limitations of existing methods. Through comprehensive simulations and a real-world genomic application to the Norwegian Mother and Child Cohort Study, we demonstrate that the proposed methods achieve superior predictive accuracy, quantified by lower root mean squared error and deviance, and critically produced exceptionally stable and interpretable feature selections. Our theoretical and empirical results show that these distribution-optimized ensembles significantly outperform both penalized-likelihood techniques and naive-transformation-based ensembles in balancing statistical robustness with biological interpretability. The study concludes that the proposed frameworks provide a crucial methodological advancement, offering a powerful and reliable tool for extracting meaningful insights from complex count data in fields ranging from genomics to public health. Full article
(This article belongs to the Special Issue Statistics for High-Dimensional Data)
Show Figures

Figure 1

23 pages, 575 KB  
Article
A Comparison of the Robust Zero-Inflated and Hurdle Models with an Application to Maternal Mortality
by Phelo Pitsha, Raymond T. Chiruka and Chioneso S. Marange
Math. Comput. Appl. 2025, 30(5), 95; https://doi.org/10.3390/mca30050095 - 2 Sep 2025
Cited by 1 | Viewed by 2551
Abstract
This study evaluates the performance of count regression models in the presence of zero inflation, outliers, and overdispersion using both simulated and real-world maternal mortality dataset. Traditional Poisson and negative binomial regression models often struggle to account for the complexities introduced by excess [...] Read more.
This study evaluates the performance of count regression models in the presence of zero inflation, outliers, and overdispersion using both simulated and real-world maternal mortality dataset. Traditional Poisson and negative binomial regression models often struggle to account for the complexities introduced by excess zeros and outliers. To address these limitations, this study compares the performance of robust zero-inflated (RZI) and robust hurdle (RH) models against conventional models using the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) to determine the best-fitting model. Results indicate that the robust zero-inflated Poisson (RZIP) model performs best overall. The simulation study considers various scenarios, including different levels of zero inflation (50%, 70%, and 80%), outlier proportions (0%, 5%, 10%, and 15%), dispersion values (1, 3, and 5), and sample sizes (50, 200, and 500). Based on AIC comparisons, the robust zero-inflated Poisson (RZIP) and robust hurdle Poisson (RHP) models demonstrate superior performance when outliers are absent or limited to 5%, particularly when dispersion is low (5). However, as outlier levels and dispersion increase, the robust zero-inflated negative binomial (RZINB) and robust hurdle negative binomial (RHNB) models outperform robust zero-inflated Poisson (RZIP) and robust hurdle Poisson (RHP) across all levels of zero inflation and sample sizes considered in the study. Full article
Show Figures

Figure 1

15 pages, 358 KB  
Article
Multi-Task CNN-LSTM Modeling of Zero-Inflated Count and Time-to-Event Outcomes for Causal Inference with Functional Representation of Features
by Jong-Min Kim
Axioms 2025, 14(8), 626; https://doi.org/10.3390/axioms14080626 - 11 Aug 2025
Cited by 1 | Viewed by 1396
Abstract
We propose a novel deep learning framework for counterfactual inference on the COMPAS dataset, utilizing a multi-task CNN-LSTM architecture. The model jointly predicts multiple outcome types: (i) count outcomes with zero inflation, modeled using zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), and negative [...] Read more.
We propose a novel deep learning framework for counterfactual inference on the COMPAS dataset, utilizing a multi-task CNN-LSTM architecture. The model jointly predicts multiple outcome types: (i) count outcomes with zero inflation, modeled using zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), and negative binomial (NB) distributions; (ii) time-to-event outcomes, modeled via the Cox proportional hazards model. To effectively leverage the structure in high-dimensional tabular data, we integrate functional data analysis (FDA) techniques by transforming covariates into smooth functional representations using B-spline basis expansions. Specifically, we construct a pseudo-temporal index over predictor variables and fit basis expansions to each subject’s feature vector, yielding a low-dimensional set of coefficients that preserve smooth variation while reducing noise. This functional representation enables the CNN-LSTM model to capture both local and global temporal patterns in the data, including treatment-covariate interactions. Our approach estimates both population-average and individual-level treatment effects (ATE and CATE) for each outcome and evaluates predictive performance using metrics such as Poisson deviance, root mean squared error (RMSE), and the concordance index (C-index). Statistical inference on treatment effects is supported via bootstrap-based confidence intervals and hypothesis testing. Overall, this comprehensive framework facilitates flexible modeling of heterogeneous treatment effects in structured, high-dimensional data, advancing causal inference methodologies in criminal justice and related domains. Full article
(This article belongs to the Special Issue Functional Data Analysis and Its Application)
Show Figures

Figure 1

Back to TopTop