MDPI - Publisher of Open Access Journals

23 pages, 1052 KB

Open AccessArticle

Technology Analysis of Extended Reality Using Machine Learning and Statistical Models

by Sunghae Jun

Virtual Worlds 2026, 5(2), 19; https://doi.org/10.3390/virtualworlds5020019 - 20 Apr 2026

Viewed by 400

Extended reality (XR), encompassing augmented reality (AR), virtual reality (VR), and mixed reality (MR), is a key enabling technology for virtual worlds, and XR-related patents continue to grow rapidly. However, patent-based XR technology analysis faces a fundamental challenge: document–keyword matrix (DKM) built from [...] Read more.

Extended reality (XR), encompassing augmented reality (AR), virtual reality (VR), and mixed reality (MR), is a key enabling technology for virtual worlds, and XR-related patents continue to grow rapidly. However, patent-based XR technology analysis faces a fundamental challenge: document–keyword matrix (DKM) built from patent titles and abstracts are typically high dimensional, sparse, and often exhibit excess zeros, which can distort inference when conventional text mining pipelines are applied without a generative count perspective. In this study, we propose a statistically grounded XR technology analysis framework that combines likelihood-based count modeling with interpretable structure mining to map XR sub-technologies from a patent DKM. Using an XR patent–keyword matrix, we fit Poisson regression (PR), negative binomial regression (NBR), and zero-inflated negative binomial regression (ZINBR) models via maximum likelihood estimation (MLE), controlling for document-length effects. Model selection by Akaike information criterion (AIC) consistently favored NBR for both target keywords, indicating substantial overdispersion in XR patent counts. We interpret exponentiated coefficients as incidence rate ratios (IRRs) and construct a technology relatedness network from significant IRR edges, revealing a dual-axis XR structure: reality is anchored in an AR or VR experience and content axis such as virtual and augment, whereas extend is embedded in a structure and integration axis for example, surface, edge, layer, and connectivity-related terms. To show how the proposed method can be applied to real domains, we searched the XR patent documents, and analyzed them for XR technology analysis. Full article

► Show Figures

Figure 1

21 pages, 855 KB

Open AccessArticle

Global Market Shocks and Food Riots: The Impact of Energy Prices, Biofuels, and Financial Speculation in Africa

by Tetsuji Tanaka and Jin Guo

Sustainability 2026, 18(6), 2959; https://doi.org/10.3390/su18062959 - 17 Mar 2026

Viewed by 716

Abstract

Even though the existing literature has elucidated the domestic causes of riots and the links between global food prices and riots, the relationship between riots and various external factors, such as biofuel production, global crude oil prices, speculation, and the US dollar exchange [...] Read more.

Even though the existing literature has elucidated the domestic causes of riots and the links between global food prices and riots, the relationship between riots and various external factors, such as biofuel production, global crude oil prices, speculation, and the US dollar exchange rate, has yet to be fully analyzed. This study aimed to fill this research gap by examining the associations of these external factors on the occurrence of riots in Africa using the Poisson, negative binomial, and zero-inflated negative binomial models. Our key findings are as follows: (1) U.S. ethanol production and international crude oil prices are positively associated with riot frequency, whereas U.S. biodiesel production is not statistically significant. (2) A higher long share relative to open interest increases riot incidence, while a higher short share reduces it. (3) Both international food prices and African domestic food prices exhibit positive and statistically significant associations with riots. (4) Appreciation of the U.S. dollar is negatively correlated with food riots. Overall, the findings suggest that global energy, financial, monetary, and food price dynamics are systematically linked to food riots in Africa. Full article

(This article belongs to the Special Issue Sustainable Development and Climate, Energy, and Food Security Nexus)

► Show Figures

Figure 1

18 pages, 1357 KB

Open AccessArticle

Zero-Inflated Data Analysis Using Graph Neural Networks with Convolution

by Sunghae Jun

Computers 2026, 15(2), 104; https://doi.org/10.3390/computers15020104 - 2 Feb 2026

Viewed by 891

Abstract

Zero-inflated count data are characterized by an excessive frequency of zeros that cannot be adequately analyzed by a single distribution, such as Poisson or negative binomial. This problem is pervasive in many practical applications, including document–keyword matrix derived from text corpora, where most [...] Read more.

Zero-inflated count data are characterized by an excessive frequency of zeros that cannot be adequately analyzed by a single distribution, such as Poisson or negative binomial. This problem is pervasive in many practical applications, including document–keyword matrix derived from text corpora, where most keyword frequencies are zero. Conventional statistical approaches, such as the zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models, explicitly separate a structural zero component from a count component, but they typically assume independent observations and can be unstable when covariates are high-dimensional and sparse. To address these limitations, this paper proposes a graph-based zero-inflated learning framework that combines simple graph convolution (SGC) with zero-inflated count regression heads such as ZIP and ZINB. We first construct an observation graph by connecting similar samples, and then apply SGC to propagate and smooth features over the graph, producing convolutional representations that incorporate neighborhood information while remaining computationally lightweight. The resulting representations are used as covariates in ZIP and ZINB heads, which preserve probabilistic interpretability through maximum likelihood learning. Our experiments on simulated zero-inflated datasets with controlled zero ratios demonstrate that the proposed ZIP+SGC and ZINB+SGC consistently reduce prediction errors compared with their non-graph baselines, as measured by mean absolute error and root mean squared error. Overall, the proposed approach provides an efficient and interpretable way to integrate graph neural computation with zero-inflated modeling for sparse count prediction problems. Full article

► Show Figures

Figure 1

16 pages, 336 KB

Open AccessArticle

Bayesian Neural Networks with Regularization for Sparse Zero-Inflated Data Modeling

by Sunghae Jun

Information 2026, 17(1), 81; https://doi.org/10.3390/info17010081 - 13 Jan 2026

Viewed by 902

Abstract

Zero inflation is pervasive across text mining, event log, and sensor analytics, and it often degrades the predictive performance of analytical models. Classical approaches, most notably the zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models, address excess zeros but rely on rigid [...] Read more.

Zero inflation is pervasive across text mining, event log, and sensor analytics, and it often degrades the predictive performance of analytical models. Classical approaches, most notably the zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models, address excess zeros but rely on rigid parametric assumptions and fixed model structures, which can limit flexibility in high-dimensional, sparse settings. We propose a Bayesian neural network (BNN) with regularization for sparse zero-inflated data modeling. The method separately parameterizes the zero inflation probability and the count intensity under ZIP/ZINB likelihoods, while employing Bayesian regularization to induce sparsity and control overfitting. Posterior inference is performed using variational inference. We evaluate the approach through controlled simulations with varying zero ratios and a real-world dataset, and we compare it against Poisson generalized linear models, ZIP, and ZINB baselines. The present study focuses on predictive performance measured by mean squared error (MSE). Across all settings, the proposed method achieves consistently lower prediction error and improved uncertainty problems, with ablation studies confirming the contribution of the regularization components. These results demonstrate that a regularized BNN provides a flexible and robust framework for sparse zero-inflated data analysis in information-rich environments. Full article

(This article belongs to the Special Issue Feature Papers in Information in 2024–2025)

► Show Figures

Graphical abstract

16 pages, 2700 KB

Open AccessArticle

Spatio-Temporal Distribution of Setipinna taty Resources Using a Zero-Inflated Model in the Offshore Waters of Southern Zhejiang, China

by Xiaoxue Liu, Wen Ma, Jin Ma, Chunxia Gao, Weifeng Chen and Jing Zhao

J. Mar. Sci. Eng. 2026, 14(1), 96; https://doi.org/10.3390/jmse14010096 - 3 Jan 2026

Viewed by 558

Abstract

Effective fishery management in coastal waters requires accurate assessments of species–environment relationships, particularly in data-rich but zero-inflated contexts (i.e., datasets with an excess of zero catches). Here, we used fishery-independent trawl survey data collected from 2018 to 2019 in the offshore waters of [...] Read more.

Effective fishery management in coastal waters requires accurate assessments of species–environment relationships, particularly in data-rich but zero-inflated contexts (i.e., datasets with an excess of zero catches). Here, we used fishery-independent trawl survey data collected from 2018 to 2019 in the offshore waters of southern Zhejiang Province of China to investigate the spatio-temporal distribution of Setipinna taty (scaly hairfin anchovy) and its environmental determinants. Given the high frequency of zero catches, we fitted both zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models and selected the best-performing approach using the Akaike information criterion (AIC). Cross-validation indicated that the ZINB model (RMSE: 199.1, R²; 0.25) outperformed ZIP model (RMSE: 239.4, R²; 0.23). Temperature, depth, and salinity were key predictors of S. taty abundance, which generally occurred at depths of 20–40 m and salinities of 26–34 psu. We then applied the optimal ZINB model to predict S. taty distributions in spring, summer, and autumn of 2020. The predictions indicated a summer peak in abundance and a nearshore-to-offshore decreasing gradient, and were broadly consistent with the spatial distribution trends observed in the 2020 survey data. The highest predicted densities were located in nearshore areas off Wenzhou and Taizhou, west of 122° E. By clarifying the key environmental factors shaping S. taty distribution and applying zero-inflated count models to account for an excess of zero catches, which occur more frequently than expected under standard negative binomial models, this study provides an improved basis for effective conservation and sustainable utilization of S. taty resources in the southern offshore waters of Zhejiang; nevertheless, predictive performance could be further improved by incorporating additional environmental and biotic covariates together with extended spatio-temporal data. Full article

(This article belongs to the Section Marine Ecology)

► Show Figures

Figure 1

25 pages, 2764 KB

Open AccessArticle

Integrated Quality Inspection and Production Run Optimization for Imperfect Production Systems with Zero-Inflated Non-Homogeneous Poisson Deterioration

by Chih-Chiang Fang and Ming-Nan Chen

Mathematics 2025, 13(24), 3901; https://doi.org/10.3390/math13243901 - 5 Dec 2025

Cited by 1 | Viewed by 667

Abstract

This study develops an integrated quality inspection and production optimization framework for an imperfect production system, where system deterioration follows a zero-inflated non-homogeneous Poisson process (ZI-NHPP) characterized by a power-law intensity function. Parameters are estimated from historical data using the Expectation-Maximization (EM) algorithm, [...] Read more.

This study develops an integrated quality inspection and production optimization framework for an imperfect production system, where system deterioration follows a zero-inflated non-homogeneous Poisson process (ZI-NHPP) characterized by a power-law intensity function. Parameters are estimated from historical data using the Expectation-Maximization (EM) algorithm, with a zero-inflation parameter π modeling scenario where the system remains defect-free. Operating in either an in-control or out-of-control state, the system produces products with Weibull hazard rates, exhibiting higher failure rates in the out-of-control state. The proposed model integrates system status, defect rates, employee efficiency, and market demand to jointly optimize the number of conforming items inspected and the production run length, thereby minimizing total costs—including production, inspection, correction, inventory, and warranty expenses. Numerical analyses, supported by sensitivity studies, validate the effectiveness of this integrated approach in achieving cost-efficient quality control. This framework enhances quality assurance and production management, offering practical insights for manufacturing across diverse industries. Full article

(This article belongs to the Section C: Mathematical Analysis)

► Show Figures

Figure 1

16 pages, 522 KB

Open AccessArticle

Zero-Inflated Text Data Analysis Using Imbalanced Data Sampling and Statistical Models

by Sunghae Jun

Computers 2025, 14(12), 527; https://doi.org/10.3390/computers14120527 - 2 Dec 2025

Viewed by 921

Abstract

Text data often exhibits high sparsity and zero inflation, where a substantial proportion of entries in the document–keyword matrix are zeros. This characteristic presents challenges to traditional count-based models, which may suffer from reduced predictive accuracy and interpretability in the presence of excessive [...] Read more.

Text data often exhibits high sparsity and zero inflation, where a substantial proportion of entries in the document–keyword matrix are zeros. This characteristic presents challenges to traditional count-based models, which may suffer from reduced predictive accuracy and interpretability in the presence of excessive zeros and overdispersion. To overcome this issue, we propose an effective analytical framework that integrates imbalanced data handling by undersampling with classical probabilistic count models. Specifically, we apply Poisson’s generalized linear models, zero-inflated Poisson, and zero-inflated negative binomial models to analyze zero-inflated text data while preserving the statistical interpretability of term-level counts. The framework is evaluated using both real-world patent documents and simulated datasets. Empirical results demonstrate that our undersampling-based approach improves the model fit without modifying the downstream models. This study contributes a practical preprocessing strategy for enhancing zero-inflated text analysis and offers insights into model selection and data balancing techniques for sparse count data. Full article

► Show Figures

Graphical abstract

21 pages, 1332 KB

Open AccessArticle

The Ridge-Hurdle Negative Binomial Regression Model: A Novel Solution for Zero-Inflated Counts in the Presence of Multicollinearity

by HM Nayem and B. M. Golam Kibria

Stats 2025, 8(4), 102; https://doi.org/10.3390/stats8040102 - 1 Nov 2025

Cited by 1 | Viewed by 2300

Abstract

Datasets with many zero outcomes are common in real-world studies and often exhibit overdispersion and strong correlations among predictors, creating challenges for standard count models. Traditional approaches such as the Zero-Inflated Poisson (ZIP), Zero-Inflated Negative Binomial (ZINB), and Hurdle models can handle extra [...] Read more.

Datasets with many zero outcomes are common in real-world studies and often exhibit overdispersion and strong correlations among predictors, creating challenges for standard count models. Traditional approaches such as the Zero-Inflated Poisson (ZIP), Zero-Inflated Negative Binomial (ZINB), and Hurdle models can handle extra zeros and overdispersion but struggle when multicollinearity is present. This study introduces the Ridge-Hurdle Negative Binomial model, which incorporates L₂ regularization into the truncated count component of the hurdle framework to jointly address zero inflation, overdispersion, and multicollinearity. Monte Carlo simulations under varying sample sizes, predictor correlations, and levels of overdispersion and zero inflation show that Ridge-Hurdle NB consistently achieves the lowest mean squared error (MSE) compared to ZIP, ZINB, Hurdle Poisson, Hurdle Negative Binomial, Ridge ZIP, and Ridge ZINB models. Applications to the Wildlife Fish and Medical Care datasets further confirm its superior predictive performance, highlighting RHNB as a robust and efficient solution for complex count data modeling. Full article

(This article belongs to the Section Statistical Methods)

► Show Figures

Figure 1

17 pages, 3465 KB

Open AccessArticle

Longitudinal Gut Microbiome Changes Associated with Transitions from C. difficile Negative to C. difficile Positive on Surveillance Tests

by L. Silvia Munoz-Price, Samantha N. Atkinson, Vy Lam, Blake Buchan, Nathan Ledeboer, Nita H. Salzman and Amy Y. Pan

Microorganisms 2025, 13(10), 2277; https://doi.org/10.3390/microorganisms13102277 - 29 Sep 2025

Viewed by 1093

Abstract

Clostridioides difficile is an obligate anaerobe and is primarily transmitted via the fecal–oral route. Data characterizing the microbiome changes accompanying transitions from non-colonized to C. difficile colonized subjects are currently lacking. In this retrospective cohort study, we examined 16S rRNA gene sequencing data [...] Read more.

Clostridioides difficile is an obligate anaerobe and is primarily transmitted via the fecal–oral route. Data characterizing the microbiome changes accompanying transitions from non-colonized to C. difficile colonized subjects are currently lacking. In this retrospective cohort study, we examined 16S rRNA gene sequencing data in a total of 481 fecal samples belonging to 107 patients. Based on C. difficile status over time, patients were categorized as Negative-to-Positive, Negative Control, and Positive Control. A linear mixed effects model was fitted to investigate the changes in the Shannon α-diversity index over time. Zero-inflated negative binomial/Poisson mixed effects models or generalized linear mixed models with negative binomial/Poisson distribution were used to investigate the changes in taxon counts over time among different groups. A total of 107 patients were eligible for the study. The median number of stool samples per patient was 3 (IQR 2–4). A total of 42 patients transitioned from C. difficile negative to positive (Negative-to-Positive), 47 patients remained negative throughout their tests (Negative Control) and 18 were always C. difficile positive (Positive Control). A significant difference in microbiome composition between the last negative samples and the first positive samples were shown in Negative-to-Positive patients, ANOSIM p = 0.022. In Negative-to-Positive patients, the phylum Pseudomonadota and family Enterobacteriaceae increased significantly in the first positive samples compared to the last negative samples, p = 0.0075 and p = 0.0094, respectively. Within the first 21 days, Actinomycetota decreased significantly over time in the Positive Control group compared to the other two groups (p < 0.001) while Bacillota decreased in both the Negative-to-Positive group and Positive Control. These results demonstrate that the transition from C. difficile negative to C. difficile positive is associated with alterations in gut microbial communities and their compositional patterns over time. Moreover, these changes play an important role in both the emergence and intensification of the gut microbiome dysbiosis in patients who transitioned from C. difficile negative to positive and those who always tested positive. Full article

(This article belongs to the Special Issue The Microbiome in Ecosystems)

► Show Figures

Figure 1

32 pages, 1288 KB

Open AccessArticle

Random Forest Adaptation for High-Dimensional Count Regression

by Oyebayo Ridwan Olaniran, Saidat Fehintola Olaniran, Ali Rashash R. Alzahrani, Nada MohammedSaeed Alharbi and Asma Ahmad Alzahrani

Mathematics 2025, 13(18), 3041; https://doi.org/10.3390/math13183041 - 21 Sep 2025

Cited by 3 | Viewed by 2017

Abstract

The analysis of high-dimensional count data presents a unique set of challenges, including overdispersion, zero-inflation, and complex nonlinear relationships that traditional generalized linear models and standard machine learning approaches often fail to adequately address. This study introduces and validates a novel Random Forest [...] Read more.

The analysis of high-dimensional count data presents a unique set of challenges, including overdispersion, zero-inflation, and complex nonlinear relationships that traditional generalized linear models and standard machine learning approaches often fail to adequately address. This study introduces and validates a novel Random Forest framework specifically developed for high-dimensional Poisson and Negative Binomial regression, designed to overcome the limitations of existing methods. Through comprehensive simulations and a real-world genomic application to the Norwegian Mother and Child Cohort Study, we demonstrate that the proposed methods achieve superior predictive accuracy, quantified by lower root mean squared error and deviance, and critically produced exceptionally stable and interpretable feature selections. Our theoretical and empirical results show that these distribution-optimized ensembles significantly outperform both penalized-likelihood techniques and naive-transformation-based ensembles in balancing statistical robustness with biological interpretability. The study concludes that the proposed frameworks provide a crucial methodological advancement, offering a powerful and reliable tool for extracting meaningful insights from complex count data in fields ranging from genomics to public health. Full article

(This article belongs to the Special Issue Statistics for High-Dimensional Data)

► Show Figures

Figure 1

23 pages, 575 KB

Open AccessArticle

A Comparison of the Robust Zero-Inflated and Hurdle Models with an Application to Maternal Mortality

by Phelo Pitsha, Raymond T. Chiruka and Chioneso S. Marange

Math. Comput. Appl. 2025, 30(5), 95; https://doi.org/10.3390/mca30050095 - 2 Sep 2025

Cited by 2 | Viewed by 3321

Abstract

This study evaluates the performance of count regression models in the presence of zero inflation, outliers, and overdispersion using both simulated and real-world maternal mortality dataset. Traditional Poisson and negative binomial regression models often struggle to account for the complexities introduced by excess [...] Read more.

This study evaluates the performance of count regression models in the presence of zero inflation, outliers, and overdispersion using both simulated and real-world maternal mortality dataset. Traditional Poisson and negative binomial regression models often struggle to account for the complexities introduced by excess zeros and outliers. To address these limitations, this study compares the performance of robust zero-inflated (RZI) and robust hurdle (RH) models against conventional models using the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) to determine the best-fitting model. Results indicate that the robust zero-inflated Poisson (RZIP) model performs best overall. The simulation study considers various scenarios, including different levels of zero inflation (50%, 70%, and 80%), outlier proportions (0%, 5%, 10%, and 15%), dispersion values (1, 3, and 5), and sample sizes (50, 200, and 500). Based on AIC comparisons, the robust zero-inflated Poisson (RZIP) and robust hurdle Poisson (RHP) models demonstrate superior performance when outliers are absent or limited to 5%, particularly when dispersion is low (5). However, as outlier levels and dispersion increase, the robust zero-inflated negative binomial (RZINB) and robust hurdle negative binomial (RHNB) models outperform robust zero-inflated Poisson (RZIP) and robust hurdle Poisson (RHP) across all levels of zero inflation and sample sizes considered in the study. Full article

► Show Figures

Figure 1

19 pages, 509 KB

Open AccessArticle

Zero-Inflated Distributions of Lifetime Reproductive Output

by Hal Caswell

Populations 2025, 1(3), 19; https://doi.org/10.3390/populations1030019 - 23 Aug 2025

Viewed by 1654

Abstract

Lifetime reproductive output (LRO), also called lifetime reproductive success (LRS) is often described by its mean (total fertility rate or net reproductive rate), but it is in fact highly variable among individuals and often positively skewed. Several approaches exist to calculating the variance [...] Read more.

Lifetime reproductive output (LRO), also called lifetime reproductive success (LRS) is often described by its mean (total fertility rate or net reproductive rate), but it is in fact highly variable among individuals and often positively skewed. Several approaches exist to calculating the variance and skewness of LRO. These studies have noted that a major factor contributing to skewness is the fraction of the population that dies before reaching a reproductive age or stage. The existence of that fraction means that LRO has a zero-inflated distribution. This paper shows how to calculate that fraction and to fit a zero-inflated Poisson or zero-inflated negative binomial distribution to the LRO. We present a series of applications to populations before and after demographic transitions, to populations with particularly high probabilities of death before reproduction, and a couple of large mammal populations for good measure. The zero-inflated distribution also provides extinction probabilities from a Galton-Watson branching process. We compare the zero-inflated analysis with a recently developed analysis using convolution methods that provides exact distributions of LRO. The agreement is strikingly good. Full article

► Show Figures

Figure 1

15 pages, 358 KB

Open AccessFeature PaperArticle

Multi-Task CNN-LSTM Modeling of Zero-Inflated Count and Time-to-Event Outcomes for Causal Inference with Functional Representation of Features

by Jong-Min Kim

Axioms 2025, 14(8), 626; https://doi.org/10.3390/axioms14080626 - 11 Aug 2025

Cited by 1 | Viewed by 1758

Abstract

We propose a novel deep learning framework for counterfactual inference on the COMPAS dataset, utilizing a multi-task CNN-LSTM architecture. The model jointly predicts multiple outcome types: (i) count outcomes with zero inflation, modeled using zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), and negative [...] Read more.

We propose a novel deep learning framework for counterfactual inference on the COMPAS dataset, utilizing a multi-task CNN-LSTM architecture. The model jointly predicts multiple outcome types: (i) count outcomes with zero inflation, modeled using zero-inflated Poisson (ZIP), zero-inflated negative binomial (ZINB), and negative binomial (NB) distributions; (ii) time-to-event outcomes, modeled via the Cox proportional hazards model. To effectively leverage the structure in high-dimensional tabular data, we integrate functional data analysis (FDA) techniques by transforming covariates into smooth functional representations using B-spline basis expansions. Specifically, we construct a pseudo-temporal index over predictor variables and fit basis expansions to each subject’s feature vector, yielding a low-dimensional set of coefficients that preserve smooth variation while reducing noise. This functional representation enables the CNN-LSTM model to capture both local and global temporal patterns in the data, including treatment-covariate interactions. Our approach estimates both population-average and individual-level treatment effects (ATE and CATE) for each outcome and evaluates predictive performance using metrics such as Poisson deviance, root mean squared error (RMSE), and the concordance index (C-index). Statistical inference on treatment effects is supported via bootstrap-based confidence intervals and hypothesis testing. Overall, this comprehensive framework facilitates flexible modeling of heterogeneous treatment effects in structured, high-dimensional data, advancing causal inference methodologies in criminal justice and related domains. Full article

(This article belongs to the Special Issue Functional Data Analysis and Its Application)

► Show Figures

Figure 1

17 pages, 343 KB

Open AccessArticle

On the Conflation of Poisson and Logarithmic Distributions with Applications

by Abdulhamid A. Alzaid, Anfal A. Alqefari and Najla Qarmalah

Axioms 2025, 14(7), 518; https://doi.org/10.3390/axioms14070518 - 6 Jul 2025

Cited by 1 | Viewed by 1133

Abstract

It is frequent for real-life count data to show inflation in lower values; however, most of the well-known count distributions cannot capture such a feature. The present paper introduces a new distribution for modeling inflated count data in small values based on a [...] Read more.

It is frequent for real-life count data to show inflation in lower values; however, most of the well-known count distributions cannot capture such a feature. The present paper introduces a new distribution for modeling inflated count data in small values based on a conflation of distributions approach. The new distribution inherits some properties from Poisson distribution (PD) and logarithmic distribution (LD), making it a powerful modeling tool. It can serve as an alternative to PD, LD, and zero-truncated distributions. The new distribution is worth considering theoretically, as it belongs to the weighted PD family. With zero as a support point, two additional models are suggested for the new distribution. These modifications yield distributions that demonstrate overdispersion models comparable to the negative binomial distribution (NBD) while retaining essential PD properties, making them suitable for accurately representing count data with frequent events of low frequency and high variance. Furthermore, we discuss the superior performance of three new distributions in modeling real count data compared to traditional count distributions such as PD and NBD, as well as other discrete distributions. This paper examines the key statistical properties of the proposed distributions. A comparison of the novel and other distributions in the literature is shown employing real-life data from some domains. All of the computations shown in this study are generated using the R programming language. Full article

(This article belongs to the Special Issue Advances in the Theory and Applications of Statistical Distributions)

► Show Figures

Figure 1

11 pages, 1524 KB

Open AccessArticle

scQTLtools: An R/Bioconductor Package for Comprehensive Identification and Visualization of Single-Cell eQTLs

by Xiaofeng Wu, Xin Huang, Pinjing Chen, Jingtong Kang, Jin Yang, Zhanpeng Huang and Siwen Xu

Biology 2025, 14(7), 743; https://doi.org/10.3390/biology14070743 - 23 Jun 2025

Viewed by 1605

Abstract

Single-cell RNA sequencing (scRNA-seq) enables expression quantitative trait locus (eQTL) analysis at cellular resolution, offering new opportunities to uncover regulatory variants with cell-type-specific effects. However, existing tools are often limited in functionality, input compatibility, or scalability for sparse single-cell data. To address these [...] Read more.

Single-cell RNA sequencing (scRNA-seq) enables expression quantitative trait locus (eQTL) analysis at cellular resolution, offering new opportunities to uncover regulatory variants with cell-type-specific effects. However, existing tools are often limited in functionality, input compatibility, or scalability for sparse single-cell data. To address these challenges, we developed scQTLtools, a comprehensive R/Bioconductor package that facilitates end-to-end single-cell eQTL analysis, from preprocessing to visualization. The toolkit supports flexible input formats, including Seurat and SingleCellExperiment objects, handles both binary and three-class genotype encodings, and provides dedicated functions for gene expression normalization, SNP and gene filtering, eQTL mapping, and versatile result visualization. To accommodate diverse data characteristics, scQTLtools implements three statistical models—linear regression, Poisson regression, and zero-inflated negative binomial regression. We applied scQTLtools to scRNA-seq data from human acute myeloid leukemia and identified eQTLs with regulatory effects that varied across cell types. Visualization of SNP–gene pairs revealed both positive and negative associations between genotype and gene expression. These results demonstrate the ability of scQTLtools to uncover cell-type-specific regulatory variation that is often missed by bulk eQTL analyses. Currently, scQTLtools supports cis-eQTL mapping; future development will extend to include trans-eQTL detection. Overall, scQTLtools offers a robust, flexible, and user-friendly framework for dissecting genotype–expression relationships in heterogeneous cellular populations. Full article

(This article belongs to the Special Issue Unraveling the Influence of Genetic Variants on Gene Regulation)

► Show Figures

Graphical abstract

Search Results (35)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (35)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI