Bayesian Neural Networks with Regularization for Sparse Zero-Inflated Data Modeling

Jun, Sunghae

doi:10.3390/info17010081

Open AccessArticle

Bayesian Neural Networks with Regularization for Sparse Zero-Inflated Data Modeling

by

Sunghae Jun

Department of Data Science, Cheongju University, Cheongju-si 28503, Republic of Korea

Information 2026, 17(1), 81; https://doi.org/10.3390/info17010081

Submission received: 30 November 2025 / Revised: 30 December 2025 / Accepted: 11 January 2026 / Published: 13 January 2026

(This article belongs to the Special Issue Feature Papers in Information in 2024–2025)

Download

Browse Figures

Versions Notes

Abstract

Zero inflation is pervasive across text mining, event log, and sensor analytics, and it often degrades the predictive performance of analytical models. Classical approaches, most notably the zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models, address excess zeros but rely on rigid parametric assumptions and fixed model structures, which can limit flexibility in high-dimensional, sparse settings. We propose a Bayesian neural network (BNN) with regularization for sparse zero-inflated data modeling. The method separately parameterizes the zero inflation probability and the count intensity under ZIP/ZINB likelihoods, while employing Bayesian regularization to induce sparsity and control overfitting. Posterior inference is performed using variational inference. We evaluate the approach through controlled simulations with varying zero ratios and a real-world dataset, and we compare it against Poisson generalized linear models, ZIP, and ZINB baselines. The present study focuses on predictive performance measured by mean squared error (MSE). Across all settings, the proposed method achieves consistently lower prediction error and improved uncertainty problems, with ablation studies confirming the contribution of the regularization components. These results demonstrate that a regularized BNN provides a flexible and robust framework for sparse zero-inflated data analysis in information-rich environments.

Keywords:

zero inflation; count data; Bayesian neural network; regularization; sparsity; text mining

Graphical Abstract

1. Introduction

Zero-inflated data refers to count data with excessively many zero values compared to other values in the given data [1,2]. Zero-inflated data also has two components: structural zeros, which arise from unobservable situations, and sampling zeros, which arise from the probability distribution of count data [3,4]. The zero-inflated problem occurs in data collected in experiments and surveys where specific events rarely occur, such as in the insurance field, where most people do not file claims. The problem also occurs in the analysis of text document data. We use preprocessing techniques of text mining to build a document–keyword matrix for the analysis of document data [5,6,7]. The rows and columns of this matrix represent documents and keywords, respectively, and each element represents the frequency of a specific keyword included in each document. Most of the elements in this matrix have zero values. This is because a keyword that appears in only one document among all documents will also form a column in this matrix [6,7]. In this case, only one row in that column will have a non-zero observation value, while the remaining rows will all have zero values. In previous studies, we have confirmed that the zero-inflated problem is a major cause of poor performance in analytical models [2,6]. Therefore, various studies have been conducted to solve this problem [2,6,7,8,9,10,11,12,13]. However, previous studies have shown that when the zero ratio becomes extremely large, there is a limit to improving model performance. In this paper, we propose a method for zero-inflated data analysis to overcome the zero-inflated problem. We use Bayesian neural networks and regularization in our proposed method.

The scope of this study focuses on addressing the limitations of traditional approaches for analyzing zero-inflated data, particularly in situations where zero counts are dominant and conventional models fail to generalize well. We focus on count data settings with high zero inflation, including document–keyword matrices in text analysis, where sparsity can degrade both interpretability and prediction accuracy of analytical models. To improve model robustness and avoid overfitting in high-dimensional sparse data, we incorporate a Bayesian neural network with regularization (BNNR) into the modeling framework. This allows for probabilistic uncertainty quantification and shrinkage of irrelevant parameters, which is crucial in sparse and noisy data environments [1,3,4]. In addition, we adopt a simulation-based comparative framework to evaluate the performance of the proposed method against conventional models such as Poisson regression, zero-inflated Poisson (ZIP), negative binomial, zero-inflated negative binomial (ZINB), and regular neural network regressions, including Bayesian regularization neural network [14,15,16,17]. Both synthetic and real-world datasets are analyzed to illustrate model generalization and the benefits of incorporating Bayesian regularization in zero-inflated data contexts. In this work, our primary focus is predictive accuracy for sparse zero-inflated counts. Accordingly, we evaluate the proposed BNNR mainly using mean squared error against widely used classical baselines such as the Poisson generalized linear model (PGLM), ZIP, and ZINB.

This paper is organized as follows. Section 2 reviews the theoretical and methodological background, including classical zero-inflated models and Bayesian inference principles. Section 3 introduces our proposed methodology, presenting a Bayesian neural network with regularization designed specifically for zero-inflated data. Section 4 provides extensive experimental results from simulated and real datasets, comparing the performance of the proposed model with baseline methods. Finally, Section 5 concludes with key findings, limitations, and possible directions for future research.

2. Research Background

2.1. Zero-Inflated Count Models

Zero-inflated data is the count data with an excessively high proportion of zeros in the given data. Although zero-inflated data contain many zeros, they are fundamentally count outcomes. We therefore begin with the PGLM as a canonical baseline. This model is simple, widely used, and provides an interpretable reference for subsequent methods. At the same time, we make explicit at the outset that PGLM does not include a structural zero component and relies on the characteristic of variance equals the mean. Consequently, under a high zero ratio and overdispersion, the PGLM typically underfits the mass at zero and can yield biased or overly confident predictions. This motivates the progression to zero-inflated models and to our BNNR, which explicitly models the zero mechanism. The PGLM is expressed as follows [4,18,19,20].

l n (μ) = β_{0} + \sum_{j = 1}^{p} β_{j} x_{j}

(1)

In Equation (1),

μ

is the parameter of the Poisson distribution.

β_{0} + \sum_{j = 1}^{p} β_{j} x_{j}

is the linear predictor with model parameters

β

and input variables

x

.

l n (\cdot)

is the link function of the PGLM. If the mean and variance of the target variable

Y

, which follows a Poisson distribution, are not equal, we use the negative binomial distribution instead of the Poisson distribution [4]. Zero-inflated models are specialized statistical frameworks developed to handle count data characterized by an excessive number of zero observations, which traditional count models, such as the Poisson or negative binomial distributions, are often inadequate to model effectively [3,11]. Among the most widely used zero-inflated models are the zero-inflated Poisson (ZIP) and zero-inflated negative binomial (ZINB) models. The ZIP model assumes that the data-generating process is a mixture of two components: a Bernoulli process responsible for producing structural zeros and a Poisson process that accounts for sampling zeros and positive counts [4]. The ZINB model extends the ZIP framework by replacing the Poisson distribution with a negative binomial distribution, thereby accommodating overdispersion, the condition where the variance exceeds the mean, which is a common phenomenon in real-world count data. Both ZIP and ZINB models adopt a two-part modeling structure. The first component is a Bernoulli model that estimates the probability of an observation being a structural zero. The second component is a count model, such as Poisson or negative binomial, that models the distribution of non-zero counts. This modular structure enables the models to flexibly capture the dual sources of zero inflation present in many practical datasets. Therefore, the zero-inflated model is expressed in two components as follows [1].

P (Y = y) = \{\begin{matrix} π + (1 - π) f (0; θ) & , y = 0 \\ (1 - π) f (y; θ) & , y > 0 \end{matrix}

(2)

In Equation (2),

π

is the probability of inflated zeros, and

f (y; θ)

is the probability of count distributions such as Poisson and negative binomial. Traditional zero-inflated models have several limitations [2,7]. First, they have a structure that is difficult to adapt to flexible relationships because the two components are fixed in advance, as in the ZIP model, which consists of Bernoulli and Poisson distributions. Second, the possibility of overfitting increases as the number of variables increases. In particular, in cases where most of the values are zero, such as in document–term matrices used in text data analysis, the performance of existing zero-inflated model deteriorates. In actual problems, it is difficult to strictly distinguish between structural zero and sampling zero, making the interpretation of the model difficult. To overcome these difficulties, research has also been conducted on zero-inflated models using Bayesian inference [8,9,10,12,13]. We therefore propose a zero-inflated data analysis method that combines regularization with Bayesian neural networks to settle these problems. We now treat the zero ratio as a continuous descriptive characteristic of sparsity rather than adopting a hard threshold. We report the observed zero ratios and use them to contextualize performance across sparsity levels.

2.2. Bayesian Inference and Learning

Bayesian inference provides a statistical framework for learning from data by combining prior information with observed data. It is grounded in Bayes’ theorem, which updates the prior distribution of parameters into a posterior distribution conditional on the observed data. Formally, given data

D

and parameters

θ

, the posterior distribution is expressed as follows [21,22,23,24].

f (θ| D) = \frac{f (θ) f (D | θ)}{f (D)}

(3)

In Equation (3),

f (θ)

and

f (θ| D)

are the prior and posterior density functions of

θ

.

f (D | θ)

is the likelihood function of the data.

f (D)

is the marginal density function of

D

and is defined as follows [25,26].

f (D) = \int_{S} f (D| θ) f (θ) d θ

(4)

In Equation (4),

S

is the parameter space of

θ

. Bayesian inference provides a principled framework for Bayesian learning from data by combining prior knowledge with observed evidence through Bayes’ theorem in Equation (3). This theorem treats both the parameter and data as random variables, allowing us to propagate uncertainty throughout the modeling [21]. Bayesian learning is a machine learning method that updates the probability of parameters given data using Bayes’ theorem [27]. In Bayesian learning, the updated posterior distribution has a circular structure and is used as the prior distribution in the next learning. From a machine learning perspective, Bayesian learning is defined as an algorithm that approximates the posterior distribution and the posterior predictive distribution as follows [27].

f (y_{n e w}| X_{n e w}) = \int f (y_{n e w}| X_{n e w}, θ) f (θ| D) d θ

(5)

In Equation (5),

X_{n e w}

and

y_{n e w}

are new data points of the explanatory and response variables on

D = (X, y)

. This formulation integrates out parameter uncertainty. In general, this integral calculation is mathematically difficult, so we use methods such as Markov chain Monte Carlo (MCMC) and variational inference. Research on Bayesian neural networks has been actively conducted in various machine learning fields [28,29,30,31]. Modern BNNs rely on approximate posteriors. Foong et al. analyzed the expressiveness of such approximations and showed that widely used families can be too restrictive, potentially missing posterior features that matter for calibrated uncertainty in deep models [28]. Complementarily, Wenzel et al. empirically probed how good deep-net posteriors really are, highlighting multi-modality, sensitivity to optimization, and the practical gap between ideal Bayesian targets and tractable approximations [29]. These studies motivate careful regularization and diagnostics when we deploy BNNs in data-sparse regimes, which is precisely the case in zero-inflated counts. Gal and Ghahramani interpret dropout as approximate Bayesian inference, yielding efficient estimates of epistemic uncertainty during prediction [30]. Wilson and Izmailov framed deep learning generalization from a probabilistic viewpoint such as ensembles, priors, and posterior geometry, emphasizing that posterior shape and inductive bias critically influence generalization performance [31]. Together, these lines of work explain why we use a Bayesian treatment of weights and posterior-aware training objectives when modeling highly sparse count data. Beyond weight decay or prior scale choices, recent work has imposed structural- or knowledge-based regularization directly on BNNs. A notable example is the posterior-regularized BNN (PR-BNN) by Huang et al., which incorporates soft or hard knowledge constraints as posterior regularization and solves the constrained problem efficiently, improving performance and interpretability when data are limited or safety constraints apply [32]. While PR-BNN regularizes the posterior to satisfy external constraints, our method regularizes weights via Bayesian priors and optimizes ZIP or ZINB likelihood tailored to zero-inflated counts. Thus, our contribution is orthogonal and complementary. We design a BNN-based zero-inflated count model, combine it with Bayesian shrinkage, and demonstrate empirical gains on sparse document–keyword data. In addition, research in the machine learning area has investigated regularization strategies to avoid overfitting in Bayesian neural networks and to improve generalization performance [14,17,26].

2.3. Bayesian Approaches and Neural Networks for Zero-Inflated Data

In this section, we discuss existing studies on Bayesian inference and neural network models for zero-inflated analysis. Choi et al. proposed a Bayesian causal structural learning model using zero-inflated Poisson Bayesian networks (ZIPBNs) [33]. They extended Bayesian networks for count data to explicitly accommodate zero inflation. The ZIPBN framework models each conditional distribution as a ZIP regression, establishes identifiability without assuming causal faithfulness, and performs fully Bayesian inference via Markov chain Monte Carlo (MCMC) with parallel tempering to navigate multi-modal posteriors. This enables principled uncertainty quantification for both graph structure and parameters. These ideas highlight the value of explicitly separating the zero mechanism from the count intensity and of using Bayesian inference to address multi-modality and multiplicity—principles that we adopt in the likelihood factorization and posterior approximation of our BNNR. Hamura et al. studied the robust Bayesian modeling of counts with zero inflation and outliers [34]. A complementary Bayesian line introduces rescaled-beta (RSB) mixtures as multiplicative noise on the Poisson rate to absorb structural zeros and extreme outliers. This construction yields efficient Gibbs sampling through conditional conjugacy and provides theoretical posterior robustness: outliers are automatically down-weighted as they move farther from the bulk, and unexplained zeros are ignored with high probability. The RSB mechanism can be viewed as an alternative to explicit ZIP/hurdle modeling and can be integrated into neural outputs to stabilize inference in highly sparse regimes. We leverage this robustness by allowing the BNNR likelihood to employ either a two-component ZIP/ZINB head or an RSB-mixture perturbation on the intensity, depending on the task. Recently, Jose et al. proposed zero-inflated Poisson neural networks (ZIPNNs) for hospital admission counts, combining a zero-inflated likelihood with network architectures and LocalGLMnet for interpretability [35]. Empirically, zero-inflated neural models improved testing loss (NLL) over Poisson-based counterparts and emphasized the need to control population-level bias via bias regularization. These results motivate our use of a two-part likelihood inside a BNN, and regularization to improve generalization.

Beyond inflation in the response variable, zero-inflated explanatory variables can substantially degrade classifier performance. To address this issue, Maldonado et al. proposed modeling zero inflation in the feature space within hybrid Bayesian network classifiers for species occurrence prediction [36]. In particular, a zero-inflated tree-augmented naïve Bayes model improved predictive accuracy by explicitly representing the point mass at zero in covariates and by capturing feature dependencies, thereby underscoring a broader principle: zero-generating mechanisms should be modeled wherever they arise, whether in the response or in the inputs. This perspective supports our design choice to equip the BNNR with feature-side sparsity priors and input-space regularization when covariates are heavily zero-inflated. In Section 2, we reviewed existing studies related to the method proposed in this paper, including zero-inflated count models and Bayesian neural networks. Existing zero-inflated models suffer from performance deterioration as the proportion of zeros in the given data increases. For example, ZIP and ZINB are popular count models for efficiently dealing with zero-inflated data, but these models exhibit degraded performance as the proportion of zeros increases. The proportion of zeros is increasing in big data environments, such as preprocessed text data. Therefore, we must find another solution to address the zero-inflated problem. In summary, previous works on zero-inflated data have largely relied on classical statistical models based on ZIP and ZINB with fixed linear predictors, while research on BNNs has mainly focused on approximate inference, uncertainty, and regularization for regression and classification tasks. Our study is designed to address this gap by integrating BNNs and statistical zero-inflated models, applying shrinkage priors to stabilize learning on sparse zero-inflated data, and comparing performance against classical zero-inflated models on both simulation and real datasets.

3. Proposed Method for Zero-Inflated Data Analysis

We propose new analysis method based on BNNR to solve the sparsity problem that occurs during zero-inflated data analysis. To check for the zero-inflated problem in the given data, we quantify zero inflation by the zero ratio in Equation (6), defined as the proportion of observations equal to zero:

{Z e r o}_{r a t i o} = \frac{1}{n} \sum_{i = 1}^{n} 1_{\{y_{i} = 0\}}

(6)

In this study, we operationally regard a zero ratio ≥ 10% as excessive zeros. Because the appropriate threshold can be domain-dependent, we also report results across multiple zero ratio levels to assess robustness. The method is an approach intended to solve problems when the zero ratio is very large and analysis is difficult even with existing zero-inflated models. Since the BNNR method is a model that adds regularization to a BNN, we first describe the structure of the BNN. The proposed BNNR is implemented by an R package for Bayesian regularization in feed-forward neural networks [16]. The network model consists of input and hidden layers with a fixed number of nodes, and an output layer to predict the conditional mean of a zero-inflated count response. For each observation

x_{i}

, the model computes

{\hat{y}}_{i} = f (x_{i}; w)

, where

w

contains all weights and bias. The BNNR model is trained by minimizing the squared error between

y_{i}

and

{\hat{y}}_{i}

, and is regularized through Gaussian priors on

w

and data-based tuning of the regularization strength. Given data

D = (X, y)

, the input and output are represented as

X = (x_{1}, x_{2}, \dots, x_{p})

and

y = f (X; w)

, respectively.

w

is the weight vector of the neural network. We model the zero-inflated count data using a neural network with hidden layers. We use sigmoid and exponential activation functions for hidden and output layers, respectively. Using normalization, we unify the scale of all input variables. When

t

is the target, we define

p (t | X)

as following a Gaussian distribution with output given by the neural network model

f (X; w)

, and

w

and precision

τ

for mean and inverse variance, respectively. According to

p (y | X, w, τ) ~ N (t | f (X; w), τ^{- 1})

, we illustrate the posterior of

w

as follows [27,37].

p (w| α) ~ N (w | 0, α^{- 1} I)

(7)

In Equation (7), 0 and

I

are the zero vector and the identity matrix. The likelihood function for

T = (t_{1}, t_{2}, \dots, t_{n})

is expressed as follows.

p (T | w, τ) = \prod_{i = 1}^{n} N (t_{i} | f (x_{i}; w), τ^{- 1})

(8)

Using the previous posterior distribution as the prior distribution, we obtain the following posterior distribution.

p (w | T, α, τ) \propto p (w| α) p (T | w, τ)

(9)

BNNs that use prior distributions for weights are less likely to be overly reliant on specific parameter values than general neural networks, but they still have difficulty due to overfitting. Moreover, prior misspecification or overly concentrated or shrinking priors, together with restrictive approximate posteriors, can over-regularize Bayesian neural networks, leading to underfitting and biased uncertainty. See Foong et al. on the expressiveness limits of approximate inference in BNNs, Wenzel et al. on posterior multi-modality and sensitivity to priors and initialization, and Wilson and Izmailov on probabilistic generalization and the role of inductive bias in deep models [28,29,30,31]. Regularization is a machine learning technique that solves the overfitting problem by adding a penalty term to the error function to prevent coefficients from increasing in size [38]. Regularization limits the size of network weights or suppresses model complexity to prevent overfitting. That is, we control the complexity of the model by including a regularization term in the error function. The number of input and output nodes is determined by the given data. Therefore, we can improve the performance of the model by optimally determining the number of hidden layers and the number of nodes in each hidden layer. The error function with regularization is defined as follows [16,21,25,26].

R = \frac{1}{2} \sum_{i = 1}^{n} {(f (x_{i}; W) - t_{i})}^{2} + \frac{1}{2} λ W^{t} W

(10)

In Equation (10),

λ

is the regularization coefficient that determines model complexity. By combining BNNs and regularization, BNNR uses prior distributions to prevent excessive weight growth during data learning and posterior distributions to perform accurate predictions under uncertainty. BNNR learning is performed by minimizing the following error function

E

[38,39].

E = β \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2} + α \sum_{j = 1}^{p} w_{j}^{2}

(11)

In Equation (11), the error function consists of the sum of squared differences between the actual and predicted values and weight constraints.

α

and

β

are hyperparameters that control the relative importance of the two terms in the error function. From a Bayesian perspective, minimizing

E

means minimizing the negative logarithm of the posterior distribution. Therefore, the BNNR learning procedure is as follows (Algorithm 1).

Algorithm 1 BNNR Learning Procedure

Input:

D = (x_{i}, y_{i})

, i = 1,2, \dots, n

Prior distribution, w ~ G a u s s i a n w i t h μ a n d σ^{2}

Likelihood function, f_{c o u n t} ~ P o i s s o n w i t h m

Output:

Posterior distribution, q (θ)

Predictive distribution, p (y | x, D)

Procedure:

1.

Setting a prior for weights, w_{j} ~ N (0, σ^{2})

2. Initial weights are sampled from prior distribution.

3. Neural network model is run to calculate the predicted values.

4.

Weight update using error loss function, E = β \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2} + α \sum_{j = 1}^{p} w_{j}^{2}

5. Automatically adjust weight parameters at each epoch based on data fit and weight size.

6. After model converges, samples of all weights are taken and predicted values are computed.

The BNNR model with two layers is represented as follows [16].

y_{i} = \sum_{m = 1}^{M} w_{2 m} g_{m} (b_{m} + \sum_{q = 1}^{Q} x_{i q} w_{1 q}) + ε_{i}

(12)

In Equation (12),

ε_{i}

is assumed to follow a Gaussian distribution with mean = 0 and variance =

σ^{2}

. In addition,

b

and

g (\cdot)

are the bias and activation function, respectively. Thus, we minimize the error function.

E = β E_{M S E} + α E_{R}

(13)

In Equation (13),

E_{M S E}

and

E_{R}

are the error sum of squares and the sum of squares of model parameters, respectively. In this study, the BNNR is implemented using Bayesian regularized neural network for regression. The model uses a single-output network that predicts the conditional mean

{\hat{y}}_{i} = f (x_{i}; w)

. While ZIP and ZINB explicitly parameterize the zero probability and count intensity as

π_{i}

and

μ_{i}

, respectively, the current BNNR implementation does not explicitly model (

π_{i}, μ_{i}

) within the network nor optimize an explicit variational evidence lower bound (ELBO) [25,38,39]. To evaluate the explanatory performance of count and zero-inflated models, we consider the Akaike information criterion (AIC) and the Bayesian information criterion (BIC) in this paper [21]. We use the AIC measure to verify the goodness of model fit as follows [21,40].

A I C = - 2 l o g (P (D; w_{M L E})) + 2 p

(14)

In Equation (14),

P (D; w_{M L E})

is the maximum likelihood estimation (MLE) of

w

.

p

is the number of model parameters. The better the fitting performance of the model, the smaller the AIC value. The BIC is an index for measuring the performance of a fitted model by adding sample size

n

to the penalty of the AIC. Equation (15) represents the BIC [21,40].

B I C = - 2 l o g (P (D; w_{M A P})) + p l o g (n)

(15)

In Equation (15),

P (D; w_{M A P})

is the maximum posterior (MAP) of

w

.

p

is the number of model parameters, and

n

is the number of observations. Like AIC, the better the fitting performance of the model, the smaller the BIC value. In addition to AIC and BIC, we consider the Watanabe–Akaike information criterion (WAIC) as model evaluation measure. WAIC estimates the expected out-of-sample predictive accuracy using the pointwise log-likelihood and a correction for effective model complexity through the variance of the log-likelihood across posterior draws [21,22]. Let

y = (y_{1}, y_{2}, \dots, y_{n})

denote the given data and

θ

represent the model parameters. Given posterior draws

{\{θ^{(j)}\}}_{j = 1}^{S}

, the log pointwise predictive density (lppd) is computed as follows [21,22,23,24].

l p p d = \sum_{i = 1}^{n} l o g (\frac{1}{S} \sum_{j = 1}^{S} P (y_{i} | θ^{(j)}))

(16)

In Equation (16), S is the number of posterior draws. Another measure is estimated by Equation (17) [21,22,23,24].

P_{W A I C} = \sum_{i = 1}^{n} V a r (l o g P (y_{i} | θ))

(17)

Equation (17) is the variance of the log predictive density across observations. WAIC is then defined as follows [21,22,23,24].

W A I C = - 2 (l p p d - P_{W A I C})

(18)

In Equation (18), a smaller WAIC value represents better expected predictive performance. As an index for comparing the performance of all models, including zero-inflated count models and BNNR, we use the mean squared error (MSE), which is defined as follows [40].

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(19)

In Equation (19),

y_{i}

and

{\hat{y}}_{i}

are the actual and predicted values, respectively. To calculate the MSE, we divide the given data into training and test sets with a ratio of 7:3. We build the model using training data and calculate the MSE using the test data. The smaller the MSE value of the model, the better the predictive performance of the model. In the following sections, we compare the performance of four models, PGLM, ZIP, ZINB, and BNNR, using AIC, BIC, and MSE. For the experiments in this paper, we use the R Project for Statistical Computing as a data analysis tool [36]. R is free, open-source software that provides a variety of functions for statistical data analysis [36].

4. Experimental Results

In our experiments, we imposed Gaussian priors on all network parameters and estimated the hyperparameters from the data, which can be interpreted as maximizing the marginal likelihood, called the evidence. This shirinks the network weights toward zero unless strongly supported by the given data to overcome the sparse zero-inflated problem. We conducted two experiments using simulated and real data, respectively, to demonstrate the performance and validity of the proposed method. We used four nodes in the hidden layer and 1000 epochs, which we found to provide stable performance in our experiments. In addition, we set

μ = 0.005

and

π = 0.5

as Gaussian priors and the probability of inflated zeros in the ZIP and ZINB models, respectively.

4.1. Simulation Data Analysis

To demonstrate the improved performance of the proposed method compared to existing models, we first conducted a performance evaluation experiment using simulated data. We used the R data language and its packages for our experiments [5,41,42,43]. We generated the simulation data with zero inflation as follows (Algorithm 2).

Algorithm 2 Generation of Zero-Inflated Count Data

Input:

n

: Number of observations

p

: Number of predictors

p_{z e r o}

: Probability of structural zeros

s e e d

: Random seed for reproducibility

Output:

Data frame = {p covariates, zero-inflated count response variable}

Procedure:

1. Set the random seed to ensure reproducibility

2.

Generate a matrix X \in R^{n \times p}

, where each element is independently drawn from the standard normal distribution N (0.1)

.

3.

Define the regression coefficients v (β_{1}, β_{2}, \dots, β_{p})

such that they linearly increase.

4.

Compute the linear predictor, R = X β

5. Compute the Poisson mean parameter using exponentiation of the normalized linear predictor,

μ = e x p (\frac{R}{s d (R)})

where s d (R)

is the sample standard deviation of R

.

6.

Generate Poisson - distributed counts, y_{p o i s s o n} ~ P o i s s o n (μ)

7.

Generate structural zeros using a Bernoulli process, z ~ B e r n o u l l i (p_{z e r o})

8.

Construct the zero-inflated response variable y

as,

y_{i} = \{\begin{matrix} 0 & , z_{i} = 1 \\ y_{p o i s s o n} & , o . w . \end{matrix}

9.

Return the final data frame combining X and the response variable y

.

Using Algorithm 2, we constructed the simulation data with three input variables. The target variable was generated by the model parameters,

β = (1,5.5,10)

. The data size was

n = 1000

. We generated five datasets for one target variable (

Y

) and three input variables (

X_{1}, X_{2}, X_{3}

) according to the zeros ratio (0.1, 0.2, 0.3, 0.4, 0.5) and used them in the analysis. Table 1 shows the correlation coefficients between the target variable and the input variables.

We found that the correlation coefficient values show some differences depending on the zero ratio. We also carried out the count model (PGLM) and zero-inflated models (ZIP and ZINB) using simulation data, and compared the AIC, BIC, and WAIC values of the models in Table 2.

Since AIC, BIC, and WAIC values cannot be obtained for the BNNR model, we present the results for the three comparable models in Table 2. It was confirmed that the zero-inflated model with relatively smaller AIC, BIC, and WAIC values compared to the count model, performed better. The performance of ZIP and ZINB, which are both zero-inflated models, was similar. Next, we conducted a performance evaluation among all comparison models using the MSE measure. Table 3 shows the MSE values of the four comparative models.

In all cases of zero ratios, the MSE values of the BNNR model were lower than those of the other comparison models. These results demonstrate the improved performance of the proposed method and its validity for analyzing zero-inflated data. Figure 1 also shows the performance comparison between four models according to the zero ratio.

We found that the BNNR outperforms the other models as the zero ratio increases. That is, we can see that the MSE value of BNNR is the smallest when the zero ratio is 0.1, 0.2, 0.4, and 0.5. Therefore, we used simulation data to show that BNNR outperforms the other comparison models. In the following section, we perform a performance evaluation of the comparative models using real data.

4.2. Real Data Analysis

We used real data in our experiment to show the improved performance of the proposed method. We chose patent documents as the real data. In the field of technology management, many companies use the results of patent keyword analysis to develop strategies for research and development [7,44,45]. A patent contains detailed information about researched and developed technologies, including the title, abstract, registration date, technology code, citation, and inventor names [46,47]. Therefore, we searched patent documents related to quantum computing technology from the United States Patent and Trademark Office (USPTO) and the Korea Intellectual Property Rights Information Service (KIPRIS) [48,49]. Quantum computing is a new type of computing technology that uses the principles of quantum mechanics to process information [50,51]. While traditional computers process values of 0 or 1 based on bits, quantum computers can simultaneously have states of 0 and 1 using qubits [50,51]. These properties enable quantum algorithms to solve certain classes faster or more efficiently than classical computers. Among patents related to quantum computing technology applied for and registered over a period of 10 years from March 2015 to March 2025, 9436 valid patents were selected. The collected patent documents were filed until 2025 in the patent database around the world. We obtained a total of 9436 valid patents through a patent verification process. In this paper, we selected the title and abstract from the patent documents. Using text mining techniques [5], we constructed a document–keyword matrix from the patent documents. This matrix consists of 175 keywords and 9436 documents. Additionally, the elements of this matrix are the frequency values of how often each keyword appears in each document. Of course, the matrix is zero-inflated, with a significant number of frequency values being zero. From the keywords related to quantum computing, we selected four keywords as follows.

In Table 4, we can see the zero ratio of the corresponding keywords included in all 9436 documents. For example, we can see that the zero ratio of the keyword quantum is not very large at 0.1011, while the zero ratio of the keyword qubit is very large at 0.7780. In this real data study, we focus on predicting the frequency of selected core technology keywords in the document–keyword matrix. This choice serves two purposes. First, keyword frequencies in patents naturally produce a sparse, zero-inflated count matrix, which provides a realistic test bed for zero-inflated modeling. Second, the conditional intensity of a core keyword given the rest of the content can be interpreted as a signal of technology intensity or specialization. Thus, while frequency prediction is not the only task of interest in text mining, it is well aligned with our methodological focus on zero-inflated counts derived from text documents. Using this data, we conducted experiments on the following models.

{K e y w o r d}_{t a r g e t} = f ({K e y w o r d}_{i n p u t 1}, {K e y w o r d}_{i n p u t 2}, {K e y w o r d}_{i n p u t 3})

(20)

In Equation (20), we conducted an experiment by switching the roles of four keywords as target and input variables. First, the correlation coefficients among the four variables are shown in Table 5.

The correlation coefficients between the keywords ranged from 0.2465 to −0.0765. We compared the AIC, BIC, and WAIC values between the comparative count and zero-inflated models in Table 6.

Because BNNR does not calculate AIC, BIC, and WAIC, we excluded the BNNR model from the comparative experiment in Table 7. We found that the AIC, BIC, and WAIC values of zero-inflated models (ZIP and ZINB) are less than the count model (PGLM). For the keywords compute and quantum, the WAIC values for ZINB were larger than those for PGLM. This is because the zero-inflated problem occurs not only in keywords used as response variables but also in keywords used as explanatory variables in real patent keyword data. However, overall, as in the simulation data analysis, ZIP and ZINB were found to be superior to PGLM. That is, we confirmed that zero-inflated models are superior to the count model even in real text analysis with the zero-inflated problem. Next, we compared the MSE values among the comparative models in Table 7.

The first column in Table 7 indicates the keywords used as the target variable Y. That is, if one of the four keywords was selected as Y, the remaining three keywords were used as input variables. From the results in Table 7, we confirmed that the BNNR showed better performance for all keywords compared with the other models. To show the improved performance and validity of our proposed method, we carried out two experiments using simulated and practical datasets. In both experiments, we found that the BNNR outperformed not only the count model but also the zero-inflated models. To evaluate the performance of our proposed model, we conducted a two-step evaluation process. First, we compared it with GLM models, such as PGLM, ZIP, and ZINB, using AIC and BIC. This demonstrated that ZIP and ZINB, as representative zero-inflated models, significantly outperform PGLM, a common count model. Next, we compared our proposed BNNR with the three models mentioned above using MSE. These results demonstrate that the BNNR outperforms ZIP and ZINB, which showed superior performance in AIC and BIC, conclusively demonstrating the superiority of our model. We note that AIC and BIC are defined for likelihood-based GLM baselines (PGLM, ZIP, and ZINB) but are not directly applicable to the BNNR. Therefore, we use AIC and BIC only to contextualize differences among classical models and rely on MSE for the primary comparison involving the BNNR.

5. Discussion

The contribution of this paper is the use of a regularized Bayesian neural network for analyzing zero-inflated data. This represents a novel approach compared with existing zero-inflated models based on the Poisson and negative binomial distributions. The simulation data analysis results in Table 3 demonstrate that the proposed method significantly outperforms existing methods as the zero ratio increases. This result aligns with the research objectives of this paper. The real data analysis results in Table 7 demonstrate that the proposed method also outperforms existing methods; however, the difference is smaller than the results using simulation data. However, our experimental results demonstrate that the proposed method for analyzing sparse zero-inflated data clearly demonstrates improved performance compared to existing methods. Generative models, which are currently being studied, could offer a novel approach to addressing the zero-inflated problem that arises during text data analysis. This is because generating and analyzing large-scale synthetic data from original data with excessive zeros can significantly alleviate this problem. Furthermore, extensive research has been conducted on developing models for analyzing zero-inflated data using Bayesian statistics. Bayesian inference mitigates the uncertainty that arises during the process of analyzing zero-inflated data. Therefore, in our future research, we will integrate Bayesian statistics into generative models to develop new and valuable models for analyzing zero-inflated data. As a future extension, it would be natural to design a dual-head BNN that explicitly parameterizes the zero probability and count intensity (

π

and

μ

) and combines this with Bayesian regularization, thereby integrating the zero-inflated likelihood directly into the neural network architecture.

In our experiments, the experimental mechanism corresponds to a log-link Poisson mean model with a Bernoulli structural zero process, which is consistent with the canonical assumptions of PGLM, ZIP, and ZINB. Hence, the GLM-based baselines are correctly specified in this simulation, and the setting is not friendly to neural networks. However, the design does not explicitly capture nonlinear mean structures, negative binomial over dispersion, or heterogeneous sparsity with

π

(probability of zeros) varying across the covariate space. These effects are partly reflected in the real patent data. We therefore consider these problems as part of our future work. Recently, several deep learning approaches have been proposed for zero-inflated counts, including zero-inflated neural networks, deep hurdle models, and VAE (variational autoencoder)-based ZINB decoders. These methods typically combine flexible neural architectures with output distributions that explicitly account for inflated zeros, sometimes with additional latent variables. Such models can be very powerful on large datasets, but they often require substantial architectural design and hyperparameter tuning. In contrast, the present work focuses on a Bayesian regularized neural regression model and studies whether this comparatively lightweight neural approach can already improve upon classical zero-inflated GLMs (PGLM, ZIP, and ZINB). This is also one of our future research directions related to BRNN for sparse zero-inflated data analysis. A further limitation of our study is that the real data evaluation is restricted to keyword frequency prediction in a document–keyword matrix. Although this task is useful for studying zero-inflated count data and technology intensity, it is less directly aligned with typical business-oriented objectives such as patent classification, technology trend forecasting, or citation count prediction. Extending BNNR to these tasks, and examining how its zero-inflated count modeling improves decision-relevant metrics in such settings, is an important direction for future work.

6. Conclusions

We demonstrated that the proposed BNNR outperforms existing GLMs in analyzing zero-inflated data, particularly under high imbalance conditions. In the current implementation, the BNNR is realized using Bayesian regularization for neural regression but does not expose an explicit variational posterior or an ELBO objective. Therefore, ELBO-based reporting is outside the scope of the present experimental pipeline. In our experiments, we used two datasets—simulation data and a document–keyword matrix—to evaluate performance among the comparative models. First, we compared the explanatory power of count model (PGLM) with zero-inflated models (ZIP and ZINB) and showed that zero-inflated models perform better than the count model. Next, we compared the performance of the BNNR model with the count and zero-inflated models using MSE. From these experimental results, we confirmed that zero-inflated data analysis using BNNR yielded the best results. The zero-inflated problem degrades the performance of conventional statistical models across multiple domains. Most previous work has addressed this issue with probability-based models such as ZIP and ZINB. In contrast, we propose a machine learning framework based on a neural network model that integrates Bayesian learning and regularization to capture uncertainty in sparse, zero-inflated data. The approach is broadly applicable to biomedical, text, and marketing datasets. Because the BNNR used in this paper is based on a neural network model, we can design various neural network structures for zero-inflated data analysis. It will be possible to select the optimal model through various selections of hyperparameters. This is an advantage of the BNNR model over fixed-structure statistical models.

We clarify that AIC and BIC are used only to compare likelihood-based GLM baselines (PGLM, ZIP, and ZINB) and are not directly applicable to the BNNR. For likelihood-based models, a Bayesian analog such as the Watanabe–Akaike information criterion (WAIC) can be computed from pointwise log-likelihood values [21,22,23,24]. However, because the current BNNR implementation does not expose an explicit likelihood or posterior draws, WAIC is not directly computable for the BNNR in this paper. We therefore note this limitation explicitly and leave likelihood-based Bayesian model comparison for the BNNR to future work.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The data are not publicly available because they are part of an ongoing research project.

Conflicts of Interest

The author declares no conflicts of interests.

References

Cameron, A.C.; Trivedi, P.K. Regression Analysis of Count Data, 2nd ed.; Cambridge University Press: New York, NY, USA, 2013. [Google Scholar]
Young, D.S.; Roemmele, E.S.; Yeh, P. Zero-inflated modeling part I: Traditional zero-inflated count regression models, their applications, and computational tools. Wiley Interdiscip. Rev. Comput. Stat. 2022, 14, e1541. [Google Scholar] [CrossRef]
Hilbe, J.M. Negative Binomial Regression, 2nd ed.; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
Hilbe, J.M. Modeling Count Data; Cambridge University Press: New York, NY, USA, 2014. [Google Scholar]
Feinerer, I.; Hornik, K. Package ‘tm’, Version 0.7-16; Text Mining Package; CRAN of R Project; R Foundation for Statistical Computing: Vienna, Austria, 2025. [Google Scholar]
Jun, S. Zero-Inflated Text Data Analysis using Generative Adversarial Networks and Statistical Modeling. Computers 2023, 12, 258. [Google Scholar] [CrossRef]
Jun, S. Patent Keyword Analysis Using Bayesian Zero-Inflated Model and Text Mining. Stats 2024, 7, 827–841. [Google Scholar] [CrossRef]
de Souza, H.C.C.; Louzada, F.; Ramos, P.L.; de Oliveira Júnior, M.R.; Perdoná, G.D.S.C. Bayesian approach for the zero-inflated cure model: An application in a Brazilian invasive cervical cancer database. J. Appl. Stat. 2022, 49, 3178–3194. [Google Scholar] [CrossRef] [PubMed]
Hwang, B.S. A Bayesian joint model for continuous and zero-inflated count data in developmental toxicity studies. Commun. Stat. Appl. Methods 2022, 29, 239–250. [Google Scholar] [CrossRef]
Lee, K.H.; Coull, B.A.; Moscicki, A.B.; Paster, B.J.; Starr, J.R. Bayesian variable selection for multivariate zero-inflated models: Application to microbiome count data. Biostatistics 2020, 21, 499–517. [Google Scholar] [CrossRef]
Sidumo, B.; Sonono, E.; Takaidza, I. Count Regression and Machine Learning Techniques for Zero-Inflated Overdispersed Count Data: Application to Ecological Data. Ann. Data Sci. 2024, 11, 803–817. [Google Scholar] [CrossRef]
Wanitjirattikal, P.; Shi, C. A Bayesian zero-inflated binomial regression and its application in dose-finding study. J. Biopharm. Stat. 2020, 30, 322–333. [Google Scholar] [CrossRef]
Workie, M.S.; Azene, A.G. Bayesian zero-inflated regression model with application to under-five child mortality. J. Big Data 2021, 8, 4. [Google Scholar] [CrossRef]
Foresee, F.D.; Hagan, M.T. Gauss-Newton approximation to Bayesian learning. In Proceedings of the International Conference on Neural Networks (ICNN’97), New York, NY, USA, 12 June 1997; IEEE: Houston, TX, USA, 2002; Volume 3, pp. 1930–1935. [Google Scholar]
Pérez-Rodríguez, P.; Gianola, D.; Weigel, K.A.; Rosa, G.J.M.; Crossa, J. Technical Note: An R package for fitting Bayesian regularized neural networks with applications in animal breeding. J. Anim. Sci. 2013, 91, 3522–3531. [Google Scholar] [CrossRef]
Rodriguez, P.P.; Gianola, D. Package ‘brnn’, Version 0.9.4; Bayesian Regularization for Feed-Forward Neural Networks. In CRAN of R Project; R Foundation for Statistical Computing: Vienna, Austria, 2025. [Google Scholar]
Sabir, Z.; Khansa, S.; Baltaji, G.; Saeed, T. A Bayesian regularization neural network procedure to solve the language learning system. Knowl.-Based Syst. 2025, 310, 112997. [Google Scholar] [CrossRef]
Montgomery, D.C.; Peck, E.A.; Vining, G.G. Introduction to Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
Roback, P.; Legler, J. Beyond Multiple Linear Regression: Applied Generalized Linear Models and Multilevel Models in R; CRC Press: Boca Raton, FL, USA, 2021. [Google Scholar]
Efron, B.; Hastie, T. Computer Age Statistical Inference; Cambridge University Press: New York, NY, USA, 2016. [Google Scholar]
Gelman, A.; Carlin, J.B.; Stern, H.S.; Dunson, D.B.; Vehtari, A.; Rubin, D.B. Bayesian Data Analysis, 3rd ed.; Chapman & Hall/CRC Press: Boca Raton, FL, USA, 2013. [Google Scholar]
Murphy, K.P. Machine Learning: A Probabilistic Perspective; MIT Press: Cambridge, MA, USA, 2012. [Google Scholar]
Theodoridis, S. Machine Learning A Bayesian and Optimization Perspective; Elsevier: London, UK, 2015. [Google Scholar]
Neal, R.M. Bayesian Learning for Neural Networks; Springer: New York, NY, USA, 1996. [Google Scholar]
Foong, A.Y.K.; Burt, D.R.; Li, Y.; Turner, R.E. On the expressiveness of approximate inference in Bayesian neural networks. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; Volume 1333, pp. 15897–15908. [Google Scholar]
Wenzel, F.; Roth, K.; Veeling, B.S.; Swiatkowski, J.; Tran, L.; Mandt, S.; Snoek, J.; Salimans, T.; Jenatton, R.; Nowozin, S. How Goodis the Bayes Posterior in Deep Neural Networks Really? In Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020; Volume 949, pp. 10248–10259. [Google Scholar]
Gal, Y.; Ghahramani, Z. Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 20–22 June 2016; Volume 48, pp. 1050–1059. [Google Scholar]
Wilson, A.G.; Izmailov, P. Bayesian deep learning and a probabilistic perspective of generalization. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; Volume 394, pp. 4697–4708. [Google Scholar]
Huang, J.; Pang, Y.; Liu, Y.; Yan, H. Posterior Regularized Bayesian Neural Network incorporating soft and hard knowledge constraints. Knowl.-Based Syst. 2023, 259, 110043. [Google Scholar] [CrossRef]
Choi, J.; Chapkin, R.; Ni, Y. Bayesian Causal Structural Learning with Zero-Inflated Poisson Bayesian Networks. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Virtual, 6–12 December 2020. [Google Scholar]
Hamura, Y.; Irie, K.; Sugasawa, S. Robust Bayesian Modeling of Counts with Zero Inflation and Outliers: Theoretical Robustness and Efficient Computation. J. Am. Stat. Assoc. 2025, 120, 1545–1557. [Google Scholar] [CrossRef]
Jose, A.; Macdonald, A.S.; Tzougas, G.; Streftaris, G. Interpretable zero-inflated neural network models for predicting admission counts. Ann. Actuar. Sci. 2024, 18, 644–674. [Google Scholar] [CrossRef]
Maldonado, A.D.; Aguilera, P.A.; Salmeron, A. Modeling zero-inflated explanatory variables in hybrid Bayesian network classifiers for species occurrence prediction. Environ. Model. Softw. 2016, 82, 31–43. [Google Scholar] [CrossRef]
Magris, M.; Iosifidis, A. Bayesian learning for neural networks: An algorithmic survey. Artif. Intell. Rev. 2023, 56, 11773–11823. [Google Scholar] [CrossRef]
Bishop, C. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
Blei, D.M.; Kucukelbir, A.; McAuliffe, J.D. Variational Inference: A Review for Statisticians. J. Am. Stat. Assoc. 2017, 112, 859–877. [Google Scholar] [CrossRef]
Bruce, P.; Bruce, A.; Gedeck, P. Practical Statistics for Data Scientists; O’Reilly Media: Sebastopol, CA, USA, 2020. [Google Scholar]
Watanabe, S. Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J. Mach. Learn. Res. 2010, 11, 3571–3594. [Google Scholar]
Gelman, A.; Hwang, J.; Vehtari, A. Understanding predictive information criteria for Bayesian models. Stat. Comput. 2014, 24, 997–1016. [Google Scholar] [CrossRef]
Vehtari, A.; Gelman, A.; Gabry, J. Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC. Stat. Comput. 2017, 27, 1413–1432. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Vienna, Austria, 2025. Available online: https://www.R-project.org (accessed on 29 November 2025).
Amatya, A.; Demirtas, H. Simultaneous generation of multivariate mixed data with Poisson and normal marginals. J. Stat. Comput. Simul. 2015, 85, 3129–3139. [Google Scholar] [CrossRef]
Amatya, A.; Demirtas, H. PoisNor: An R package for generation of multivariate data with Poisson and normal marginals. Commun. Stat.-Simul. Comput. 2017, 46, 2241–2253. [Google Scholar] [CrossRef]
Feng, L.; Niu, Y.; Liu, Z.; Wang, J.; Zhang, K. Discovering Technology Opportunity by Keyword-Based Patent Analysis: A Hybrid Approach of Morphology Analysis and USIT. Sustainability 2020, 12, 136. [Google Scholar] [CrossRef]
Xue, D.; Shao, Z. Patent text mining based hydrogen energy technology evolution path identification. Int. J. Hydrog. Energy 2024, 49, 699–710. [Google Scholar] [CrossRef]
Hunt, D.; Nguyen, L.; Rodgers, M. Patent Searching Tools & Techniques; Wiley: Hoboken, NJ, USA, 2007. [Google Scholar]
Roper, A.T.; Cunningham, S.W.; Porter, A.L.; Mason, T.W.; Rossini, F.A.; Banks, J. Forecasting and Management of Technology; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
KIPRIS, Korea Intellectual Property Rights Information Service. Available online: www.kipris.or.kr (accessed on 1 April 2025).
USPTO, The United States Patent and Trademark Office. Available online: http://www.uspto.gov (accessed on 15 March 2025).
Bernhardt, C. Quantum Computing for Everyone; MIT Press: Cambridge, MA, USA, 2020. [Google Scholar]
Lee, P.Y.; Ji, H.; Cheng, R. Quantum Computing and Information: A Scaffolding Approach, 2nd ed.; Polaris QCI Publishing: Middletown, NY, USA, 2025. [Google Scholar]

Figure 1. Performance comparison among models according to the zero ratio.

Table 1. Correlation coefficients between

Y

and (

X_{1}, X_{2}, X_{3}

) according to the zero ratio.

Table 1. Correlation coefficients between

Y

and (

X_{1}, X_{2}, X_{3}

) according to the zero ratio.

Zero Ratio	X₁	X₂	X₃
0.1	0.0991	0.2543	0.5270
0.2	0.0935	0.2260	0.4911
0.3	0.1154	0.2071	0.4222
0.4	0.1147	0.2224	0.3755
0.5	0.1347	0.1966	0.3327

Table 2. Performance evaluation of the comparative count and zero-inflated models: simulation data.

Zero Ratio	AIC			BIC			WAIC
Zero Ratio	PGLM	ZIP	ZINB	PGLM	ZIP	ZINB	PGLM	ZIP	ZINB
0.1	1895	1844	1846	1914	1880	1887	8102	7505	7460
0.2	1897	1776	1778	1915	1812	1818	8282	7351	7349
0.3	1895	1679	1678	1913	1715	1719	9682	7699	7579
0.4	1838	1573	1571	1856	1609	1612	11,143	7436	7389
0.5	1714	1423	1420	1732	1459	1461	12,006	7270	7262

Table 3. MSE values of comparative models: simulation data.

Zero Ratio	PGLM	ZINB	ZIP	BNNR
0.1	5.8866	2.3130	2.3123	2.1771
0.2	5.9603	2.4605	2.4551	2.4365
0.3	6.2554	3.0707	3.0016	3.2211
0.4	6.3551	3.1041	3.0538	2.8751
0.5	6.1618	3.0892	3.0422	2.8564

Table 4. Zero ratios of comparative keywords.

Keyword	Zero Frequency	Zero Ratio
Compute	1881	0.1993
Data	7219	0.7650
Quantum	954	0.1011
Qubit	7341	0.7780

Table 5. Correlation coefficients between comparative keywords.

Keyword	Compute	Data	Quantum	Qubit
Compute	1.0000	0.0306	0.2465	−0.0288
Data	0.0306	1.0000	−0.0501	−0.0765
Quantum	0.2465	−0.0501	1.0000	−0.0444
Qubit	−0.0288	−0.0765	−0.0444	1.0000

Table 6. Performance evaluation of the comparative count and zero-inflated models: real data.

Keyword for Y	AIC			BIC			WAIC
Keyword for Y	PGLM	ZIP	ZINB	PGLM	ZIP	ZINB	PGLM	ZIP	ZINB
Compute	29,472	28,826	25,910	29,499	28,880	25,971	14,426	12,257	67,897
Data	25,374	15,574	14,202	25,401	15,628	14,263	27,460	8276	5627
Quantum	45,377	42,675	35,159	45,404	42,729	35,220	17,559	14,237	66,730
Qubit	25,340	14,676	13,534	25,367	14,731	13,595	29,393	8156	5666

Table 7. MSE values of comparative models: real data.

Keyword for Y	Zero Ratio	PGLM	ZINB	ZIP	BNNR
Compute	0.1993	9.3246	6.7215	6.7582	6.7083
Data	0.7650	6.1017	5.0415	5.0408	5.0182
Quantum	0.1011	32.9197	19.4256	19.4600	19.2890
Qubit	0.7780	7.5561	6.3348	6.3348	6.2589

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jun, S. Bayesian Neural Networks with Regularization for Sparse Zero-Inflated Data Modeling. Information 2026, 17, 81. https://doi.org/10.3390/info17010081

AMA Style

Jun S. Bayesian Neural Networks with Regularization for Sparse Zero-Inflated Data Modeling. Information. 2026; 17(1):81. https://doi.org/10.3390/info17010081

Chicago/Turabian Style

Jun, Sunghae. 2026. "Bayesian Neural Networks with Regularization for Sparse Zero-Inflated Data Modeling" Information 17, no. 1: 81. https://doi.org/10.3390/info17010081

APA Style

Jun, S. (2026). Bayesian Neural Networks with Regularization for Sparse Zero-Inflated Data Modeling. Information, 17(1), 81. https://doi.org/10.3390/info17010081

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bayesian Neural Networks with Regularization for Sparse Zero-Inflated Data Modeling

Abstract

1. Introduction

2. Research Background

2.1. Zero-Inflated Count Models

2.2. Bayesian Inference and Learning

2.3. Bayesian Approaches and Neural Networks for Zero-Inflated Data

3. Proposed Method for Zero-Inflated Data Analysis

4. Experimental Results

4.1. Simulation Data Analysis

4.2. Real Data Analysis

5. Discussion

6. Conclusions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI