Bayesian Feature Extraction for Two-Part Latent Variable Model with Polytomous Manifestations

Zhang, Qi; Zhang, Yihui; Xia, Yemao

doi:10.3390/math12050783

Open AccessArticle

Bayesian Feature Extraction for Two-Part Latent Variable Model with Polytomous Manifestations

by

Qi Zhang

¹,

Yihui Zhang

² and

Yemao Xia

^1,*

¹

School of Science, Nanjing Forestry University, Nanjing 210037, China

²

School of Computer Science, China University of Geosciences, Wuhan 430074, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(5), 783; https://doi.org/10.3390/math12050783

Submission received: 29 January 2024 / Revised: 1 March 2024 / Accepted: 5 March 2024 / Published: 6 March 2024

(This article belongs to the Special Issue Multivariate Statistical Analysis and Application)

Download

Browse Figures

Versions Notes

Abstract

:

Semi-continuous data are very common in social sciences and economics. In this paper, a Bayesian variable selection procedure is developed to assess the influence of observed and/or unobserved exogenous factors on semi-continuous data. Our formulation is based on a two-part latent variable model with polytomous responses. We consider two schemes for the penalties of regression coefficients and factor loadings: a Bayesian spike and slab bimodal prior and a Bayesian lasso prior. Within the Bayesian framework, we implement a Markov chain Monte Carlo sampling method to conduct posterior inference. To facilitate posterior sampling, we recast the logistic model from Part One as a norm-type mixture model. A Gibbs sampler is designed to draw observations from the posterior. Our empirical results show that with suitable values of hyperparameters, the spike and slab bimodal method slightly outperforms Bayesian lasso in the current analysis. Finally, a real example related to the Chinese Household Financial Survey is analyzed to illustrate application of the methodology.

Keywords:

two-part latent variable model; spike and slab prior; Bayesian lasso; MCMC sampling; CHFS

MSC:

62H12; 62F15

1. Introduction

Semi-continuous data, which are characterized by excessive zeros, are very common in the fields of social sciences and economics. A typical example is given by [1] in an analysis of medical expenditures, in which the zeros correspond to a subpopulation of patients who do not use health services, while the positive values describe the actual levels of expenditures among users. For understanding such a type of data structure, a two-part model [2] is a widely appreciated statistical method. The basic assumption for a two-part model is that the overall model consists of two processes: one binary process (Part One) and one continuous positive-valued process (Part Two). The binary process, usually formulated within a logistic or probit regression model, is used to indicate whether the items have been responded to or not, while the continuous process, conditioning of the binary process, is used to describe what the actual levels of the responses are (see, e.g., [3]). By combining two processes into one, a two-part model provides a unified and flexible way to describe various relationships underlying semi-continuous data. Now, two-part models have been widely used for health services [4,5,6], medical expenditures [1,7,8,9,10], household finances [11], substance use studies [12,13], and genome analysis [14].

A traditional two-part model usually formulates exogenous explanatory factors as fixed and observed. However, in many real applications, especially for socials survey, many unobserved/latent and random factors also have important impacts on the outcome variable(s). This fact is revealed by [15] in a study of children’s aggressive behavior. Ref. [15] noted that two factors, the propensity to engage in aggressive behavior and the propensity to have highly aggressive activity levels, had significant influence on children’s aggressive behavior. The authors incorporated two such latent factors into their analysis and established a two-component–two-part mixture model to identify the heterogeneity of the population. Ref. [16] noticed that in China, the financial literacy of a family had a non-ignorable effect on the desire to hold finance debts and also affected the amount of finance debt being held. They suggested conducting a joint analysis of latent factors and observed covariates in a two-part regression model. Latent factors are further manifested by multiple binary measurements via a factor analysis model. Ref. [17] incorporated a two-part regression model into a general latent variable model framework and analyzed the internal relationships between multiple factors longitudinally. These methods have brought significant attention to two-part models in behavioral science, economics, psychology, and medicine in recent years: see, for example [14,18,19], and references therein for further developments of two-part models.

In an analysis of semi-continuous data, an important issue is to determine which explanatory factors are helpful for improving model fit. This issue is especially true when the number of exogenous factors is large since the commonly used forward and backward regression procedure is extremely time-consuming. Now, lassos and their extensions [20,21,22,23,24,25,26,27] have been the most commonly used methods for feature extraction. A typical feature of these methods is to put some suitable penalties on the coefficients and shrink many coefficients to zero, thus performing variable selection. Recently, these penalization/regularization approaches have been applied widely for prediction and prognosis (see, for example, [28,29]). Though more appealing, lasso-type regularization methods also suffer some limitations. For example, most contributions are developed within the frequentist framework, and their performance heavily depends on the large sample theory (see, for example [20,21,26,27], and references therein). This also readily leads to computational difficulty in the analysis of mixed data. An alternative to variable selection is conducted within the Bayesian framework. Statisticians have introduced hierarchical models with mixed spike-and-slab priors that can adaptively determine the amount of shrinkage [30,31]. The spike and slab prior is the fundamental basis for most Bayesian variable selection approaches and has proved remarkably successful [30,31,32,33,34,35,36]. Recently, Bayesian spike and slab priors have been applied to predictive modeling and variable selection in large-scale genomic studies: see [37] for a simple review. Nevertheless, model selection has never been considered in a two-part regression model with latent variables. In this study, we introduce a spike and slab model and Bayesian lasso that have been combined into a two-part latent variable model, which is a first attempt for this model.

Our formulation is more along the lines of the spike and slab bimodal prior in [34] and the Bayesian lasso in [38]. We formulate the problem by specifying a normal distribution with mean zero to the regression coefficient or factor loading of interest. The probability of a related variable being excluded or included is governed by the variance. To model the shrinkage of coefficients properly, we consider two schemes for the variance parameter: One is a two-point mixture model with one component located at a point close to zero and the other component situated at a point far away zero. The mixing proportion is governed by a beta-distribution with suitable hyperparameters. The other scheme uses a Bayesian lasso for which the variance is specified via a gamma distribution that is scaled by the penalty parameters. The two schemes are unified into a hierarchical mixture model. Within the Bayes paradigm, we developed a fully Bayesian selection procedure for the two-part latent variable model. We resort to the Markov chain Monte Carlo sampling method. A Gibbs sampler is used to draw observations from the posterior. We obtain all full conditionals. Posterior analysis is carried out based on the simulated observations. We investigate the performance of the proposed methods via a simulation study and a real example. Our empirical results show that the two schemes result in similar results for variable selection, but the spike and slab bimodal prior with suitable hyperparameters slightly outperforms the Bayesian lasso in terms of the correct rate.

The remainder of this paper is organized as follows. Section 2 introduces the proposed model for semi-continuous data with latent variables. Section 3 develops an MCMC sampling algorithm for the proposed model. Bayesian inference procedures to include parameter estimation and model assessment are also presented in this section. In Section 4, we present the results of a simulation study to assess the performance of the proposed methodology and illustrate the practical value of our proposed model by analyzing household finance debt data. Section 5 concludes the paper with a discussion. Some technical details are given in Appendix A.

2. Model Description

In Section 2.1, a basic formulation for analyzing semi-continuous data with latent variables is presented. Section 2.2 presents a Bayesian procedure for feature extraction.

2.1. Two-Part Latent Variable Model

Suppose that for

i = 1, \dots, n

,

s_{i}

is a semicontinuous outcome variable that takes a value in

[0, \infty)

;

x_{i}

is a generic vector with r fixed covariates representing the collection of observed explanatory factors of interest. We assume that each

x_{i j}

in

x_{i}

is standardized in the sense

\sum_{i = 1}^{n} x_{i j} = 0

and

\sum_{i = 1}^{n} x_{i j}^{2} = 1

for

j = 1, \dots, r

. Moreover, we include m latent/unobserved variables

ω_{i} = {(ω_{i 1}, \dots, ω_{i m})}^{T}

into the analysis to account for the unobserved heterogeneity of responses. Conceptually, these latent variables can be the covariates that are not directly observed or the synthesization of some highly correlated explanatory items with the noise. Inclusion of latent variables can improve model fit and strengthen the power of model interpretation: see [39] for more discussion of the latent variables in a general setting. To deal with the spike of

s_{i}

at zero, we follow the common routine in the literature (see, for example [10,12]) and identify

s_{i}

with two surrogate variables:

u_{i} = I {s_{i} > 0}

and

z_{i} = log (s_{i} | s_{i} > 0)

, where

I (A)

denotes the indicator function of set A. That is, we separate the whole dataset into two parts: one part is the binary dataset that corresponds to the response-to-nonresponse indicators of the subject and the other part is the set of logarithms of positive values. Our interest focuses on the exploration of the effects of exogenous factors on the two parts.

We assume that

u_{i}

and

z_{i}

satisfy the following sampling models:

\begin{matrix} log (\frac{P (u_{i} = 1)}{1 - P (u_{i} = 1)}) = η_{i}^{u} = α + β_{x}^{T} x_{i} + β_{ω}^{T} ω_{i}, \end{matrix}

(1)

\begin{matrix} p (z_{i} | u_{i} = 1, ω_{i}) = N (η_{i}^{z}, σ^{2}), \\ η_{i}^{z} = γ + ψ_{x}^{T} x_{i} + ψ_{ω}^{T} ω_{i}, \end{matrix}

(2)

in which

α

and

γ

are the scalars of the intercept parameters,

β_{x}

and

ψ_{x}

are the vectors of the regression coefficients, and

β_{ω}

and

ψ_{ω}

are the vectors of the factor loadings;

σ^{2}

is the scale and ‘T’ is the transpose operator of the vector or matrix. For compactness, we write

β = {(β_{x}^{T}, β_{ω}^{T})}^{T}

and

ψ = {(ψ_{x}^{T}, ψ_{ω}^{T})}^{T}

and treat

w_{i} = {(x_{i}^{T}, ω_{i}^{T})}^{T}

as the complete explanatory variables. Note that Equation (1) can be represented as

\begin{matrix} p (u_{i} | x_{i}, ω_{i}) = exp (u_{i} η_{i}^{u}) / (1 + exp (η_{i}^{u})), u_{i} = 0, 1, \end{matrix}

(3)

and we refer to it as the logistic model.

The involvement of latent variables apparently complicates the model. It readily leads to model identification problems [40,41]. This is especially true when the dimension of

ω_{i}

is high. In this case, any auxiliary information is required to manifest

ω_{i}

further. Among various easy constructs, we consider a latent variable (LV) [40,41] approach. A basic assumption of the LV approach is that there exists, say, p manifestations

y_{i} = {(y_{i 1}, \dots, y_{i p})}^{T}

, of which each

y_{i j}

may be continuous, counted, or categorical, and we assume that they satisfy the following link equation:

\begin{matrix} F (y_{i}, ω_{i}, ϵ_{i}, φ) = 0, \end{matrix}

(4)

where F is a known and fixed link function,

ϵ_{i}

is the vector of errors used to identity the idiosyncratic part of

y_{i}

that cannot be explained by

ω_{i}

, and

φ

is a vector of unknown parameters used to quantify the uncertainty of the model. The information about

ω_{i}

is manifested by

y_{i}

via F. In this paper, in view of real applications, we consider p ordered categorical variables

y_{i} = {(y_{i 1}, \dots, y_{i p})}^{T}

, for which

y_{i j}

takes a value in

{0, 1, \dots, c_{j}} (c_{j} > 1)

and satisfies the following link model:

\begin{matrix} y_{i j} = ℓ_{j} if δ_{j, ℓ_{j}} < y_{i j}^{*} \leq δ_{j, ℓ_{j} + 1}, \end{matrix}

(5)

where

δ_{j, 0} < δ_{j, 1} < \dots < δ_{j, c_{j}} < δ_{j, c_{j} + 1}

are the threshold parameters satisfying

δ_{j, 0} = - \infty

and

δ_{j, c_{j} + 1} = + \infty

, and

y_{i}^{*} = {(y_{i 1}^{*}, \dots, y_{i p}^{*})}^{T}

is the vector of latent responses satisfying the factor analysis model:

\begin{matrix} y_{i}^{*} = μ + Λ ω_{i} + ϵ_{i}, \end{matrix}

(6)

\begin{matrix} ω_{i} \overset{i i d .}{\sim} N_{m} (0, Φ), ϵ_{i} \sim N_{p} (0, I_{p}), and ω_{i} ⊥ ϵ_{i}, \end{matrix}

(7)

where

μ

is the p-dimensional intercept vector,

Λ

is the

p \times m

-dimensional factor loading matrix, and

I_{p}

is an identity matrix of order p. We assume that, conditional on

ω_{i}

,

s_{i}

and

y_{i}

are independent.

We refer to the model specified by (1), (2), and (5) associated with (6) as the two-part latent variable model with polytomous responses. It provides a unified framework to explore the dependence of binary, continuous, and categorical data simultaneously. The dependence between them results from the sharing of common factors or latent variables. If

ω_{i}

is degenerated at zeros or the factor loadings are taken as zeros, the dependence among them disappears, and the overall model reduces to a traditional two-part model and ordinal regression model.

To facilitate efficient calculation and motivated by the key identity in [42] (see Equation (2) in their seminal paper), we express model (3) as a mixture model with form

\begin{matrix} \frac{exp (u_{i} (α + β^{T} w_{i}))}{1 + exp (α + β^{T} w_{i})} \\ = 2^{- 1} exp (κ_{i} (α + β^{T} w_{i})) \int_{0}^{\infty} exp \{- \frac{u_{i}^{*}}{2} {(α + β^{T} w_{i})}^{2}\} p_{P G} (u_{i}^{*}) d u_{i}^{*}, \end{matrix}

(8)

where

κ_{i} = u_{i} - 0.5

, and

p_{P G} (u)

is the standard Pólya–Gamma probability density function. Assuming that we introduce auxiliary variables

u_{i}^{*}

and augment them with

u_{i}

, then Equation (3) can be considered to be the marginal density of the joint distribution

\begin{matrix} p (u_{i}, u_{i}^{*} ∣ x_{i}, ω_{i}) = 2^{- 1} exp \{κ_{i} η_{i}^{u} - \frac{u_{i}^{*}}{2} η_{i}^{u 2}\} p_{P G} (u_{i}^{*}) . \end{matrix}

(9)

Note that the exponential part is the kernel of the normal density function with respect to

η_{i}^{u}

. Hence, it admits conjugate full-conditional distributions for all regression coefficients, factor loadings, and factor variables, leading to a straightforward Bayesian computation.

Let

U = {u_{i}}_{i = 1}^{n}

,

Z = {z_{i}}_{i = 1}^{n}

, and

Y = {y_{i}}_{i = 1}^{n}

be the sets of observed variables. We write

Ω = {ω_{i}}_{i = 1}^{n}

for the collection of factor variables and

U^{*} = {u_{i}^{*}}_{i = 1}^{n}

,

Y^{*} = {y_{i}^{*}}_{i = 1}^{n}

for the sets of latent response variables. The complete data likelihood is given by

\begin{matrix} p (U, U^{*}, Z, Y, Y^{*}, Ω | θ) \\ = & p (U, U^{*} | Ω, α, β) p (Z | U, Ω, γ, ψ, σ^{2}) p (Y | Y^{*}, δ) p (Y^{*} | Ω, μ, Λ) p (Ω | Φ) \\ = & \prod_{i = 1}^{n} exp \{κ_{i} η_{i}^{u} - \frac{1}{2} u_{i}^{*} {(η_{i}^{u})}^{2}\} p_{P G} (u_{i}^{*} | 1, 0) \\ \times \prod_{i \in I} \frac{1}{\sqrt{2 π} σ} exp \{- \frac{1}{2 σ^{2}} {(z_{i} - η_{i}^{z})}^{2}\} \\ \times \prod_{i = 1}^{n} \prod_{j = 1}^{p} \sum_{ℓ = 0}^{c_{j}} I {δ_{j ℓ} < y_{i j}^{*} \leq δ_{j, ℓ + 1}, y_{i j} = ℓ} \\ \times \prod_{i = 1}^{n} \prod_{j = 1}^{p} \frac{1}{\sqrt{2 π}} exp \{- \frac{1}{2} {(y_{i j}^{*} - μ_{j} - Λ_{j}^{T} ω_{i})}^{2}\} \\ \times \prod_{i = 1}^{n} \frac{1}{{(\sqrt{2 π})}^{m} {| Φ |}^{1 / 2}} exp \{- \frac{1}{2} t r [Φ^{- 1} ω_{i} ω_{i}^{T}]\} . \end{matrix}

where

I = {i : u_{i} = 1}

is the set of indices,

δ = {δ_{j ℓ}}

is the set of threshold parameters, and

θ = {α, β, γ, ψ, σ^{2}, μ, Λ, Φ, δ}

is the vector of unknown parameters. For the moment, we assume

θ_{j}

in

θ

are all free.

2.2. Bayesian Feature Selection

Generally speaking, regression variables

x_{i}

and factor variables

ω_{i}

may not impact

u_{i}

and

z_{i}

simultaneously, and some redundant variables may exist. The presence of redundant variables not only decreases the model fit but also weakens the power of model interpretation. Therefore, it is necessary to determine which regression coefficient or factor loading is significantly far from zero. In the context of frequentist inference, this issue is generally tackled via stepwise regression, during which for each variable it is decided to be excluded or included according to the model fit. However, the situation becomes complex when the number of independent variables is large. In this paper, we pursue a Bayesian variable selection procedure. To this end, we follow [38] and assume

\begin{matrix} β \sim N_{q} (0, d i a g {γ_{β k}^{2}}), ψ \sim N_{q} (0, σ^{2} d i a g {γ_{ψ k}^{2}}), \end{matrix}

(10)

in which we use

d i a g {a_{k}}

to represent a diagonal matrix with the

k^{t h}

diagonal element

a_{k}

and let

q = r + m

. That is, we assume that each

β_{k}

in

β

(

ψ_{k}

is similar) is centered at zero (or equivalently, each

w_{i k}

is excluded from

w_{i}

), but the probability is governed by the variance

γ_{β k}^{2}

. If

γ_{β k}^{2}

is close to zero, then the probability of

β_{k}

taking zero increases and

w_{i k}

tends to be excluded; conversely, if

γ_{β k}^{2}

is large, then the probability of

β_{k}

being zero is small and

w_{i k}

tends to be maintained. As a result, the value of

γ_{β k}^{2}

plays a key role in determining whether

w_{k}

is relevant to be selected in Part One. With this in mind, a reasonable assumption about

γ_{β k}^{2}

and

γ_{ψ k}^{2}

is that:

\begin{matrix} γ_{β k}^{2} \overset{i n d .}{\sim} (1 - w_{β}) δ_{ν_{β 0} η_{β k}^{2}} (\cdot) + w_{β} δ_{η_{β k}^{2}} (\cdot), \end{matrix}

(11)

\begin{matrix} γ_{ψ k}^{2} \overset{i n d .}{\sim} (1 - w_{ψ}) δ_{ν_{ψ 0} η_{ψ k}^{2}} (\cdot) + w_{ψ} δ_{η_{ψ k}^{2}} (\cdot), \end{matrix}

(12)

where

δ_{a} (\cdot)

is the Dirac measure concentrated at point a,

w_{β}

is the random weight used to measure the degree of similarity between

γ_{β k}^{2}

and

η_{β k}^{2}

, and

η_{β k}^{2}

is the hyperparameter used to represent how far

β_{k}

is away from zero or the slab;

ν_{β 0}

is a previously specified small positive value used to identity the ‘spike’ of

β_{k}

at zero. In other words, every

γ_{β k}^{2}

is assumed to be equal to

η_{β k}^{2}

with probability

w_{β}

and equal to

ν_{β 0} η_{β_{k}}^{2}

with probability

1 - w_{β}

. This is also true for

w_{ψ}

,

η_{ψ k}

and

ν_{ψ 0}

. To model

w_{β}

and

w_{ψ}

properly, we assign the following beta distributions to them:

\begin{matrix} p (w_{β} | a_{β}, b_{β}) = B e t a (a_{β}, b_{β}), p (w_{ψ} | a_{ψ}, b_{ψ}) = B e t a (a_{ψ}, b_{ψ}), \end{matrix}

(13)

where

a_{β}

,

a_{ψ}

,

b_{β}

, and

b_{ψ}

are the hyperparameters used to control the shape of the beta density: that is, to determine the magnitude of weights in

(0, 1)

. For example, if

a_{β 1}

in Equation (13) is small and

b_{β 1}

is large, then Equation (13) encourages

w_{β}

to take a small value with high probability. In contrast, it follows from

1 - B e t a (a_{β}, b_{β}) = B e t a (b_{β}, a_{β})

that a large

a_{β}

and small

b_{β}

encourage

w_{β}

to take a large value in

(0, 1)

. In the case that

a_{β} = b_{β} = 1.0

, Equation (13) reduces to a uniform distribution on

(0, 1)

. In this case, every value in

(0, 1)

is possible for

w_{β}

with identical probability. In real applications, if no information is available, one can assign these values in a manner that ensures the beta distribution is inflated enough.

Finally, to measure the magnitudes of the ‘slab’ in the distributions of

β_{k}

and

ψ_{k}

, we specify gamma distributions for

η_{β k}^{- 2}

and

η_{ψ k}^{- 2}

, or equivalently,

\begin{matrix} η_{β k}^{2} | a_{β 1}, a_{β 2} \overset{i i d .}{\sim} I G (τ_{β 0}, ζ_{β 0}), η_{ψ k}^{2} | a_{ψ 1}, a_{ψ 2} \overset{i i d .}{\sim} I G (τ_{ψ 0}, ζ_{ψ 0}), \end{matrix}

(14)

where we use ‘

I G (τ, ζ)

’ to signify the inverse gamma distribution with mean

ζ / (τ - 1)

for

τ > 1

and variance

ζ^{2} / ({(τ - 1)}^{2} (τ - 2))

for

τ > 2

;

τ_{β 0}

,

ζ_{β 0}

,

τ_{ψ 0}

, and

ζ_{ψ 0}

are the hyperparameters, which are treated as fixed and known. Similarly, one can assign values to them to ensure that (14) is dispersed enough. For example, we can follow the routines in [34] for ordinary regression analysis and set

τ_{β 0} = τ_{ψ 0} = 1.0

and

ζ_{β 0} = ζ_{ψ 0} = 0.05

to obtain dispersed priors.

Note that Equations (11) and (12) can be formulated as a hierarchy as follows: for

k = 1, \dots, q

,

\begin{matrix} γ_{β_{k}}^{2} = f_{β k} η_{β k}^{2}, γ_{ψ k}^{2} = f_{ψ k} η_{β k}^{2}, \end{matrix}

\begin{matrix} f_{β k} | ν_{β 0}, ω_{β} \overset{i i d .}{\sim} (1 - w_{β}) δ_{v_{β 0}} (\cdot) + w_{β} δ_{1} (\cdot), \end{matrix}

(15)

\begin{matrix} f_{ψ k} | ν_{ψ 0}, w_{ψ} \overset{i i d .}{\sim} (1 - w_{ψ}) δ_{v_{ψ 0}} (\cdot) + w_{ψ} δ_{1} (\cdot), \end{matrix}

(16)

where

f_{β k}

and

f_{ψ k}

are the latent binary variables. Such a formulation aims to separate

η_{β k}^{2}

and

η_{ψ k}^{2}

from the distributions (11) and (12) to facilitate posterior sampling.

It is instructive to compare the proposed method to the Bayesian lasso [38], in which the variance parameters

γ_{β k}^{2}

and

γ_{ψ k}^{2}

in Equation (10) are specified via exponential distributions as follows:

\begin{matrix} p (γ_{β}^{2} ∣ λ_{β}^{2}) = \prod_{k = 1}^{q} \frac{λ_{β k}^{2}}{2} exp (- λ_{β k}^{2} γ_{β k}^{2} / 2), \end{matrix}

(17)

\begin{matrix} p (γ_{ψ}^{2} ∣ λ_{ψ}^{2}) = \prod_{k = 1}^{q} \frac{λ_{ψ k}^{2}}{2} exp (- λ_{ψ k}^{2} γ_{ψ k}^{2} / 2), \end{matrix}

(18)

where

λ_{β}^{2} = {(λ_{β 1}^{2}, \dots, λ_{β q}^{2})}^{T}

and

λ_{ψ}^{2} = (λ_{ψ 1}^{2}, \dots,

λ_{ψ q}^{2})^{T}

λ_{ψ k}^{2}

are the shrinkage/penalty parameters used to control the amount of shrinkage of

β_{k}

and

ψ_{k}

toward zero.

Modeling

γ_{β k}^{2}

and

γ_{ψ k}^{2}

like Equations (17) and (18) leads to marginal distributions of

β_{k}

and

ψ_{k}

as Laplace distributions with location zero and scale

λ_{k}

. The penalty parameters

λ_{β k}^{2}

and

λ_{ψ k}^{2}

are rather crucial for determining the amount of shrinkage of parameters. Figure 1 presents the plots of densities of Laplace distribution

L A (λ) (λ > 0)

across various choices of

λ

. It can be seen that the larger the value of

λ

is, the more kurtosis the density has, indicating more penalties on the regression coefficient.

Due to the key roles in Equations (17) and (18) of

λ_{β}^{2}

and

λ_{ψ}^{2}

, we assign the following gamma priors to them, i.e.,

\begin{matrix} p (λ_{β}^{2}) = \prod_{k = 1}^{q} p (λ_{β k}^{2}) = \prod_{k = 1}^{q} G a (a_{k 0}, b_{k 0}), \end{matrix}

(19)

\begin{matrix} p (λ_{ψ}^{2}) = \prod_{k = 1}^{q} p (λ_{ψ k}^{2}) = \prod_{k = 1}^{q} G a (c_{k 0}, d_{k 0}), \end{matrix}

(20)

where ‘

G a (ν, λ)

’ denotes a gamma distribution with mean

ν / λ

. As previously discussed, the values of

a_{k 0}

,

b_{k 0}

,

c_{k 0}

, and

d_{k 0}

should be selected with care since they relate the shrinkages directly. Similar to [43], one can set

a_{k 0} = c_{k 0} = 1

and

b_{k 0} = d_{k 0} = 0.05

to enhance the robustness of inference. This routine is followed in our empirical study.

Let

F_{β}^{*} = {f_{β k}}

,

F_{ψ}^{*} = {f_{ψ k}}

,

γ_{β}^{2} = {γ_{β k}^{2}}

,

γ_{ψ}^{2} = {γ_{ψ k}^{2}}

,

η_{β}^{2} = {η_{β k}^{2}}

,

η_{ψ}^{2} = {η_{ψ k}^{2}}

. We treat

ν_{β 0}

and

ν_{ψ 0}

as the known hyperparameters. Note that

γ_{β}^{2}

and

γ_{ψ}^{2}

are totally determined by

F_{β}^{*}

,

F_{ψ}^{*}

and

η_{β}^{2}

,

η_{ψ}^{2}

. In the following, we abbreviate the spike and slab bimodal prior to SS and the Bayesian lasso to BaLsso.

3. Bayesian Inference

3.1. Prior Specification and MCMC Sampling

In view of the model complexity, we consider Bayesian inference. Some priors are required to specify unknown parameters to complete Bayesian model specification. Based on the model convention, it is natural to assume that the parameters involved in the different models are independent.

Firstly, for

μ

,

Λ

, and

Φ

, we consider the following conjugate priors:

\begin{matrix} p (μ) = N_{p} (μ_{0}, Σ_{0}), \end{matrix}

(21)

\begin{matrix} p (Λ) = \prod_{k = 1}^{p} p (Λ_{k}) = \prod_{k = 1}^{p} N_{m} (Λ_{0 k}, H_{0 k}), \end{matrix}

(22)

\begin{matrix} p (Φ) = I W (ρ_{0}, R_{0}^{- 1}), \end{matrix}

(23)

where ‘

I W (ρ, R)

’ denotes an inverse Wishart distribution with degrees of freedom

ρ

and scale matrix

R

[44];

Λ_{k}^{T}

is the

k^{t h}

row vector of

Λ

;

μ_{0}

,

Σ_{0} (p \times p) > 0

,

Λ_{0 k}

,

H_{0 k} (m \times m) > 0

,

ρ_{0} > 0

, and

R_{0} (m \times m) > 0

are the hyperparameters, which are treated as fixed and known.

Secondly, for

α

,

γ

, and

σ^{2}

in Parts One and Two, we assume they are mutually independent and satisfy

\begin{matrix} p (α) = N (α_{0}, σ_{α 0}^{2}), p (γ) = N (γ_{0}, σ_{γ 0}^{2}), p (σ^{- 2}) = G a (a_{0}, b_{0}), \end{matrix}

(24)

where

α_{0}

,

σ_{α 0}^{2}

,

γ_{0}

,

σ_{γ 0}^{2}

,

a_{0}

, and

b_{0}

are the fixed hyperparameters.

Lastly, for threshold parameter

δ

, without loss of generality, we assume that

c_{j}

, the number of categories of

y_{i j}

, is invariant across the subscript j and equals c. Moreover, we assume that

p (δ) = \prod_{j = 1}^{p} p (δ_{j})

, where

δ_{j} = (δ_{j k})

is the

j^{t h}

row vector of

δ

. In the following, we suppress the subscript j in

δ_{j k}

for notational simplicity and write

δ

for

δ_{j}

.

Let

F_{0} (\cdot)

be any strictly monotonically increasing and differentiable function on

R

, with

F_{0} (+ \infty) = 1

and

F_{0} (- \infty) = 0

. For example, one can take

F_{0} = Φ (\cdot / τ_{0})

for some

τ_{0} > 0

or the distribution function of Student’s t-distribution with degrees of freedom

ν_{0}

, where

Φ (\cdot)

is the standard normal distribution function. To specify a prior for

δ

, we follow [45] and let

p_{j} = F_{0} (δ_{j}) - F_{0} (δ_{j - 1})

for

j = 1, \dots, c

. It is easily shown that this transformation is invertible with Jacobi determination unity. We first consider the following Dirichlet distribution for

p = {(p_{1}, \dots, p_{c})}^{T}

:

\begin{matrix} π (p) = \frac{1}{B (η_{1}, \dots, η_{c + 1})} p_{1}^{η_{1} - 1} \dots p_{c}^{η_{c} - 1} {(1 - \sum_{ℓ = 1}^{c} p_{ℓ})}^{η_{c + 1} - 1} \end{matrix}

where

B (η_{1}, \dots, η_{c + 1}) = \prod_{j = 1}^{c + 1} Γ (η_{j}) / Γ (\sum_{j = 1}^{c + 1} η_{j})

is the multivariate beta function evaluated at

η_{1}, \dots, η_{c + 1}

, and

η_{j} > 0

. Then, by the formula of inverse transformation, the joint distribution of

δ

is given by

\begin{matrix} π (δ) = \frac{1}{B (η_{1}, \dots, η_{c + 1})} p_{1}^{η_{1} - 1} \dots p_{c}^{η_{c} - 1} {(1 - \sum_{ℓ = 1}^{c} p_{ℓ})}^{η_{c + 1} - 1} \prod_{j = 1}^{c} f_{0} (δ_{j}) I {δ_{1} < \dots < δ_{c}}, \end{matrix}

(25)

where

f_{0} (x)

is the derivative of

F_{0} (x)

with respect to x. We call (25) the transformed Dirichlet prior and use it as the prior of

δ

. An advantage of working with (25) is that, conditional upon

δ_{j - 1}

and

δ_{j + 1}

, the transformed distribution of

δ_{j}

has the beta distribution given by

\frac{F_{0} (δ_{j}) - F_{0} (δ_{j - 1})}{F_{0} (δ_{j + 1}) - F_{0} (δ_{j - 1})} | (δ_{j - 1}, δ_{j + 1}) \sim B e t a (η_{j}, η_{j + 1}), (j = 1, \dots, c) .

(26)

3.2. MCMC Sampling

With the prior given above, the inference about

θ

is based on the posterior

p^{o} (θ | U, Z, Y)

, which has no closed form. Motivated by the key idea in [46], we treat latent quantities as the missing data and augment them to the observed data to form the complete data. The statistical inference is carried out based on the complete data likelihood. To this end, apart from

Ω

,

U^{*}

, and

Y^{*}

mentioned before, we further let

Q^{*}

be the collection of latent quantities involved in the specifications of

β

and

ψ

, i.e.,

Q^{*} = {F_{β}^{*}

,

F_{ψ}^{*}

,

η_{β}^{2}

,

η_{ψ}^{2}

,

w_{β}

,

w_{ψ}}

under SS and

{λ_{β}^{2}

,

λ_{ψ}^{2}}

under BaLsso. Rather than working with the posterior

p^{o}

directly, we consider the following joint distribution

\begin{matrix} p^{j o i n t} (Ω, U^{*}, Y^{*}, Q^{*}, θ | U, Z, Y), \end{matrix}

(27)

where

p^{o}

can be considered as the marginal of

p^{j o i n t}

. We use the Markov chain Monte Carlo (MCMC) [47,48] sampling method to simulate observations from this target distribution. In particular, a Gibbs sampler is implemented to draw observations iteratively from the full conditional distributions as follows:

Draw $Ω$ from $p (Ω ∣ U^{*}, Q^{*}, Y^{*}, θ, U, Z, V)$ ;
Draw $U^{*}$ from $p (U^{*} ∣ Ω, Y^{*}, Q^{*}, θ, U, Z, V)$ ;
Draw $Y^{*}$ from $p (Y^{*} ∣ Ω, U^{*}, Q^{*}, θ, U, Z, V)$ ;
Draw $Q^{*}$ from $p (Q^{*} ∣ Ω, U^{*}, Y^{*}, θ, U, Z, V)$ ;
Draw $θ$ from $p (θ ∣ Ω, U^{*}, Y^{*}, Q^{*}, U, Z, V)$ .

Upon convergence, the posterior is approximated by the empirical distribution of the simulated observations. The convergence of the algorithm can be monitored by plotting the traces of estimates for different starting values or observing the values of EPSR [49] of unknown parameters. The technical details for implementing MCMC sampling are given in Appendix A.

Simulated observations obtained from the blocked Gibbs sampler can be used for statistical inference via a straightforward analysis procedure. For example, the joint Bayesian estimates of unknown parameters can be obtained via sample averaging as follows:

\begin{matrix} \hat{θ} = M^{- 1} \sum_{m = 1}^{M} θ^{(m)}, \end{matrix}

where

{θ^{(m)} : m = 1, \dots, M}

are the simulated observations from the posterior. Consistent estimates of the covariance matrices of estimates can be obtained via sample covariance matrices.

The main purpose of introducing SS and BaLsso is to screen the variables in

w_{i}

. Unlike that in the frequentist inference, Bayesian variable selection does not produce estimates

\hat{β}

and

\hat{ψ}

that are exactly equal to zero, and hence, it is necessary to determine which component can be treated as zero. This can accomplished via posterior confidence intervals (PCIs) of

β_{j}

and

ψ_{j}

, which are given by

\begin{matrix} P (| β_{j} | < c_{α / 2} | U, Z, Y) = 1 - α, P (| ψ_{j} | < d_{α / 2} | U, Z, Y) = 1 - α \end{matrix}

(28)

where

α

is any previously specified value in

(0, 1)

. Calculation of the PCI can be achieved via the Monte Carlo method. For example, let

{β_{j}^{(k)} : k = 1, \dots, K}

be the K observations generated from the posterior distribution. Then the PCI of

β_{j}

with confidence level

100 (1 - α) %

is given by

[β_{j, 100 (α / 2)}

,

β_{j, 100 (1 - α / 2)}]

, where

β_{j, k}

is the

k^{t h}

order statistics.

Another choice for variable determination in SS is based on the posterior probabilities of

f_{β j} = 1

and

f_{ψ j} = 1

, which can be approximated by

\begin{matrix} {\hat{f}}_{β j} = \frac{1}{K} \sum_{k = 1}^{K} I {f_{β j}^{(k)} = 1}, {\hat{f}}_{ψ j} = \frac{1}{K} \sum_{k = 1}^{K} I {f_{ψ j}^{(k)} = 1}, \end{matrix}

(29)

where

f_{β j}^{(k)}

and

f_{ψ j}^{(k)} (k = 1, \dots, K)

are the k observations drawn from the posterior distribution via the Gibbs sampler. The variable

w_{j}

is selected in Parts One and Two if

{\hat{f}}_{β j} > 0.5

and

{\hat{f}}_{ψ j} > 0.5

.

4. Simulation Study

In this section, a simulation study is conducted to assess the performance of the proposed method. The main objective is to assess the accuracy of estimates and the correct rate of variable selection. We consider one semi-continuous variable

s_{i}

, two factor variables

ω_{i 1}

and

ω_{i 2}

, and six categorical variables

y_{i j} (j = 1, \dots, 6)

. We assume that

s_{i}

,

ω_{i j}

and

y_{i j}

satisfy Equations (1), (2), and (5) associated with (6), respectively, in which the number of fixed covariates is set to five. We generate

x_{i 1}

and

x_{i 2}

from the standard normal distribution,

x_{i 3}

and

x_{i 4}

from a binomial distribution with a probability of success of 0.3, and

x_{5}

from a uniform distribution on

(0, 1)

. All covariates were standardized to unify the scales. For ordered categorical variables, we take

c_{j} = c = 4

: that is, each

y_{i j}

belongs to

{0, 1, 2, 3, 4}

.

The true values of the population parameters are set as follows:

α = γ = 0.7

,

β = {(0.7, 0.0, 0.7, 0.0, 0.7, 0.0, 0.8)}^{T}

,

γ = 0.7

,

ψ = {(0.7, 0.0, 0.7, 0.0, 0.7, 0.8, 0.0)}^{T}

,

σ^{2} = 1.0

,

μ = 0.7 \times 1_{6}

, where

1_{6}

is a

6 \times 1

vector for which the elements are in unity. The factor loading matrix

Λ

and covariance matrix

Φ

are taken as

\begin{matrix} Λ^{T} = [\begin{matrix} 1.0 & 0.8 & 0.8 & 0.0 & 0.0 & 0.0 \\ 0.0 & 0.0 & 0.0 & 1.0 & 0.8 & 0.8 \end{matrix}], Φ = [\begin{matrix} 1.0 & 0.3 \\ 0.3 & 1.0 \end{matrix}], \end{matrix}

(30)

in which the ones and zeros in

Λ

are treated as fixed to identify the model; the thresholds are set as

δ_{k} = {(- {1.5}^{*}, 0.0, 1.2, {2.5}^{*})}^{T}

for

k = 1, \dots, 6

, where the elements with an asterisk are treated as fixed for model identification. Based on these setups, we generate data by first drawing latent factors from

N_{2} (0, Φ)

and then drawing latent responses

Y^{*}

from (6). The the indicator responses

U

, the intensity responses

Z

and the ordered categorical responses

Y

, are sequentially generated from (1), (2) and (5). To investigate the effect of sample size on the estimates, we take

n = 400

and 1000, respectively, which represent small and large levels of sample size.

For Bayesian analysis, we consider the following inputs for the hyperparameters: for the parameters involved in the measurement model, we take

μ_{0} = 0_{6}

and

Σ_{0} = 100.0 \times I_{6}

; the elements in

Λ_{0}

corresponding to the free parameters in

Λ

are set at zero, and

H_{0 k} = I_{2}

for

k = 1, \dots, 6

;

ρ_{0} = 10.0

, and

R_{0}^{- 1} = 6.0 \times I_{3}

; for the threshold parameters

δ

, we take

η_{1} = \dots = η_{5} = 1.0

, which denotes the uniform distribution of p on the simplex in

R^{5}

; for the intercept parameters

α

,

γ

, and scale

σ^{2}

in the two-part model, we set

α_{0} = γ_{0} = 0

,

σ_{α 0}^{2} = σ_{γ 0}^{2} = 100

, and

a_{0} = b_{0} = 2.0

; the hyperparameters involved in the formulation of

β

and

ψ

are set as before. Note that these values can ensure that the corresponding priors are inflated enough; hence, we can expect this to enhance the robustness of the inference. In addition, we set

ν_{β 0} = ν_{ψ 0} = 0.001

in Equations (11) and (12) to guarantee

β_{k}

and

ψ_{k}

clumping at zero sufficiently.

The MCMC algorithm described in Section 3 is implemented to obtain the estimates of unknown parameters

θ

. Before the formal implementation, a few test runs are conducted as a pilot to monitor the convergence of the Gibbs sampler. We plot the values of EPSR of unknown parameters against the number of iterations under three different starting values. For SS, Figure 2 presents the plots of EPSR of unknown parameters under three different starting values with sample size

n = 400

.

It can be found that the convergence of estimates is fast and all values of EPSR are less than 1.2 in about 300 iterations. To be conservative, we remove the first 2000 observations as the burn-in phase and further collect 3000 observations for calculating the bias (BIAS), root mean square (RMS), and standard deviation (SD) of the estimate across 100 replications. The BIAS and RMS of the j-th component

{\hat{θ}}_{j}

in

\hat{θ}

are respectively defined as follows:

\begin{matrix} BIAS ({\hat{θ}}_{j}) = ({\bar{θ}}_{j} - θ_{j}^{0}), {\bar{θ}}_{j} = \frac{1}{100} \sum_{κ = 1}^{100} {\hat{θ}}_{j}^{(κ)}, RMS ({\hat{θ}}_{j}) = \sqrt{\frac{1}{100} \sum_{κ = 1}^{100} {({\hat{θ}}_{j}^{(κ)} - θ_{j}^{0})}^{2},} \end{matrix}

(31)

where

θ_{j}^{0}

is the j-th element of the population parameters

θ^{0}

. The summaries of the estimates of the main parameters for the two scenarios are reported in Table 1 and Table 2, where the sums of the SDs and RMSs across the estimates are presented in the last rows.

Examination of Table 1 and Table 2 gives the following findings: (i) Both methods produce satisfactory results. The performance of SS is slightly superior to that of BaLsso in terms of the sums of RMS and SD. For

n = 400

, the total RMS and SD are 1.870 and 1.975, respectively, under SS and amount to 2.016 and 2.035, respectively, under BaLsso. (ii) Except

ψ_{2}

in Table 1 and

ψ_{4}

in Table 2, for the regression coefficients and factor loadings with true values at zero, the RMSs and SDs produced by SS are uniformly smaller than those produced by BaLsso. This indicates that SS imposes more penalties on the regression coefficients and factor loadings than BaLsso to shrink them toward zero. (iii) For non-zero

β_{j}

and

ψ_{j}

, the situation becomes fiendishly complex. For example, the RMSs and SDs of the estimates of

β_{5}

,

β_{7}

,

ψ_{1}

,

ψ_{5}

, and

ψ_{6}

in Table 1 show that the SS does a weaker job than BaLsso. This is also true for

β_{3}

and

ψ_{3}

in Table 2. The underlying reason perhaps lies in that the penalties imposed on the non-zero regression coefficients and factor loadings by SS are similar to those of BaLsso, and one is not overwhelmingly superior to the other. (iv) As expected, increasing the sample size improves the accuracy of the estimates for both SS and BaLsso.

Another simulation is conducted to assess the performance of the proposed method for variable selection when the covariates and latent variables are correlated. In this setting, we generate covariates and latent factors jointly from the multivariate normal distribution with mean zero and covariance matrix

Σ (7 \times 7)

with

Σ_{j k} = ρ^{| j - k |}

, where

Σ_{j k}

is the

{(j, k)}^{t h}

entry of

Σ

. We consider three scenarios for

ρ

: (i)

ρ = 0.1

, (ii)

ρ = 0.5

, and (iii)

ρ = 0.8

, which represent, respectively, weak, mild, and strong dependence among them. The values of

β

and

ψ

are taken as

{(1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 1.0)}^{T}

and

{(1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0)}^{T}

, respectively, and the sample size is taken as

n = 1000

. The remainder of the model is set up the same as before. We implement MCMC sampling and collect 3000 observations after removing the first 2000 observations for posterior inference. We follow [43] and treat a regression coefficient as zero if the absolute value of its estimate is less than 0.1. Table 3 gives the summary of variable selection across 100 replications.

Based on Table 3, it can be found that (i) for nonzero regression coefficients, the two methods exhibit satisfactory performances, with both having 100% correct rates across all situations; (ii) for zero regression coefficients, there are differences between the two methods, and SS uniformly outperforms BaLsso. The underlying reason perhaps is that for SS, the variances of estimates are set to be small enough to ensure that the coefficients are close to zero, while for BaLsso, the variances of estimates are controlled by the shrinkage parameters, which may not be large enough to ensure this point; (iii) with an increase in the strength of dependence, the correct rates of the two methods decrease. As an illustration, Figure 3 gives the plots of the correct rates of the selected variables for three scenarios; the values on the x-axis are the true values of

β_{1}

to

ψ_{7}

in the order given in Table 3.

5. Chinese Household Financial Survey Data

To demonstrate the usefulness of the proposed methodology, in this section, a small portion of Chinese household finance debt data is analyzed. The dataset is collected from the Chinese Household Financial Survey (CHFS), a non-profit institute organized by the Southeast University of Finance and Economics in Chengdu, Sichuan Province, China. The survey covers a series of questions that touch on information about various aspects of a household’s financial situation. In this study, we only focus on the measurement ‘gross debts per household (DEB)’: the amount of secured debt and unsecured debt for the household under investigation. We extracted the data from a survey of Zhejiang Province in 2013. Due to some uncertain factors, some measurements of DEB are missing. The missing proportion is about 2.7%. We remove the subjects with missing entries, and the ultimate sample size is 884. A preliminary data analysis shows that the DEB measurements contain excessive zeros, and the proportion of zeros is about 72.58%. Naturally, we treat this variable as the outcome variable

s_{i}

and identify it with

u_{i}

and

z_{i}

. Figure 4 presents the histogram of DEB as well as the logarithms of positive values. It can be seen clearly that the dataset illustrates strong heterogeneity. The skewness and kurtosis of DEB are 1.1042 and 2.3361, respectively, which indicates that a single parametric model for DEB may be unappreciated.

We include the following measurements as the potential explanatory factors to interpret the variability of DEB: gender (

x_{1}

), age (

x_{2})

, marital status (

x_{3}

), health condition(

x_{4}

), educational experience (

x_{5}

), employment status of the household head (

x_{6}

), the number of family members (aged over 16,

x_{7}

), and the household annual income (

x_{8}

). Table 4 gives the descriptive summary of these measurements. To unify the scale, all covariates were standardized.

Besides the observed factors mentioned above, we also include family culture

η

as a latent factor into the current analysis. It is well-known that China is an ancient civilization with a long history, and Confucian culture is deeply rooted in social development. Economic activity or social development cannot be independent of cultural development. Hence, it is of practical interest to investigate how family culture affects behavior related to household finance debt. Based on the design of the questionnaire, we select the following three measurements as manifestations for

η

: (i) The preference for boys (BP,

y_{1}

): this is a three-category measurement coded as 0, 1, and 2, which correspond to the attitudes ‘opposed’, ‘doesn’t matter’, and ‘strongly support’; (ii) The attitude toward having a single child (SC), coded by 0, 1 and 2, according to the level of support. (iii) The importance of family in one’s life: this measurement was originally based on a six-point scale (0 to 5) according to the support level. However, in view of the frequencies in the last three groups being small, we grouped them into three categories and recoded them as 0 (does not matter), 1 (important), and 2 (very important). In addition, as some manifestations are missing, we treat missing data as missing randomly and ignorable [50] and ignore the specific mechanics that result in missing data.

Let

U = {u_{i}}

,

Z = {z_{i}}

, and

Y = {Y_{o b s}, Y_{m i s}}

, where

Y_{o b s}

is the collection of observed data and

Y_{m i s}

is the set of missing data. We formulate

U

,

Z

, and

Y

within Equations (1), (2), and (6) and assume that

η_{i}, i i d . \sim N (0, 1)

. The inputs of the hyperparameters in the priors are taken as follows:

Λ_{j 0} = 0.0

,

H_{j 0} = 1

, and

η_{j 1} = η_{j 2} = η_{j 3} = 2.0

. The values of other hyperparameters are taken to be the same as those in the simulation study. To implement the MCMC sampling algorithm, we need to impute the missing data in

Y

. This is done by drawing

y_{i j, m i s}

from the conditional distribution

p (y_{i j, m i s} | θ, Y_{o b s}) = N (μ_{j, m i s} + Λ_{j, m i s} η_{i}, 1)

, where

μ_{j, m i s}

and

Λ_{j, m i s}

are the components of

μ

and

Λ

, respectively, that correspond to the missing entries

y_{i j, m i s}

in

y_{i}

. In addition, to identify the model and scale the factor, we set

Λ_{1} = 1

. We also adopt the method in [51] in the context of a latent variable model with polytomous data and fix

δ_{j 1}

at

Φ^{- 1} (f_{j 1} / n_{j})

, where

n_{j}

is the size of

y_{o b s, i j}

that is equal to 1, and

f_{j 1}

is the observed frequency of 0 in

y_{o b s, i j}

. To assess the convergence of the algorithm, for SS, we plot the traces of estimates under three different initial values (see Figure 5). It can be seen that the algorithm converges at about 3000 iterations. To be conservative, we collect 6000 observations after deleting the initial 4000 observations for calculating the estimates and their standard deviations.

Table 5 gives the summary of two estimates of unknown parameters in the two-part model and factor analysis model. Examination of Table 5 shows that most of the estimates for both models are very close, but there exist differences in the estimates of

β_{4}

,

β_{5}

,

β_{7}

,

β_{8}

,

ψ_{2}

,

ψ_{7}

, and

ψ_{8}

. For example, the estimates of

β_{4}

,

β_{5}

, and

β_{7}

under SS are

0.428

,

0.577

, and

0.747

with standard deviations of

0.062

,

0.070

, and

0.072

, respectively, while they equal

0.072

,

0.082

, and

0.092

with standard deviations of

0.07

,

0.081

, and

0.092

under BaLsso. These differences reflect the fact that the two methods impose different penalties on the regression coefficients during variable selection.

To see more clearly, Table 6 gives the resulting selected variables according to SS and BaLsso. It can be seen that (i) for Part One, both methods give the same results for the selection of factors ‘gender’, ‘age’, ‘marital status’, ‘employment’, ‘number of adults’, and ‘family culture’. The two methods favor ‘age’,‘marital status’, and ‘number of adults’ as helpful for improving model fit, while ‘gender’ and ‘family culture’ have less influence on the probability of holding household finance debt. However, there exist contradictory conclusions when selecting ‘health condition’, ‘education’, and ‘income’. (ii) For Part Two, except for the factors ‘age’ and ‘number of adults’, the two methods give the same results. In particular, both methods support that ‘family culture’ is relevant to the amount of household finance debt being held. This fact is also revealed by [18] in an analysis of the CHFS using a two-part nonlinear latent variable model. Further interpretation is omitted in order to save space.

6. Discussion

A two-part latent variable model can be considered to be an extension of a traditional two-part model for situations where the latent variables are included to identify the unobserved heterogeneity of a population resulting from the absence of the observed covariates. When analyzing such a model, an important issue is to determine which factors are relevant to the outcome variable. This is especially true when the number of exogenous factors is high, because usual model selection/comparison procedures are extremely time-consuming. In this paper, we resort to a Bayesian variable selection method and develop a fully Bayesian variable selection procedure for semi-continuous data. Our formulation is along the lines of a spike and slab bimodal prior and recasts the distribution of regression coefficients and factor loadings as a hierarchy of priors over the parameter and model space. The selected variables are identified based on having a high posterior probability of occurrence. We also consider an adaptive Bayesian lasso (BaLsso) for reference. To facilitate computation, we recast the logistic regression model in Part One as a flavor of a normal mixture model by introducing latent Pólya–Gamma variables. This admits the conjugate full-conditional distributions for all regression coefficients, factor loadings, and factor variables.

Although Bayesian variable selection has its unique advantages, there are still some limitations that need to be considered with care. Firstly, its computational complexity is high. Bayes SSL requires Monte Carlo sampling to estimate the posterior distribution, which can lead to slower calculation speed, especially when working with high-dimensional datasets. Secondly, the method is sensitive to hyperparameter and data distribution assumptions. The selection of the hyperparameters of the prior distribution, such as the ratio of spike to slab, lasso penalty parameters, and data distribution assumptions, will have a great impact on the results. When the data do not conform to the model’s convention, the performance of the method is poor. Therefore, these issues need to be carefully considered in real applications to ensure that the Bayesian SS method can be effectively applied to the specific dataset.

The proposed method can be applied to more general latent variable models, including multilevel SEMs [51] and longitudinal dynamic variable models with discrete variables [17,52]. These extensions are left for further study.

Author Contributions

Conceptualization, Q.Z.; methodology, Q.Z., Y.Z. and Y.X.; software, Q.Z. and Y.Z.; validation, Q.Z., Y.Z. and Y.X.; formal analysis, Q.Z., Y.Z. and Y.X.; investigation, Q.Z., Y.Z. and Y.X.; resources, Q.Z. and Y.X.; writing—original draft preparation, Y.Z. and Y.X.; writing—review and editing, Q.Z. and Y.X.; visualization, Y.Z.; supervision, Y.X.; project administration, Q.Z., Y.Z. and Y.X.; funding acquisition, Y.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Nature Science Foundation of China (NNSF 11471161) and the Natural Science Foundation of the Higher Education Institutions of Jiangsu Province (15KJB110010).

Data Availability Statement

All the data included in this study are available upon request from the corresponding author.

Acknowledgments

The authors thank the editor and the two reviewers for many insightful comments and suggestions. The authors thank Xin-Yuan Song, Department of Statistics, The Chinese University of Hong Kong, Hong Kong, for providing us with CHFS data.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

TPM	Two-part model
TPLVM	Two-part latent variable model
SS	Spike and slab bimodal prior
BaLsso	Bayesian lasso
MCMC	Markov chain Monte Carlo
CHFS	Chinese Household Financial Survey

Appendix A

In this section, we present some technical details of the full conditionals for MCMC sampling. For ease of exposition, for any scalar or vector x, we use

p (x | \dots)

to denote the conditional distribution of x given ‘⋯’. Note that under the scenarios SS and BaLsso, the full conditionals of

Ω

,

U^{*}

,

Y^{*}

, and

θ

are exactly the same. The following derivations are mainly based on Bayes theorem.

Full conditional of $p (Ω | \dots)$

It follows from Equations (2), (6) and (9), that

\begin{matrix} p (Ω | \dots) = \prod_{i = 1}^{n} p (ω_{i} | \dots), \end{matrix}

where

\begin{matrix} p (ω_{i} | \dots) \propto p (u_{i}, u_{i}^{*} | ω_{i}, α, β) p (z_{i} | u_{i}, ω_{i}, γ, ψ, σ^{2}) p (y_{i}^{*} | ω_{i}, μ, Λ) p (ω_{i} | Φ) . \end{matrix}

Let

κ_{i}^{*} = κ_{i} - u_{i}^{*} (γ + x_{i}^{T} β_{x})

and

z_{i}^{*} = z_{i} - γ - x_{i}^{T} ψ_{x}

. By some algebra, it can be shown that

\begin{matrix} p (ω_{i} | \dots) \overset{D}{=} N_{m} ({\hat{μ}}_{ω i}, {\hat{Σ}}_{ω i}), \end{matrix}

(A1)

where

\begin{matrix} {\hat{μ}}_{ω i} = {\hat{Σ}}_{ω i} [β_{ω} κ_{i}^{*} + ψ_{ω} u_{i} z_{i}^{*} / σ^{2} + Λ^{T} (y_{i}^{*} - μ)], \\ {\hat{Σ}}_{ω i} = {[β_{ω} β_{ω}^{T} u_{i}^{*} + ψ_{ω} ψ_{ω}^{T} u_{i} / σ^{2} + Λ^{T} Λ + Φ^{- 1}]}^{- 1} . \end{matrix}

Hence, the drawing of

Ω

can be obtained by simulating

ω_{i}

independently from the normal distribution (A1).

2.: Full conditional of $p (U^{*} | \dots)$

Following a similar derivation to that in [42], it can be shown that, given

U

,

Ω

, and

θ

,

U^{*}

is the Pólya–Gamma distribution through exponential tilting of the standard Pólya–Gamma density given by

\begin{matrix} p (U^{*} ∣ \dots) = \prod_{i = 1}^{n} P G (u_{i}^{*} | 1, η_{i}) \end{matrix}

(A2)

where

η_{i} = α + β^{T} w_{i}

. Drawing

u_{i}

from this distribution can be achieved via a rejection sampling method; see [42] or [53] for more details about this issue.

3.: Full conditional of $p (Y^{*} | \dots)$

Note that

\begin{matrix} p (Y^{*} | \dots) \propto p (Y | Y^{*}, δ) p (Y^{*} | Ω, μ, Λ) \\ = & \prod_{i = 1}^{n} \prod_{k = 1}^{p} (\sum_{ℓ = 0}^{c} I {y_{i k} = ℓ, δ_{k ℓ} < y_{i k}^{*} \leq δ_{k ℓ + 1}}) \times \frac{1}{\sqrt{2 π}} exp \{- \frac{1}{2} {(y_{i k}^{*} - μ_{k} - Λ_{k}^{T} ω_{i})}^{2}\} . \end{matrix}

Hence, given

Ω

, the full conditional of

Y^{*}

only depends on

μ

,

Λ

,

Y

, and

Ω

and is given by

\begin{matrix} p (Y^{*} | \dots) = \prod_{i = 1}^{n} \prod_{k = 1}^{p} p (y_{i k}^{*} | ω_{i}, θ, y_{i k}), \\ p (y_{i k}^{*} | ω_{i}, θ, y_{i k}) = N (μ_{k} + Λ_{k}^{T} ω_{i}, 1) I {δ_{k, y_{i k}} < y_{i k}^{*} \leq δ_{k, y_{i k} + 1}} . \end{matrix}

(A3)

This is the truncated normal distribution, and its drawing can be obtained via an inverse distribution sampling method; see, for example, [54].

4.: Full conditional of $p (θ | \dots)$

Recall that

θ

consists of

α

,

β

,

γ

,

ψ

,

σ^{2}

,

μ

,

Λ

,

Φ

, and

δ

. Hence, the drawing of

θ

can be accomplished by (i) drawing

α

from

p (α | \dots)

, (ii) drawing

β

from

p (β | \dots)

, (iii) drawing

γ

from

p (γ | \dots)

, (iv) drawing

(ψ, σ^{2})

from

p (ψ, σ^{2} | \dots)

, (v) drawing

μ

from

p (μ | \dots)

, (vi) drawing

Λ

from

p (α | \dots)

, (vii) drawing

Φ

from

p (Φ | \dots)

, and (viii) drawing

δ

from

p (δ | \dots)

sequentially. Note that given

U^{*}

,

Y^{*}

, and

Ω

, the models (2), (6) and (9), reduce to ordinary regression models and, hence, most of the full conditionals, similar to the regression coefficients and variance/covariance in the Bayesian regression analysis, are standard distributions such as normal, gamma, inverse gamma, and Wishart distributions. As a matter of fact, by some tedious but non-trivial calculations, it can be shown that

\begin{matrix} p (α | \dots) = N ({\hat{μ}}_{β}, {\hat{σ}}_{β}^{2}), p (β | \dots) = N_{q} ({\hat{μ}}_{β}, {\hat{Σ}}_{β}), \end{matrix}

(A4)

\begin{matrix} p (γ | \dots) = N ({\hat{μ}}_{γ}, {\hat{σ}}_{γ}^{2}), p (ψ, σ^{2} | \dots) = I G ({\hat{α}}_{σ}, {\hat{β}}_{σ}) \times N_{q} ({\hat{μ}}_{ψ}, σ^{2} {\hat{Σ}}_{ψ}), \end{matrix}

(A5)

\begin{matrix} p (μ | Ω, Λ, Y^{*}) = N_{p} ({\hat{m}}_{μ}, {\hat{Σ}}_{μ}), p (Λ | \dots) = \prod_{k = 1}^{p} p (Λ_{k} | \dots) = \prod_{k = 1}^{p} N_{m} ({\hat{Λ}}_{k}, {\hat{H}}_{k}), \end{matrix}

(A6)

\begin{matrix} p (Φ^{- 1} | \dots) = W_{m} (ρ + n, \hat{R}), \end{matrix}

(A7)

in which

\begin{matrix} {\hat{μ}}_{α} = {\hat{σ}}_{α}^{2} \sum_{i = 1}^{n} (κ_{i} - u_{i}^{*} β^{T} w_{i}), {\hat{σ}}_{α}^{2} = {(\sum_{i = 1}^{n} u_{i}^{*} + σ_{α 0}^{- 2})}^{- 1}, \\ {\hat{μ}}_{β} = {\hat{Σ}}_{β} \sum_{i = 1}^{n} w_{i} (κ_{i} - α u_{i}^{*}), {\hat{Σ}}_{β}^{- 1} = \sum_{i = 1}^{n} u_{i}^{*} w_{i} w_{i} + d i a g {γ_{β}^{- 2}}, \\ {\hat{μ}}_{γ} = {\hat{σ}}_{γ}^{2} \sum_{i = 1}^{n} u_{i} (z_{i} - ψ^{T} w_{i}) / σ^{2}, {\hat{σ}}_{γ}^{2} = {(\sum_{i = 1}^{n} u_{i} / σ^{2} + σ_{γ 0}^{- 2})}^{- 1}, \\ {\hat{μ}}_{ψ} = {\hat{Σ}}_{ψ} \sum_{i = 1}^{n} w_{i} (z_{i} - γ) u_{i} / σ^{2}, {\hat{Σ}}_{ψ}^{- 1} = \sum_{i = 1}^{n} u_{i} w_{i} w_{i} + d i a g {γ_{ψ}^{- 2}}, \\ {\hat{α}}_{σ} = a_{0} + | I | / 2, \\ {\hat{β}}_{σ} = b_{0} + \frac{1}{2} (\sum_{i = 1}^{n} u_{i} z_{i}^{2} - {\hat{μ}}_{ψ}^{T} {\hat{Σ}}_{ψ}^{- 1} {\hat{μ}}_{ψ} + Λ_{0 k}^{T} H_{0 k}^{- 1} Λ_{0 k}), \\ {\hat{m}}_{μ} = {\hat{Σ}}_{μ} (Σ_{0}^{- 1} μ_{0} + n ({\bar{Y}}^{*} - Λ \bar{Ω})), {\hat{Σ}}_{μ}^{- 1} = n I_{p} + Σ_{0}^{- 1}, \\ {\hat{Λ}}_{k} = {\hat{H}}_{k} (H_{0 k}^{- 1} Λ_{0 k} + Ω^{T} Y_{[k]}^{* *}), {\hat{H}}_{k}^{- 1} = n Φ^{- 1} + Ω^{T} Ω, \\ {\hat{R}}^{- 1} = R_{0}^{- 1} + Ω^{T} Ω, \end{matrix}

where

Y^{* *}

is the

n \times p

matrix with

i^{t h}

row

y_{i}^{* T} - μ^{T}

,

Y_{[k]}^{* *}

is the

k^{t h}

column of

Y^{* *}

, and

Ω

is the

n \times m

matrix with

i^{t h}

row

ω_{i}

;

{\bar{Y}}^{*} = \sum_{i = 1}^{n} y_{i}^{*} / n

,

\bar{Ω} = \sum_{i = 1}^{n} ω_{i} / n

are the sample means of

Y^{*}

and

Ω

, respectively, and

| I |

denotes the size of

I = {u_{i} = 1}

.

However, for

δ

, we note that

\begin{matrix} p (δ | \dots) = \prod_{k = 1}^{p} p (δ_{k} | Y_{[k]}^{*}, Y_{[k]}), and \\ p (δ_{k} | Y_{[k]}^{*}, Y_{[k]}) \propto p (δ_{k}) \prod_{i = 1}^{n} \prod_{ℓ = 0}^{c} I {y_{i k} = ℓ, δ_{k ℓ} < y_{i k}^{*} \leq δ_{k, ℓ + 1}} . \end{matrix}

Hence, drawing of

δ

can be obtained by drawing

δ_{k}

from

p (δ_{k} | \dots)

independently. Moreover, under prior (25), it can be shown that

\begin{matrix} p (δ_{k ℓ} | δ_{k, (- ℓ)}, Y_{[k]}^{*}, Y_{[k]}) \propto p (δ_{k ℓ}, δ_{k, (- ℓ)}) I {max_{y_{i k} = ℓ - 1} {y_{i k}^{*}}, \leq δ_{k ℓ} < min_{y_{i k} = ℓ} {y_{i k}^{*}}}, \end{matrix}

where

δ_{k, (- ℓ)}

is the vector of

δ_{k}

with

δ_{k ℓ}

removed. Let

h_{k, ℓ} = max {δ_{k, ℓ - 1}, {max}_{y_{i k} = ℓ - 1} {y_{i k}^{*}}}

,

g_{k, ℓ} = min {δ_{k, ℓ + 1}, {min}_{y_{i k} = ℓ} {y_{i k}^{*}}}

. It follows from (26) that

\begin{matrix} \frac{F_{0} (δ_{k ℓ}) - F_{0} (δ_{k, ℓ - 1})}{F_{0} (δ_{k, ℓ + 1}) - F_{0} (δ_{k, ℓ - 1})} | δ_{k, - ℓ}, Y_{[k]}^{*}, Y_{[k]} \sim B e t a (η_{k, ℓ}, η_{k, ℓ + 1}) I {(s_{k, ℓ}, t_{k, ℓ})}, \end{matrix}

(A8)

where

\begin{matrix} s_{k, ℓ} = \frac{F_{0} (h_{k, ℓ}) - F_{0} (δ_{k, ℓ - 1})}{F_{0} (δ_{k, ℓ + 1}) - F_{0} (δ_{k, ℓ - 1})}, t_{k, ℓ} = \frac{F_{0} (g_{k, ℓ}) - F_{0} (δ_{k, ℓ - 1})}{F_{0} (δ_{k, ℓ + 1}) - F_{0} (δ_{k, ℓ - 1})} . \end{matrix}

As a result, we can draw

δ_{k ℓ}

by first generating a

δ_{k ℓ}^{*}

from the truncated beta distribution (A8) and then transform it to

δ_{k l}

via an inverse transformation by setting

F_{0}^{- 1} (δ_{k ℓ}^{*} [F_{0} (δ_{k, ℓ + 1}) - F_{0} (δ_{k, ℓ - 1})] + F_{0} (δ_{k, ℓ - 1}))

. A drawing of the truncated beta distribution can be obtained by implementing an inverse distribution sampling method.

5.: Full conditional of $p (Q^{*} | \dots)$

First of all, it is noted that

Q^{*}

consists of

F_{β}

,

F_{ψ}

,

w_{β}

,

w_{ψ}

,

η_{β}^{- 2}

, and

η_{ψ}^{- 2}

under SS, and it is formed by

γ_{β}^{2}

,

γ_{ψ}^{2}

,

λ_{β}^{2}

, and

λ_{ψ}^{2}

under BaLsso. Similar to that of

θ

, we update

Q^{*}

by drawing observations from their full conditionals per component sequentially.

Firstly, it is noted that

\begin{matrix} p (F_{β} | \dots) \propto \prod_{k = 1}^{q} p (β_{k} | f_{β k}, η_{β_{k}}^{2}) p (f_{β_{k}} | w_{β}), \\ p (F_{ψ} | \dots) \propto \prod_{k = 1}^{q} p (ψ_{k} | σ^{2}, f_{ψ k}, η_{ψ_{k}}^{2}) p (f_{ψ_{k}} | w_{ψ}), \end{matrix}

which indicates that the components in the posteriors of

F_{β}

and

F_{ψ}

are independent. Further, it follows easily from (13) that

\begin{matrix} p (f_{β k} | w_{β}, η_{β k}^{2}, β_{k}) = (1 - {\hat{q}}_{β k}) δ_{ν_{β 0}} (\cdot) + {\hat{q}}_{β k} δ_{1} (\cdot), \\ p (f_{ψ k} | w_{ψ}, η_{ψ k}^{2}, β_{k}) = (1 - {\hat{q}}_{ψ k}) δ_{ν_{ψ 0}} (\cdot) + {\hat{q}}_{ψ k} δ_{1} (\cdot), \end{matrix}

where

\begin{matrix} {\hat{q}}_{β k} = \frac{w_{β} ϕ (β_{k} / η_{β k})}{(1 - w_{β}) ϕ (β_{k} / (\sqrt{ν_{β 0}} η_{β k}) / \sqrt{ν_{β 0}} + w_{β} ϕ (β_{k} / η_{β k})}, \\ {\hat{q}}_{ψ k} = \frac{w_{ψ} ϕ (ψ_{k} / (σ η_{ψ k})}{(1 - w_{ψ}) ϕ (ψ_{k} / (σ \sqrt{ν_{ψ 0}} η_{ψ k}) / \sqrt{ν_{ψ 0}} + w_{ψ} ϕ (ψ_{k} / (σ η_{ψ k}))}, \end{matrix}

and

ϕ (\cdot)

is the standard normal probability density function.

Secondly, it is noted that

\begin{matrix} p (w_{β} | F_{β}) & \propto p (w_{β}) p (F_{β} | w_{β}) = p (w_{β}) \prod_{k = 1}^{q} p (f_{β k} | w_{β}) \\ = c w_{β}^{a_{β} - 1} {(1 - w_{β})}^{b_{β} - 1} \prod_{k = 1}^{q} w_{p}^{I {f_{β k} = 1}} {(1 - w_{β})}^{I {f_{β k} = ν_{β 0}}}, \\ p (w_{ψ} | F_{ψ}) & \propto p (w_{ψ}) p (F_{ψ} | w_{ψ}) = p (w_{ψ}) \prod_{k = 1}^{q} p (f_{ψ k} | w_{ψ}) \\ = c w_{p}^{a_{ψ} - 1} {(1 - w_{β})}^{b_{ψ} - 1} \prod_{k = 1}^{q} w_{ψ}^{I {f_{ψ k} = 1}} {(1 - w_{ψ})}^{I {f_{ψ k} = ν_{ψ 0}}} . \end{matrix}

Hence,

\begin{matrix} p (w_{β} | \dots) = B e t a (c_{β 1} + | {f_{β k} = 1} |, c_{β 2} + | {f_{β k} = ν_{β 0}} |), \end{matrix}

(A9)

\begin{matrix} p (w_{ψ} | \dots) = B e t a (c_{ψ 1} + | {f_{ψ k} = 1} |, c_{ψ 2} + | {f_{ψ k} = ν_{ψ 0}} |), \end{matrix}

(A10)

where

| A |

, as before, is the size of set A.

Lastly, it follows from

\begin{matrix} p (η_{β}^{- 2} | F_{β}, β) & \propto p (β | F_{β}, η_{β}^{- 2}) p (η_{β}^{- 2}) \\ = \prod_{k = 1}^{q} {(η_{β k}^{- 2})}^{1 / 2} exp \{- \frac{1}{2} η_{β k}^{- 2} β_{k}^{2} / f_{β k}\} {(η_{β k}^{- 2})}^{a_{β 1} - 1} exp {- a_{β 2} η_{β k}^{- 2}}, \\ p (η_{ψ}^{- 2} | F_{ψ}, ψ) & \propto p (ψ | F_{ψ}, η_{ψ}^{- 2}) p (η_{ψ}^{- 2}) \\ = \prod_{k = 1}^{q} {(η_{ψ k}^{- 2})}^{1 / 2} exp \{- \frac{1}{2} η_{ψ k}^{- 2} ψ_{k}^{2} / f_{ψ k}\} {(η_{ψ k}^{- 2})}^{a_{ψ 1} - 1} exp {- a_{β 2} η_{ψ k}^{- 2}} \end{matrix}

that

\begin{matrix} p (η_{β}^{- 2} | F_{β}, β) = \prod_{k = 1}^{q} p (η_{β k}^{- 2} | f_{β k}, β_{k}) = \prod_{k = 1}^{q} G a ({\hat{τ}}_{β k}, {\hat{ζ}}_{β k}), \\ p (η_{ψ}^{- 2} | F_{ψ}, ψ) = \prod_{k = 1}^{q} p (η_{ψ k}^{- 2} | f_{ψ k}, ψ_{k}) \prod_{k = 1}^{q} G a ({\hat{τ}}_{ψ k}, {\hat{ζ}}_{ψ k}), \end{matrix}

where

\begin{matrix} {\hat{τ}}_{β k} = τ_{β 0} + 1 / 2, {\hat{ζ}}_{β k} = ζ_{β 0} + β_{k}^{2} / (2 f_{β k}), \\ {\hat{τ}}_{ψ k} = τ_{ψ 0} + 1 / 2, {\hat{ζ}}_{ψ k} = ζ_{ψ 2} + ψ_{k}^{2} / (2 f_{ψ k}) . \end{matrix}

For BaLasso, we follow the practice in [38] and can show

\begin{matrix} p (γ_{β}^{- 2} | \dots) = \prod_{k = 1}^{q} p (γ_{β k}^{- 2} | \dots) = \prod_{k = 1}^{q} I G ({\hat{μ}}_{β}, {\hat{λ}}_{β}), \\ p (γ_{ψ}^{- 2} | \dots) = \prod_{k = 1}^{q} p (γ_{ψ k}^{- 2} | \dots) = \prod_{k = 1}^{q} I G ({\hat{μ}}_{ψ}, {\hat{λ}}_{ψ}), \end{matrix}

in which

\begin{matrix} {\hat{μ}}_{β} = \sqrt{{\hat{λ}}_{β} / β_{j}^{2}}, {\hat{λ}}_{β} = λ_{β k}^{2}, \\ {\hat{μ}}_{ψ} = \sqrt{σ^{2} {\hat{λ}}_{ψ} / β_{j}^{2}}, {\hat{λ}}_{ψ} = λ_{ψ k}^{2}, \end{matrix}

where

I G (μ, λ)

is an inverse Gaussian distribution with density

\sqrt{λ / (2 π)} x^{- 3 / 2} exp {- λ {(x - μ)}^{2} / (2 μ^{2} x)} (x > 0)

[55].

Similarly,

\begin{matrix} p (λ_{β}^{2} | \dots) = \prod_{k = 1}^{q} p (λ_{β k}^{2} | \dots) = \prod_{k = 1}^{q} G a ({\hat{a}}_{β k}, {\hat{b}}_{β k}), \\ p (λ_{ψ}^{2} | \dots) = \prod_{k = 1}^{q} p (λ_{ψ k}^{2} | \dots) = \prod_{k = 1}^{q} G a ({\hat{c}}_{ψ k}, {\hat{d}}_{ψ k}), \end{matrix}

in which

\begin{matrix} {\hat{a}}_{β k} = a_{k 0} + 1.0, {\hat{b}}_{β k} = b_{k 0} + 0.5 γ_{β k}^{2}, \\ {\hat{c}}_{ψ k} = c_{k 0} + 1.0, {\hat{d}}_{ψ k} = d_{k 0} + 0.5 γ_{ψ k}^{2} . \end{matrix}

References

Deb, P.; Munkin, M.K.; Trivedic, R.K. Bayesian analysis of the two-part model with endogeneity: Application to health care expenditure. J. Appl. Econ. 2006, 21, 1081–1099. [Google Scholar] [CrossRef]
Cragg, J.G. Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica 1971, 39, 829–844. [Google Scholar] [CrossRef]
Neelon, B.; Zhu, L.; Neelon, S.E.B. Bayesian two-part spatial models for semicontinuous data with application to emergency department expenditures. Biostatistics 2015, 16, 465–479. [Google Scholar] [CrossRef]
Manning, W.G.; Morris, C.N.; Newhouse, J.P.; Orr, L.L.; Duan, N.; Keeler, E.B.; Leibowitz, A. A two-part model of the demand for medical Care: Preliminary results from the health insurance experiment. In Health, Economics, and Health Economics; van der Gaag, J., Perlman, M., Eds.; North-Holland: Amsterdam, The Netherlands, 1991; pp. 103–104. [Google Scholar]
Su, L.; Tom, B.D.; Farewell, V.T. Bias in 2-part mixed models for longitudinal semi-continuous data. Biostatistics 2009, 10, 374–389. [Google Scholar] [CrossRef] [PubMed]
Su, L.; Tom, B.D.; Farewell, V.T. A likelihood-based two-part marginal model for longitudinal semi-continuous data. Stat. Methods Med. Res. 2015, 24, 194–205. [Google Scholar] [CrossRef] [PubMed]
Duan, N.; Manning, W.G.; Morris, C.N.; Newhouse, J.P. A comparison of alternative models for the demand for medical Care. J. Bus. Econ. Stat. 1983, 1, 115–126. [Google Scholar]
Liu, L.; Cowen, M.E.; Strawderman, R.L.; Shih, Y.C.T. A flexible two-part random effects model for correlated medical costs. J. Health Econ. 2010, 29, 110–123. [Google Scholar] [CrossRef] [PubMed]
Smith, V.A.; Neelon, B.; Preisser, J.S.; Maciejewski, L. A marginalized two-part model for semicontinuous data. Stat. Med. 2015, 33, 4891–4903. [Google Scholar] [CrossRef] [PubMed]
Tooze, J.A.; Grunwald, J.K.; Jones, R.H. Analysis of repeated measures data with clumping at zero. Stat. Methods Med. Res. 2002, 11, 341–355. [Google Scholar] [CrossRef] [PubMed]
Brown, R.A.; Monti, P.M.; Myers, M.G.; Martin, R.A.; Rivinus, T.; Dubreuil, M.E.T.; Rohsenow, D.J. Depression among cocaine abusers in treatment: Relation to cocaine and alcohol use and treatment outcome. Am. J. Psychiat. 1998, 155, 220–225. [Google Scholar] [CrossRef]
Olsen, M.K.; Schafer, J.L. A two-part random-effects model for semicontinuous longitudinal data. J. Am. Stat. Assoc. 2001, 96, 730–745. [Google Scholar] [CrossRef]
Xing, D.Y.; Huang, Y.X.; Chen, H.N.; Zhu, Y.L.; Dagen, G.A.; Baldwin, J. Bayesian inference for two-part mixed effects model using skew distributions, with application to longitudinal semi-continuous alcohol data. Stat. Methods Med. Res. 2017, 26, 1838–1853. [Google Scholar] [CrossRef]
Chen, J.Y.; Zheng, L.Y.; Xia, Y.M. Bayesian analysis for two-part latent variable model with application to fractional data. Commun. Stat. Theory Methods, 2023; preprint. [Google Scholar]
Kim, Y.; Muthén, B.O. Two-part factor mixture modeling: Application to an aggressive behavior measurement instrument. Struct. Equ. Model. Multidiscip. J. 2009, 16, 602–624. [Google Scholar] [CrossRef] [PubMed]
Feng, X.; Lu, B.; Song, X.Y.; Ma, S. Financial literacy and household finances: A Bayesian two-part latent variable modeling approach. J. Empir. Financ. 2019, 51, 119–137. [Google Scholar] [CrossRef]
Xia, Y.M.; Tang, N.S. Bayesian analysis for mixture of latent variable hidden Markov models with multivariate longitudinal data. Comput. Stat. Data Anal. 2019, 132, 190–211. [Google Scholar] [CrossRef]
Gou, J.W.; Xia, Y.M.; Jiang, D.P. Bayesian analysis of two-part nonlinear latent variable model: Semiparametric method. Stat. Model 2023, 23, 721–741. [Google Scholar] [CrossRef]
Xiong, S.C.; Xia, Y.M.; Lu, B. Bayesian analysis of two-part latent variable model with mixed data. Commun. Math. Stat. 2023, preprint. [Google Scholar] [CrossRef]
Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 2001, 96, 1348–1360. [Google Scholar] [CrossRef]
Fu, W.J. Penalized regression: The bridge versus the lasso. J. Comput. Graph. Stat. 1998, 7, 109–148. [Google Scholar]
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning; Springer: New York, NY, USA, 2009. [Google Scholar]
Hastie, T.; Tibshirani, R.; Wainwright, M. Statistical Learning with Sparsity—The Lasso and Generalization; CRC Press: New York, NY, USA, 2015. [Google Scholar]
Kuo, L.; Mallick, B.K. Variable selection for regression models. Sankhyā Indian J. Stat. Ser. B 1998, 60, 65–81. [Google Scholar]
Tibshirani, R. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 1996, 58, 267–288. [Google Scholar] [CrossRef]
Zou, H.; Hastie, T. Regularization and variable selection via the elastic net. J. R. Stat. Soc. B Stat. Methodol. 2005, 67, 301–320. [Google Scholar] [CrossRef]
Zou, H. The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 2006, 101, 1418–1429. [Google Scholar] [CrossRef]
Zhang, W.; Ota, T.; Shridhar, V.; Chien, J.; Wu, B.; Kuang, R. Network based survival analysis reveals subnetwork signatures for predicting outcomes of ovarian cancer treatment. PLoS Comput. Biol. 2013, 9, e1002975. [Google Scholar] [CrossRef]
Zhao, Q.; Shi, X.J.; Xie, Y.; Huang, J.; Shia, B.C.; Ma, S. Combining multidimensional genomic measurements for predicting cancer prognosis: Observations from TCGA. Brief. Bioinform. 2015, 16, 291–303. [Google Scholar] [CrossRef]
George, E.I.; McCulloch, R.E. Variable selection via Gibbs sampling. J. Am. Stat. Assoc. 1993, 88, 881–889. [Google Scholar] [CrossRef]
George, E.I.; McCulloch, R.E. Approaches for Bayesian variable selection. Stat. Sin. 1997, 7, 339–373. [Google Scholar]
Chipman, H.A. Bayesian variable selection with related predictors. Canad. J. Statist. 1996, 24, 17–36. [Google Scholar] [CrossRef]
Ishwaran, H.; Rao, J.S. Spike and Slab gene selcetion for multigroup microarray data. J. Am. Stat. Assoc. 2005, 87, 371–390. [Google Scholar]
Ishwaran, H.; Rao, J.S. Spike and Slab variable selection: Frequentist and Bayesian strageies. Ann. Stat. 2005, 33, 730–773. [Google Scholar] [CrossRef]
Mitchell, T.J.; Beauchamp, J.J. Bayesian variable selection in linear regression. J. Am. Stat. Assoc. 1988, 83, 1023–1032. [Google Scholar] [CrossRef]
Rockova, V.; George, E.I. EMVS: The EM approach to Bayesian variable selection. J. Am. Stat. Assoc. 2014, 109, 828–846. [Google Scholar] [CrossRef]
Tang, Z.X.; Shen, Y.P.; Zhang, X.Y.; Yi, N.J. The Spike-and-Slab Lasso generalized linear modelsfor prediction and associated genes detection. Genetics 2017, 205, 77–88. [Google Scholar] [CrossRef]
Park, T.; Casella, G. The Bayesian Lasso. J. Am. Stat. Assoc. 2008, 103, 681–686. [Google Scholar] [CrossRef]
Skrondal, A.; Rabe-Hesketh, S. Generalized Latent Variable Modelling: Multilevel, Longitudinal and Structural Equation Models; Chapman & Hall/CRC: London, UK, 2004. [Google Scholar]
Bollen, K.A. Structural Equations with Latent Variables; John Wiley & Sons: New York, NY, USA, 1989. [Google Scholar]
Lee, S.Y. Structural Equation Modeling: A Bayesian Approach; John Wiley & Sons: New York, NY, USA, 2007. [Google Scholar]
Polson, N.G.; Scott, J.G.; Windle, J. Bayesian inference for logistic models using Polya-Gamma latent variables. J. Am. Stat. Assoc. 2013, 108, 1339–1349. [Google Scholar] [CrossRef]
Feng, X.; Wang, Y.F.; Lu, B.; Song, X.Y. Bayesian regularized quantile structural equation models. J. Multivar. Anal. 2017, 154, 234–248. [Google Scholar] [CrossRef]
Anderson, T.W. An Introduction to Multivariate Statistical Analysis; John Wiley & Sons: New York, NY, USA, 1984. [Google Scholar]
Sha, N.J.; Dechi, B.O. A Bayes inference for ordinal response with latent variable approach. Stats 2019, 2, 321–331. [Google Scholar] [CrossRef]
Tanner, M.A.; Wong, W.H. The calculation of posterior distributions by data augmentation (with discussion). J. Am. Stat. Assoc. 1987, 82, 528–550. [Google Scholar] [CrossRef]
Gelfand, A.E.; Smith, A.F.M. Sampling-based approaches to calculating marginal densities. J. Am. Stat. Assoc. 1990, 85, 398–409. [Google Scholar] [CrossRef]
Geman, S.; Geman, D. Stochastic relaxation, Gibbs distribution, and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 1984, PAMI-6, 721–741. [Google Scholar] [CrossRef]
Gelman, A.; Rubin, D.B. Inference from iterative simulation using multiple sequences (with discussion). Stat. Sci. 1992, 7, 457–511. [Google Scholar] [CrossRef]
Little, R.J.A.; Rubin, D.B. Statistical Analysis with Missing Data; John Wiley & Sons: New York, NY, USA, 2002. [Google Scholar]
Song, X.Y.; Lee, S.Y. A tutorial on the Bayesian approach for analyzing structural equation models. J. Math. Psychol. 2012, 56, 135–148. [Google Scholar] [CrossRef]
Song, X.Y.; Xia, Y.M.; Zhu, H.T. Hidden Markov latent variable models with multivariate longitudinal data. Biometrics 2017, 73, 313–323. [Google Scholar] [CrossRef] [PubMed]
Devroye, L. Non-Uniform Random Variate Generation; Springer: New York, NY, USA, 1986. [Google Scholar]
Ross, S.M. A Course in Simulation; MacMillan: New York, NY, USA, 1991. [Google Scholar]
Chhikara, R.S.; Folks, L. The Inverse Gaussian Distribution: Theory, Methodology, and Applications; Marcel Dekker: New York, NY, USA, 1989. [Google Scholar]

Figure 1. Plot of the densities of Laplace distribution for different choices of

λ

.

Figure 1. Plot of the densities of Laplace distribution for different choices of

λ

.

Figure 2. Plot of the values of EPSR of unknown parameters under three different starting values in which the colored solid lines represent the trajectories of EPSR of estimates against the number of iteraitons: simulation study and

n = 400

.

Figure 2. Plot of the values of EPSR of unknown parameters under three different starting values in which the colored solid lines represent the trajectories of EPSR of estimates against the number of iteraitons: simulation study and

n = 400

.

Figure 3. Plots of the correct rates of the selected variables under three scenarios: simulation study and

n = 1000

.

Figure 3. Plots of the correct rates of the selected variables under three scenarios: simulation study and

n = 1000

.

Figure 4. Histograms of DEB and the logarithms of their positive values: Chinese Household Financial Survey data. (Left) panel corresponds to DEB and (right) panel corresponds to

log (DEB | DEB > 0)

.

Figure 4. Histograms of DEB and the logarithms of their positive values: Chinese Household Financial Survey data. (Left) panel corresponds to DEB and (right) panel corresponds to

log (DEB | DEB > 0)

.

Figure 5. Trace plots of the estimates of unknown parameters against the number of iterations under SS prior, in which the colored solid lines repsent the traces of estimates under three starting values: CHFS data.

Table 1. Summary of the estimates of unknown parameters under SS and BaLsso: simulation study and

n = 400

.

Table 1. Summary of the estimates of unknown parameters under SS and BaLsso: simulation study and

n = 400

.

	SS			BaLsso
PAR	BIAS	RMS	SD	BIAS	RMS	SD
$α = 0.7$	−0.015	0.097	0.129	0.028	0.150	0.134
$β_{1} = 0.7$	−0.056	0.143	0.142	−0.152	0.217	0.136
$β_{2} = 0.0$	−0.001	0.021	0.061	−0.019	0.042	0.079
$β_{3} = 0.7$	−0.144	0.216	0.145	−0.122	0.251	0.148
$β_{4} = 0.0$	0.005	0.030	0.064	−0.008	0.040	0.078
$β_{5} = 0.7$	−0.091	0.147	0.137	−0.045	0.135	0.137
$β_{6} = 0.0$	0.017	0.028	0.075	0.026	0.055	0.096
$β_{7} = 0.8$	−0.187	0.237	0.184	−0.126	0.209	0.184
$γ = 0.7$	0.010	0.079	0.084	0.008	0.063	0.085
$ψ_{1} = 0.7$	−0.035	0.079	0.077	−0.011	0.065	0.074
$ψ_{2} = 0.0$	0.005	0.032	0.051	−0.018	0.031	0.054
$ψ_{3} = 0.7$	−0.007	0.061	0.070	−0.021	0.085	0.069
$ψ_{4} = 0.0$	−0.007	0.029	0.049	−0.003	0.031	0.053
$ψ_{5} = 0.7$	−0.070	0.093	0.077	−0.018	0.082	0.075
$ψ_{6} = 0.8$	−0.040	0.086	0.089	−0.020	0.069	0.088
$ψ_{7} = 0.0$	−0.011	0.033	0.062	0.014	0.036	0.069
$σ^{2} = 1.0$	0.085	0.129	0.117	0.038	0.082	0.111
$λ_{21} = 0.8$	0.042	0.078	0.073	0.058	0.098	0.071
$λ_{31} = 0.8$	0.030	0.072	0.071	0.034	0.063	0.072
$λ_{52} = 0.8$	0.058	0.079	0.072	0.052	0.090	0.073
$λ_{62} = 0.8$	0.031	0.060	0.072	0.037	0.064	0.073
$Φ_{12} = 0.3$	0.014	0.041	0.074	0.018	0.058	0.076
Total	-	1.870	1.975	-	2.016	2.035

Table 2. Summary of the estimates of unknown parameters under SS and BaLsso: simulation study and

n = 1000

.

Table 2. Summary of the estimates of unknown parameters under SS and BaLsso: simulation study and

n = 1000

.

	SS			BaLsso
PAR	BIAS	RMS	SD	BIAS	RMS	SD
$α = 0.7$	0.052	0.096	0.087	0.009	0.092	0.087
$β_{1} = 0.7$	0.005	0.069	0.089	0.055	0.117	0.090
$β_{2} = 0.0$	0.003	0.048	0.058	0.032	0.052	0.060
$β_{3} = 0.7$	0.007	0.086	0.093	−0.045	0.076	0.091
$β_{4} = 0.0$	0.004	0.015	0.049	−0.020	0.043	0.060
$β_{5} = 0.7$	0.010	0.071	0.086	0.013	0.074	0.085
$β_{6} = 0.0$	−0.003	0.029	0.059	0.032	0.064	0.077
$β_{7} = 0.8$	0.002	0.102	0.120	−0.042	0.108	0.114
$γ = 0.7$	0.017	0.042	0.053	0.030	0.056	0.054
$ψ_{1} = 0.7$	−0.023	0.038	0.046	−0.016	0.039	0.047
$ψ_{2} = 0.0$	−0.007	0.019	0.033	−0.005	0.018	0.037
$ψ_{3} = 0.7$	−0.028	0.060	0.042	−0.014	0.026	0.043
$ψ_{4} = 0.0$	−0.007	0.023	0.033	0.000	0.018	0.036
$ψ_{5} = 0.7$	−0.005	0.035	0.046	0.003	0.043	0.047
$ψ_{6} = 0.8$	−0.031	0.058	0.053	−0.039	0.063	0.054
$ψ_{7} = 0.0$	−0.001	0.031	0.045	−0.025	0.081	0.053
$σ^{2} = 1.0$	0.018	0.049	0.068	0.041	0.053	0.071
$λ_{21} = 0.8$	0.021	0.041	0.045	0.033	0.038	0.045
$λ_{31} = 0.8$	0.016	0.049	0.045	0.028	0.038	0.045
$λ_{52} = 0.8$	0.032	0.049	0.045	0.054	0.057	0.045
$λ_{62} = 0.8$	0.043	0.059	0.046	0.043	0.054	0.046
$Φ_{12}$	0.016	0.043	0.049	0.005	0.037	0.048
Total	-	1.112	1.290	-	1.247	1.335

Table 3. Number of correctly selected variables in the two-part model on the simulated datasets.

	SS			BaLsso
PAR	$ρ = 0.1$	$ρ = 0.5$	$ρ = 0.8$	$ρ = 0.1$	$ρ = 0.5$	$ρ = 0.8$
$β_{1} = 1.0$	100	100	100	100	100	100
$β_{2} = 0.0$	98	96	85	88	86	76
$β_{3} = 1.0$	100	100	100	100	100	100
$β_{4} = 0.0$	96	95	86	93	93	85
$β_{5} = 1.0$	100	100	100	100	100	100
$β_{6} = 0.0$	96	94	93	97	92	87
$β_{7} = 1.0$	100	100	100	100	100	100
$ψ_{1} = 1.0$	99	100	100	100	100	100
$ψ_{2} = 0.0$	100	99	95	100	98	93
$ψ_{3} = 1.0$	100	100	100	100	100	100
$ψ_{4} = 0.0$	100	100	97	98	100	91
$ψ_{5} = 1.0$	100	100	100	100	100	100
$ψ_{6} = 1.0$	100	100	100	100	100	100
$ψ_{7} = 0.0$	100	98	97	97	96	96

Table 4. Descriptive statistics of explanatory variables: CHFS data.

Variable	Description	Mean	Max	Min	SD
Gender ( $x_{1}$ )	=1, male; =0, otherwise	0.756	1	0	0.430
Age ( $x_{2}$ )		51.81	91	19	14.931
Marital status ( $x_{3}$ )	=1, married; 0, otherwise	0.863	1	0	0.344
Health condition ( $x_{4})$	=1, good; 0, otherwise	0.833	1	0	0.373
Educational experience $(x_{5})$	=1, high school or above;
	=0, otherwise	0.352	1	0	0.478
Employment ( $x_{6}$ )	=1, yes; 0, otherwise	0.092	1	0	0.290
No. of adults ( $x_{7}$ )		3.002	3	0	1.301
Annual Income (CYN) $(x_{8})$ *		${9.376}^{4}$	${8.060}^{5}$	0	${4.249}^{4}$

* Note: Superscripts are used to indicate values raised to the power of 10 (thus,

a^{b} = a \times 10^{b}

). The measurement is taken as the middle value of the range in the questionnaire.

Table 5. Estimates and standard deviation estimates of unknown parameters under SS and BaLsso: CHFS data.

	SS		BaLsso		SS		BaLsso
Par	Est.	SD	Est.	SD	Par	Est.	SD	Est.	SD
$α$	−0.835	0.078	−0.838	0.080	$γ$	9.782	0.152	9.670	0.125
$β_{1}$	0.050	0.063	0.076	0.070	$ψ_{1}$	−0.137	0.103	−0.107	0.088
$β_{2}$	−0.750	0.099	−0.757	0.102	$ψ_{2}$	−0.147	0.141	−0.015	0.081
$β_{3}$	0.107	0.085	0.147	0.088	$ψ_{3}$	−0.022	0.065	−0.006	0.075
$β_{4}$	0.428	0.062	0.072	0.070	$ψ_{4}$	−0.019	0.060	−0.029	0.069
$β_{5}$	0.577	0.070	0.082	0.081	$ψ_{5}$	0.259	0.123	0.322	0.107
$β_{6}$	0.004	0.040	0.005	0.052	$ψ_{6}$	0.035	0.058	0.053	0.067
$β_{7}$	0.118	0.079	0.130	0.079	$ψ_{7}$	0.043	0.072	0.281	0.113
$β_{8}$	0.747	0.073	0.092	0.077	$ψ_{8}$	0.384	0.132	0.188	0.118
$β_{η}$	−0.059	0.112	−0.039	0.092	$ψ_{η}$	1.205	0.106	1.910	0.104
$σ^{2}$	0.312	0.150	0.300	0.152
$λ_{21}$	−0.791	0.062	−0.714	0.057
$λ_{31}$	−0.865	0.067	−0.625	0.068

Table 6. The selected variables in the CHFS data—0: excluded and 1: included.

	Part One		Part Two
VAR	SS	BaLsso	SS	BaLsso
Gender	0	0	1	1
Age	1	1	1	0
Marital status	1	1	0	0
Health condition	1	0	0	0
Education	1	0	1	1
Employment	0	0	0	0
No. of adults	1	1	0	1
Income	1	0	1	1
Family culture	0	0	1	1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Q.; Zhang, Y.; Xia, Y. Bayesian Feature Extraction for Two-Part Latent Variable Model with Polytomous Manifestations. Mathematics 2024, 12, 783. https://doi.org/10.3390/math12050783

AMA Style

Zhang Q, Zhang Y, Xia Y. Bayesian Feature Extraction for Two-Part Latent Variable Model with Polytomous Manifestations. Mathematics. 2024; 12(5):783. https://doi.org/10.3390/math12050783

Chicago/Turabian Style

Zhang, Qi, Yihui Zhang, and Yemao Xia. 2024. "Bayesian Feature Extraction for Two-Part Latent Variable Model with Polytomous Manifestations" Mathematics 12, no. 5: 783. https://doi.org/10.3390/math12050783

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bayesian Feature Extraction for Two-Part Latent Variable Model with Polytomous Manifestations

Abstract

1. Introduction

2. Model Description

2.1. Two-Part Latent Variable Model

2.2. Bayesian Feature Selection

3. Bayesian Inference

3.1. Prior Specification and MCMC Sampling

3.2. MCMC Sampling

4. Simulation Study

5. Chinese Household Financial Survey Data

6. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI