1. Introduction
A hidden truncation model, also known as a selective reporting model or latent variable model, is a mathematical model that describes observed variables truncated with respect to some hidden covariable(s). An example of hidden truncation provided by Arnold and Beaver [
1] is that the observed variable is the waist size for uniforms of elite troops, and the hidden covariable is the height of the troops. The elite troops are selected only if they meet a specific minimum height requirement. Therefore, the variable waist size will not be observed unless the troop meets the minimum height requirement. Hidden truncation models have been used in studying personal income data when an individual’s income is either not always correctly specified or may be unreported. Different hidden truncation models and their inference are developed mostly using the classical approach. If the data are subject to hidden truncation, then it is natural to expect that the behavior of the random phenomenon differs from a non-truncated model. Note that the notion of the mixture model is related to the hidden truncated model. Arnold and Gomez [
2] showed that the two-parameter skew–normal distribution can be viewed as having arisen from a standard normal density by hidden truncation or an additive component construction, and they conjectured that this apparent relation between hidden truncation and mixture models only occurs in the normal case. However, the hidden truncation and mixture models are slightly different, as in a mixture model, there is no such hidden covariable subject to truncation. Sometimes, through a natural mechanism or otherwise, a dataset has already been subject to one of the types of hidden truncation, such as the Quasar data as originally and independently studied in Efron and Petrosian [
3].
A non-exhaustive list of the references on hidden truncation models is as follows. One may cite the seminal paper by Azzalini [
4] to propel research endeavors in this area. It is conjectured and supported by appropriate real-life scenarios, including but not limited to income modeling, astronomical data, survival analysis, etc., such that hidden truncation models provide flexible alternatives to the somewhat constrained and elliptically contoured distributions for modeling bivariate/multivariate or matrix-variate data in higher dimensions. Arnold [
5] developed and studied several univariate, bivariate, and multivariate parametric families of distributions based on hidden truncation. Arnold and Beaver [
1] suggested that a hidden truncation model involves a plethora of model parameters. Subsequently, it almost always involves a ubiquitous normalizing constant, making inferences under a frequentist framework virtually impossible. This is mainly because parameter estimates in conjunction with such models will not be available in closed forms or at least analytically intractable, and quite often, they have constraints. Furthermore, using a frequentist approach, non-normal models involving a two-sided truncation would require special attention when exploring inferential aspects. However, efforts have been made to estimate the model parameters for hidden truncation models under the frequentist approach with some limitations. For example, Ghosh and Nadarajah [
6] discussed the inference under the classical approach of a hidden truncated Pareto (type-II) model starting from a bivariate Pareto (type-II) model. Hidden truncation in bivariate and multivariate Pareto data has been developed and applied to income modeling (see the references in [
7]). Zaninetti [
8] found that a left-truncated beta distribution better fits the initial mass function for stars compared to the lognormal distribution, which has commonly been used in astrophysics. Kotz et al. [
9] (Chapter 44) also discussed the details of the formation of distributions through truncation.
Under the Bayesian paradigm, one-sided hidden truncated Pareto (type-II and type-IV) has been discussed by Ghosh [
10,
11], but not much discussion has been made on the applicability as well as the efficacy of the proposed Bayesian methodology in terms of prior choices, including the choice of hyperparameters, among others. Noticeably, nonparametric methods for estimation corresponding to two-sided truncated data have been developed by Efron and Petrosian [
3]. This serves as a major motivation for the current research work.
In this paper, we explore the estimation of model parameters for a two-sided hidden truncated model, assuming that the data come from a bivariate exponential distribution defined by Arnold and Strauss [
12]. Since the exponential distribution is an important probability model for studies involving non-negative random variables, such as the frailty models, this probability model is selected to utilize the proposed Bayesian estimation strategy with the hope that the proposed technique can be well adopted for several other non-normal models. For a detailed study on the genesis of the construction of hidden truncated non-normal models, readers may refer to [
7] and the references therein. Next, we delve into the hidden truncation models and discuss the reason to start with a bivariate exponential-type model. Hidden truncation involves ubiquitous normalizing constants followed by the truncation parameter(s) masking within other parameters embedded in such a way that there is an estimate for the composite function, say
, in which
involves not only the truncation parameter but also location and scale parameters of the conditioning variable. Moving away from the normal distribution under the hidden truncation model will increase the difficulty in model fitting and related statistical inference. In search for a simpler non-normal model involving a two-sided hidden truncation, the bivariate exponential distribution proposed by Arnold and Strauss [
12] is selected for illustrative purposes. As the statistical parameter estimation of a hidden truncation model is challenging, especially when there is a lack of information on the truncation limits in the model, we aim to provide a feasible approach based on the Bayesian paradigm that can be used in practical situations.
The rest of the paper is organized as follows. In
Section 2, the mathematical derivation for a basic two-component model of hidden truncated distributions and the mathematical derivation of a two-sided hidden truncated bivariate exponential (HTBEXP, in short) model are provided.
Section 3 provides the details of the Bayesian inference adopted in this paper for the HTBEXP model developed in
Section 2 under both an informative and non-informative and improper priors set-up.
Section 4 explores the Bayes factor to deal with model selection procedures out of all possible candidate models. In
Section 5, a Monte Carlo simulation study under the Bayesian paradigm is used to evaluate the performance of the proposed Bayesian estimation method. For illustrative purposes, a real-life dataset is re-analyzed in
Section 6 to exhibit the efficacy of the proposed estimation strategy. Finally, some concluding remarks are presented in
Section 7.
2. Hidden Truncation in Arnold–Strauss Bivariate Exponential Model
Let
be a two-dimensional absolutely continuous random vector. Consider the conditional distribution of
X given
, where
M is a Borel set in
. In a hidden truncation model, variable
X is the variable of interest, and variable
Y is a related unobservable variable that may be related to
X. Let
be the joint probability density function (PDF) of the random vector
, and let
and
be the marginal PDFs of random variables
X and
Y, respectively. The conditional PDF of
X given
can be expressed as
The following three forms of hidden truncation models can be considered:
- (i)
Truncation from below (also known as lower truncation): , where is the lower truncation point;
- (ii)
Truncation from above (also known as upper truncation): , where is the upper truncation point;
- (iii)
Two-sided truncation: , where and are the lower and upper truncation points, respectively, with .
For a two-sided hidden truncation, observations are only available for
X’s whose corresponding concomitant variable
Y is in
, for
. Hence, Equation (
1) reduces to
This type of hidden truncation model is characterized as follows:
The underlying marginal distribution of X can be specified by the PDF ;
The conditional distribution of Y given can be specified by the conditional PDF ;
The specified values of truncation points are denoted by and ;
There could be some other model parameters in addition to and .
For the conditional PDF in Equation (
2), if we consider
and
, the PDF reduces to the unconditional marginal PDF of
X. The shape of the conditional PDF will be more sensitive for smaller values of the lower truncation point
compared to small or large values of the upper truncation point
. From the outset, the hidden truncation from below will not augment the original model since the resulting density will again be a member of the same family of distributions with only a reparametrization of the parent model. Consequently, we focus primarily on the hidden truncation from both sides in the more general framework, as the hidden truncation from below and/or from above will be a particular case of two-sided hidden truncation.
In this paper, for the applications in situations with random variables having non-negative supports (e.g., the survival and reliability analyses), a two-dimensional random vector
with non-negative coordinates that follows the Arnold–Strauss bivariate exponential (ASBE) distribution is considered. The joint PDF of the random vector
of ASBE distribution is
where
K is the normalizing constant defined as
Here,
is the exponential integral function defined by
. The associated marginal PDF of
X will be
and the marginal density of
Y will be
Therefore, the normalizing constant
K can also be defined as
Since
for each fixed
is monotonically decreasing in
x for all possible values of
y, the joint density in Equation (
3) is negative quadrant dependent, and hence it has a negative correlation coefficient. In the context of hidden truncation, we assume that the random variable of interest
X and the hidden covariable
Y are dependent non-negative random variables that follow the ASBE distribution.
Note that the conditional PDF of
Y given
is
Next, a two-sided hidden truncation for the random variable
Y, say
, where
is considered. We can obtain
and
Therefore, the conditional PDF of
X given
, can be expressed as
where
is a constant depending only on the parameters and not the random variable
X.
Figure 1 presents the hidden truncated PDFs of
X given
for different values of the parameters
a,
b,
c,
, and
based on the ASBE distribution along with the non-truncated exponential distribution.
From
Figure 1, the hidden truncated PDFs are always right-skewed with different intensities for given choices of
a,
b, and
c along with the truncated parameters
. In
Section 3, we discuss in detail the Bayesian inference of the density given in Equation (
7).
4. Selection between Different Hidden Truncation Models
In this section, we consider the following four models for the hidden truncated bivariate exponential distribution:
: No hidden truncated model, i.e., and .
: A truncated from below model, i.e., .
: A truncate from above model, i.e., .
: A truncate from both sides model, i.e., and .
Let
denote the parameter space for model
(
). Hence, we have
,
,
, and
. Based on the observed values
from each of models
,
, and
, the likelihood functions under these three models are given respectively by
where
is the same as
in Equation (
4) of the current text, and
For model
, the likelihood function
is presented in Equation (
9).
Remark 3. Regarding prior specifications for models , , and , we use the same prior distributions utilized in the full model by deleting irrelevant parameters to estimate the corresponding parameters. Subsequently, we could obtain MCMC outputs in the three models that can also approximate marginal distributions.
To choose the most plausible model based on the observed data, we consider using the Bayesian approach based on the Bayes factor. Suppose that there are
q different models, namely,
, any of which could be meaningful, and they contend with each other for model selection. If model
holds true, the data
follow a parametric distribution with PDF
, where
is an unknown parameter vector for
. Let
be the parameter space for the parameter vector
, where
may or may not be nested. Bayesian model selection proceeds by choosing a prior PDF
for
under
, and a prior model probability
of
being true for
. The posterior model of the probability that
is true can be obtained as (for pertinent details, see, [
17])
where the Bayes factor
of model
to
is defined by
where
is called the marginal or predictive density of
x under
. Subsequently, the model with the largest posterior probability in Equation (
19) becomes the most plausible model. It is customary to use vague model probabilities, i.e.,
for
. Then, Equation (
19) becomes
. Clearly,
Meanwhile, we use the method proposed by Newton and Raftery [
18] to evaluate the marginals of the four models. The procedure is described as follows. Observe that the posterior distribution can be represented as
under model
for
. For notational simplicity, let
under the full model
. Now, let
be
G, drawing from the posterior density obtained by conventional MCMC methods such as the Gibbs sampler. Newton and Raftery [
18] suggested that the marginal can be estimated as
Note that the estimate in Equation (
22) is the harmonic mean of the likelihood evaluated at MCMC samples.
5. Monte Carlo Simulation Studies
In this section, Monte Carlo simulation studies are used to evaluate the Bayesian parameter estimation and model selection procedures proposed in this paper for the hidden truncation models. First, a Monte Carlo simulation study is used to evaluate the performance of Bayes estimates for the model parameters of the probability distribution given in Equation (
7). For illustrative purposes, we consider the true values of the model parameters to be
and three different sample sizes
n = 100, 200, and 300. For each simulated dataset, the posterior means, posterior medians, and the associated 95% highest posterior density (HPD) credible intervals of the model parameters are computed. After collecting the posterior estimates based on a total of 200 replications, the performance of the point estimates is evaluated based on the average posterior mean, coverage probability (CP), and average width (AW) of the 95%. The total and burn-in MCMC iterations of the Bayesian estimation procedure are set to be 20,000 and 5000, respectively. Regarding the hyperparameters of the prior distributions on
a,
b, and
c, we set
,
, and
, in order to obtain the mean of each prior distribution to be the same as the true value and the variances as one for the three prior gamma distributions. For the hyperparameter(s) of the distribution on
, we set
. Moreover, regarding the initial variances in Equation (
15) of the zero-truncated normal distributions, we set a standard deviation of two for each of
a,
b, and
c.
Table 1 provides the average posterior means, the mean square errors (MSEs) of the posterior means, the average posterior variances, the CPs, and AWs for the model parameters based on 200 simulations.
From
Table 1, we observe that all the CPs are 100%, which indicates that the standard errors of the estimates are relatively large. The posterior means of
b,
c,
, and
are close to the true values for decently large sample sizes. However, the posterior means for
a are overestimated with large average widths for the HPD intervals. We also observe that there are no dramatic improvements in parameter estimation accuracy when the sample size increases. Since we are dealing with subjective prior choices, there are infinitely many combinations on the hyperparameters, and the performance of the Bayesian estimation procedure depends on the choice of the hyperparameters. For instance, based on our preliminary study (results are not shown here), the performance can be poor when prior distributions with large prior variances are used. Although the choices of hyperparameter values used in this simulation study may not be optimal, these choices of hyperparameter values provide satisfactory results among the choices we examined.
In the second Monte Carlo simulation study, the performance of the model selection procedure is evaluated when the four models
–
are considered candidate models. In this simulation study, data are generated from the four models
–
, and the model selection method based on Bayes factors presented in
Section 4 is applied to each simulated dataset. Then, we obtain the proportions to determine which model has the largest posterior model probability. Recall that the posterior probabilities can be computed by Equation (
19) in conjunction with Equation (
22). For comparative purposes, we compute the average posterior probabilities in two ways by assuming the equal prior model probability of
for each model
–
:
- I.
The average of posterior probabilities, no matter which model is selected out of 500 replications (denoted as Post. Prob. I in
Table 2);
- II.
The average of posterior probabilities is based only on the samples possessing the highest posterior probability in each replication (denoted as Post. Prob. I in
Table 2). For example, the “Post. Prob. II” when the true model is
in
Table 2, 0.265 for contending model
is computed based on 125 samples that model
is selected.
Based on 500 simulated samples from each model, the above two types of posterior probabilities, the frequency and proportion of selecting a particular model, and the average posterior means (medians) based only on selected models for each model parameter are presented in
Table 2. The numerical summaries in
Table 2 with the same prior specifications and iterations are in accordance with the adopted MCMC procedures described in
Section 3.2.
From
Table 2, we observe that the parameter estimates based on either the posterior mean or the posterior median of the four models yield decently reasonable results for all cases considered. To select the most plausible model, although the procedure based on the Bayes factor (posterior probability) does not capture the true model in all the cases, the procedure based on the Bayes factor tends to select model
, which is the most general hidden truncation model among the four models with the largest number of parameters.
Since the number of parameters models
–
are different, instead of evaluating the performance based on the parameter estimates, we consider the estimation of the conditional mean
, where
and
are pre-specified values based on the model and the values of
and
(e.g., for model
,
and
), which is a function of the model parameters. Specifically, for given values of
and
, the conditional expectation is
where
Let
R be the total number of simulations (i.e.,
in this study) and
be the number of samples that model
is selected with the largest posterior model probability (
), where
. Suppose
is the estimate of
based on the
j-th sample, where model
is selected as the most plausible model for
,
The MSEs based on the simulated data corresponding to selected model
, denoted as
, can be calculated as
To illustrate the effect on the performance of estimating the conditional expectation with the model selection procedure, the MSE with the model selection procedure is calculated as
The simulated values of
(
) and
for estimating the conditional expectation associated with the model selection perspectives are presented in
Table 3. Note that the proportions of selecting the correct models with the highest posterior probability are all the same as in
Table 2. As expected, the value of
is the smallest if
is the true model in most cases. The simulation results in
Table 3 show that the model selection procedure can reduce the risk of mis-specifying the underlying model. On the other hand, we observe that the values of
and
are similar, while the values of
and
are similar. Note that models
and
assume
, while
and
assume
with an unknown value. This observation suggests that determining whether
may provide further information to improve the estimation of the conditional expectation.
6. Illustrative Example
In this section, the proposed methodologies in this paper are illustrated using a real dataset comprised of independently collected quadruplets of the redshift and the apparent magnitude of a quasar object previously analyzed by Efron and Petrosian [
3]. The dataset is available in the R package DTDA version 3.0.1 [
19] named
Quasars. The original data consist of
observations with the following variables
for
, where
denotes the redshift of the
i-th quasar,
denotes its apparent magnitude, and the two values
and
indicate lower and upper truncation bounds corresponding to the apparent magnitude, respectively. The variable of interest is the transformed logarithm of luminosity values provided in the first column of the dataset
Quasars, i.e.,
, where
t is a transformation that depends on the cosmological model assumed (see [
20,
21] for a detailed description). The anti-logarithm transformation
is considered such that the support of the random variable
is
. We conjecture that the dataset has been subjected to a two-sided truncation and the PDF in Equation (
7) would be appropriate to model this dataset, where the observation
is subjected to a two-sided truncation by other covariates, and the same truncation points apply for
,
, i.e.,
and
.
Based on the model in Equation (
7), and using the Bayesian methodologies proposed in this paper, the results presented in
Table 4 are obtained. Regarding the hyperparameters of the prior distributions on
a,
b, and
c, we set
for
because we have no information on these parameters. The same value of
as in the Monte Carlo simulation studies is used. In
Table 4, we report the posterior probabilities for each of the four models
–
, and the posterior means and medians (presented in the parenthesis) along with the corresponding 95% HPD intervals. From a model selection perspective, model
is selected with the largest posterior probability of
, assuming equal prior model probability for all four models. We notice that
has the second largest posterior probability of
, which agrees with our observation of the similarity of models
and
in the simulation results. Regarding the posterior inference, we observe that the estimates of
a,
b, and
c have some patterns. The estimates under models
and
are close to each other, while those values are close to each other when models
and
are applied. Thus, the HPD intervals are changed accordingly.
Figure 2 shows the trace plots for all parameters when the four models are adapted, respectively. Overall, the plots for
a,
b, and
c are decently well behaved, while there are some fluctuations in the plots of
and
.
We want to highlight the fact that the parameter estimates/values of parameter
and
(i.e., the last two columns) in
Table 4 and for the Model
having the maximum posterior probability confirm the fact that the data (alias the main study variable,
) have been subject to a one-sided upper truncation, as the value of
In this data application, we have explored all possible scenarios of hidden truncation (including zero truncation) and then adopted strategy searches for the best-case scenario under the Bayesian paradigm.
7. Concluding Remarks
In this article, we explore the features of a two-sided hidden truncated model in terms of estimation under the Bayesian paradigm and, most importantly, detecting with a desired level of accuracy whether or not the data have been subject to hidden truncation starting with a simple model in two dimensions, namely, the bivariate exponential distribution proposed by Arnold and Strauss [
12]. Indeed, there are several other versions of the classical bivariate exponential distribution, but they may have a singular part (i.e., if two random variables
and
follow a bivariate distribution, there is a positive probability that
) [
22] (see, for example, [
23]), which makes it more difficult to consider from real-world and mathematical perspectives. For this reason, we resort to a simple model that is an absolutely continuous statistical distribution, develop the corresponding hidden-truncation model, and provide some feasible model-fitting methodologies that can be used in practice. Consequently, inferential aspects under the Bayesian paradigm for the parameters of a doubly (two-sided) hidden truncation model are still in their infancy stage and have not been explored in detail to the best of our knowledge. Statistical inference of the hidden truncation models with unknown truncation point(s) is a challenging problem, especially when there is very little information about the truncation point(s) in the observed sample. One can expect to mimic the inferential results obtained in this paper to other types of hidden truncated bivariate exponential models, albeit with computational complexity and possibly with a different set of informative and non-informative priors that might be challenging. Needless to say, the associated computational complexity and judicious selection of prior choices, especially for the truncation points, are the natural hindrance to developing the methodology and theory. Inference under the Bayesian framework, as discussed in the simulation study as well as in the real-data application, is encouraging in the sense that the Bayesian estimates of the parameters are reasonably good under both settings. Efficient estimation for multiple constraints models, i.e., the estimation of model parameters for a hidden truncated model under a multi-component set-up, where, for example, the main study variable
Y is observed only when the associated concomitant variables, say,
are truncated from below, such as
(or
), will be of great interest in the context of several real-life applications. However, the real-life applicability of such models and their analytical traceability are the two key factors that should be addressed first. Research in this direction is in progress, and we hope to report the results in a future paper.