Modeling the Conditional Dependence between Discrete and Continuous Random Variables with Applications in Insurance

: We jointly model amount of expenditure for outpatient visits and number of outpatient visits by considering both dependence and simultaneity by proposing a bivariate structural model that describes both variables, speciﬁed in terms of their conditional distributions. For that reason, we assume that the conditional expectation of expenditure for outpatient visits with respect to the number of outpatient visits and also, the number of outpatient visits expectation with respect to the expenditure for outpatient visits is related by taking a linear relationship for these conditional expectations. Furthermore, one of the conditional distributions obtained in our study is used to derive Bayesian premiums which take into account both the number of claims and the size of the correspondent claims. Our proposal is illustrated with a numerical example based on data of health care use taken from Medical Expenditure Panel Survey (MEPS), conducted by the U.S. Agency of Health Research and Quality. Author Contributions: Conceptualization, E.G.-D. and E.C.-O.; methodology, E.G.-D. and E.C.-O.; validation, E.G.-D. and E.C.-O.; writing—original draft preparation, E.G.-D. and E.C.-O.; writ-ing—review and editing, E.G.-D. and E.C.-O.; funding acquisition, E.G.-D.


Introduction
The use of bivariate distributions where the marginals are one continuous and the other one discrete are extremely rare in statistics and more specifically in applied statistics. Some examples in which these types of distributions have been previously considered are [1][2][3]. The latter work was applied in tourism settings and the first two papers were applied in actuarial context. Conditional specification structure has also been applied in insurance framework when both marginal distributions are continuous. See [4] for details.
The statistical literature contains numerous methods to obtain bivariate distributions, some of them within the copula framework. These are functions that enable us to separate the marginal distributions from the dependency structure of a given multivariate distribution (see [5,6], among others). Several flexible approaches based on different bivariate copula families such as [7][8][9] have been proposed in the literature, including the Plackett and Frank copulas. The methodology proposed in this work allows us to build a bivariate distribution based on a conditional specification technique (see for instance [10,11]). In general, although the models obtained through this methodology rely on formulations that incorporate many parameters, this requirement can be relaxed, as it will be shown below, to obtain simple and practical expressions. The major advantage of this methodology, as compared to copula approach, is that the latter one incorporates a parameter that controls the correlation which is difficult to estimate, since it is restricted to a range of allowed values.(In practice, this can often be solved by choosing a suitable parametrization (e.g., using the logit or arctan of the correlation as parameter)).
In this work, we jointly model amount of expenditure for outpatient visits and number of outpatient visits by taking into account both dependence and simultaneity. It is noted that both variables are important inputs in the structure of health budgets and in health insurance. More specifically, on the one hand, we model dependence by proposing a bivariate structural model that describes both variables specified in terms of their conditional distributions [10,11]. In fact, the model proposed here is a special case of the one obtained in [3]. On the other hand, simultaneity is implemented into the bivariate distribution, by making two assumptions: (i) the conditional expectation expenditure for outpatient visits with respect to the number of outpatient visits is linear; and (ii) the conditional expectation of outpatient visits with respect to the expenditure for outpatient visits is also linear. Thus, we assume that more expenditure implies that the considered health system is capable of handling more visits (note that we are not assuring that more expenditure involves a greater number of visits, which is difficult to sustain in practice); on the other hand, more visits obviously imply more expense.
Furthermore, one of the conditional distributions obtained in our study is used to calculate Bayesian premiums which take into account both the number of claims and the size of the correspondent claim. A credibility expression for the Bayesian premium in which the credibility factor obeys the classical expression provided in [12] is also derived. This type of premiums can be useful in automobile insurance contract to take into consideration both the number of claims and the severity of claim.
The second major contribution of this work is based on the fact that the proposed model enables us to evaluate the effect of covariates on both expenditure and number of visits. Thus, we can distinguish between the effects of these explanatory variables on both responses and, therefore, we can determine which covariates simultaneously affect both expenditure and the number of visits.
The proposed model is empirically evaluated by using data of health care use taken from Medical Expenditure Panel Survey (MEPS), conducted by the U.S. Agency of Health Research and Quality. This set of data can be downloaded from the web page of Professor E. Frees. We jointly model the number of outpatient visits and the amount of expenditure of outpatient visits. Some factors affecting these two random variables will be considered.
The rest of this paper is organized as follows. Section 2 provides the general motivation of this study and the main results. The proposed model for number of visits and expenditure is introduced here. A bivariate model allowing for the implementation of fixed effects is provided in Section 3. Some methods of estimation are shown in Section 4. Next, an empirical analysis of these models is performed in Section 5 and finally Section 6 summarizes the main conclusions drawn from this work.

Motivation and Results
In this section, we propose a non-linear bivariate model of amount of expenditure for outpatient visits and number of outpatient visits. The model is a special case of the one provided in [3] which was built by using a characterization via conditional expectations. In this case it was constructed through a class of bivariate distributions such that the conditional distributions belong to a specified exponential family.
As is well known, multivariate distributions can be specified in terms of their conditional distributions rather than directly. In this case, the dependence structure can be modeled using bivariate copulas or other mathematical or statistical methods such as conditional distributions or mixing distributions. By assuming that the conditional distributions belong to certain parametric families of distributions, the joint distribution can be obtained as described in [10] (see also [11] for an introduction to this topic). For applied works in this setting, see [1,3,13], among others. To obtain the joint distribution, it is first necessary to determine the solution of certain functional equations, which facilitates the derivation of highly flexible multiparametric distributions. As [11] pointed out, when we wish to specify a bivariate distribution it is sometimes convenient to visualize conditional distributions rather than marginal or joint distributions. In this context, it is useful in statistical modeling to have tractable multivariate distributions with given marginals to quantify the dependence effect of the variables in the model. Accordingly, let us assume that the amount of expenditure for outpatient visits is represented as a random variable X and that the number of outpatients visits to the hospital is also random, and denoted by N. We now wish to obtain the more general bivariate distribution (X, N) whose conditional distributions satisfy the following conditions based on expenditure, conditional on the number of visits and on the number of visits to the hospital conditional to expenditure.
On the one hand, the expenditure for outpatient visits depends on the number of outpatient visits, i.e., if X|N = n is the expenditure conditional to N = n visits is directly proportional to the number of outpatient visits. Hence, According to expression (1), assuming that n takes values within the set of integer numbers {0, 1, . . . , n} there is a minimum expense given by α 1 in the case of no outpatient visits. Then, it seems coherent to assume that the expense increases as the number of visits increases.
On the other hand, it also seems logical to believe that the number of outpatient visits will depend on the expenditure, i.e., the number of outpatient visits that the health care system can assist will depend on its capacity, which in turn is conditioned by the budget (in terms of expenditure) available. Hence, Although Equation (1) includes an intercept since some expenditure would be necessary even for zero visits, pointing out that the health system must be prepared for any future visit, this is not the case of Equation (2). This is mainly due to two reasons: (i) On the one hand, this modelization facilitates the construction of the mathematical model obtained from Equation (2); and (ii) it seems obvious that an initial expenditure of near-zero monetary units implies that the health system is unable to absorb patients and, therefore, the mean of visits, and therefore the conditional mean, will be close to zero.
In some circumstances it is certain that this linear dependence relationship is not easily to be assumed. For example, an increase in the number of patient visits to the hospital might be caused by the suffering of more serious diseases that, in turn, incur in more expensive treatments. In this case, it is likely that a non-linear or logarithmic polynomial relationship could be more realistic. Obviously, the modeling in this case would be different from the one considered in this work.
In this work, we propose the use of a bivariate distribution that satisfies the conditions (1) and (2). We will consider the case where one of the conditional random variable is Gamma and the other one Poisson. A discrete random variable is said to follow a Poisson distribution with parameter λ > 0 if its probability mass function (pmf) is given by In the following, N ∼ P o(λ) will denote a random variable following a Poisson distribution with pmf given by (3). Moreover, a continuous random variable follows a gamma distribution with shape parameter α > 0 and rate parameter β > 0 if its probability density function (pdf) is expressed as We will write in this case, X ∼ G(α, β). By searching for the bivariate conditional distribution, let us now assume that for given functions ϕ(x) : N −→ R + , σ(n) : R + −→ R + and η(n) : R + −→ R + . Then, the conditional distributions given in (5) and (6) have pdfs given by i.e., they are members of the exponential family of distributions. Ref. [3] (see also [10], Chapter 4, p. 98) studied the most general bivariate distribution with conditionals given by (7) and (8) and concluded that this has pdf given by, The parameter κ is a normalizing constant, a function of the remaining parameters, satisfying which can be obtained from one of the conditional distributions given in (5) and (6), where Certainly, the major limitation of this type of models, built by conditional specification, lies in the difficulty that both the marginal distributions and the joint distribution depend on a normalization constant that in most cases is very difficult or even impossible to calculate in closed form (see for example [14]). Obviously, the difficulty grows when the two variables on which the model depends have different support or it is even more complicated when one variable is discrete and the other one is continuous, as the case considered here. However, from a numerical point of view, this is not an issue of relative importance since there exist integration and summation techniques (more difficult in the latter case) that allow us to compute the constant term. The reader is referred to [15] for more details on how to proceed in these cases.
For practical purposes, an appropriate choice of the parameter could enable us to obtain a closed-form expression for the normalizing constant. For example, the case ε = τ = 0 corresponds to that one where N and X are independent and the marginal distributions are Poisson and gamma as in (3) and (4), respectively. The cases β = 1 + γ, ε = 0, τ = 1 and ε = 0, τ = 1 provide a closed-form model with conditional means of the type of (1) and (2) and thus suitable for our purpose. This latter model that includes an additional parameter as compared to the previous one will be considered here. Then, the bivariate distribution given in (9) can be rewritten as Please note that the bivariate distribution (13) has a marginal distribution with continuous support and the other one with discrete support. This type of bivariate distribution is uncommon in theoretical and applied statistical literature, allowing us to model phenomena that are not usual in practical situations but sometimes occur, as it is the case described in this paper. See for example [1,16,17].

Marginal Distributions
The marginal distributions of X and N are given by where N B refers to the negative binomial distribution, i.e., The marginal distribution of N is obtained by integrating (13) with respect to x over (0, +∞) and the marginal distribution of X is calculated by summing (13) with respect to n in the support {0, 1, . . . }.
Furthermore, it is readily apparent that the marginal means and variances are given by while the cross moment of X and N is Simple calculations provide the covariance and the coefficient of correlation. These are given by respectively, which are always positive. The coefficient of correlation is bounded between 0 (γ ≈ β) and 1 (γ ≈ 0). Thus, parameter γ controls the dependence structure of the model. If the researcher wanted to work with a model that allows for positive and negative correlations, it would be enough to choose in (9) a less restrictive model.

Conditional Distributions
The conditional distribution of X given N, shown in (5), is gamma with the parameters given in (10) and (11) and the conditional distribution of N given X, see (6), is Poisson with parameter given in (12). Thus, the conditional distribution of X given N = n is G(σ(n), η(n)), where Observe that under these conditional distributions, expressions (1) and (2) are guaranteed.
Both the moments and maximum likelihood methods are feasible ways of estimating the vector of parameters φ φ φ = (α, β, γ) of the distribution through sample observations.

Hypothesis Testing
By using bivariate data, hypothesis testing can be performed with the parameters α, β and γ. We may also be interested in determining, via the likelihood ratio test, when the model might depend only on two parameters, e.g., when β = γ + 1. In this case, the vector φ φ φ can be estimated with the constraint that β = γ + 1. If we denote the new vector by φ * φ * φ * , the critical region for the null hypothesis H 0 : β = γ + 1 is determined from the test statistic which asymptotically has a χ 2 -squared distribution with one degree of freedom.

Allowing for Covariates
In this section, we introduce a more practical model where covariates may be easily implemented. The bivariate linear regression model, which makes no distributional assumptions, is likely to be unsatisfactory because certain combinations of parameters and regressors could violate the non-negative restriction of the mean of both variables and therefore prediction could give non-realistic values. To avoid this situation, we propose a parametric model based on the distributional assumptions presented in the previous section when the researcher wishes to examine the factors or covariates that can affect simultaneously both means.
For our regression analysis, we reparameterized (13) as a function of its marginal means µ 1 = E(X) and µ 2 = E(N), then α is replaced by γµ 1 and β is replaced by γ + µ 2 /µ 1 , where µ 1 > 0 and µ 2 ≥ 1. Then, the pdf (13) can be rewritten as for x > 0 and n = 0, 1, . . . After this reparameterization we also obtain the cross moment, the covariance and the correlation, which are given by Thus, parameter γ controls the dependence or independence of the model. As Therefore, the model provided in (14) is suitable for including covariates since the marginal means are given by µ 1 and µ 2 . The expression (X, N) ∼ BGN B(γ, µ 1 , µ 2 ) is used to denote a bivariate random variable (X, N) that follows the distribution given in (14), with marginals gamma and negative binomial. Graphs of the density function for different parameter values and their corresponding contour plots are shown in Figure 1, revealing that the density seems to be unimodal for different values of its parameters. Now, let y y y i = (y 1i , . . . , y ki ) and w i = (w 1i , . . . , w ki ) be two vectors of k covariates associated with the ith observation. They are two vectors of linearly independent regressors that are thought to determine (x, n). For the ith observation, the model takes the form log(µ 1i ) = y y y i δ δ δ, log(µ 2i − 1) = w w w i η η η, for i = 1, . . . , t and where t denotes the number of observations and δ δ δ = (δ 1 , . . . , δ k ) and η η η = (η 1 , . . . , η k ) the corresponding vectors of regression coefficients.
Moreover, each of the variables, X and N, can be influenced by different factors, hence the explanatory variables that are taken to explain µ κi , κ = 1, 2, could not be the same. Furthermore, observe that the log link assumed ensures that µ 1i falls within the interval (0, ∞) and µ 2i within the interval (1, ∞). Furthermore, in practice the model can handle factors that are only related to just one outcome and not to the other one.
In this study, all estimation computations were implemented by using the Wolfram Mathematica (v. 12.0, Wolfram Research Inc., Champaing, IL, USA) and RATS (v. 7.00 Estima, Evanston, IL, USA) software. In the latter case, the approximation given in (16) was used, obtaining the same results as those ones computed with Mathematica. For further information on these software packages, the reader is referred to [18,19].

Some Methods of Estimation
In the following, we derive estimators based on the moments and maximum likelihood methods for the model with and without covariates. We also provide closed-form expressions for the Fisher's information matrix.

Estimation of the Model without Covariates
The log-likelihood function and the normal equations to provide the corresponding estimates are almost given in closed form. Let us first consider the case of the model without covariates. If (x,ñ) = {(x 1 , n 1 ), (x 2 , n 2 ), . . . , (x t , n t )} is a sample obtained from the distribution with joint density (14) x i n i are the corresponding sample moments, some computations provide the estimators based on these sample moments, which are given by ) .

The Score Vector and Fisher Information Matrix
We now consider the method of maximum likelihood estimation. Let Θ = (γ, µ 1 , µ 2 ) be the vector of parameters to be estimated. The log-likelihood function is proportional to

Thus, the normal equations which provide the estimators of the parameters are given byx
where ψ(z) is the digamma function (the logarithmic derivative of the Euler gamma function). After some algebra the maximum likelihood estimators of µ 1 and µ 2 are obtained, which are given by µ 1 =x and µ 2 =n. Finally, the estimator of the parameter γ is the solution of the equation log γ − ψ(γ µ 1 ) +x * = 0, which can be solved numerically.
The second partial derivatives are as follows: where ψ 1 (·) is the first derivative of the digamma function. The entries of the Fisher's information matrix, J ( Θ), are therefore Here, Θ represents the maximum likelihood estimator of Θ. Observe that the analytic expression of E(∑ t i=1 log x i ) is not feasible. For computational reasons, for large values of t, this is evaluated by ignoring the expectation operator and replacing it by ∑ t i=1 log x i . The asymptotic variance-covariance matrix of Θ is obtained by inverting the observed information matrix.
The above model has the advantage of its simplicity; however, the normal equations require the use of the digamma function, ψ(z) = d dz log(Γ(z)), z > 0. Although the digamma function is provided in most of the statistical software, in practical situations it is convenient to approximate the digamma function by the following expression, which is well known in the statistical literature and it has been proven to perform well in practice.

Estimation of the Model with Covariates
For the sake of simplicity, we will assume (This assumption is only made to display the normal equations and the Fisher information matrix in a nicer format. In practice, as is well known, the maximum of the likelihood function can be obtained directly by maximizing the logarithm of the likelihood function, and therefore, the covariates used do not have to be the same for the two response variables.) η η η = δ δ δ and write µ 1i = µ 1i (δ δ δ) and µ 2i = µ 2i (δ δ δ). Let us also denote Θ = (γ, δ δ δ), thus the log-likelihood function is proportional to Thus, the normal equations, for i = 1, . . . , t, are given by where j = 1, . . . , k, ∂µ 1i ∂δ j = y ij µ 1i and ∂µ 2i ∂δ j = w ij µ 2i . Again, the approximation given in (16) can be used instead of the digamma function. Finally, after computing the second partial derivatives we obtain the elements of the Fisher's information matrix, as follows: −y ij y il µ 2 1i γ 2 ψ 1 ( γ µ 1i ) + x i µ 2i µ 1i w ij w il − w ij y il − y ij w il + y ij y il , j = l.

Empirical Analysis
To analyze the practical performance of the model proposed, two different examples are provided in this section.

Example 1
In this first example a set of data that can be downloaded from the web page of Professor E. Frees (https://instruction.bus.wisc.edu/jfrees/jfreesbooks/Regression%20 Modeling/BookWebDec2010/data.html) is considered. The dataset contains information about the Medical Expenditure Panel Survey (MEPS), conducted by the U.S. Agency of Health Research and Quality. This survey provides nationally representative estimates of health care use, expenditure, sources of payment, and insurance coverage for the U.S. civilian population. This survey collects detailed information on individuals (Please note that the unit of study is not the individual patient but a health system (public or private) which is the case considered here, although it could also be a hospital unit or similar) of each medical care episode by type of services including physician office visits, hospital emergency room visits, hospital outpatient visits, hospital inpatient stays, all other medical provider visits, and use of prescribed medicines. This detailed information allows the researcher to develop models of health care use to predict future expenditure (see http://www.meps.ahrq.gov/mepsweb/ for more details). We consider MEPS data consisting of 1352 individuals that have positive outpatient expenditure for individuals between ages 18 and 65. In Table 1 the summary of descriptive statistics of these two random variables is displayed. Our dependent variables consist of the logarithm of the amounts of expenditure for outpatient log(EXPENDOP) visits (i.e., Expenditure) and number of outpatient visits COUNTOP, (i.e., Outpatient visits). The degree of association between the two response variables Expenditure and Outpatient visits in the sample for the different levels is summarized in terms of some measures of correlation for bivariate data. In Table 2 Pearson's, Spearman's and Kendall's measures of correlation for these random variables are displayed. It is noticeable that there exists a moderate degree of correlation between these two variables. This is also confirmed by Figure 2 where the scatter plot of these two variables and the theoretical (smoothed) distribution are illustrated. In addition the fitted correlation obtained from (15) by using the maximum likelihood estimates of the parameters provided in Table 3 is 0.592.  For the bivariate model without covariates parameters estimates together with their corresponding standard errors are provided in Table 3 by using the parametrization of the bivariate model given in (14). In addition, the minimum value of the negative of the log-likelihood function (NLL) and the total sample size is displayed. Under this parametrization, µ 1 and µ 2 correspond to the estimates of the marginal means.
Parameter estimates and standard errors for the model with covariates are given below in Table 4 for the continuous random variable, Expenditure, and the discrete one, Outpatient visits. Explanation of the covariates considered can be found at the Data Descriptions file available on the same web page. As it can be seen many explanatory variables are statistically significant at the 5% level. Only the covariates unemploy, i.e., employment status of patients and income, i.e., income compared to poverty line do not have a significant impact simultaneously on the marginal means of the variables Expenditure and Outpatient visits. In addition, the dummy variable high school degree highsch, managedcare, region and marital status, maristat are not significant for the continuous response variable. On the other hand, the explanatory variable race, i.e., race of the patient is not statistically significant at the same level for the Outpatient visits response variable.

Example 2
By using statistical tools from risk theory, the conditional distribution of N given X = x provided in Section 2 can be used to compute bonus-malus premiums based on both number and size of the claims. The procedure of handling the variables number of claims and their respective claims size in the actuarial literature and, in particular, in automobile insurance, is usually based, given the difficulty involved in their study, on considering both variables separately (see [20,21]). The main advantage provided by the model proposed in this work is that it treats them together by taking into account their positive correlation. Please note that this model differs from the traditional collective risk model where the variable of interest is S = ∑ N j=1 X i and it is assumed that X j are i.i.d. and independent of the variable N.
Let us now consider that the conditional distribution N|X = x ∼ P (µ 2 x/µ 1 ) is defined in a way such that µ 2 follows a gamma prior distribution with shape parameter a > 0 and scale parameter b > 0. Then, it is straightforward that after observing the data n = (n 1 , . . . , n t ), the posterior distribution of µ 2 given the datañ, is again gamma with updated parameters Now, under squared-error loss function the net risk premium is just the mean of the random variable N, depending obviously on x, i.e., and the collective (prior) premium is the mean of the latter one and it results Since the posterior distribution is conjugate with respect to the likelihood, the Bayesian net premium is obtained directly from (19) by using (17) and (18) and results where n + = ∑ t i=1 n i . In actuarial statistics, the Bayesian premium is a useful premium calculation tool, especially in the automobile insurance sector, since it combines sample information (of an insured) with a priori information (related to the group of insured). This Bayes premium can be easily calculated, i.e., the posterior mean of the parameter µ 2 , when the prior distribution is conjugated with respect to the likelihood used since it is obtained directly from the collective premium by updating the parameters of this prior obtained from the data.
Observe that expression (20) increases with both the number of claims and the claims size, x. Additionally, this Bayesian premium can be rewritten as P * = Z tn + (1 − Z t )P, wheren = n + /t is the sample mean and Z t = tx bµ 1 + tx is known in the actuarial literature as the credibility factor. It is straightforward to see that Z t → 0 as t → 0, Z t → 1 as t → ∞ and Z t = t/(t + K) where K = var(E(N|X = x, µ 1 , µ 2 ))/E(var(N|X = x, µ 1 , µ 2 )). Thus, the credibility factor obeys the expression given in [12] (see also [22,23], among others).
A simple application of the methodology used in this paper is provided in the following. We suppose a motor vehicle insurance portfolio with mean value of 1 and 1/2 for the number of claims and the amount of claims, respectively.
Then, the expression (20) has been used to compute the Bayesian premiums of an automobile insurance portfolio for different values of the claim size x, by assuming that the mean of the number of claims is 1 (i.e., a = b = 1) and that the mean of claims size is µ 1 = 0.5. Then, the Bayesian premiums obtained are shown in Table 5 for different values of the number of claims, n + = ∑ t i=1 n i , observed over the time period (usually a year) t.
It is discernible that for all the values of x, i.e., number of claims, the premium increases with the value of the number of claims, n + . In a similar way, the premium decreases with the period of time, t in which the policyholder does not declare a claim. Also, for a fixed value of k and t, the value of the premium increases with claim size x.

Final Comments
It is useful in applied statistics to obtain tractable bivariate distributions with given marginals that allow us to quantify the degree of dependence of the variables in the model. Although the most traditionally used methodology has been through copulas, in this article, we have examined this issue by using conditional specification structure. In general, the models derived via this approach, are based on expressions that might incorporate many parameters; however, this requirement may be relaxed to obtain simple and useful expressions. Furthermore, this methodology allows us to derive bivariate distributions with discrete and/or continuous marginals. In addition, one of the conditional distributions obtained in this study is used to calculate Bayesian premiums that take into account both the number of claims and the size of the correspondent claims. Also, a credibility expression for the Bayesian premium was derived. Finally, as a numerical application, an example that jointly explain the amount of expenditure for outpatient visits and number of outpatient visits by taking into account both dependence and simultaneity was considered. The model proposed in this paper enable us to evaluate and distinguish the effect of different explanatory variables on both number of visits and expenditure for outpatient visits. Funding: Emilio Gómez-Déniz was partially funded by grant ECO2017-85577-P (Ministerio de Economía, Industria y Competitividad. Agencia Estatal de Investigación).
Informed Consent Statement: Not applicable.