Bayesian Dependence Tests for Continuous, Binary and Mixed Continuous-Binary Variables

: Tests for dependence of continuous, discrete and mixed continuous-discrete variables are ubiquitous in science. The goal of this paper is to derive Bayesian alternatives to frequentist null hypothesis signiﬁcance tests for dependence. In particular, we will present three Bayesian tests for dependence of binary, continuous and mixed variables. These tests are nonparametric and based on the Dirichlet Process, which allows us to use the same prior model for all of them. Therefore, the tests are “consistent” among each other, in the sense that the probabilities that variables are dependent computed with these tests are commensurable across the different types of variables being tested. By means of simulations with artiﬁcial data, we show the effectiveness of the new tests.


Introduction
Tests for dependence of continuous, discrete and mixed continuous-discrete variables are fundamental in science.The standard way to statistically assess if two (or more) variables are dependent is by using null-hypothesis significance tests (NHST), such as χ 2 -test, Kendall's τ, etc.However, these tests are affected by the drawbacks which characterize NHST [1][2][3].An NHST computes the probability of getting the observed (or a larger) value of the statistics under the assumption that the null hypothesis of independence is true, which is obviously not the same as the probability of variables being dependent on each other, given the observed data.Another common problem is that the claimed statistical significance might have no practical impact.Indeed, the usage of NHST often relies on the wrong assumptions that p-values are a reasonable proxy to the probability of the null hypothesis and that statistical significance implies practical significance.
In this paper, we propose a collection of Bayesian dependence tests.The questions we are actually interested in-for example, Is variable Y dependent on Z? or Based on the experiments, how probable is Y dependent on Z?-are actually questions about posterior probabilities.Answers to these questions are naturally provided by Bayesian methods.The core of this paper is thus to derive Bayesian alternatives to frequentist NHST and to discuss their inference and results.In particular, we present three Bayesian tests for dependence of binary, continuous and mixed variables.All of these tests are nonparametric and based on the Dirichlet Process.This allows us to use the same prior model for all the tests we develop.Therefore, they are "consistent" in the sense that the probabilities of dependence we compute are commensurable across the tests.This is another main difference about such an approach and the use of p-values, since the latter usually cannot be compared across different types of tests.
To address the issue of how to choose the prior parameters in case of lack of information, we propose the use of the Imprecise Dirichlet Process (IDP) [4].It consists of a family of Dirichlet processes with fixed prior strength and and prior probability measure free to span the set of all distributions.In this way, we obtain as a byproduct a measure of sensitivity of inferences to the choice of the prior parameters.
Nonparametric tests based on the Dirichlet Process and on similar ideas to those presented in this paper have also been proposed in [4] to develop a Bayesian rank test, in [5] for a Bayesian signed-rank test, in [6] for a Bayesian Friedman test and in [7] for a Bayesian test that accounts for censored data.
Several alternative Bayesian methods are available for testing of independence.The test of linear dependence between two continuous univariate random variables can be achieved by fitting a linear model and inspecting the posterior distribution of the correlation coefficient.A more sophisticated test based on a Dirichlet Process Mixture prior is instead presented in [8] to deal with linear and nonlinear dependences.Other methods were proposed for testing of independence based on a contingency table [9][10][11].The main difference between these works and the work presented in this paper is that we provide tests for continuous, categorical (binary) and mixed variables using the same approach.This allows us to derive a very general framework to test independence/dependence (these tests could be used for instance for feature selection in machine learning [12][13][14][15]).
By means of simulations on artificial data, we use our test to decide if two variables are dependent.We show that our Bayesian test achieves equal or better results than the frequentist tests.We moreover show that the IDP test is more robust, in the sense that it acknowledges when the decision is prior-dependent.In other words, the IDP test suspends the judgment and becomes indeterminate when the decision becomes prior dependent.Since IDP has all the positive features of a Bayesian test and it is more reliable than the frequentist tests, we propose IDP as a new test for testing dependence.

Dirichlet Process
The Dirichlet Process was developed by Ferguson [16] as a probability distribution on the space of probability distributions.Let X be a standard Borel space with Borel σ-field B X and P be the space of probability measures on (X, B X ) equipped with the weak topology and the corresponding Borel σ-field B P .Let M be the class of all probability measures on (P, B P ).We call the elements µ ∈ M nonparametric priors.
An element of M is called a Dirichlet Process distribution D(α) with base measure α if for every finite measurable partition B 1 , . . ., B m of X, the vector (P(B 1 ), . . ., P(B m )) has a Dirichlet distribution with parameters (α(B 1 ), . . ., α(B m )), where α(•) is a finite positive Borel measure on X.Consider the partition B 1 = A and B 2 = A c = X\A for some measurable set A ∈ X, then if P ∼ D(α) from the definition of the DP we have that (P(A), P(A c )) ∼ Dir(α(A), α(X) − α(A)), which is a β distribution.From the moments of the β distribution, we can thus derive that: where we have used the calligraphic letter E to denote expectation with respect to the Dirichlet process.This shows that the normalized measure α(•)/α(X) of the DP reflects the prior expectation of P, while the scaling parameter α(X) controls how much P is allowed to deviate from its mean α(•)/α(X).Let s = α(X) stand for the total mass of α(•) and α * (•) = α(•)/s stand for the probability measure obtained by normalizing α(•).If P ∼ D(α), we shall also describe this by saying P ∼ Dp(s, α * ) or, if X = R, P ∼ Dp(s, G 0 ), where G 0 stands for the cumulative distribution function of α * .Let P ∼ Dp(s, α * ) and f be a real-valued bounded function defined on (X, B).Then the expectation with respect to the Dirichlet Process of One of the most remarkable properties of the DP priors is that the posterior distribution of P is again a DP.Let X 1 , . . ., X n be an independent and identically distributed sample from P and P ∼ Dp(s, α * ), then the posterior distribution of P given the observations, denoted as P X|X n , is where δ X i is an atomic probability measure centered at X i and X n = {X 1 , . . ., X n }.This means that the Dirichlet Process satisfies a property of conjugacy, in the sense that the posterior for P is again a Dirichlet Process with updated unnormalized base measure α + ∑ n i=1 δ X i .From Equations ( 1)-( 3), we can easily derive the posterior mean and variance of P(A) and, respectively, posterior expectation of f .Hereafter we list some useful properties of the DP that will be used in the sequel (see Chapter 3 in [17]).
An issue in the use of the DP as prior measure on P is how to choose the infinite dimensional parameter G 0 in case of lack of prior information.There are two avenues that we can follow.The first assumes that prior ignorance can be modelled satisfactorily by a so-called noninformative prior.In the DP setting, the only noninformative prior that has been proposed so far is the limiting DP obtained for s → 0, which has been introduced by [16] and discussed by [18].The second approach suggests that lack of prior information should be expressed in terms of a set of probability distributions.This approach known as Imprecise Probability [19][20][21] is connected to Bayesian robustness [22][23][24] and it has been extensively applied to model prior (near-)ignorance in parametric models.In this paper, we implement a prior (near-)ignorance model by considering a set of DPs obtained by fixing s to a strictly positive value and letting G 0 span the set of all distributions.This model has been introduced in [4] with the name of Imprecise Dirichlet Process (IDP).

Bayesian Independence Tests
Let us denote by X the vector of variables [Y, Z] T so that the n observations of X can be rewritten as that is, a set of n vector-valued i.i.d.observations of X.We also consider an auxiliary variable X together with X.We assume that X, X are independent variables from the same unknown distribution and that X n = X n , that is, we have the same observations of X and X .
Let P be the unknown distribution of X, X and assume that the prior distribution of P is Dp(s, α * ).Our goal is to compute the posterior of P. The posterior of P is given in (3) and, by exploiting (6), we know that with (ω 0 , ω 1 , . . ., ω n ) ∼ Dir(s, 1, . . ., 1) and P ∼ Dp(s, α * ).The distribution P X |X n of X is similarly defined.The questions we pose in a statistical analysis can all be answered by querying this posterior distribution in different ways.We adopt this posterior distribution to devise Bayesian counterparts of the independence hypothesis tests.

Bayesian Bivariate Independence Test for Binary Variables
Let us assume that the variables Y, Z ∈ {0, 1} (that is, they are binary).Our aim is to devise a Bayesian independence test for binary variables based on the DP.We will also show that our test is a Bayesian generalisation of the frequentist χ 2 -test for independence applied to binary variables.We start by defining the following quantities: where we have exploited the independence of X, X and here F(X|X n ) denotes the posterior cumulative distribution of P X|X n defined in (8).From (8), it can easily be verified that where and where in the last equality we have exploited the fact that X has the same distribution as X and also the same observations.The two quantities ω 00 , ω 11 include two terms.The first is the term due to the prior dF ∼ Dp(s, α * ) and the second term is due to the observations.Similarly, we compute where and Summing up, ω 00 , ω 1,0 , ω 0,1 , ω 11 represent the posterior probabilities of the events (0, 0) (that is, Y = 0 and Z = 0), (1, 0), (0, 1) and (1, 1), respectively, according to the posterior joint distribution F(X|X n ).
Proof.We just derive the third statement.The other two statements are analogue.We first consider the indicator functions and same for the auxiliary variables Y , Z .By computing the expectation of these functions, we can obtain the marginals of the variables Y, Z with respect to the joint P X : where ω 0• (resp.ω 1• ) denotes the marginal with respect to Z when Y = 0 (resp.Y = 1), while ω •0 (resp.ω •1 ) denotes the marginal with respect to Y when Y = 0 (resp.Y = 1).Then, by exploiting independence between X and X , we derive We are now ready to define the independence test.If the two variables Y, Z are independent, then the vector v = (ω 00 , ω 10 , ω 01 , and thus is a well-defined quantity with respect to our probabilistic model (similarly for the other terms).Therefore, the independence test reduces to checking whether the (1 − γ)% highest density credible region (HCR) of v includes the zero vector.It can be easily verified that In fact, we have for i, j ∈ {0, 1} and ī = 1 − i, j = 1 − j, and so Therefore, it is enough to check whether If this is the case, then we can declare that the two variables are dependent with probability (1 − γ).Here, the multiplier 2 in 2(ω 00 ω 11 − ω 01 ω 10 ) is only a scaling factor so that 2(ω 00 ω 11 − ω 01 ω 10 ) varies in [−0.5, 0.5].
From the proof of Theorem 1 it is evident the similarity of the test with the frequentist χ 2 -test for independence.Both tests use the difference between the joint and the product of the marginals as a measure of dependence.The advantage of the Bayesian approach is that we compute posterior probabilities for the hypothesis in which we are interested and not the probability of getting the observed (or a larger) difference under the assumption that the null hypothesis of independence is true.
The probabilities computed in Theorem 1 depend on the prior information F ∼ Dp(s, α * ).In this paper we adopt IDP as prior model.We can then perform a Bayesian nonparametric test that is based on extremely weak prior assumptions, and easy to elicit, since it requires only the choice of the strength s of the DP instead of its infinite-dimensional parameter α * .The infinite-dimensional parameter α * is free to vary in the set of all distributions.
Let us consider for instance (13).Each one of these priors gives a posterior probability P (2(ω 00 ω 11 − ω 01 ω 10 ) > 0|X n ).We can characterize this set of posteriors by computing the lower and upper bounds P (2(ω 00 ω 11 − ω 01 ω 10 ) > 0|X n ) and P (2(ω 00 ω 11 − ω 01 ω 10 ) > 0|X n ).Inferences with IDP can be computed by verifying if and then by taking the following decisions: 1. if both the inequalities are satisfied, then we declare that the two variables are dependent with probability larger than 1 − γ; 2. if only one of the inequalities is satisfied (which has necessarily to be the one for the upper), we are in an indeterminate situation, that is, we cannot decide; 3. if both are not satisfied, then we declare that the probability that the two variables are dependent is lower than the desired probability of 1 − γ.
When IDP returns an indeterminate decision, it means that the evidence from the observations is not enough to declare that the probability of the hypothesis being true is either larger or smaller than the desired value 1 − γ; more observations are necessary to reach a reliable decision.
3. compute the histogram of the elements in V (this gives us the plot of the posterior of 2(ω 00 ω 11 − ω 01 ω 10 )) 4. compute the posterior upper probability that 2(ω 00 ω 11 − ω 01 ω 10 ) is greater than zero as The number of Monte Carlo samples N mc is equal to 100 thousand in the next examples and figures.
The lower and upper HDI intervals in Theorem 1 can also be obtained as in Theorem 2 and computed via Monte Carlo sampling (HDI can be computed using the values stored in V, see pseudo-code).Hereafter we will denote the two intervals corresponding to the lower and upper distributions as HDI(2(ω 00 ω 11 − ω 01 ω 10 )) and HDI(2(ω 00 ω 11 − ω 01 ω 10 )), respectively.
The only prior parameter that must be selected with IDP is the prior strength s.The value of s determines how quickly the posteriors corresponding to the lower and upper probabilities converge as the number of observations increases.We select s = 0.5-this means that we need at least 4 concordant binary observations to take a decision with 1 − γ = 0.95.In other words, for s = 0.5 we need two observations of type Y = 0, Z = 0 and two of type Y = 1, Z = 1 to guarantee that both 1 − γ = 0.95% HDI intervals, i.e., HDI(2(ω 00 ω 11 − ω 01 ω 10 )) and HDI(2(ω 00 ω 11 − ω 01 ω 10 )), do not include the zero.For any number of (and configuration of) observations less than four, the test is always indeterminate (i.e., no decision can be taken).Thus, four is the minimum number of observations that is required to take a decision.This choice is arbitrary and subjective, but is our measure of cautiousness.We make clearer the meaning of determinate and indeterminate in the following example.
Example 1.Let us consider the following three matrices of 10 paired binary i.i.d.observations They correspond to different degrees of dependence.Figure 1 shows the lower and upper distributions of 2(ω 00 ω 11 − ω 01 ω 10 ) and the relative 95% HDI, i.e., HDI(2(ω 00 ω 11 − ω 01 ω 10 )) and HDI(2(ω 00 ω 11 − ω 01 ω 10 )), for the three cases a, b, c (the filled in areas).In case (a), the two variables are dependent (concordant) with probability greater than 0.95, since all the mass of the lower and upper distributions are in the interval [0, 0.5].In the second case, we are in an indeterminate situation, that is, the lower and upper are in disagreement, which means that the inference is prior dependent.In the third case, we can only say that they are not dependent at 95% since both the HDI intervals include the zero.

Bayesian Bivariate Independence Test for Continuous Variables
Let us assume that variables Y, Z ∈ R, that is, they are real continuous variables.Our aim is to devise a Bayesian independence test for continuous variables based on the DP.We will also show that our test is a Bayesian generalisation of Kendall-τ test for independence.This test uses results from [25] that derived a Bayesian Kendall's τ statistic using DP.As before, we introduce auxiliary variables Y , Z .We start by defining the following quantities: T 1 and T 2 are concordance measures.We can then compute where we have exploited the independence of X, X and here F(X|X n ) denotes the posterior cumulative distribution of P X|X n .This quantity is equal to where we have exploited the fact that X has the same distribution as X and the same observations.Given (ω 0 , . . ., ω n ), it can be seen that the first two terms depend on the prior distribution F ∼ Dp(s, α * ) and the last term is only due to the observations.Theorem 3. The variables Y and Z are said to be concordant (dependent) with posterior probability (1 − γ) and they are said to be discordant provided that where P is the probability computed with respect to (ω 0 , ω 1 , . . ., ω n ) ∼ Dir(s, 1, . . ., 1) and dF ∼ Dp(s, α * ).
Finally, they are said to be simply dependent with posterior probability (1 − γ) provided that where HDI denotes the posterior Highest Density Interval of E The divisor 2 in E[I T 1 − I T 2 ]/2 is only a scaling factor so that the expectation lies in [−0.5, 0.5].The theorem simply follows from the fact that E[I T 1 − I T 2 ] is the same measure of dependence used in Kendall's τ test.In this respect, it is worth to highlight the connection with Kendall's τ.By exploiting the properties of DP, we have that the posterior mean of E[I T 1 − I T 2 ] for large n is approximately equal to.

E (E[I
and this is exactly Kendall's sample τ coefficient.In fact, Kendall's sample τ coefficient is defined as: with Observe that T can also be rewritten as: in terms of all the A ij pairs, which is proportional to (30) for large n.This clarifies the connection between our Bayesian test of dependence for continuous variables based on E[I T 1 − I T 2 ]/2 and Kendall's τ test.
As for the dependence test for binary variables, we will make inferences using IDP.Inferences with IDP can computed by verifying if Theorem 4. The upper probability P (E Proof.We have that We want to maximize I T 1 (X, X ).Since I T 1 (X, X )δ X a 0 (X)δ X a 0 (X )dXdX = 0, we need at least two Dirac's deltas.Hence, we consider the mixture dF = mδ X a and so we have maximized the second term.For the first term depending on m(1 − m), the maximum is obtained at m = 1/2.For the lower probability, the proof is similar.
The lower and upper HDI intervals can also be obtained as in Theorem 4. Again in this case, the value of s determines how quickly lower and upper posteriors converge as the number of observations increases.We choose s = 0.5 as for the binary test.
They correspond to different degrees of dependence.Figure 2 shows the lower and upper posteriors for the three cases a, b, c and the relative HDI intervals at 95% probability (the filled in areas).In case (a), the two variables are dependent (concordant) with probability greater than 0.95, since all the mass of the lower and upper distributions are in the interval [0, 0.5].In the second case, we are in an indeterminate situation, that is, the lower and upper are in disagreement, which means that the inference is prior dependent.In the third case, we can only say that they are not dependent at 95% since both the HDI intervals include the zero.

Bayesian Bivariate Independence Test for Mixed Continuous-Binary Variables
Let us assume that the variables Y ∈ R and Z ∈ {0, 1}.Our aim is to devise a Bayesian independence test based on the DP.We introduce the auxiliary variable X as done before.To derive our test, we start by defining the following indicator: This indicator is one if X = (Y, 0) and X = (Y , 1), with Y > Y and zero otherwise.We can compute where we have exploited the independence of X, X .F(X|X n ) denotes the posterior cumulative distribution of P X|X n .This quantity is equal to For large n, we have that which is equal to the rank of Y in the observations (Y, 0) with respect to the observations (Y, 1).Therefore, our dependence test is rank-based.It is clear that, in the case of independence of the variables Y and Z, the mean rank is equal to 0.125.Hence, we can formulate independence test for mixed variables.
Theorem 5.The variables Y and Z are dependent with posterior probability (1 where HDI denotes the posterior Highest Density Interval of The theorem follows from the fact that in case of independence between variables Y and Z the mean rank (36) scaled by 4 and shifted of −0.5 is equal to 0. Also in this case, we make inferences using IDP.

Proof. Consider the quantity
By computing the derivative . The result is obtained by exploiting the fact that m ∈ [0, 1].For the lower probability, the computation is straightforward.
The lower and upper HDI intervals can also be obtained as in Theorem 4. We choose s = 0.5 as for the previous tests.
Example 3. We consider three matrices of 10 paired binary-continuous i.i.d.observations Again, they correspond to different degrees of dependence.Figure 3 shows the lower and upper posteriors for the three cases a, b, c and the relative HDI intervals at 95% probability (the filled in areas).In case (a), the two variables are dependent (concordant) with probability greater than 0.95, since all the mass of the lower and upper distributions are in the interval [0, 0.5].In the second case, we are in an indeterminate situation, that is, the lower and upper are in disagreement.In the third case, we can only say that they are not dependent at 95% since both the HDI intervals include the zero.

Experiments
We compare our Bayesian testing approach in the three discussed main scenarios where both variables are binary, both are continuous and one is binary and the other is continuous.The goal is to decide whether the two variables are dependent or independent.We generate n samples (n = 20 and 50) using the distributions defined in Table 1.Ten thousand repetitions are used by forcing the variables to be independent (so β = 0) and thousand repetitions where the variables are dependent, for each value of β > 0. The value of β is varied as explained in the table.For each n, β and each of these twenty thousand samples (for which we know the correct result of the test), we run the new approach versus χ 2 test, Kendall τ test and Kolmogorov-Smirnov test, respectively for the binary-binary, continuous-continuous and binary-continuous cases.For each run of each method, we record their p-values, while for the new approach we compute γ corresponding to the limiting credible region 1 − γ wide where the decision changes between dependent and independent.Such value is related to the p-values of the other tests and can be used for decision making by comparing it against a threshold (just as it is done with the p-values).However, it should be observed that thresholds different from 0.05 or 0.01 are hardly used in practice in null hypothesis significance tests.Conversely, for a Bayesian tests 1 − γ is a probability and, therefore, we can take decisions with probability 0.99, 0.95 but also 0.7 or even 0.51 depending on the application (and the loss function).However, instead of fixing a threshold (which is a subjective choice) to decide between the options dependent and non-dependent with probability 1 − γ, we use Receiver Operating Characteristic (ROC) curves.ROC curves give the quality of the approaches for all possible thresholds.The curves are calculated as usual by varying the threshold from 0 to 1 and computing the sensitivity (or true positive rate) and specificity (or one minus false positive rate) (this is slightly different from the common approach of drawing ROC curves as a function of the true positive rate and false positive rate [26][27][28]).ROC curves are always computed considering different degrees of dependence (different values for β = 0) against independence (β = 0).We apply the same criterion to p-values for comparing the methods across a wider range of decision criteria.We have used the R package "pROC" to compute the ROC curves [29].Figures 4-6 present the comparison of the new approach (which we name as IBinary, ICont or IMixed to explicitly account for the types of variables been analyzed) using s ≈ 0 against the appropriate competitor.With such choice of s, the new approach runs without indeterminacy and can be directly compared against usual methods.As we see in the figures, the new method performs very similar to each competitor, with the advantage of being compatible among different types of data (the p-values of the other methods, among different data types, cannot be compared to each other).This is useful when one works with multivariate models involving multiple data types.As expected, the quality of the methods increases with the increase of β and of the sample size.
Figures 7-9 present the ROC curves for the methods χ 2 , Kendall τ and Kolmogorov-Smirnov, respectively.These curves are separated according to whether the instance is considered determinate or indeterminate by the new approach.In other words, for each one of the twenty thousand repetitions, we run the corresponding usual test and then we check whether the output of the new approach is determinate or indeterminate (applying s = 0.5), and we split the instances accordingly (blue curves show the accuracy over instances that are considered easy (determinate cases) while green curves over instances that are hard (indeterminate cases)-we also present the overall accuracy of the method using red curves).As we see, such division is able to identify easy-to-classify and hard-to-classify cases, since the ROC curves for the cases deemed as indeterminate by the new approach suggest a performance not better than a random guess (green curves).using the new approach, This means that if we would devise another test (called "50/50 when indeterminate") which returns the same response as IBinary, ICont or IMixed when they are determinate, and issues a random answer (with 50/50 chance) otherwise, then this "50/50 when indeterminate" test would have the same ROC curve as χ 2 , Kendall τ and Kolmogorov-Smirnov, respectively.This suggests that the indeterminacy of IDP based tests is an additional useful information that our approach gives to the analyst.In these cases she/he knows that (i) her/his posterior decisions would depend on the choice of the prior DP measure; (ii) deciding between the two hypotheses under test is a difficult problem as shown by the comparison with the DP with s = 0, χ 2 , Kendall τ and Kolmogorov-Smirnov.Based on this additional information, the analyst can for example decide to collect additional measurements to eliminate the indeterminacy (in fact when the number of observations goes to infinity the indeterminacy goes to zero).

Conclusions
We have proposed three novel Bayesian methods for performing independence tests for binary, continuous and mixed binary-continuous variables.All of these tests are nonparametric and based on the Dirichlet Process.This has allowed us to use the same prior model for all the tests we have developed.Therefore, all the tests are "consistent", in the sense that the probabilities of dependence we compute with these tests are commensurable across the tests.
We have presented two versions of these tests: one based on a noninformative prior and one based on a conservative model of prior ignorance (IDP).Experimental results show that the prior ignorance method is more reliable than both the frequentist test and the noninformative Bayesian one, being able to isolate instances in which these tests are almost guessing at random.For future work, we plan to extend this approach in two directions: (1) feature selection in classification; (2) learning the structure (graph) of Bayesian networks and Markov Random Fields.The idea is to use our dependence tests to replace the frequentist tests that are commonly used for that purpose and evaluate the gain in terms of performance.For instance in case (1), we then could compare the accuracy of a classifier whose features are selected using our tests with that of a classifier whose features are selected by using frequentist tests.Our new approach is suitable since it addresses two limitations of currently used tests: they are based on null-hypothesis significance tests, and they cannot be applied to categorical and continuous variables at the same time in a commensurable way.

Figure 2 .
Figure 2. Three possible results of the independence hypothesis testing for continuous variables.The red and blue filled areas correspond respectively to the lower and upper HDI.(a) Dependent at 95%; (b) Indeterminate at 95%; (c) They are not dependent at 95%.

Figure 3 .
Figure 3. Three possible results of the independence hypothesis testing for pairs binary-continuous.The red and blue filled areas correspond respectively to the lower and upper HDI.(a) Dependent at 95%; (b) Indeterminate at 95%; (c) They are not dependent at 95%.

Table 1 .
Data generation setup.In order to generate independent data, β is set to zero.Larger values of β increase their dependency.samples have the binary variable set to zero and half to one.When that variable is zero, then for the continuous use Γ(10, 2), otherwise Γ(10 + β, 2 + β).

Table 4 .
Area under the ROC curve (AUC) values for all the performed experiments using different values of s, β and n.IMixed shows the AUC for the new test applied to one binary and one continuous variables and s ≈ 0. Kolmogorov-Smirnov (KS), Det.cases, and Indet.casesshow the AUC obtained by KS test over all samples, only over samples considered determinate by IMixed (with the corresponding s) and finally only over samples considered indeterminate by IMixed.