Selection Criteria for Overlapping Binary Models—A Simulation Study

Aparicio, Teresa; Villanúa, Inmaculada

doi:10.3390/math10030478

Open AccessArticle

Selection Criteria for Overlapping Binary Models—A Simulation Study

by

Teresa Aparicio

and

Inmaculada Villanúa

^*

Department of Economic Analysis, University of Zaragoza, Gran Vía, 2, 50005 Zaragoza, Spain

^*

Author to whom correspondence should be addressed.

Mathematics 2022, 10(3), 478; https://doi.org/10.3390/math10030478

Submission received: 28 December 2021 / Revised: 25 January 2022 / Accepted: 29 January 2022 / Published: 2 February 2022

(This article belongs to the Special Issue Statistical Methods in Economics)

Download Versions Notes

Abstract

:

This paper deals with the problem of choosing the optimum criterion for selecting the best model out of a set of overlapping binary models. The criteria we studied were the well-known AIC and SBIC, and a third one called

C_{2}

. Special attention was paid to the setting where neither of the competing models was correctly specified. This situation has not been studied very much but it is the most common case in empirical works. The theoretical study we carried out allowed us to conclude that, in general terms, all criteria perform well. A Monte Carlo exercise corroborated those results.

Keywords:

binary choice models; overlapping models; Kullback–Leibler distance; discrepancy; information criterion; correctly specified models

1. Introduction

This work focused on the analysis of model selection criteria within the framework of binary choice models (BCM), where the endogenous variable is binary

(Y_{i})

, representing the choice of the decision-maker (i) between two options which are quantified by the values 1 and 0. These models are usually expressed as

p_{i} = F (x_{i}^{'} β)

, F being the cumulative distribution function (c.d.f.),

x_{i}

the regressors vector and

p_{i}

the probability that

Y_{i} = 1

. The c.d.f. can be normal or logistic, leading to a probit model or a logit model, respectively. Although the common analysis procedure for these models is to apply the maximum likelihood estimation (MLE) method, they can also be implemented from a Bayesian framework using Gibbs Sampling Markov Chain Monte Carlo (MCMC) methods [1,2]. Nevertheless, in this work, we considered the conventional context; thus, the MLE procedure was used.

This paper compares several models in order to select the “best” of them. In a general context, and following [3], the compared models can be nested, overlapping and non-nested models. In the specific framework of BCM, two binary models are nested if they possess the same c.d.f. (both probit or both logit) and the regressors of one of the models are included in the other one. Two binary models are overlapping if both possess the same c.d.f. (both probit or both logit) with some common explanatory variables and other specific variables. Finally, the compared models are non-nested if they possess only specific regressors. Moreover, the models are also non-nested when they possess different c.d.f. (probit versus logit), even if there are some common regressors. Many works define nested and non-nested models, and describe the way of working in every situation [4,5,6], while the overlapping models have been the least analyzed. In this paper, we compared overlapping models, and we found that they were equivalent or that one of them was better than the other (non-equivalent).

Although the hypothesis testing procedures (HTP) are widely used to discriminate between models, we can only use them to choose between pairs of models. In comparison, the selection criteria allowed us to select the best model from quite a large set. This is an important advantage in empirical econometric works. The latter approach allows researchers to express their objectives in the form of a loss function, or by using the discrepancy concept. As [7] established, the discrepancy concept is a particular case of loss function. For non-linear regression models (our framework), the procedures developed by [3,8,9] belongs to the first category (HTP). The second category involves the well-known AIC [10] and SBIC [11], where the discrepancy was obtained from the Kullback–Leibler distance. Additionally, the use of the mean square error (MSE) of prediction as a discrepancy enabled us to derive another criterion, denoted as

C_{2}

(see [12]).

Many works have studied the behaviour of selection procedures in linear regression models. However, this subject has been less analysed in a non-linear regression context, and the nested framework is nearly always assumed [13,14,15]. The performance of some selection procedures has also been studied in phylogenetics, where partitioned models were used [16,17,18,19]. Specific references for discrete choice models are [12] for nested models, and [20] for non-nested models.

In this paper, the competing models we selected from were overlapping models. The purpose was to investigate the discriminatory power of certain model selection criteria assuming two situations: (i) at least one of the models was correctly specified; (ii) neither of the models was correctly specified. According to [21], a well-specified model can include irrelevant variables together with the set of regressors of the data generating process (DGP). In our opinion, situation (ii) is the most interesting in practice but the least studied in the literature. Given that, in this case, no model was well-specified, we could not consider consistency as the condition that makes a given selection criterion adequate. The requirement we proposed is that the criterion selects the closest model to the DGP.

The article is organised as follows. In Section 2, we establish the general context and the methodology. Section 3 is dedicated to study the theoretical behaviour of the criteria. Section 4 presents and discusses the results from a Monte Carlo experiment. Conclusions are presented in Section 5.

2. Materials and Methods

Consider the following DGP:

M 0 : p_{i} = F (x_{i}^{'} β^{0}) = F (β_{0} + β_{1} x_{1 i} + β_{2} x_{2 i}) (i = 1, \dots, N)

(1)

and a pair of overlapping models which, in general terms, are defined as follows:

\begin{array}{l} M 1 : p_{i} = F (a_{i}^{'} γ) \\ M 2 : p_{i} = F (b_{i}^{'} δ) \end{array} (i = 1, \dots, N)

(2)

where F(·) is the cumulative distribution function (c.d.f), which can be normal or logistic, leading to the probit model or the logit model, respectively. The two competing models have the same c.d.f.;

a_{i}^{'}

and

b_{i}^{'}

are the

1 \times k_{1}

and

1 \times k_{2}

explanatory variables vectors of M1 and M2, for the i-observation;

γ

and

δ

are the corresponding parameter vectors. Given the definition of overlapping models,

a^{'} ⊄ b^{'}

and

b^{'} ⊄ a^{'}

are satisfied, and both vectors have some common variables.

In order to describe the relationship of each of the competing models with the true model (DGP) we used the Kullback–Leibler distance (KLIC) to the DGP:

KLIC (M 0, M j) = E_{0} \ln [\frac{f_{0}}{f_{j}}] (j = 1, 2)

(3)

being

f_{0}

the density function of the DGP and

f_{j}

corresponding to model Mj.

From (3), we can write:

KLIC (M 0, M 1) - KLIC (M 0, M 2) = E_{0} [\ln f (y | b, δ^{*})] - E_{0} [\ln f (y | a, γ^{*})] = E_{0} [ℓ_{2}^{*} - ℓ_{1}^{*}]

(4)

where

γ^{*}

and

δ^{*}

are the corresponding pseudo-true parameter vectors (see [22]).

It is well-known that if this statistic (expression (4)) is positive, then M2 is the preferred model, M1 being preferred if (4) is negative. If it is null, the two models are equivalents.

Given the DGP of (1), and following [21], any model which is correctly specified can be written as:

p_{i} = F (γ_{0} + γ_{1} x_{1 i} + γ_{2} x_{2 i} + \sum_{j = 3}^{k_{j}} γ_{j} d_{j i})

where d_j are additional regressors, including the particular case, where

d_{j}

do not exist.

It is worth noting that, for each competing model, the maximum likelihood estimation of the parameter vectors satisfies:

\begin{matrix} \hat{γ} \overset{p}{\to} γ^{*} \\ \hat{δ} \overset{p}{\to} δ^{*} \end{matrix}

(5)

We can distinguish two cases: the case where at least one of the competing models was correctly specified, and the case where neither of them was correct.

2.1. Case 1: At Least One of the Competing Models Is Correctly Specified

Then, the situations we considered are:

Case 1.1: Both models were well-specified (or both models included the DGP):

$\begin{array}{l} M 1 : p_{i} = F (γ_{0} + γ_{1} x_{1 i} + γ_{2} x_{2 i} + \sum_{j = 3}^{k_{1}} γ_{j} d_{j i}) \\ M 2 : p_{i} = F (δ_{0} + δ_{1} x_{1 i} + δ_{2} x_{2 i} + δ_{3} z_{i}) \end{array}$

(6)

with $z \neq d_{j} \forall j$ .

Case 1.2: Only one of them was well-specified (or only one of them included the DGP):

\begin{array}{l} M 1 : p_{i} = F (γ_{0} + γ_{1} x_{1 i} + γ_{2} x_{2 i} + \sum_{j = 3}^{k_{1}} γ_{j} d_{j i}) \\ M 2 : p_{i} = F (δ_{0} + δ_{1} x_{1 i} + δ_{2} z_{i}) \end{array}

(7)

Let

β^{0 +}

be the parameter vector extended with elements equal to zero in the places corresponding to the variables that are not included in the DGP, that is,

{(β^{0 +})}^{'}

=

{(β^{0} | 0)}^{'}

. From the convergence result (5), and according to [23], in case 1.1, the equality

γ^{*} = δ^{*} = β^{0 +}

held, implying that both models are equivalent. However, in case 1.2

γ^{*} = β^{0 +}

but

δ^{*} \neq β^{0 +}

, M1 being better than M2.

2.2. Case 2: Neither of the Models Is Correctly Specified (or Neither of the Models Includes the DGP)

In this situation the compared models are:

\begin{matrix} M 1 : p_{i} = F (γ_{0} + γ_{1} x_{1 i} + γ_{2} w_{i}) \\ M 2 : p_{i} = F (δ_{0} + δ_{1} x_{2 i} + δ_{2} w_{i}) \end{matrix}

(8)

We again used the convergence result (5) to conclude that, in this case,

γ^{*} \neq β^{0 +}

and

δ^{*} \neq β^{0 +}

, it being possible that the competing models are equivalent or not. Specifically, according to [3], there are two possible situations:

Case 2.1: $f (y_{i}; γ^{*}, a_{i}) = f (y_{i}; δ^{*}, b_{i})$ that is, the density functions of $y_{i}$ in M1 and M2, evaluated at the corresponding pseudo-true parameter vectors, were observationally identical. It implies that M1 and M2 are equivalent specifications.
Case 2.2: $f (y_{i}; γ^{*}, a_{i}) \neq f (y_{i}; δ^{*}, b_{i})$ . In this situation the models can be:
(a)
Equivalent, which means that $E_{0} [ℓ_{1}^{*}] = E_{0} [ℓ_{2}^{*}]$ .
(b)
Non-equivalent, or $E_{0} [ℓ_{1}^{*}] \neq E_{0} [ℓ_{2}^{*}]$ .

Now, we present the selection criteria, whose behaviour was the aim of our paper. Specifically, we are discussing the well-known information criteria (IC) of Akaike (AIC) and Schwarz (SBIC), and another criterion we call

C_{2}

. To obtain them, we adopted the discrepancy concept (see [7]). As we can see in [12], “A discrepancy measures the lack of fit between the proposed model and the DGP, in the aspect which the researcher considers the most relevant”. Then, the discrepancy for model M1 could be written as

Δ (F_{1}, F_{0})

, and we wished to minimize the “overall discrepancy”, expressed as

Δ (F_{\hat{γ}}, F_{0})

, or equivalently

Δ (\hat{γ})

, with

F_{\hat{γ}}

the estimated model M1 (that is,

{\hat{p}}_{i} = F ({a^{'}}_{i} \hat{γ})

). The estimation of the expected overall discrepancy,

{\hat{E}}_{0} Δ (\hat{γ})

, constitutes the selection criterion. Details about this procedure can be found in [7] and [12]. We assume two discrepancies, called

Δ_{1}

and

Δ_{2}

. The first one is the Kullback–Leibler distance. For a model Mj, it is expressed as

Δ_{1} (F_{j}, F_{0}) = K L I C (M j, M 0)

, and leads to the information criteria (IC) AIC and SBIC:

I C (M j) = - \frac{{\hat{ℓ}}_{j}}{N} + \frac{K_{N} (M j)}{N}

(9)

where

{\hat{ℓ}}_{j} (j = 1, 2)

denotes the log-likelihood of model Mj, evaluated at the corresponding vector of estimates,

k_{j}

is the number of parameters of Mj,

K_{N} (M j)

is the correction factor (

k_{j}

for AIC and

\frac{k_{j} \log N}{2}

for SBIC).

The second discrepancy is the mean square error (MSE) of prediction. For model M1, this discrepancy is

Δ_{2} (F_{1}, F_{0}) = E_{0} {(Y_{N + 1} - F ({a^{'}}_{N + 1} γ))}^{2}

, with “N + 1” indicating an out-sample observation. For any Mj model, and following the previously mentioned procedure, the expression of the criterion is:

C_{2} (M j) = \frac{S S D_{j}}{N} (1 + \frac{2 k_{j}}{N})

(10)

and

S S D_{j} = \sum_{i = 1}^{N} {(Y_{i} - {\hat{F}}_{j i})}^{2}

, the squared sum of the differences between the binary variable and the estimated probability with model Mj. The proof of (10) is developed in [24].

The model having the criterion with the lowest value was chosen, so different criteria could have led to a different choice. Nevertheless, we were interested in analysing if the criteria worked well, that is, if the selection was correct, in the sense we define in the following section.

3. Theoretical Results

In this section, we study the theoretical behaviour of the criteria, in order to prove if they perform well. All proofs of the results we present in this section can be seen, in detail, in [24].

We carry out an asymptotic analysis, which needs a set of initial assumptions, the results and definitions that we state below.

Assumption 1.

The

x_{i}^{'}

,

a_{i}^{'}

and

b_{i}^{'}

regressor vectors of the models specified in (1) and (2) are non stochastic. The variables of these vectors have sample means and variances with finite limits.

Lemma 1.

Let be

y_{i}

a variable, which is not i.i.d., but heterogeneous (non-identical means and non-identical variances). Then:

N^{- 1} \sum_{i = 1}^{N} a (y_{i}, \tilde{θ}) \overset{p}{\to} E [\frac{1}{N} \sum_{i = 1}^{N} a (y_{i}, θ_{0})]

(11)

Proof.

The proof of (11) is based on the law of large numbers for heterogeneous variables, together with a lemma of [25].

The law of large numbers for heterogeneous variables is expressed in the following terms [26]: “Let the sequence {

y_{i} - μ_{i}

} be independent with E (

y_{i} - μ_{i}

) = 0. If E

| y_{i} - μ_{i} |^{1 + δ}

≤ B < ∞ ∀i with δ > 0, then

\frac{1}{N} \sum_{i = 1}^{N} (y_{i} - μ_{i})

\overset{p}{\to}

0”.

The lemma of [25] (p. 2156) is expressed as follows: “If

z_{i}

is i.i.d., a (z,

θ

) is continuous at

θ_{0}

with probability one, and there is a neighbourhood

Γ

of

θ_{0}

such that E [

\sup_{θ \in Γ} ‖a (z, θ)‖

] < ∞, then for any

\tilde{θ} \overset{p}{\to}

θ_{0}

,

N^{- 1} \sum_{i = 1}^{N} a (z_{i}, \tilde{θ}) \overset{p}{\to} E [a (z, θ_{0})]

”. This lemma, together with the law of large numbers, allows us to write (11). □

Definition 1.

M1 and M2 are equivalent models if

E_{0} [\log \frac{f (y_{i}; a_{i}, γ^{*})}{f (y_{i}; b_{i}, δ^{*})}] = 0

(12)

which leads to:

[F_{0 i} \log \frac{F_{1 i}}{F_{2 i}} + (1 - F_{0 i}) \log \frac{1 - F_{1 i}}{1 - F_{2 i}}] = 0

(13)

where

F_{0 i} = F (x_{i}^{'} β)

,

F_{1 i} = F (a_{i}^{'} γ^{*})

and

F_{2 i} = F (b_{i}^{'} δ^{*})

.

Definition 2.

M1 is closer to the DGP than M2 if:

E_{0} [\log \frac{f (y_{i}; a_{i}, γ^{*})}{f (y_{i}; b_{i}, δ^{*})}] > 0

(14)

which leads to:

[F_{0 i} \log \frac{F_{1 i}}{F_{2 i}} + (1 - F_{0 i}) \log \frac{1 - F_{1 i}}{1 - F_{2 i}}] > 0

(15)

Definition 3.

Let

R (\cdot)

be a model selection criterion. It is said that

R (\cdot)

is adequate when:

(i): If M1 and M2 are equivalent, plim [R(M1)] = plim [R(M2)].
(ii): If M1 is closer than M2 to the DGP, then plim [R(M1)] < plim [R(M2)].

Now, for every case we enuntiated in the previous section, we must prove whether the definition of “adequate criterion” is satisfied.

Result 1.

The IC criteria behave well in all settings.

Proof.

The basic tool for achieving this result is the comparison of Definitions 1 and 2 with Definition 3. In this sense, Definitions 1 and 2 establish the condition that must be met when the compared models are equivalent or non-equivalent, respectively. On the other hand, Definition 3 tells us the requirements for determining if a specific criterion is adequate in each context (of equivalence or not).

Expression (9) can be written as:

I C (M j) = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} \log {\hat{F}}_{j i} + (1 - y_{i}) \log (1 - {\hat{F}}_{j i})] + \frac{K_{N} (M j)}{N} j = 1, 2

(16)

with

{\hat{F}}_{1 i} = F (a_{i}^{'} \hat{γ})

and

{\hat{F}}_{2 i} = F (b_{i}^{'} \hat{δ})

.

Using Lemma 1 and the convergences given in (5) for the first term, we obtain:

\frac{{\hat{ℓ}}_{j}}{N} \overset{p}{\to} \frac{1}{N} \sum_{i = 1}^{N} E_{0} [y_{i} \log F_{j i} + (1 - y_{i}) \log (1 - F_{j i})] j = 1, 2

(17)

The correction factor

\frac{K_{N} (M j)}{N}

converges to zero for every model.

When the competing models are equivalent (cases 1.1, 2.1 and 2.2.(a)), the IC criteria will be adequate if equality of Definition 3 (i) holds, which, using (17), leads to the following expression:

\frac{1}{N} \sum_{i = 1}^{N} [F_{0 i} \log \frac{F_{1 i}}{F_{2 i}} + (1 - F_{0 i}) \log \frac{(1 - F_{1 i})}{(1 - F_{2 i})}] = 0

(18)

This result is always satisfied, given Definition 1, so we can say that the IC criteria performed well.

When the models are non-equivalent, and assuming M1 is always better than M2 (cases 1.2 and 2.2.(b)), the IC criteria will be adequate if equality of Definition 3 (ii) holds, which, using (17), leads to:

\frac{1}{N} \sum_{i = 1}^{N} [F_{0 i} \log \frac{F_{1 i}}{F_{2 i}} + (1 - F_{0 i}) \log \frac{(1 - F_{1 i})}{(1 - F_{2 i})}] > 0

(19)

this result being identical to Definition 2. Thus, the IC criteria performed well. □

It should be noted that

\frac{K_{N} (M j)}{N}

converges to zero faster for the model with a lower

k_{j}

. Additionally, expression (19) shows that AIC and SBIC are asymptotically identical, which is not strictly true when

k_{1} \neq k_{2}

. In this situation, for a given pair of competing models, the difference between both criteria is due to the different rate of convergence to zero between

A I C (M 1) - A I C (M 2)

and

S B I C (M 1) - S B I C (M 2)

. This difference is caused by the corrector factor.

Specifically, we can write:

\frac{S B I C (M 1) - S B I C (M 2)}{A I C (M 1) - A I C (M 2)} = \frac{O (\frac{\log N}{N})}{O (\frac{1}{N})} = O (\log N)

(20)

which means that, when N increases, the distance between the convergence rate of the numerator and the denominator of expression (20) becomes larger. This implies that SBIC will tend toward one of the models more than AIC. Which model? It is evident that, if

k_{1} > k_{2}

, the tendency will be toward M2, given that both AIC and SBIC selects the model with a lower value of the criterion. It is important to remark that

p l i m (I C (M 1) - I C (M 2)) = 0

is not contradictory with a higher tendency to the model that is more parsimonious, given that both models are equivalent.

Result 2.

The

C_{2}

criterion is adequate, except in a specific situation.

Proof.

Expression (10) can be written as:

C_{2} (M j) = \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{F}}_{j i})}^{2}}{N} (1 + \frac{2 k_{j}}{N}) j = 1, 2

(21)

Applying Lemma 1 together with convergences (5) we obtain:

\frac{S S D_{j}}{N} = \frac{1}{N} {\sum_{i = 1}^{N} (y_{i} - {\hat{F}}_{j i})}^{2} \overset{p}{\to} \frac{\sum_{i = 1}^{N} E_{0} {(y_{i} - F_{j i})}^{2}}{N} j = 1, 2

(22)

Additionally, the term

(1 + \frac{2 k_{j}}{N})

(j = 1, 2) converges to 1 when N → ∞.

When the compared models are equivalent and well-specified (case 1.1), the probability limit (22) is the same for both models. It implies that Definition 3 i) is satisfied, in other words, the

C_{2}

criterion performed well. Note that the convergence rate is different between M1 and M2 when

k_{1} \neq k_{2}

. The term

\frac{2 k_{j}}{N}

is

O (\frac{1}{N})

and converges to zero faster for models with a lower

k_{j}

.

When the compared models are non-equivalent and only M1 is well-specified (case 1.2), the probability limit (22) is different for each model:

\frac{S S D_{1}}{N} \overset{p}{\to} \frac{\sum_{i = 1}^{N} F_{0 i} (1 - F_{0 i})}{N} = h_{1}

(23)

\frac{S S D_{2}}{N} \overset{p}{\to} \frac{\sum_{i = 1}^{N} F_{0 i} (1 - F_{0 i})}{N} + \frac{\sum_{i = 1}^{N} {(F_{0 i} - F_{2 i})}^{2}}{N} = h_{1} + h_{2}

(24)

being

h_{1}

and

h_{2}

positive terms. It is straightforward to see that Definition 3 (ii) is satisfied, so the

C_{2}

criterion performed adequately.

If neither of the competing models is correctly specified (case 2), the probability limits of (22) for each model can be written as:

\frac{S S D_{1}}{N} \overset{p}{\to} \frac{\sum_{i = 1}^{N} F_{´ 0 i} (1 - F_{0 i})}{N} + \frac{\sum_{i = 1}^{N} {[F_{0 i} - F_{1 i}]}^{2}}{N} = h_{1} + h_{3}

(25)

\frac{S S D_{2}}{N} \overset{p}{\to} \frac{\sum_{i = 1}^{N} F_{0 i} (1 - F_{0 i})}{N} + \frac{\sum_{i = 1}^{N} {[F_{0 i} - F_{2 i}]}^{2}}{N} = h_{1} + h_{2}

(26)

with

h_{i}

(i = 1, 2, 3) being positive constants.

Now, the final conclusions depend on the relationship between the density functions. Then, in Case 2.1, where the density functions were observationally identical (equivalent models),

F_{1 i} = F_{2 i}

is satisfied. It implies that

h_{3} = h_{2}

, so Definition 3 is verified, and the

C_{2}

criterion behaved well.

In Case 2.2. (a), with non-observationally identical density functions and equivalent models, the only possibility for achieving

h_{3} = h_{2}

is that, on average,

[F_{0 i} - F_{1 i}] = - [F_{0 i} - F_{2 i}]

, or, equivalently,

2 F_{0 i} = F_{1 i} + F_{2 i}

. Therefore, there can be empirical works where the criterion

C_{2}

did not behave well. The Monte Carlo experiment will allow us a more specific analysis of the behaviour of the criterion.

Finally, in Case 2.2.(b), where the competing models were non-equivalent, we assumed that M1 was better than M2. In order to study the power of the criterion, we applied a strategy similar to that used in the IC criteria. That is, we related Definitions 2 and 3 (ii). Definition 2 can be written as:

{(\frac{F_{1 i}}{F_{2 i}})}^{F_{0 i}} > {(\frac{1 - F_{2 i}}{1 - F_{1 i}})}^{1 - F_{0 i}}

(27)

We wanted to find the combinations of

F_{0 i}

,

F_{1 i}

and

F_{2 i}

that satisfy (27). The results we obtained are summarized in Table 1.

Definition 3 (ii) establishes that the

C_{2}

criterion behaves well if

h_{3} < h_{2}

, that is to say:

\frac{\sum_{i = 1}^{N} [(F_{1 i} - F_{2 i}) (F_{1 i} + F_{2 i} - 2 F_{0 i})]}{N} < 0

(28)

For every combination presented in Table 1, we get the previous result, so the

C_{2}

criterion is adequate. □

4. Simulation Study and Discussion

The objective of the Monte Carlo experiments is twofold: confirm the theoretical results and assess the performance of all criteria with finite samples.

The generation of the binary variable

y_{i}

is based on the latent linear model that underlies any binary model:

y_{i}^{*} = x_{i}^{'} β + u_{i}

(29)

where

y_{i}^{*}

is a latent (unobservable) variable which generates

y_{i}

through:

y_{i} = \{\begin{matrix} 1 \\ 0 \end{matrix} \begin{matrix} i f \\ i f \end{matrix} \begin{matrix} y_{i}^{*} > 0 \\ y_{i}^{*} \leq 0 \end{matrix}

(30)

Under the assumption established in Section 3, and following the procedure of [27], we obtained the values of

y_{i}

. We considered different sets of parameter values and different kinds of explanatory variables (continuous and dummy) and the standard normal distribution function was chosen for the error term, implying exclusive focus on probit models. Two sample sizes N = 200 and 2000 were used; we carried out 500 replications for each experiment. Additionally, the intercept was fixed at a value of −2, in order to avoid a non-balanced number of ones in the sample of

y_{i}

, which would lead to problems when estimating and interpreting results.

In each of the 500 replications, we estimated M1 and M2, and calculated the value of the IC and

C_{2}

criteria in each replication. The corresponding tables for every experiment show the number of times that each criterion selected M1. Note that we only present tables for N = 2000 and comment the differences from N = 200 if such differences exist. In all cases, the DGP is

p_{i} = F (β_{0} + β_{1} x_{1 i} + β_{2} x_{2 i})

.

4.1. Montecarlo Exercise When Both Models Are Correctly Specified (Case 1.1)

We consider the following well-specified models M1 and M2:

\begin{array}{l} M 1 \equiv p_{i} = F (γ_{0} + γ_{1} x_{1 i} + γ_{2} x_{2 i} + γ_{3} w_{i} + γ_{4} s_{i}) \\ M 2 \equiv p_{i} = F (δ_{0} + δ_{1} x_{1 i} + δ_{2} x_{2 i} + δ_{3} z_{i}) \end{array}

Firstly, we assumed

γ_{4} = 0

, so we chose between models with the same number of parameters and, afterwards, we assumed

γ_{4} \neq 0

. These settings are called A and B, respectively; the results are presented in Table 2.

For setting A, we can see that the number presented in each cell was around 250 (50% of 500 times), which is the correct behaviour for equivalent models. However, if we consider setting B, where M1 had more irrelevant regressors than M2, we could see that the criteria tended to select the model with fewer parameters, that is, it selected the more parsimonious model (M2), and that this tendency grew with the sample size. This behaviour was also correct given that both specifications were equivalent. Specifically, the most parsimonious criterion is SBIC, so the Monte Carlo exercise corroborated this theoretical aspect of the previous section. Additionally, we observed that neither the kind of variables nor the set of values of the DGP parameter vector seemed to affect the behaviour of the criteria.

4.2. Montecarlo Exercise When Only One of the Models Is Correctly Specified (Case 1.2)

In these experiments, M1 and M2 are expressed as follows:

\begin{array}{l} M 1 \equiv p_{i} = F (γ_{0} + γ_{1} x_{1 i} + γ_{2} x_{2 i} + γ_{3} w_{i}) \\ M 2 \equiv p_{i} = F (δ_{0} + δ_{1} x_{1 i} + δ_{2} z_{i}) \end{array}

The theoretical results were corroborated, so all the criteria tended to select M1 for whatever kind of explanatory variables. The corresponding table has been omitted, given that the value in all cells was 500.

However, for N = 200 the results were not so evident, although they tended towards adequate behaviour. Specifically, differences were found when the variables

x_{1}

and

x_{2}

were both uniform and the weight of

x_{2}

was not greater than that of

x_{1}

; this difference was more evident for the SBIC criterion.

4.3. Montecarlo Experiment When Neither of the Models Is Correctly Specified (Case 2)

We needed to analyse each of the situations defined in Case 2 of Section 2 separately. The two compared models are:

\begin{matrix} M 1 \equiv p_{i} = F (γ_{0} + γ_{1} x_{1 i} + γ_{2} w_{i}) \\ M 2 \equiv p_{i} = F (δ_{0} + δ_{1} x_{2 i} + δ_{2} w_{i}) \end{matrix}

The implementation of the experiments for 2.1 and 2.2.(a) required using the relationship between the true parameter vector (

β_{}^{0}

) and each of the pseudo-true parameter vectors (

γ^{*}

and

δ^{*}

). In case 2.1, we should have been able to obtain the value of

β_{}^{0}

from them, satisfying the equality of density functions. In other terms, if we had

γ^{*} = m_{1} (β^{0})

and

δ^{*} = m_{2} (β^{0})

, we were interested in obtaining

β_{}^{0}

that makes

f (Y_{i}; m_{1} (β^{0}), a_{i}) = f (Y_{i}; m_{2} (β^{0}), b_{i})

. The same idea could be used in 2.2.(a) in order to make E(

ℓ_{1}^{*}

) and E(

ℓ_{2}^{*}

) equal, that is,

E [ℓ_{1} (m_{1} (β^{0}))] = E [ℓ_{2} (m_{2} (β^{0}))]

. Nevertheless, the non-linear equation system that we needed to solve possessed insurmountable problems. Given that we could not obtain the exact relationship, we needed to approximate the equalities of densities and likelihoods. To this end, we generated several DGPs modifying the value of the parameter vector

β_{}^{0}

and the kind of explanatory variables. Again, the intercept was fixed at a value of −2, while

β_{1}

and

β_{2}

took values in the range of (−2, 2) counting by 0.5 s; additionally, they took values 3, 4, 5 and 7. As a result of this strategy, we generated 132 different DGPs. Each of the outlined DGPs lead to a specific relationship between models M1 and M2: equivalent models (with identical or non-identical densities) or non-equivalent models.

In order to classify the 132 experiments into the two categories, we used the following indicators:

a b s e l = \frac{1}{N} \sum_{i = 1}^{N} |d e l_{i}|

(31)

AM 1 \equiv times (of the N observations) that d e l_{i} > 0

(32)

a b s d i f d e n = \frac{1}{N} \sum_{i = 1}^{N} |d i f d e n_{i}|

(33)

with

d e l_{i} = E (ℓ_{1 i}^{*}) - E (ℓ_{2 i}^{*})

and

d i f d e n_{i} = f (y_{i}; γ^{*}, x_{1}, x_{2}) - f (y_{i}; δ^{*}, x_{1}, x_{2})

.

Firstly, we classified the experiments into “containing equivalent models”, or “containing nonequivalent models”. Secondly, in the first group, we distinguished identical from non-identical densities. Finally, we classified the non-equivalent models depending on their closeness to the DGP. The following three stages were carried out:

Step 1.

The two requirements for considering the models as equivalents are:

(R.1): A value of absel close to zero.
(R.2): A value of AM1 close to $N / 2$ .

Taking into account absel, two models will tend to be equivalent if, for most of the observations,

d e l_{i} \approx 0

, which should lead to

a b s e l \approx 0

. Could we have used only this measure to affirm that the models were equivalent? The answer is no, because we could find

a b s e l \approx 0

but with most of the observations satisfying

E (ℓ_{1 i}^{*}) > E (ℓ_{2 i}^{*})

, which means that M1 was closer to the DGP. Using AM1 instead of absel, two models will tend to be equivalent if

A M 1 \approx N / 2

. Could we have used only AM1 to classify the models? Again, the answer is no, because it could happen that

A M 1 \approx N / 2

but with a large value of absel, due to large values of

|d e l_{i}|

. Then, we need the two requirements (R.1) and (R.2).

Step 2.: To consider that two equivalent models have identical densities, a value of difden close to zero is required.
Step 3.: If model M1 is better than M2, $N / 2 < A M 1 < N$ must be satisfied, while M2 will be better if $0 < A M 1 < N / 2$ .

Each experiment was numbered from 1 to 132, and was classified in type A, B or C, as we can see in Table 3.

Those experiments with an extreme percentage of zeros in the sample of the binary variable were omitted. Table 4 presents the experiments with the lowest value of absel, the values of AM1 closer to 1000, and the values of absdifden near zero.

We concluded that experiments 27, 3 and 6 included equivalent models, 27 and 3 having identical densities. The rest of the experiments corresponded to non-equivalent models, and we needed to classify them according to their closeness to the DGP. Given that AM1 was the adequate indicator, Table 5 shows all the experiments sequenced from the highest to the lowest value of this measure.

The variable w is always generated as N(3,1) and IC groups AIC and SBIC together, because the results of both criteria were identical.

We find the experiments with equivalent models at the middle of this table. Above them (the upper part of the table), we see the experiments where M1 was better and, below them (the lower part), those where M2 was the best model. The results showed that, in the upper end of the table, the values in columns IC and C2 tended towards 500 and, in the lower end, tended towards zero, corroborating the theoretical conclusions.

The experiments that contained equivalent models with identical densities (27 and 3) corroborated the theoretical results. On the other hand, in experiment 6 (equivalent models with non-identical densities), both IC and

C_{2}

performed well. Nevertheless, the theoretical results for

C_{2}

concluded that this criterion was adequate only in some situations. We can affirm that experiment 6 belonged to one of these situations, characterized by uniform distribution of the DGP variables, and similar (but not oversized) weights for both variables. We think that these characteristics could cause

C_{2}

to behave well.

Finally, we observed a non-adequate behaviour of the criteria in some experiments. In the upper part of the table, experiments 62, 48 and 57 showed values far less than 500 and large differences between the values of the criteria columns. In the lower part of the table, we find that experiments 21 and 109 have IC and C2 column values far from 0. Could this atypical behaviour be due to anomalous observations? Taking into account that

d e l_{i}

is the main element which underlies the indicators used to classify the experiments, we studied whether extreme values of

|d e l_{i}|

were always associated with the same selection (same sign of

d e l_{i}

). We found that this happened in all the experiments, except in 21. Eliminating these extreme values, the behaviour of the criteria became adequate, as we show in Table 6.

As a final comment, we observed a greatly reduced number of experiments with equivalent models. We understood that this was logical, because the experiments of Case 2 corresponded to pairs of models where the DGP was not nested in M1 or M2. Given that model M1 contained the variable

x_{1}

and model M2 contained

x_{2}

(

x_{1}

and

x_{2}

being the only DGP variables), it was very difficult to find cases where both the M1 and M2 models were equivalent.

When we re-executed the analysis for a sample size of 200, the results were similar in general terms, although the tendency toward correct behaviour of the criteria was slower. Nevertheless, we could affirm that the three criteria performed quite well for finite sample sizes.

5. Conclusions

Within the framework of overlapping binary models, we have studied the power of model selection criteria: the well-known information criteria AIC and SBIC, and the

C_{2}

criterion, based on the mean square error of prediction.

As we previously mentioned, two binary models are overlapping if both have the same functional form (both probit or both logit), with some common explanatory variables and some specific variables. In this article, we distinguished two cases: i) at least one of the competing models is well-specified, and ii) neither of them is correctly specified. This last case is an important aspect of our work because it is not commonly considered in empirical works.

From a theoretical point of view, we have classified the competing models as equivalent or non-equivalent. Once this classification had been carried out, the task was to define the requirement that a given criterion must satisfy to be considered as adequate. Specifically, if two models are equivalent, the probability limits of a given criterion must be the same in both models. However, if one of them is better, its corresponding limit must be lower than that of the other model. The theoretical analysis carried out has confirmed that all the criteria performed well in every situation. Only

C_{2}

did not, sometimes, behave well in a specific alternative.

These theoretical results have been corroborated by a Monte Carlo experiment. The most complicated situation to simulate was, as we expected, when neither of the two models were well-specified. This situation can lead to three possibilities: equivalent models with identical densities, equivalent models with non-identical densities, and non-equivalent models.

In order to develop this part of the Monte Carlo exercise, we had to generate 132 different DGPs, leading to 132 different experiments. Each of the experiments corresponded to one of the three theoretical relationships mentioned above. To establish the specific relationship, we have defined three indicators:

(a): The average of the absolute differences between the expected log-likelihoods (at the pseudo-trues) of both models. We have denoted it as absel.
(b): The number of observations in the sample where the expected log-likelihood in model M1 is larger than in M2. Note that we have assumed that M1 is the closest to the DGP. This indicator is called AM1.
(c): The average of the absolute differences between the density functions (at the pseudo-trues) of both models. We have denoted it as absdifden.

The general conclusion is that the three criteria behaved well for overlapping binary models: when neither of the two competing models was well-specified, the criteria tended to choose the best of them, that is, the closest to the DGP. In the most commonly studied case, where at least one of the competing models was correct, our conclusion was that the criteria also performed well, as we expected. Furthermore, when both models were correct, the criteria tended to choose the most parsimonious model.

It is important to note that these criteria are also used when we compared an extensive set of models, correctly specified or misspecified. The criteria AIC, SBIC and

C_{2}

allow us to order them, being in the first places those correctly specified, which will be equivalent to each other. Among them, the first one (the selected model) will be the most parsimonious if we use the SBIC criterion. The misspecified models will be at the bottom of the ranking.

This paper has been focused on the restricted framework of overlapping binary models. In order to complete this analysis, a future work should study the behaviour of AIC, SBIC and

C_{2}

in the non-nested framework. Moreover, the wider context of multinomial dependent variables could be the aim of future research. Given that the MLE procedure was also applied to estimate these models, the formal expression of the IC would be quite straight, while

C_{2}

would require a deeper analysis.

Author Contributions

Conceptualization, T.A. and I.V.; methodology, T.A. and I.V.; software, I.V.; formal analysis, T.A. and I.V.; numerical simulation, I.V.; writing—original draft preparation, T.A. and I.V.; writing—review and editing, I.V.; supervision, I.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by DGA Reference Group S40_20R and Agencia Estatal de Investigación Reference PID2019-106822RB-I00.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data were generated in the simulation study.

Conflicts of Interest

The authors declare no conflict of interest.

References

Aljarallah, R.; Kharroubi, S.A. Use of Bayesian Markov Chain Monte Carlo Methods to Model Kuwait Medical Genetic Center Data: An Application to Down Syndrome and Mental Retardation. Mathematics 2021, 9, 248. [Google Scholar] [CrossRef]
Li, Z.; Wang, E.; Su, J.; Yu, Y. Using MCMC Probit Model to Value Coastal Beach Quality Improvement. J. Environ. Prot. 2011, 2, 109–114. [Google Scholar] [CrossRef] [Green Version]
Vuong, Q.H. Likelihood Ratio Tests for Model Selection and Non-Nested Hypotheses. Econometrica 1989, 57, 307–333. [Google Scholar] [CrossRef] [Green Version]
Lewis, F.; Butler, A.; Gilbert, L. A Unified Approach to Model Selection Using the Likelihood Ratio Test. Methods Ecol. Evol. 2011, 2, 155–162. [Google Scholar] [CrossRef]
Hendry, D.F. Econometric Modelling; Department of Economics, University of Oslo: Oslo, Norway, 2000. [Google Scholar]
Hong, H.; Preston, B. Nonnested Model Selection Criteria; Department of Economics, Stanford University: Stanford, CA, USA, 2006; pp. 1–33. [Google Scholar]
Linhart, H.; Zucchini, W. Model Selection; John Wiley and Sons: New York, NY, USA, 1986. [Google Scholar]
Pesaran, M.H.; Pesaran, B. A Simulation Approach to the Problem of Computing Cox’s Statistics for Testing Nonnested Models. J. Econom. 1993, 57, 377–392. [Google Scholar] [CrossRef]
Santos Silva, J.M.C. A Score Test for Non-Nested Hypotheses with Applications to Discrete Data Models. J. Appl. Econom. 2001, 16, 577–592. [Google Scholar] [CrossRef]
Akaike, H. Information Theory and an Extension of the Likelihood Ratio Principle. In Proceedings of the Second International Symposium on Information Theory, Tsahkadsor, Armenia, USSR, 2–8 September 1971; Petrov, B.N., Csáki, F., Eds.; Akademiai Kiado: Budapest, Hungary, 1973; pp. 267–281. [Google Scholar]
Schwarz, G. Estimating the Dimension of a Model. Ann. Stat. 1978, 6, 461–464. [Google Scholar] [CrossRef]
Aparicio, T.; Villanúa, I. Some Selection Criteria for Nested Binary Choice Models: A Comparative Study. Comput. Stat. 2007, 22, 635–660. [Google Scholar] [CrossRef]
Kim, H.; Cavanaugh, J.E. Model Selection Criteria Based on Kullback Information Measures for Nonlinear Regression. J. Stat. Plan. Inference 2005, 134, 332–349. [Google Scholar] [CrossRef]
Van Der Hoeven, N. The Probability to Select the Correct Model Using Likelihood-Ratio Based Criteria in Choosing Between Two Nested Models of Which the More Extended One Is True. J. Stat. Plan. Inference 2005, 135, 477–486. [Google Scholar] [CrossRef]
Lalou, P.; Chalikias, M.; Skordoulis, M.; Papadopoulos, P.; Fatouros, S. A Probabilistic Evaluation of Sales Expansion. In Proceedings of the 5th International Symposium and 27th National Conference on Operation Research, Egaleo, Greece, 9–11 June 2016; pp. 109–113, ISBN 978-618-80361-6-1. [Google Scholar]
Seo, T.K.; Thorne, J.L. Information Criteria for Comparing Partition Schemes. Syst. Biol. 2018, 67, 616–632. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Jhwueng, D.C.; Huzurbazar, S.; O’Meara, B.C.; Liu, L. Investigating the Performance of AIC in Selecting Phylogenetic Models. Stat. Appl. Genet. Mol. Biol. 2014, 13, 459–475. [Google Scholar] [CrossRef] [PubMed]
Susko, E.; Roger, A.J. On the Use of Information Criteria for Model Selection in Phylogenetics. Mol. Biol. Evol. 2020, 37, 549–562. [Google Scholar] [CrossRef] [PubMed]
Dziak, J.J.; Coffman, D.L.; Lanza, S.T.; Li, R.; Jermiin, L.S. Sensitivity and Specificity of Information Criteria. Brief. Bioinform. 2020, 21, 553–565. [Google Scholar] [CrossRef] [PubMed]
Monfardini, C. An Illustration of Cox’s Non-Nested Testing Procedure for Logit and Probit Models. Comput. Stat. Data Anal. 2003, 42, 425–444. [Google Scholar] [CrossRef] [Green Version]
Bierens, H.J. Topics in Advances Econometrics; Cambridge University Press: Cambridge, UK, 1994. [Google Scholar]
White, H. Maximum Likelihood Estimation of Misspecified Models. Econometrica 1982, 50, 1–25. [Google Scholar] [CrossRef]
Kohn, R. Consistent Estimation of Minimal Subset Dimension. Econometrica 1983, 51, 367–376. [Google Scholar] [CrossRef]
Aparicio, T.; Villanúa, I. Selection Criteria for Overlapping Binary Models; Documentos de Trabajo; Facultad de Economía y Empresa, Universidad de Zaragoza: Zaragoza, Spain, 2012; pp. 1–54. [Google Scholar]
Newey, W.K.; McFadden, D. Large sample estimation and hypothesis testing. In Handbook of Econometrics; Elsevier: Amsterdam, The Netherlands, 1994; Volume 4, pp. 2111–2245. [Google Scholar]
Davidson, J. Econometric Theory; Blackwell: Oxford, UK, 2000. [Google Scholar]
Gourieroux, C.; Monfort, A.; Renault, E.; Trognon, A. Simulated Residuals. J. Econom. 1987, 34, 201–252. [Google Scholar] [CrossRef]

Table 1. Combinations of

F_{0}

,

F_{1}

and

F_{2}

which leads to M1 be better than M2.

Table 1. Combinations of

F_{0}

,

F_{1}

and

F_{2}

which leads to M1 be better than M2.

Value of $F_{0}$	Condition Satisfied When M1 Is Better than M2
$F_{0 i} = 0$	$F_{1 i} < F_{2 i}$ and $F_{1 i} \neq 0$
$F_{0 i}$ ∈ (0, 0.5]	$\begin{matrix} F_{1 i} < F_{2 i} and F_{1 i} + F_{2 i} > 1 \\ F_{1 i} > F_{2 i} and F_{1 i} + F_{2 i} < 1 and F_{0 i} ≅ 0.5 \\ F_{1 i} < F_{2 i} and F_{1 i} + F_{2 i} < 1 and F_{0 i} \approx 0 \end{matrix}$
$F_{0 i}$ ∈ (0.5, 1)	$\begin{matrix} F_{1 i} > F_{2 i} and F_{1 i} + F_{2 i} < 1 \\ F_{1 i} > F_{2 i} and F_{1 i} + F_{2 i} > 1 and F_{0 i} \approx 1 \\ F_{1 i} < F_{2 i} and F_{1 i} + F_{2 i} > 1 and F_{0 i} \approx 0.5 \end{matrix}$
$F_{0 i} = 1$	$F_{1 i} > F_{2 i}$ and $F_{1 i} \neq 1$

Table 2. Behaviour of the selection criteria in Case1.1 N = 2000.

	DGP			Specified Models
	$β^{'}$	$x_{1}$	$x_{2}$	$w$	z	s	AIC	SBIC	$C_{2}$
Setting A	(−2,3,1)	U	U	U	N(3,1)		273	273	275
	(−2,1,3)	U	U	U	N(3,1)		236	236	240
	(−2,1,1)	U	U	U	N(3,1)		234	234	244
	(−2,3,1)	$χ^{2}$	$χ^{2}$	U	N(3,1)		240	240	252
	(−2,1,3)	$χ^{2}$	$χ^{2}$	U	N(3,1)		273	273	261
	(−2,1,1)	$χ^{2}$	$χ^{2}$	U	N(3,1)		236	236	238
	(−2,3,1)	U	$χ^{2}$	U	N(3,1)		247	247	245
	(−2,1,3)	U	$χ^{2}$	U	N(3,1)		231	231	245
	(−2,1,1)	U	$χ^{2}$	U	N(3,1)		243	243	260
	(−2,3,1)	U	dummy	U	N(3,1)		259	259	269
	(−2,1,3)	U	dummy	U	N(3,1)		231	231	245
	(−2,1,1)	U	dummy	U	N(3,1)		242	242	261
	(−2,3,1)	$χ^{2}$	dummy	U	N(3,1)		255	255	251
	(−2,1,3)	$χ^{2}$	dummy	U	N(3,1)		248	248	257
	(−2,1,1)	$χ^{2}$	dummy	U	N(3,1)		251	251	234
Setting B	(−2,3,1)	U	U	U	N(3,1)	$χ^{2}$	144	10	179
	(−2,1,3)	U	U	U	N(3,1)	$χ^{2}$	133	10	151
	(−2,1,1)	U	U	U	N(3,1)	$χ^{2}$	126	2	144
	(−2,3,1)	$χ^{2}$	$χ^{2}$	U	N(3,1)	$χ^{2}$	128	12	215
	(−2,1,3)	$χ^{2}$	$χ^{2}$	U	N(3,1)	$χ^{2}$	140	10	230
	(−2,1,1)	$χ^{2}$	$χ^{2}$	U	N(3,1)	$χ^{2}$	130	11	185
	(−2,3,1)	U	$χ^{2}$	U	N(3,1)	$χ^{2}$	128	9	165
	(−2,1,3)	U	$χ^{2}$	U	N(3,1)	$χ^{2}$	106	2	192
	(−2,1,1)	U	$χ^{2}$	U	N(3,1)	$χ^{2}$	120	5	161
	(−2,3,1)	U	dummy	U	N(3,1)	$χ^{2}$	144	12	177
	(−2,1,3)	U	dummy	U	N(3,1)	$χ^{2}$	145	7	153
	(−2,1,1)	U	dummy	U	N(3,1)	$χ^{2}$	141	7	162
	(−2,3,1)	$χ^{2}$	dummy	U	N(3,1)	$χ^{2}$	147	10	200
	(−2,1,3)	$χ^{2}$	dummy	U	N(3,1)	$χ^{2}$	136	8	185
	(−2,1,1)	$χ^{2}$	dummy	U	N(3,1)	$χ^{2}$	143	12	185

Table 3. Number of every experiment in Case 2, and types of x in the DGP (A,B,C) ¹.

Number	DGP	Number	DGP	Number	DGP	Number	DGP
A,B,C	$(β_{1}, β_{2})$	A,B,C	$(β_{1}, β_{2})$	A,B,C	$(β_{1}, β_{2})$	A,B,C	$(β_{1}, β_{2})$
1,45,89	(1,−2)	12,56,100	(1,7)	23,67,111	(3,5)	34,78,122	(7,1)
2,46,90	(1,−1.5)	13,57,101	(3,−2)	24,68,112	(3,7)	35,79,123	(−2,3)
3,47,91	(1,−1)	14,58,102	3,−1.5)	25,69,113	(−2,1)	36,80,124	(−1.5,3)
4,48,92	(1,−0.5)	15,59,103	(3,−1)	26,70,114	(−1.5,1)	37,81,125	(−1,3)
5,49,93	(1,0.5)	16,60,104	(3,−0.5)	27,71,115	(−1,1)	38,82,126	(−0.5,3)
6,50,94	(1,1)	17,61,105	(3,0.5)	28,72,116	(−0.5,1)	39,83,127	(0.5,3)
7,51,95	(1,1.5)	18,62,106	(3,1)	29,73,117	(0.5,1)	40,84,128	(1.5,3)
8,52,96	(1,2)	19,63,107	(3,1.5)	30,74,118	(1.5,1)	41,85,129	(2,3)
9,53,97	(1,3)	20,64,108	(3,2)	31,75,119	(2,1)	42,86,130	(4,3)
10,54,98	(1,4)	21,65,109	(3,3)	32,76,120	(4,1)	43,87,131	(5,3)
11,55,99	(1,5)	22,66,110	(3,4)	33,77,121	(5,1)	44,88,132	(7,3)

¹ In type A both variables are U(0,1), in B

x_{1}

\sim

U(0,1) and

x_{2}

\sim

χ_{1}^{2}

, and C has both variables

χ_{1}^{2}

.

Table 4. Experiments sequenced by each of the three indicators (absel, AM1 and absdifden).

Experiments with the Lowest Value of absel			Experiments with AM1 around 1000			Experiments with the Lowest Value of absdifden
Exp.	absel	AM1	Exp.	AM1	absel	Exp.	absdifden
3	0.00716535	1001	57	1011	0.17489579	27	0.0224
27	0.00716886	1003	27	1003	0.00716886	3	0.0227
28	0.00731291	505	3	1001	0.00716535	28	0.0257
4	0.00737039	1498	6	1000	0.01854382	4	0.0261
5	0.01279404	1514	63	995	0.25329468	48	0.0325
29	0.0127966	510	21	991	0.13463303	47	0.0341
48	0.01515962	1035	109	980	0.44042798	46	0.0349
						45	0.0354
						29	0.0510

Table 5. Behaviour of the selection criteria in Case 2. N = 2000 ¹.

Exp.	AM1	IC	C2	Exp.	AM1	IC	C2	Exp.	AM1	IC	C2	Exp.	AM1
104	1868	500	500	20	1393	500	500	129	805	0	0	84	315
34	1861	500	500	76	1366	500	500	47	786	2	22	37	315
17	1846	500	500	119	1365	500	500	65	764	0	0	50	261
16	1834	500	500	30	1359	499	499	112	719	0	0	123	245
33	1827	500	500	13	1342	500	500	128	708	0	0	124	232
92	1825	500	500	107	1321	500	500	96	705	0	0	51	226
103	1809	500	500	88	1292	500	500	22	701	0	0	10	224
91	1807	500	500	42	1288	500	500	66	686	0	0	114	209
90	1794	500	500	118	1268	500	500	7	685	2	2	125	208
102	1779	500	500	59	1252	500	500	46	668	0	5	52	204
89	1779	500	500	132	1219	500	500	35	662	0	0	11	195
101	1753	500	500	108	1188	500	500	41	636	0	0	113	194
32	1751	500	500	131	1156	500	500	67	601	0	0	69	190
105	1691	500	500	62	1154	67	244	45	589	0	2	115	180
18	1691	500	500	94	1112	441	451	117	583	0	0	53	170
15	1675	500	500	58	1112	352	494	97	576	0	0	39	170
78	1641	500	500	130	1093	500	500	23	537	0	0	38	170
122	1606	500	500	87	1067	447	496	68	518	0	0	116	167
44	1582	500	500	48	1035	72	195	98	513	0	0	54	162
121	1556	500	500	57	1011	41	322	29	510	1	1	70	162
19	1540	500	500	27	1003	245	240	28	505	10	13	126	161
120	1538	500	500	3	1001	250	245	8	504	0	0	79	157
31	1524	500	500	6	1000	249	252	36	494	0	0	12	137
77	1516	500	500	63	995	0	0	74	491	0	0	71	133
5	1514	499	498	21	991	240	223	99	486	0	0	56	131
93	1510	500	500	109	980	357	324	40	475	0	0	55	129
14	1504	500	500	86	932	0	2	85	444	0	0	80	128
4	1498	484	484	110	891	0	0	100	414	0	0	73	104
60	1492	500	500	64	877	0	0	49	390	0	0	81	97
106	1474	500	500	95	835	0	0	24	385	0	0	72	84
43	1438	500	500	111	813	0	0	127	329	0	0	83	80
61	1429	500	500	75	813	0	0	9	325	0	0	82	57

¹ Experiments 1, 2, 25 and 26 are omitted, due to extreme percentage of ones/zeros in the samples.

Table 6. Atypical experiments in Case 2. Behaviour of the selection criteria.

Experiment	$(β_{1}, β_{2})$	Type of x	N	AM1	absdel	IC	C2
62	(3,1)	B	1800	1144	0,1517	500	500
48	(1,−0.5)	B	1750	1035	0,0097	377	499
57	(3,−2)	B	1800	1003	0,12	499	500
109	(3,3)	C	1925	909	0,4	0	0

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Aparicio, T.; Villanúa, I. Selection Criteria for Overlapping Binary Models—A Simulation Study. Mathematics 2022, 10, 478. https://doi.org/10.3390/math10030478

AMA Style

Aparicio T, Villanúa I. Selection Criteria for Overlapping Binary Models—A Simulation Study. Mathematics. 2022; 10(3):478. https://doi.org/10.3390/math10030478

Chicago/Turabian Style

Aparicio, Teresa, and Inmaculada Villanúa. 2022. "Selection Criteria for Overlapping Binary Models—A Simulation Study" Mathematics 10, no. 3: 478. https://doi.org/10.3390/math10030478

APA Style

Aparicio, T., & Villanúa, I. (2022). Selection Criteria for Overlapping Binary Models—A Simulation Study. Mathematics, 10(3), 478. https://doi.org/10.3390/math10030478

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Selection Criteria for Overlapping Binary Models—A Simulation Study

Abstract

1. Introduction

2. Materials and Methods

2.1. Case 1: At Least One of the Competing Models Is Correctly Specified

2.2. Case 2: Neither of the Models Is Correctly Specified (or Neither of the Models Includes the DGP)

3. Theoretical Results

4. Simulation Study and Discussion

4.1. Montecarlo Exercise When Both Models Are Correctly Specified (Case 1.1)

4.2. Montecarlo Exercise When Only One of the Models Is Correctly Specified (Case 1.2)

4.3. Montecarlo Experiment When Neither of the Models Is Correctly Specified (Case 2)

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI