1. Introduction
The endogenous switching regression model is useful when analyzing individuals and firms that switch between two regimes, for example, being credit constrained versus being credit unconstrained. A credit constrained business may not be able to make the necessary investments, which may lower the productivity of that business. Similarly, a credit constrained farmer may not be able to purchase fertilizer or tools, which may also cause their productivity to be lower. The decision to switch from one regime to the other could also depend on unobserved factors, which would cause the state, such as being credit constrained, to be endogenous.
Charlier et al. (
2001) estimated several switching models. They find that the data rejects them all. The models that they rejected include a linear panel model and
Kyriazidou’s (
1997) semi-parametric model.
Kyriazidou’s (
1997) model did not allow for conditional time-varying heteroscedasticity, so it may not be that surprising that the data rejected that model. Allowing for conditional time-varying heteroscedasticity is important because the variance of the error terms is often dependent on the predictors. For example, the residual variance of farm output may be greater in some years than other years, and we allow for this possibility.
We propose to generalize the existing fixed effects and random effects models to allow for endogenous switching. This generalization will allow for conditional heteroscedasticity in the outcome equation, a feature of almost any dataset. In particular,
Maddala and Nelson’s (
1974) switching model was a special case of the proposed model, as is the linear model with fixed effects and heteroscedastic errors.
Maddala and Nelson (
1974) and
Kyriazidou (
1997) both considered the problem of endogenous switching in switching regression models. We expand on these papers in the following ways.
Maddala and Nelson (
1974) did not allow for fixed effects or time dummies. However, many empirical settings require the use of fixed effects or time dummies to absorb individual-level heterogeneity, such as a farm’s land quality. Therefore, we generalize Maddala and Nelson’s model to allow for fixed effects and time dummies.
Kyriazidou (
1997), on the other hand, allowed for fixed effects in a switching regression model, but assumed homoscedasticity. However, in many empirical settings, homoscedasticity does not fit the data well. Therefore, we relax the assumption of homoscedasticity to allow for conditional time-varying heteroscedasticity. By (i) allowing for fixed effects and time dummies, and (ii) relaxing the assumption of homoscedasticity to require only conditional time-varying heteroscedasticity, we generalize the existing fixed effects and random effects models to allow for endogenous switching. This generalization allows our model to better fit the data. We demonstrate this better fit by re-estimating the effects of removing credit constraints on farm productivity on a dataset that had been previously analyzed using both a linear panel model and
Kyriazidou’s (
1997) method.
Specifically, the application we are interested in is agricultural financing in developing countries and micro non-farm financing. Studying the effects of credit constraints on farmers in is a focus of development literature to better understand the role of institutions in eliminating rural poverty.
Guirkinger and Boucher (
2008) applied both a fixed effects model and
Kyriazidou’s (
1997) model to farmers who may switch between being credit constrained and unconstrained to estimate the effects of such credit constraints on farm productivity. However, it might be argued that the fixed effects models that
Charlier et al. (
2001) and
Guirkinger and Boucher (
2008) estimate did not adequately take the endogenous switching decision into account. Specifically, the linear panel model employed by
Guirkinger and Boucher (
2008) did not incorporate a selection equation. A selection equation is important because it takes the endogenous switching decision into account in the outcome equation. In this context, unobserved factors that impact a farmer’s credit constraint may also impact their farm productivity, so switching between being credit constrained and unconstrained may not be exogenous to farm output. Our model resolves this endogeneity problem.
In addition to our methodological contribution, our paper contributes to the literature on credit constraints in agriculture. We show that accounting for unobserved farmer characteristics that influence both credit constraints and farm output greatly reduces the benefits to removing credit constraints. This result has important policy implications both in the scope of agricultural policy and in other settings. Policy makers attempting to alleviate rural poverty should take into account that removing credit constraints may not increase farm output as much as previously thought. Additionally, many policy settings involve estimating the impact of moving individuals or households from one regime to another, and we demonstrate that accounting for endogenous selection into those programs may be important for understanding the policy outcomes associated with them.
Guirkinger and Boucher (
2008) estimated that removing the credit constraint from constrained farmers would increase productivity by 26%. This number is based on an estimate from a fixed effects model without a selection equation. We extend such a model by adding a selection equation, and find that the credit constraint has a much smaller impact (11% versus 26%) on farm production when using the same dataset, demonstrating the importance of having a selection equation.
Aside from the papers previously mentioned, our paper relates to two main literature sets. First, it relates to the econometric literature about fixed effects and random effects models and selection bias. The methodology of our paper differs from
Wooldridge (
1995). Like this paper,
Wooldridge (
1995) addressed selection bias in panel models; however,
Wooldridge (
1995) did not consider endogenous switching models, which is the focus of this paper.
Semykinaa and Wooldridge (
2010) further considered selection bias in panel models, and they used the inverse Mills ratio as an instrumental variable. Our method also uses the inverse Mills ratio, but unlike
Semykinaa and Wooldridge (
2010), we do not use it as an instrument.
Second, our paper pertains to the empirical literature on credit constraints in agriculture. Two related papers are
Feder et al. (
1988,
1990). Like
Guirkinger and Boucher (
2008) and the present paper, these two papers argued that being credit constrained may be endogenous. However, unlike
Guirkinger and Boucher (
2008) and the present paper, these papers did not use individual effects to control for the heterogeneity of the quality of the farmland. Controlling for time-invariant individual-level effects is important in many empirical settings, but it complicates generating unbiased estimators of the coefficients of interest in a switching regression model with endogenous switching due to selection. Our model allows for individual-level fixed effects and endogenous switching. More recently,
Sekyi et al. (
2017) used survey data and a conditional mixed logit model to analyze access to credit and farmer productivity simultaneously. Unlike our paper, this paper does not use an endogenous switching model.
Seck (
2019) and
Zabatantou Louyindoula et al. (
2023), which both used a switching model to analyze credit constraints, are most similar to our paper. However, both
Seck (
2019) and
Zabatantou Louyindoula et al. (
2023) followed
Woolridge (
2010) by using an instrument to account for the endogeneity of credit constraints and farm productivity. Unlike
Seck (
2019) and
Zabatantou Louyindoula et al. (
2023), we do not use an instrument. Instead, we account for this endogeneity directly and use an exclusion restriction.
We note that switching models are not only useful for loan decisions, but are also useful for labor supply and household expenditure decisions. For example,
Lee (
1978) and
Adamchik and Bedi (
2000) estimated a switching model to analyze wage differences between different sectors of the economy. We expect our extensions to be useful for such applications as well.
This paper is organized as follows.
Section 2 introduces the model and states the consistency and asymptotic normality result of our estimator.
Section 3 applies the new estimator to data on productivity in Peruvian agriculture.
Section 4 concludes.
2. Model and Theorem
In our application, farmers can be credit constrained or credit unconstrained. Being credit constrained may reduce output of the farm, since it may be more difficult to buy the relevant inputs such as fertilizer and machines, as well as to hire farm hands or specialized workers. Thus, being credit constrained may reduce productivity. However, being credit constrained and having low productivity could also be caused by a unobserved shock that impacts both, such as illness of the farmer. Thus, it is important to account for this sample selection, and accounting for this selection is what
Maddala and Nelson’s (
1974) “switching regression model with endogenous switching” intends to do. In particular, their switching regression model has a selection equation and an outcome equation, and their model is a special case of the more general model we describe below.
Let
be equal to one if the farmer
i is credit constrained in period
t, and zero otherwise. If the farmer
i is not credit constrained in period
t,
, then the productivity of the farm is
where
denotes the regressors,
is an individual-specific fixed effect,
is a time dummy, and
is the error term. The models considered by
Maddala and Nelson (
1974) or
Maddala (
1983) do not have fixed effects or time dummies, but we use those here. Such individual fixed effects are important in our application to control for the quality of land, and they are also important in many other settings to control for time-invariant individual-level heterogeneity. However, including fixed effects and time dummies complicates obtaining an unbiased estimate of the coefficients of interest in the outcome equation.
Kyriazidou (
1997) addresses this issue by assuming homoscedasticity. A contribution of our paper is to solve this issue without assuming homoscedasticity by generalizing fixed effects and random effects models to allow for endogenous switching.
Similarly to the last equation, if the farmer
i is credit constrained in period
t (
), then the productivity of the farm is
where the fixed effect
and error term
are, in general, different from
and
in Equation (
1).
Maddala and Nelson (
1974) assume that the error terms in the selection equation and in the outcome equation are jointly normal. This assumption implies that the error terms in the outcome equations do not have expectation zero conditional on the regressors. Therefore,
Maddala and Nelson (
1974) and
Maddala (
1983) subtract the inverse Mills ratio with a known coefficient from the outcome equations. The inverse Mills ratio is the ratio of the probability density function over the complementary cumulative distribution function of a distribution. Specifically, let
X be a normally distributed random variable with mean
and variance
. Then, the inverse Mills ratio is given by the two fractions
Above, denotes a constant, denotes the standard normal density function, and denotes the standard normal cumulative distribution function.
Like
Maddala and Nelson (
1974), we also subtract the inverse Mills ratio from the outcome equations. However, since we do not assume that the error terms in the outcome equation are normally distributed, we need to estimate the coefficient of the inverse Mills ratio. In particular, we propose the following procedure, which is the main contribution of our paper.
First, we estimate a selection equation. In our application, this selection equation predicts if a farmer is credit constrained or not. Second, we difference out the fixed effects to obtain unbiased estimates of the coefficients of interest. In our application, these coefficients of interest are the marginal impacts of endowments on credit constrained farm productivity and credit unconstrained farm productivity. This method generalizes fixed effects and random coefficients models to allow for endogenous switching. Specifically, it takes a fixed effect or random effect model, and explicitly incorporates a selection equation as a first stage to account for endogenous selection into either regime (e.g., credit constrained or unconstrained). Further, our method differs from
Kyriazidou (
1997) because it allows for conditional heteroscedasticity in the outcome equation, meaning that it may better fit the data.
In our model, let denote the regressors of the selection equation of individual i in period t, and suppose we observe N individuals for T periods. Our procedure allows for predetermined regressors (step 1A) or for exogenous regressors and correlated random effects (step 1B).
Step 1A (selection equation with predetermined regressors): Estimate a Probit model with predetermined regressors. Let
denote the quasi maximum likelihood estimator, i.e.,
Using , calculate for and .
Step 1B (selection equation with correlated random effects): Estimate a Probit model with strictly exogenous regressors, constant slope coefficients and correlated random effects. Let
denote the quasi maximum likelihood estimator, i.e.,
Using , calculate for and .
Step 2: After step 1A or step 1B, we need to difference out the fixed effect. The previous literature that relies on
Kyriazidou (
1997), such as
Guirkinger and Boucher (
2008), assumes that the propensity of an individual
i to be in one category of the selection equation (e.g., to be credit constrained or credit unconstrained) is constant across time. However, if the propensity of individual
i to be in one category of the selection equation changes over time, this approach works less well. Therefore, our method relaxes this assumption to difference out the fixed effect while specifically accounting for endogenous switching in the selection equation. For example, in our application, the outcome (farm productivity) and the selection equation (credit constrained or unconstrained) might both depend on factors that change over time, such as farmer health, and our method allows for this possibility.
To difference out the fixed effect, for every time period and every individual for which , calculate , and . Next, regress on a constant, , and . The constant takes care of the difference in time dummies, . Then, for every time period and every individual for which , calculate and regress on a constant, , and . This process allows the terms and to difference out the fixed effects and , so that we can build moments that do not depend on these fixed effects.
If the researcher is willing to make stronger assumptions, then other differences can be used as well. In particular, define
, and
for
. Then, define the moment
where
is normalized to be zero and
. Let the moment for the other outcome,
, be similarly defined, where
, and
. One can use this general method of moment procedure instead of the least squares method in step 2, but we do not consider this in further detail here. In the application, we use a regressor in step 1A that is not used in step 2. This is usually called an exclusion restriction.
We now state the assumptions. These assumptions support the theorem that follows them.
Assumption 1 (Selection equation with predetermined regressors). Let . Let be nonsingular for . Let the parameter space be compact. Define , and let the true value be in the interior of .
Assumption 1 allows for arbitrary correlation of the error in the selection equation and also allows the variance of this equation to vary with time.
De Jong and Woutersen (
2011) discuss dynamic binary choice models in more detail. An example that satisfies Assumption 1 is
where
is the health of farmer
i in period 1,
is the harvest of farmer
i in period
t, and
is a standard normal error term that is i.i.d. conditional on the regressors.
Assumption 1 allows for arbitrary correlation of the error in the selection equation, and also allows the variance of this equation to vary with time.
De Jong and Woutersen (
2011) discuss dynamic binary choice models, such as the selection equations discussed here, in more detail. An example of a selection equation that satisfies Assumption 1 is
where
is the health of farmer
i in period 1,
is the harvest of farmer
i in period
t, and
is a standard normal error term that is i.i.d. conditional on the regressors. In Equation (
6), the variance of
may be non-constant in
t, and our model allows for this option. Equation (
6) satisfies Assumption 1 because
is defined here as
. However, if
were a singular matrix (e.g., if
), then Assumption 1 would not be satisfied. Assumption 1 corresponds to step 1A.
Assumption 2 (Selection equation with correlated random effects). Let , where . Let the parameter space, , be compact. Define , , and let the true value be in the interior of . Let be nonsingular for and all .
Assumption 2 allows for correlated random effects because
can depend on the regressors. Such correlated random effects were proposed by
Chamberlain (
1980).
Mundlak (
1978) lets the random effect depend on the averages of the regressors,
, and the last assumption also allows for that. Assumption 2 corresponds to step 1B.
The next assumption means that the error term is uncorrelated with regressors, after differentiating out the fixed effect. Define
where
is calculated if step 1A (predetermined regressors) is used, and
if step 1B (correlated random effects) is used.
Assumption 3. Let if . Let if . Let and be uncorrelated with , , and , and let . Let , and for all , , and . Further, let , , be nonsingular for some and . Let , , be nonsingular for some and .
Assumption 3 helps allow for for unbiased estimates of the coefficients of interest in the outcome equation because it means that the error terms are uncorrelated with the regressors, after differentiating out the fixed effect. To generate unbiased coefficients of interest, we need to (i) difference out the fixed effect, and (ii) allow for endogenous selection at the individual level. The above assumption addresses (i). Assumptions 1 and 2 address (ii).
Assumption 4. Let be i.i.d. across individuals. Let for all , and where .
The above assumption requires boundedness of the fourth moment of , , , and .
Theorem 1 (
Consistency and asymptotic normality)
. Let Assumptions 1 and 3–4 hold. Thenwhere is positive semidefinite.Let Assumptions 2–4 hold. Then,where is positive semidefinite. The main use of Theorem 1 is to extend fixed effect and random effect models to incorporate endogenous switching between regimes. Specifically, our estimator can be used as a helpful check if there is an endogeneity problem in the linear panel model: unobserved factors may impact both the selection equation and the outcome equation, leading to biased estimates, and our estimator addresses this problem. In
Appendix A, we show that our estimator is consistent and asymptotically normal.
Our estimator generalizes
Kyriazidou (
1997).
Kyriazidou (
1997) imposes exchangability of the error terms. This assumption implies that the error terms are homoscedastic. However, the outcome equation in this paper allows for conditional heteroscedasticity, and the selection equation of Assumption 1 allows for time-varying variances (see Equation (
6)).
1In order to correct the standard errors for our two step estimator, it is convenient to write the estimator as the maximum of an objective function. This method is similar to
Heckman’s (
1979) sample selection estimator. The objective function that is used to prove asymptotic normality of our estimator, as well as the asymptotic variance-covariance matrix, is presented in
Appendix A.
In our application, however, we bootstrap the estimators. That is, we sample the data with replacement and go through step 1A and step 2 for every dataset that we generated. As
Horowitz (
2001, Theorem 2.2) shows, bootstrapping an asymptotically normally distributed estimator that can be represented by an influence function yields a consistent variance-covariance matrix and consistent confidence intervals.
2