FHA Loans in Foreclosure Proceedings : Distinguishing Sources of Interdependence in Competing Risks

A mortgage borrower has several options once a foreclosure proceedings is initiated, mainly default and prepayment. Using a sample of FHA mortgage loans, we develop a dependent competing risks framework to examine the determinants of time to default and time to prepayment once the foreclosure proceedings is initiated. More importantly, we examine the interdependence between default and prepayment, through both the correlation of the unobserved heterogeneity terms and the preventive behavior of the individual mortgage borrowers. We find that time to default and time to prepayment are affected by several factors, such as the Loan-To-Value ratio (LTV), FICO score and unemployment rate. In addition, we find strong evidence that supports the existence of interdependence between the default and prepayment hazards through both the correlation of the unobserved heterogeneity terms and the preventive behavior of individual mortgage borrowers. We show that neglecting the interdependence through the preventive behavior of the individual mortgage borrowers can lead to biased estimates and misleading inference.


Introduction
A mortgage borrower is technically delinquent once a monthly mortgage payment due date is missed.Most lenders, however, give the borrower a substantial period of time (typically 90 days, but varying by lender) to bring the loan into current status by making up all the missed payments plus the associated late fees.If the borrower is still delinquent after a certain time period, the lender initiates a foreclosure proceedings.Loans that are in foreclosure proceedings are not fully terminated.In fact, some of these loans can be reinstated, prepaid or modified (extended term or other alterations to lower the monthly payment), or have other alternative outcomes.These outcomes can be considered as competing risks.In this paper, we examine the lifetime of an FHA mortgage loan from the onset of foreclosure until one of the main types of outcomes is observed.In particular, we examine the probability that an FHA mortgage loan in a foreclosure proceedings will eventually be prepaid or defaulted to Real Estate Owned (REO). 1  of the unobserved heterogeneity terms and the preventive behavior of the individual mortgage borrowers can lead to biased estimates and misleading inference.As for the effects of covariates on the likelihood of default and prepayment, we find that loans with the following characteristics have a higher probability of default: more equity, low FICO score, high unemployment rate in the borrower's geographical area, short delinquency spells, nonjudicial states and positive interest rate spread.In addition, we find that loans with the following characteristics have higher probability to prepay: more equity, high FICO score, high unemployment rate, short delinquency spells, nonjudicial states and negative interest rate spread.
The remainder of this paper is organized as follows.Section 2 presents the data and some descriptive statistics.Section 3 presents the model.Section 4 reports the empirical results.Section 5 concludes.

Data Description and Summary Statistics
We use a panel dataset of first-lien residential mortgage loans obtained from the OCCMortgage Metrics data (OCCMM).OCCMM includes loans serviced by seven large banks and covers monthly loan performance from January 2008 until March 2016.The dataset consists of more than 21.1 million first-lien mortgage loans with $3.6 trillion in unpaid principal balances, which make up about 38 percent of all first-lien residential mortgage debt outstanding in the U.S. 4For the purposes of this paper, we focus on FHA loans5 for which the foreclosure proceedings were initiated.There is a total of 231,800 of these loans in our sample of 3,359,573 FHA loans. 6In our analysis, we exclude loans for which: (i) foreclosure proceedings end for reasons other than default and prepayment; (ii) servicing was transferred to different servicers; and (iii) values for explanatory variables are missing.Following the outlined exclusion criteria, our final sample size is 107,627, out of which 8974 are prepaid, 84,012 are defaulted and 14,641 are still in foreclosure proceedings as of March 2016.
We measure the lifetime of an FHA loan as the number of months from the onset of foreclosure until the loan is either defaulted or prepaid.We denote a loan that is still in a foreclosure proceedings at the end of the observation period as right censored and measure its lifetime as the number of months from the onset of foreclosure until March 2016.Table 1 presents descriptive statistics of the lifetime by status.The mean lifetime of a loan is about 15 months, which is higher than that reported by Pennington-Cross (2006) for subprime loans.The mean lifetime for prepaid loans is higher than for defaulted loans.This result suggests interdependence between default and prepayment.In other words, if default and prepayment were independent, presumably no loans would make it through to prepay, as they would on average reach the default option at 14 months before the prepay at 18 months.Thus, it might be misleading to consider independency between default and prepayment.
To examine how default and prepayment rates change with age, Figure 1 shows the smooth nonparametric estimation of the hazard function. 7The figure shows that the probability of default increases during the first 18 months after foreclosure proceedings are initiated and then rapidly decreases.In addition, the figure shows that the probability of prepayment increases as foreclosure proceedings lengthen.The following explanatory variables are used to examine the determinants of default and prepayment hazards and their interdependence: • LTV: To measure equity remaining in the property, we calculate Loan-To-Value ratio (LTV) using the current balance of the loan in each month and the estimated property value. 8• FICO score: To proxy for the overall borrower's creditworthiness, we use the borrower's FICO score in each month.• Unemployment rate: To proxy for financial instability, we use the seasonally-adjusted monthly unemployment rate lagged by six months in the state where the property is located. 98 The estimated property value is obtained from the Lender Processing Services (LPS) Home Price Index (HPI).9 The seasonally-adjusted monthly unemployment rate is obtained from the Bureau of Labor Statistics (BLS).
• Delinquency spell: To measure delinquency behavior, we calculate the fraction of months in delinquency prior to the beginning of a foreclosure proceedings.  11,1.
Table 2 provides summary statistics of the explanatory variables.A quick comparison shows that the average characteristics are different between default and prepayment.In particular, on average, defaulted loans have less equity, lower FICO score and higher unemployment rate.

Econometric Methodology
In this section, we propose a dependent competing risks duration model that is capable of incorporating time-varying covariates and censored observations easily.More importantly, the model controls for unobserved covariates, allows for estimating the default and prepayment hazards jointly and accounts for the interdependence of these hazards through both the correlation of the unobserved heterogeneity terms and the preventive behavior of individual mortgage borrowers.We first describe the specification of the model and then derive the likelihood function.

Model Specification
There are two main options available to the mortgage borrowers once foreclosure proceedings are initiated; namely, default (D) and prepayment (P).The prepayment (P) option can be viewed as "distressed prepayments" since borrowers in the foreclosure proceedings want to sell their homes to avoid a default outcome.Suppose that nonnegative random variables T D and T P are the potential lifetimes from the onset of foreclosure until default (D) and prepayment (P), respectively.In the competing risks framework, only the shortest lifetime is actually observed; that is T = min[T D , T P ] and the corresponding actual event type, J ∈ {D, P}.Let x(t) be a vector of observable covariates at time tand v = (v D , v P ) be a vector of unobservable covariates.The advantage of introducing two unobservable covariates (also called unobserved heterogeneity terms or frailties) is the possibility of exploring the dependence between the default and prepayment hazards, whenever v D and v P are positively or negatively correlated.In particular, this specification avoids using a restrictive one-factor model (e.g., Flinn and Heckman 1982;Clayton and Cuzick 1985;Heckman and Walker 1990) and so does not restrict the sign of dependence when a sufficiently flexible class of joint distributions is chosen for the unobserved heterogeneity terms.
Assumption 3. The individual heterogeneities are independent of the covariate histories.
Assumption 4. The variables T iD (resp.T iP ), i = 1, . . ., n, have identical conditional distributions given the individual covariate histories and the individual unobserved heterogeneities. 14Assumption 5.The type-specific hazard functions conditional on (x i (t), v iD , v iP ), i = 1, . . ., n, are mixed proportional hazard functions: (1) where β D and β P are type-specific regression coefficients' vectors and h 0D and h 0P are the type-specific baseline hazard functions.The parameter γ captures the structural dependence of the prepayment hazard rate on the default probability.
Equation ( 1) accounts for interdependence between default and prepayment hazards in two ways, through: (1) the correlation of the unobserved heterogeneity terms; default and prepayment hazards might share similar or distinctive unobserved loan-specific characteristics that are identified by the negative or positive correlation of v D and v P ; and (2) the structural dependence; borrowers who are in foreclosure proceedings and foresee themselves facing a high risk of default might increase their intensity to sell their homes to prepay in order to avoid default.If this hypothesis is true, we should expect γ to be significant and positive.
The model defined by Equation ( 1) nests three restricted models that are generally used in applied studies.The first restriction can be imposed to the general model by specifying γ = 0, which eliminates the structural dependence of the default and prepayment hazards and allows the interdependence between the hazards only through unobserved heterogeneity terms.The next restriction can be applied by assuming that the unobserved heterogeneity terms v D and v P are independent (i.e., v D ⊥v P ).This is a common assumption in empirical competing risks studies. 15The last restriction can completely ignore unobserved heterogeneity terms.To illustrate the advantage of the model defined by Equation ( 1) and the potential bias of the restricted models, all four models are estimated.Assumption 6.The baseline hazard functions follow an expo-power distribution: where j = D, P, α j > 0, −∞ < θ j < +∞.
This parametric specification was introduced by Saha and Hilton (1997).It can represent a variety of patterns of the hazard function, including constant, monotonically increasing, monotonically decreasing, U-shaped, inverted U-shaped or display humps.It includes as a special case the Weibull hazard function for θ = 0, which is monotone.For θ = 0, the hazard function has a turning point at [(1 − α j )/(α j θ j )] 1/α j .
Conditional on the observable covariate histories, the distributions of the uncensored and right censored observations are characterized by the probabilities Pr(t ≤ T < t + ∆t, J = j|X(t)) and Pr(T > c|X(c)), respectively.These probabilities are obtained by integrating out v D and v P : 15 One of the main reasons for these studies to make the independence assumption, in addition to computational convenience, is the common misunderstanding that dependent competing risks' specifications are not identifiable.This non-identifiability property is studied in detail by Tsiatis (1975), who proves that for any joint survival function with arbitrary dependence between the competing risks, one can find a different joint survival function with independent competing risks.If that is the case, then there is no point in complicating the model with the dependence assumption because the data cannot test for it anyway.However, Tsiatis's argument is valid only if the sample is homogenous.Thus, the problem of non-identifiability can be resolved by introducing heterogeneity through the variation of the observed covariates, as discussed at length by Heckman and Honore (1989), Abbring and Van den Berg (2003) and Colby and Rilstone (2004).
where j = D, P and Pr(t and Pr(t This quantity depends on the covariate histories up to time t only.In addition, we have: where This quantity depends on the covariate history up to time c only. In practice, the model has to be completed by specifying the joint distribution of the unobserved heterogeneity terms.In this subsection, we use an extension of the approach of Heckman and Singer (1984) (see also Nickell 1979;Van den Berg et al. 2004) and assume the following: Assumption 7. The joint distribution of the unobserved heterogeneity terms is bivariate discrete in which v D and v P can only take two values.Let v 1 D and v 2 D denote the values of v D and v 1 P and v 2 P denote the values of v P .Conditional on covariate histories, the set of individual mortgage loans can be divided into four classes that correspond to (v 1 D , v 1 P ), (v 1 D , v 2 P ), (v 2 D , v 1 P ) and (v 2 D , v 2 P ), respectively.The sizes of these classes are unknown a priori and will be approximated by means of their associated probability estimates.Under Assumption 7, the joint distribution of v D , v P is characterized by the following elementary probabilities: with 0 ≤ p kl ≤ 1 and ∑ 2 k=1 ∑ 2 l=1 p kl = 1 for k, l = 1, 2. 16,17 Under Assumption 7, the characteristics of the uncensored and right censored distributions become: ) 16 To ensure that the probabilities lie between [0, 1] and sum up to one, we apply the logistic transformation, i.e., .
where −∞ < q kl < +∞, for k, l = 1, 2. 17 The covariance of v D and v P can be derived as (see Van den Berg et al. 1994) . Therefore, the correlation between v D and v P becomes: and

The Likelihood Function
We derive the likelihood function as follows: where W 11 is the set of 84,012 uncensored loans that are defaulted (D), W 12 is the set of 8974 loans that are prepaid (P) and W 2 is the set of 14,641 right-censored loans that are still in foreclosure proceedings as of March 2016.
There are three important points that should be noted about the likelihood function: (1) In order to avoid identification problems, we assume no constant covariates; that is, no intercept in the proportionality term.The levels of the intensities are captured by means of the values v 1 D , v 2 D , v 1 P , v 2 P , which are left unconstrained; (2) The likelihood function is valid when the covariates are continuously observed since the foreclosure proceedings is initiated.This condition is automatically satisfied by covariates x i , which depend on individuals only.However, the covariates that depend on time are usually observed in discrete time.In this case, the likelihood function has to be approximated by assuming that the covariates are constant between two consecutive observation dates; (3) There is no closed-form expression for the integration of the prepayment hazard function.Thus, the integral is evaluated using the trapezoidal rule.18

Empirical Analysis
Here, we report and discuss the maximum likelihood estimates of the general model and its associated nested models.The general model, Model (1), is the unrestricted model introduced in the previous section.Model (2) is the model in which there is no structural dependence between the default and prepayment hazards, i.e., γ = 0. Model (3) is the model in which the unobserved heterogeneity terms v D and v P are assumed independent.This independence assumption is equivalent to the condition p 11 p 22 − p 12 p 21 = 0, whenever v 1 D = v 2 D and v 1 P = v 2 P .Under Model (3), the two competing risks are independent conditional on the observed covariates.Finally, Model (4) is the model without unobserved heterogeneity terms.Tables 3 and 4 provide estimation results for Model (1), Model (2), Model (3) and Model (4), respectively. 19The intercepts are set equal to zero in all models with unobserved heterogeneity terms (that is Models (1), ( 2) and ( 3)) since the intercepts cannot be distinguished from multiplicative constants in unobserved heterogeneities.
Based on the likelihood ratio tests, all the restricted models are rejected in favor of Model (1) (see Appendix A for details on comparing Models (1), (2), (3) and (4) based on the likelihood ratio tests).Thus, it can be concluded that unobserved dependent heterogeneities, as well as the structural dependence of the prepayment hazard rate on the default probability exist.In particular, the results confirm that neglecting the structural dependence can lead to overestimation of the correlation of the unobserved heterogeneity terms.This can be seen by comparing Model (1) to Model (2), where the magnitude of the correlation parameter, ρ, decreased, but remained significant, and the coefficient of the structural dependence, γ, is positive and statistically significant.In the following, we focus on the results of Model (1) to analyze the effects of covariates.The results in Table 3 show that the higher the equity in the property (evidenced by LTV), the higher the probabilities of default and prepayment.These results suggest that lenders seek to own properties that have more equity to lower their loss rate, and borrowers like to sell the properties that have more equity to lower their mortgage debt.In terms of credit scores, the results indicate that loans with higher FICO scores are less likely to default and more likely to prepay.Unemployment rate is used as a proxy for financial instability and suggests that a higher unemployment rate increases the probabilities of default and prepayment.The share of months in which the loan was delinquent prior to a foreclosure proceedings affects the likelihood of default and prepayment.In particular, loans with long delinquency spells are less likely to default and to prepay.Judicial states have lower probabilities of default and prepayment than nonjudicial states, suggesting that the foreclosure process lasts longer for states in which foreclosure is processed through the state court system.In terms of interest rates, an increase in the interest rate spread increases the probability of default and decreases the probability of prepayment.
Table 3 also lists the estimated parameters of expo-power distribution.Using these estimates from Model (1), Figure 2 presents the baseline hazards for default and prepayment.As shown in Figure 2, the baseline hazard for default appears to be inverted U-shaped.That is, the likelihood of default increases in the first months, reaches a peak and then decreases.The baseline hazard for prepayment features an initial increase followed by a gradual decrease.Note that at all time points, the baseline hazard for prepayment is higher than the baseline hazard for default.This means that, in the absence of covariates, the chance of prepayment for loans in foreclosure proceedings is higher than the chance of default.ρ(v D , v B ) and γ in Table 3 denote interdependence between the default and prepayment hazards through the correlation of the unobserved heterogeneity terms and through the preventive behavior of individual mortgage borrowers, respectively.The positive and significant sign of the estimated correlation between the unobserved heterogeneity terms suggests that there are some unobservable loan-specific characteristics that affect both default and prepayment hazards in the same direction.The positive and significant sign of γ implies that the higher risk of default leads to a higher probability of prepayment.The result supports the hypothesis of structural dependence induced by the preventive behavior of individual mortgage borrowers.

Conclusions
Using a panel data of FHA mortgage loans, we specify a dependent competing risks framework to examine the determinants of the default and prepayment hazards once the foreclosure proceedings is initiated.More importantly, we examine the interdependence between default and prepayment, through both the correlation of the unobserved heterogeneity terms and the preventive behavior of the individual mortgage borrowers.We incorporate interdependence between the default and prepayment hazards through both the correlation of the unobserved heterogeneity terms associated with each risk and the preventive behavior of individual mortgage borrowers.
Our most important empirical finding here is that default and prepayment hazards are interdependent in two distinct ways.First, we find a significant positive correlation between the unobserved heterogeneity terms.This finding suggests that there are some unobservable loan-specific characteristics that affect both default and prepayment hazards in the same direction.Second, we find a significant positive structural dependence, suggesting that higher risk of default leads to higher probability of prepayment.We show that neglecting the interdependence through the correlation of the unobserved heterogeneity terms and through the preventive behavior of the individual mortgage borrowers can lead to biased estimates and misleading inference.As for the effects of covariates, we find that equity, FICO score, unemployment rate, delinquency spells, judicial states and interest rate spread are affecting the default and prepayment hazards.ratio test is larger than the critical value of χ 2 1 at the five percent level, which supports the existence of interdependence between the default and prepayment hazards through both the correlation of the unobserved heterogeneities and the preventive behavior of individual mortgage borrowers.

Figure 1 .
Figure 1.Smoothed nonparametric hazard Function.The figures display the smooth nonparametric estimation of default and prepayment hazard functions.The estimate is based on the Nelson-Aalen estimator.To smooth the Nelson-Aalen estimator, we specify an Epanechnikov kernel function with the default bandwidth in STATA.

Assumption 1 .
(a) The unobserved heterogeneity terms are time invariant and depend on the individual mortgage loans i.(b) The individual heterogeneities (v iD , v iP ), i = 1, . . ., n, are independent and have the same distribution G(v D , v P ). 13

Figure 2 .
Figure 2. Baseline hazards for default and prepayment.The figure displays the estimates of the baseline hazards for default and prepayment.The estimate of the baseline hazards for event type j (j = D, P) is obtained using the maximum likelihood estimates of α j and θ j (j = D, P) from Model (1) and the lifetimes of loans from the onset of foreclosure.

Table 1 .
Statistics for lifetimes of individual FHA loans by status.

Mean Median Std.Dev. Min Max Qu. (25%) Qu. (75%)
The table provides the descriptive statistics of lifetimes (in months) of all FHA loans from January 2008 to March 2016.The descriptive statistics include the mean, median, standard deviation, minimum, maximum, 25% quartile and 75% quartile.All loans refer to all defaulted, prepaid and right censored loans in the sample.

•
Judicial status: To examine state foreclosure laws, we use an indicator variable equal to one if the state is a judicial foreclosure state, and zero otherwise.10•Interest rate spread: To measure the change in the market interest rate, we use an indicator variable equal to one if the current interest rate is higher than the current 30-year fixed rate, and zero otherwise.

Table 2 .
Statistics of the explanatory variables by status.LTV, Loan-To-Value ratio.The table provides the descriptive statistics for LTV, FICO score, unemployment rate, delinquency spell, judicial states, and interest rate spread from January 2008 to March 2016.The descriptive statistics include the mean, standard deviation, minimum and maximum.The reported descriptive statistics for time-dependent variables are averaged over loans.All loans refer to all defaulted, prepaid and right censored loans in the sample.

Table 3 .
Dependent competing risks' estimates: Models (1) and (2).The table provides the maximum likelihood estimates for Model (1) and Model (2).Model (1) is the model with dependent unobserved heterogeneities and structural dependence, and Model (2) is the model with dependent unobserved heterogeneities.The numbers in parentheses are the standard errors for the estimated coefficients.*, ** and *** indicate that the coefficients are statistically significant at the 10%, 5% and 1% levels, respectively.