Limiting Loss Distribution of Default and Prepayment for Loan Portfolios and Its Application in RMBS

Chenxi Xia; Xin Zang; Lan Bu; Qinhan Duan; Jingping Yang

doi:10.3390/risks13080153

Abstract

This paper studies the joint distribution of the default and prepayment losses for a large portfolio of loans, based on a bottom-up approach. The repayment behaviors of loans in the portfolio are determined by both systematic and idiosyncratic risk factors and are conditionally independent given the systematic factors. The joint two-dimensional limit distributions of the portfolio default and prepayment losses are obtained, including the strong law of large numbers and the central limit theorem. A numerical study for the portfolio losses is performed for some simplified models. Finally, we conduct the empirical analysis on the residential mortgage-backed security (RMBS) based on Freddie Mac’s dataset. The empirical results reveal the impacts of different factors on the default and prepayment behaviors, and the distributions of the portfolio losses are simulated based on empirical estimation results to show its difference with the log-normal distributions.

Keywords:

loan portfolio; default risk; prepayment risk; limiting loss distribution; RMBS

1. Introduction

For individual customers, a mortgage and personal loan are two common options for borrowing money from the financial institutions. A mortgage is a secured loan typically used for real estate financing, while a personal loan is an unsecured loan and can be flexibly used for various purposes. If the borrower fails to repay the mortgage, the lender has the right to foreclose the loan and repossess the property. The repayment process of a given loan may be subject to both default risk and prepayment risk. Default occurs when the borrower lacks sufficient funds to meet their debt obligations, or when the property value falls below the outstanding loan balance, potentially leading to a strategic default. Prepayment may occur, for instance, when there is a drop in the market interest rates, allowing the borrower to pay off the existing loan and refinance at a lower rate. Our study focuses on a large number of similar loans, which can be pooled and securitized as a portfolio for investing and trading, e.g., an asset-backed security (ABS) consisting of a specific pool of underlying loans. The total loss of portfolio includes default loss and prepayment loss, which come from the potential losses of each loan in the pool.

In early periods, the literature on modeling the loan for individual customers mainly focused on either default or prepayment. For example, in applying the option pricing models to mortgage defaults, Cunningham and Hendershott (1984) and Epperson et al. (1985) priced the default as a put option. The motivation is that the default occurs if the house price falls sufficiently below the mortgage value so that the decision of default follows a similar manner as the exercise of a compound European put option. Under the option pricing framework, Foster and Van Order (1984) and Quigley and Van Order (1995) empirically estimated the default models. In contrast, Dunn and McConnell (1981), Buser and Hendershott (1984), and Brennan and Schwartz (1985) priced the mortgages with prepayment risks by using the option-based approach. The introduction of the option pricing method lies in the fact that the prepayment gives the option to buy the remaining part of the loan (including the outstanding debt and prepayment cost) and therefore can be interpreted as an American call option on a bond. The estimation of prepayment models can be found in Schwartz and Torous (1989) and Quigley and Van Order (1990).

Since the 1990s, a number of works jointly modeled the default and prepayment as competing risks for the personal loan or mortgage. Deng et al. (1996) and Deng et al. (2000) extended the traditional option models to jointly consider default and prepayment as dependent risks in a proportional hazard framework for mortgages. Banasik et al. (1999) introduced the competing risks approach of survival analysis to study the prepayment and default for personal loans, where the applications of Cox’s proportional hazard (CPH) model were further extended by Stepanova and Thomas (2002). In later extensions, Quercia and Spader (2008) applied the multinomial logit model to investigate the homeownership education and counseling completion on the prepayment and default. Steinbuks (2015) studied the effect of prepayment penalty restrictions on the probability of prepayment and default. Zhang et al. (2019) proposed a mixture cure PH model under competing risks for an online loan. Some techniques for estimating the competing risks models include, e.g., a maximum likelihood technique based on constrained optimization in Thackham and Ma (2022) and a nonparametric life-table method in Li et al. (2023).

The existing research on the limiting portfolio loss mainly focused on default losses. Vasicek (1991) initiated this field by deriving the asymptotic distribution of portfolio credit losses under a single-factor Gaussian copula model. Credit Suisse Financial Products (1997) introduced a Poisson mixture model framework for analyzing the limiting portfolio loss. Gordy (2003) developed a single-factor model for the limiting loss of homogeneous portfolios, providing the theoretical basis for regulatory capital framework under Basel II/III.1 Giesecke et al. (2015) derived a law of large numbers for default losses in heterogeneous portfolios using the stochastic PDE methods. Sirignano and Giesecke (2019) proposed data-driven asymptotic approaches for the large loan pools, demonstrating their computational efficiency in computing the distributions of default rates, prepayment rates, and loss from default. In the industry, the top-down approach has been widely used to assess the limiting portfolio loss, focusing on directly modeling the distribution of total portfolio loss (e.g., the log-normal distribution for modeling the default loss in Moody’s Investors Service 2024).

In this paper, we model the limiting loss distribution for a large portfolio of loans by considering both default and prepayment risk as competing events (i.e., only the first occurred event is observed) and apply the model to a specific loan dataset. Our framework adopts a bottom-up approach to start by modeling the repayment behaviors of individual loans. At each discrete payment date, the repayment of a loan may end with either one of the three events: scheduled repayment completion, default, or prepayment. Building on this structure, we develop a joint model for the probabilities and loss values of both default and prepayment. The repayment behaviors of all the loans are determined by both systematic and idiosyncratic factors and are conditionally independent given the systematic factors. Under certain regular conditions, the limit properties of portfolio loss are discussed, including the strong law of large numbers (SLLN) and the central limit theorem (CLT). Further, we conduct an empirical analysis to estimate the probabilities of default and prepayment by applying the data of residential mortgage-backed security (RMBS). Based on the estimation results, we simulate the portfolio loss distribution, which is close to the log-normal distribution.

Our main contributions can be summarized as follows.

(a) We propose a discrete-time bottom-up portfolio loss model incorporating both default and prepayment risks. It is different from the existing frameworks for default losses in the single-period (Gordy 2003) or continuous-time (Giesecke et al. 2013) contexts. Several illustrative examples demonstrate that our discrete-time structure better captures the individual loan repayment behaviors, such as the monthly scheduled repayments and periodic event observations.

(b) Given the conditional independent structure, we investigate the limit properties of portfolio loss as the pool size tends to infinity, including the SLLN and CLT along with an upper error bound for approximating. Especially, under the assumption that the frequency of individual factors converges to a given distribution, the SLLN and CLT show that the limiting loss distribution of the portfolio is determined only by the systematic factors. We further verify this result by the numerical simulations of simplified models and find that the limiting distribution is approximately log-normal. Moreover, the copula of default and prepayment loss distributions are exhibited.

(c) Our empirical analysis applies the proposed framework to the empirical data, employing a CPH model to estimate the factor coefficients and baseline hazards. The estimated coefficients in the CPH model indicate that the systematic factors significantly influence both hazards. In terms of the influence of individual factors, the default risk is affected by loan-level credit quality, and the prepayment risk mainly depends on refinancing capacity. The out-of-sample validation reveals the good predictive performance of the model, and the estimated loss distribution is consistent with the CLT result. These findings align with the established theoretical and empirical evidence (see Calhoun and Deng 2002; Campbell and Cocco 2015). Finally, we perform the Monte Carlo simulation of portfolio default and prepayment losses based on the empirically estimated parameters. The asymptotic portfolio loss distributions display reasonable agreement with the log-normal distribution, which complies with Moody’s methodology (Moody’s Investors Service 2024).

The rest of this paper is organized as follows. Section 2 introduces the portfolio loss model considering both default and prepayment risk. Section 3 derives the limit properties of loss distribution under the conditional independent structure. Section 4 presents the empirical work and a further simulation study. Section 5 concludes.

2. The Portfolio Loss Model Considering Default and Prepayment

In this section, a general model is proposed for a portfolio of loans, where the repayment processes are exposed to both default risk and prepayment risk. The repayment behaviors of all loans are independent conditional on some common systematic factors, and each loan is also affected by some extra idiosyncratic factors.

2.1. The Portfolio Loss for a Pool of Loans

Building on the highly diversified characteristics of loan pools, we specify a general portfolio loss model that reflects the collective behavior of underlying assets. Assume that there are N unit-principal loans in the asset pool, which are issued at the same time and repaid periodically. For simplicity, the common scheduled periodical repayment dates of all loans are denoted as

1, 2, \dots, K

, where

K \in Z_{+}

is the terminal time.

During the loan repayment process, the default or prepayment may occur on each loan. For the ith loan, let

τ_{i}^{(1)} \in Z_{+}

and

τ_{i}^{(2)} \in Z_{+}

be its time of potential default and time of potential prepayment, respectively. Since the default and prepayment are mutually exclusive, we assume that they do not simultaneously happen.

Assumption 1.

τ_{i}^{(1)} \neq τ_{i}^{(2)}

a.s.,

1 \leq i \leq N

.

Based on the potential event times

τ_{i}^{(1)}, τ_{i}^{(2)}

, define

τ_{i} : = min {τ_{i}^{(1)}, τ_{i}^{(2)}, K} and D_{i} : = \{\begin{matrix} 0, & if τ_{i}^{(1)} > K and τ_{i}^{(2)} > K, \\ 1, & if τ_{i}^{(1)} < τ_{i}^{(2)} and τ_{i}^{(1)} \leq K, \\ 2, & if τ_{i}^{(2)} < τ_{i}^{(1)} and τ_{i}^{(2)} \leq K . \end{matrix}

(1)

Here,

τ_{i}

represents the time-until-termination, which is default at

D_{i} = 1

, prepayment at

D_{i} = 2

, and normal repayment at

D_{i} = 0

. A model for

(τ_{i}, D_{i})

belongs to the class of multiple decrement models (see, e.g., Deshmukh (2012) and Chapter 7.2 of Rotar (2014)). For the ith loan, denote by

b_{i}^{(1)}

and

b_{i}^{(2)}

the loss given default and loss given prepayment, which are random variables taking values in

[0, 1]

. Then, the discounted default loss and prepayment loss at time 0 are defined as

l_{i}^{(1)} : = 1_{{D_{i} = 1}} b_{i}^{(1)} e^{- r τ_{i}} and l_{i}^{(2)} : = 1_{{D_{i} = 2}} b_{i}^{(2)} e^{- r τ_{i}}

(2)

respectively, where r is the constant risk-free interest rate. Denote by

l_{i} = (l_{i}^{(1)}, l_{i}^{(2)})

the ith loss vector. Then, the portfolio loss vector

L_{N} = (L_{N}^{(1)}, L_{N}^{(2)})

for the mortgage pool of N loans is given by

L_{N} : = \frac{1}{N} \sum_{i = 1}^{N} l_{i},

(3)

where

L_{N}^{(1)} = \sum_{i = 1}^{N} l_{i}^{(1)} / N

represents the portfolio default loss and

L_{N}^{(2)} = \sum_{i = 1}^{N} l_{i}^{(2)} / N

represents the portfolio prepayment loss, respectively. Throughout this paper, bold symbols are used for (column) vectors.

In practice, loans are classified as defaults only after a period of delinquency or inactivity. While some studies model the delinquency stages when analyzing the repayment behavior, our analysis mainly focuses on the terminal loss outcomes (default/prepayment). Since the observed default losses include those incurred during the delinquency periods, we adopt a single-step default framework for simplicity.

Remark 1.

We develop a discrete-time multi-period model, which naturally incorporates discounting by the risk-free interest rate. Notably, when

K = 1

, the model reduces to a single-period framework where we can set

r = 0

(i.e., no discounting), making it consistent with many classical single-period models in the literature (i.e., Gordy 2003; Vasicek 1991).

2.2. The Correlation Structure for the Portfolio Loss

In the following, we assume that the cash flow of payments for the N loans are influenced by some common systematic and macroeconomic factors

Y

, e.g., refinancing interest rate, GDP growth rate, unemployment rate, and industry-performance factors, which can vary through time. For

1 \leq i \leq N

, the repayment behavior of the ith loan is also determined by some extra idiosyncratic factors

x_{i}

, which remain unchanged through time. Here,

x_{1}, x_{2}, \dots, x_{N}

have the same category and dimension. For example, Sirignano et al. (2016) and Sirignano and Giesecke (2019) consider the credit score, LTV ratio, initial loan rate, type of loan, collateral type, and geographic location as the individual factors. Regarding the factors

Y

and

x_{i}

, we make the following two assumptions.

Assumption 2.

For

1 \leq i \leq N

,

j = 1, 2

, and

1 \leq k \leq K

, there exist continuous functions

q^{(j)} (k, x, y)

,

b^{(j)} (k, x, y)

, and

B^{(j)} (k, x, y)

such that

\begin{matrix} P (τ_{i}^{(j)} = k | τ_{i} > k - 1, Y = y) & = q^{(j)} (k, x_{i}, y), \end{matrix}

(4)

\begin{matrix} E [b_{i}^{(j)} | τ_{i} = k, D_{i} = j, Y = y] & = b^{(j)} (k, x_{i}, y), \end{matrix}

(5)

\begin{matrix} E [{(b_{i}^{(j)})}^{2} | τ_{i} = k, D_{i} = j, Y = y] & = B^{(j)} (k, x_{i}, y) . \end{matrix}

(6)

Assumption 3.

The N random vectors

(τ_{i}^{(1)}, τ_{i}^{(2)}, b_{i}^{(1)}, b_{i}^{(2)}), 1 \leq i \leq N

are mutually independent conditional on

Y

.

Assumption 2 means that the loss probabilities of the ith loan are fully determined by the individual factors

x_{i}

and systematic factors

Y

. Given

Y = y

, the repayment behavior of the ith loan is illustrated in Figure 1. At each time

k = 1, \dots, K

, the ith loan is exposed to the default risk and prepayment risk, with the probability

q^{(1)} (k, x_{i}, y)

of default and the probability

q^{(2)} (k, x_{i}, y)

of prepayment. Although the actual loss values are not uniquely determined by the individual factors

x_{i}

and systematic factors

Y

, the first and second moments of the loss distribution (equivalently, its expectation and variance) are fully specified by these factors.

Figure 1. The repayment behavior of the ith loan.

Assumption 3 implies that the repayment behaviors of the N loans are independent conditional on the systematic factors

Y

, which is a standard modeling approach in the literature (see Credit Suisse Financial Products 1997; Gordy 2003). Note that for some

1 \leq i \leq N

, the components in

(τ_{i}^{(1)}, τ_{i}^{(2)}, b_{i}^{(1)}, b_{i}^{(2)})

need not be conditionally independent.

Some commonly used models for

q^{(j)} (k, x, y), j = 1, 2

in the literature are listed in Example 1.

Example 1.

Three different models for

q^{(j)} (k, x, y), j = 1, 2

are presented in Table 1. While the predictions of default and prepayment probabilities for the linear regression model may fall outside the range

[0, 1]

, this limitation is resolved by the subsequent use of the multinomial logistic model (Calhoun and Deng 2002). Additionally, the proportional hazard model incorporates the baseline hazard functions

q_{0}^{(j)} (k)

, which capture the time-dependent structure of the baseline risk for each event j at the time period k.

Table 1. Some models for

q^{(j)} (k, x, y)

in the literature.

Moreover, some widely used models for

b_{i}^{(j)}

,

b^{(j)} (k, x, y)

, and

B^{(j)} (k, x, y)

in Assumption 2 are given below.

Example 2.

Assume that each loan is repaid by equal installments. For the ith loan, denote by

r_{i}

its fixed loan rate,

R_{i}

the fixed periodic payment for K periods, and

M_{i} (k)

the outstanding balance at time k,

1 \leq k \leq K

. By the discounted cash flow formula,2 we have

R_{i} = \frac{e^{r_{i}} (1 - e^{- r_{i}})}{1 - e^{- K r_{i}}} and M_{i} (k) = \frac{e^{r_{i}} (1 - e^{- (K - k + 1) r_{i}})}{1 - e^{- K r_{i}}} .

(7)

Note that

r_{i}

is one of the individual factors in

x_{i}

, so both

R_{i}

and

M_{i} (k)

in (7) are functions of

x_{i}

.

For the ith loan, the default loss at time k is a percentage of the outstanding balance

M_{i} (k)

(Flores et al. 2010). As modeled in Moody’s Mortgage Portfolio Analyzer (MPA) methodology (Stein et al. 2011), the loss

b_{i}^{(1)}

at time k is given by

b_{i}^{(1)} = M_{i} (k) \cdot L G D_{i} and L G D_{i} \sim B e t a (α (x_{i}, y), β (x_{i}, y))

conditional on

Y = y

, where

L G D_{i}

is a Beta-distributed ratio of loss given default (LGD) with parameters

α (x_{i}, y)

and

β (x_{i}, y)

. Consequently, we have

b^{(1)} (k, x_{i}, y) = M_{i} (k) \frac{α (x_{i}, y)}{α (x_{i}, y) + β (x_{i}, y)}

and

B^{(1)} (k, x_{i}, y) = M_{i}^{2} (k) \frac{α (x_{i}, y) (α (x_{i}, y) + 1)}{(α (x_{i}, y) + β (x_{i}, y)) (α (x_{i}, y) + β (x_{i}, y) + 1)} .

Alternatively, the prepayment losses are determined by interest rate differentials (Jones and Chen 2016; Richard and Roll 1989). For the ith loan, the prepayment loss at time k is given by

b_{i}^{(2)} = b^{(2)} (k, x_{i}, y) = R_{i} \sum_{h = 0}^{K - k} e^{- h r} - M_{i} (k) = \frac{1 - e^{- (K - k + 1) r}}{1 - e^{- r}} R_{i} - M_{i} (k)

(8)

conditional on

Y = y

, where

R_{i} \sum_{h = 0}^{K - k} e^{- h r}

represents the risk-free discounted value of the unpaid balance at time k, and

M_{i} (k)

is the outstanding loan balance in (7). Consequently, we have

B^{(2)} (k, x_{i}, y) = b^{(2)} {(k, x_{i}, y)}^{2}

with

b^{(2)} (k, x_{i}, y)

in (8).

Denote

S (k, x, y) : = \prod_{h = 1}^{k} (1 - q^{(1)} (h, x, y) - q^{(2)} (h, x, y)), 0 \leq k \leq K .

(9)

In the following, we derive some results for the conditional distribution of

(τ_{i}, D_{i}), 1 \leq i \leq N

defined in (1).

Lemma 1.

Suppose that Assumptions 1–3 hold.

(i): The conditional survival function of the ith loan at time k is

$P (τ_{i} > k | Y = y) = S (k, x_{i}, y), 1 \leq k \leq K - 1,$

(10)

where $S (k, x, y)$ is defined in (9).
(ii): The conditional joint distribution of $(τ_{i}, D_{i})$ is given by

$P (τ_{i} = k, D_{i} = j | Y = y) = \{\begin{matrix} 0, & if j = 0, 1 \leq k \leq K - 1, \\ S (K, x_{i}, y), & if j = 0, k = K, \\ S (k - 1, x_{i}, y) q^{(j)} (k, x_{i}, y), & if j = 1, 2, 1 \leq k \leq K, \end{matrix}$

where $q^{(j)} (k, x, y)$ is defined in (4).

For later use, below we derive the expectations and variances of

l_{i}

and

L_{N}

conditional on

Y = y

, as functions of

q^{(j)} (k, x, y)

,

b^{(j)} (k, x, y)

, and

B^{(j)} (k, x, y)

. For convenience, define

m (x, y) = (m_{1} (x, y), m_{2} (x, y)) and V (x, y) = {(v_{i, j} (x, y))}_{2 \times 2}

(11)

with

\begin{matrix} m_{j} (x, y) = & \sum_{k = 1}^{K} S (k - 1, x, y) q^{(j)} (k, x, y) b^{(j)} (k, x, y) e^{- k r}, j = 1, 2, \end{matrix}

(12)

\begin{matrix} v_{j, j} (x, y) = & \sum_{k = 1}^{K} S (k - 1, x, y) q^{(j)} (k, x, y) B^{(j)} (k, x, y) e^{- 2 k r} - m_{j} {(x, y)}^{2}, j = 1, 2, \\ v_{1, 2} (x, y) = & v_{2, 1} (x, y) = - m_{1} (x, y) m_{2} (x, y) . \end{matrix}

(13)

Proposition 1.

Suppose that Assumptions 1–3 hold, and then we have

E [l_{i} | Y = y] = m (x_{i}, y) and Var [l_{i} | Y = y] = V (x_{i}, y),

(14)

with

m (x, y)

and

V (x, y)

defined in (11). Moreover,

E [L_{N} | Y = y] = \frac{1}{N} \sum_{i = 1}^{N} m (x_{i}, y) and Var [L_{N} | Y = y] = \frac{1}{N^{2}} \sum_{i = 1}^{N} V (x_{i}, y) .

(15)

Proposition 1 lays a basic setup for deriving the limit distributions in Section 3. Some specific models are introduced in Section 3.3.

Remark 2.

To better reflect the influence of default and prepayment behaviors, when calculating the discounted loss amounts from default and prepayment in Proposition 1, we employ a constant market interest rate. The extensions to the context of stochastic interest rate can be also similarly considered.

3. The Limiting Distribution of the Portfolio Loss

In this section, we study the limit properties of the portfolio loss

L_{N}

as

N \to \infty

, including the law of large numbers, the central limit theorem, and an upper bound on the error term. For technical uses, two assumptions are introduced below.

Assumption 4.

There exists a distribution function

F_{X}

, such that

F_{X}^{(N)} (x) : = \frac{1}{N} \sum_{i = 1}^{N} 1_{{x_{i} \leq x}} \overset{w}{\to} F_{X} (x) as N \to \infty,

(16)

where “

\overset{w}{\to}

” means the weak convergence.

Assumption 5.

There exists some

ε > 0

, such that

det (Var [l_{i} | Y]) > ε

a.s.,

1 \leq i \leq N

.

Assumption 4 characterizes the limiting behavior of the frequency counts of individual factors

x_{i}, 1 \leq i \leq N

in the mortgage pool. Aligning conceptually with Assumption 2.2 of Giesecke et al. (2013) and Condition 2.1 of Giesecke et al. (2015), it implies that as the pool size grows, the distribution of these factors converges to a simpler macroscopic characterization. This assumption allows us to model the overall features of the portfolio instead of focusing on the individual data characteristics under the large-sample context.

Assumption 5 imposes a lower bound on the determinant of the covariance matrix

Var [l_{i} | Y]

. This condition rules out some degenerate scenarios by ensuring that (i) both default loss and prepayment loss exhibit non-trivial variability (i.e., their variances are bounded away from zero) and reflect the realistic fluctuations in loan portfolios; (ii) the losses are not perfectly linearly dependent, excluding the artificial cases where one loss becomes completely predictable given the other; and (iii) the joint distribution maintains a genuine two-dimensional randomness structure, preventing pathological concentration of risk. This mathematical formulation effectively captures the natural dispersion and imperfect dependence between default and prepayment behaviors observed practically.

3.1. The Strong Law of Large Numbers

First, the strong law of large numbers on

L_{N}

as

N \to \infty

is provided. For ease of exposition, define

μ (y) = \int m (x, y) d F_{X} (x) and Σ (y) = \int V (x, y) d F_{X} (x),

(17)

with

m (x, y)

and

V (x, y)

given in (11), and

F_{X}

introduced in Assumption 4. For later use, denote

μ (y) = (μ_{1} (y), μ_{2} (y))

and

Σ (y) = {(Σ_{i, j} (y))}_{2 \times 2}

.

Proposition 2.

Suppose that Assumptions 1–3 hold. Let

L_{N}

be as in (3).

(i): As $N \to \infty$ ,

$L_{N} - E [L_{N} | Y] \to 0, P - a . s .,$

(18)

where $E [L_{N} | Y = y]$ is given in (15).
(ii): Suppose further that Assumption 4 holds. Then, as $N \to \infty$ ,

$L_{N} \to μ (Y), P - a . s .,$

(19)

with $μ (y)$ defined in (17).

Intuitively, Proposition 2 (i) states that for a portfolio with a large number of individuals,

L_{N}

converges to its conditional expectation. Note that

E [L_{N} | Y = y]

in (15) is determined by both systematic factors

Y

and individual factors

x_{i}, 1 \leq i \leq N

. Compared to (i), Proposition 2 (ii) establishes a simpler, almost sure limit of

L_{N}

in terms of the limit distribution

F_{X}

. As shown in (19), the limit

μ (Y)

depends only on the vector of systematic factors

Y

, which requires less underlying information and provides greater computational efficiency via the simplified expression.

Recall that

μ (y) = (μ_{1} (y), μ_{2} (y))

. The following examples demonstrate some straightforward applications of Proposition 2 to the cases where the dimension of

Y

is one or two.

Example 3.

(i) Suppose that

Y

is a one-dimensional random variable (denoted by Y),

μ_{1} (y)

is a continuous increasing function, and

μ_{2} (y)

is a continuous decreasing function. Then, we have

lim_{N \to \infty} P (L_{N}^{(1)} \leq x) = P (μ_{1} (Y) \leq x) = P (Y \leq μ_{1}^{- 1} (x)) = F_{Y} (μ_{1}^{- 1} (x))

and

lim_{N \to \infty} P (L_{N}^{(2)} \leq x) = P (μ_{2} (Y) \leq x) = P (Y \geq μ_{2}^{- 1} (x)) = 1 - F_{Y} (μ_{2}^{- 1} (x)),

where

F_{Y}

is the distribution function of Y. In this setting,

L_{N}^{(1)}

and

L_{N}^{(2)}

nearly exhibit the counter-monotonicity as

N \to \infty

. This is because the limit of

L_{N}^{(1)}

increases in Y, while the limit of

L_{N}^{(2)}

decreases in Y.

(ii) Suppose that

Y

is a two-dimensional random vector and

μ (y) : R^{2} \to R^{2}

is a continuous differentiable bijection. Then, the limit of probability density function of

L_{N}

as

N \to \infty

can be approximated by

f_{μ (Y)} (x_{1}, x_{2}) = f_{Y} (μ^{- 1} (x_{1}, x_{2})) \frac{\partial^{2} (μ^{- 1} (x_{1}, x_{2}))}{\partial x_{1} \partial x_{2}},

where

μ^{- 1} (x_{1}, x_{2})

denotes a vector

(y_{1}, y_{2})

satisfying

μ (y_{1}, y_{2}) = (x_{1}, x_{2})

.

For a random variable X, the VaR of X at level

α \in (0, 1)

is defined as

{VaR}_{α} : = inf {x \in R : P (X \leq x) \geq α}

. The result (19) leads to the following corollary on the VaR of

L_{N}^{(j)}, j = 1, 2

as

N \to \infty

.

Corollary 1.

Suppose that Assumptions 1–4 hold. For any

α \in (0, 1)

and

ε > 0

, we have

lim_{N \to \infty} P (L_{N}^{(j)} \leq {VaR}_{α} (μ_{j} (Y)) + ε) \geq α, j = 1, 2,

and

lim_{N \to \infty} P (L_{N}^{(j)} \leq {VaR}_{α} (μ_{j} (Y)) - ε) \leq α, j = 1, 2 .

Given Corollary 1, the VaR of

μ (Y)

can be applied to approximate the VaR of

L_{N}^{(j)}, j = 1, 2

for large N.

3.2. The Central Limit Theorem

In the following, a central limit theorem (CLT) is provided for

L_{N}

in (3). Denote by

I_{n}

the

n \times n

identity matrix. For a positive-definite matrix A,

A^{- 1 / 2}

is defined as its inverse square root via spectral decomposition; see the footnote for details.3

Theorem 1.

Suppose that Assumptions 1–3 and 5 hold.

(i): Conditional on $Y = y$ ,

$Σ_{N} {(y)}^{- \frac{1}{2}} (L_{N} - μ_{N} (y)) \overset{d}{⟶} N (0, I_{2})$

(20)

as $N \to \infty$ , where

$μ_{N} (y) = \frac{1}{N} \sum_{i = 1}^{N} m (x_{i}, y) and Σ_{N} (y) = \frac{1}{N^{2}} \sum_{i = 1}^{N} V (x_{i}, y)$

(21)

with $m (x, y)$ and $V (x, y)$ given in (11).
(ii): Suppose further that Assumption 4 holds. Then, conditional on $Y = y$ ,

$\sqrt{N} (L_{N} - μ (y)) \overset{d}{⟶} N (0, Σ (y))$

(22)

as $N \to \infty$ , with $μ (y)$ and $Σ (y)$ given in (17).

Similar to Proposition 2,

μ_{N} (y)

and

Σ_{N} (y)

in Theorem 1 (i) include the information of both systematic factors

Y

and individual factors

x_{1}, x_{2}, \dots, x_{N}

, while

μ (y)

and

Σ (y)

in Theorem 1 (ii) are only determined by

Y

and the asymptotic distribution

F_{X}

. The simplified expression in (ii) reduces the data requirements while enhancing the computational efficiency.

Based on Theorem 1, the following corollary further studies the CLT on the default loss

L_{N}^{(1)}

, the prepayment loss

L_{N}^{(2)}

, and the total loss

L_{N}^{(1)} + L_{N}^{(2)}

. Recall that

Σ (y) = {(Σ_{i, j} (y))}_{2 \times 2}

.

Corollary 2.

Suppose that Assumptions 1–5 hold. Then, conditional on

Y = y

, we have

\sqrt{N} (L_{N}^{(j)} - μ_{j} (y)) \overset{d}{⟶} N (0, Σ_{j, j} (y)), j = 1, 2

and

\sqrt{N} (L_{N}^{(1)} + L_{N}^{(2)} - μ_{1} (y) - μ_{2} (y)) \overset{d}{⟶} N (0, Σ_{1, 1} (y) + 2 Σ_{1, 2} (y) + Σ_{2, 2} (y)),

as

N \to \infty

.

Corollary 2 follows directly from Theorem 2.3 in Van der Vaart (2000) on preserving the asymptotic normality under linear transformations. The proof is therefore omitted.

Theorem 1 (i) implies that conditional on

Y = y

, the distribution of portfolio loss

L_{N}

with large number N can be approximated by the normal distribution

N (μ_{N} (y), Σ_{N} (y))

. Denote by

F_{Y}

the distribution function of

Y

and

Φ_{(μ, Σ)}

is the distribution function of normal distribution

N (μ, Σ)

. Then, the joint distribution of

L_{N}

can be approximated by

P (L_{N} \leq x) = E [P (L_{N} \leq x | Y)] \approx E Φ_{(μ_{N} (Y), Σ_{N} (Y))} (x) = \int Φ_{(μ_{N} (y), Σ_{N} (y))} (x) d F_{Y} (y) .

(23)

Moreover, the error term in approximation (23) admits an explicit upper bound, as presented below.

Theorem 2.

Suppose that Assumptions 1–3 and 5 hold. The error term in (23) is bounded by

|P (L_{N} \leq x) - \int Φ_{(μ_{N} (y), Σ_{N} (y))} (x) d F_{Y} (y)| \leq \frac{264}{\sqrt{N ε}},

with ε introduced in Assumption 5.

Theorem 2 implies that the error term for the distribution function of

L_{N}

converges to zero at the rate

O (N^{- 1 / 2})

.

3.3. Numerical Analysis for Some Models

Under certain assumptions, the limiting portfolio loss follows the SLLN in Section 3.1 and CLT in Section 3.2, respectively. Specifically, Proposition 2 states that the portfolio loss

L_{N}

can be approximated by

μ (Y)

in (19). However, sometimes it is difficult to calculate

μ (Y)

due to the large number of parameters in

Y

and the complexity of the expressions involved in

μ

. Hence, this part introduces two simplified models, where

μ (Y)

can be directly represented.

3.3.1. The Asset Value Model

The asset value models introduced by Merton (1974) and Finger (1999) are widely used in the industry, where the loan’s repayment behavior is determined by its underlying asset value (or return rate). Based on the repayment solvency considerations, we assume that the default occurs when the asset value falls below a predetermined threshold, while the prepayment is triggered when the asset value exceeds a higher threshold. For simplicity, we abstract away from temporal dynamics and consider a single-period setting (

K = 1

and

r = 0

).

In this model, the vector

Y

consists of n market factors, jointly following a standard multivariate normal distribution

N (0, I_{n})

. The vector

x_{i}

comprises the n coefficients (sensitivities) of the ith borrower to these market factors. The return rate is defined as

V_{i} = x_{i}^{'} Y + \sqrt{1 - x_{i}^{'} x_{i}} \cdot e_{i} \sim N (0, 1),

where

e_{i} \overset{i . i . d .}{\sim} N (0, 1)

are idiosyncratic performance terms independent of

Y

. Let

c_{1}

and

c_{2}

denote the critical thresholds for default and prepayment, respectively, defined in terms of the return rate. In this case, the probabilities of default and prepayment for the ith loan are given by

\begin{matrix} q^{(1)} (1, x_{i}, y) = & P (V_{i} < c_{1} | Y = y) = P (x_{i}^{'} y + \sqrt{1 - x_{i}^{'} x_{i}} \cdot e_{i} < c_{1}) = Φ (\frac{c_{1} - x_{i}^{'} y}{\sqrt{1 - x_{i}^{'} x_{i}}}), \\ q^{(2)} (1, x_{i}, y) = & P (V_{i} > c_{2} | Y = y) = P (x_{i}^{'} y + \sqrt{1 - x_{i}^{'} x_{i}} \cdot e > c_{2}) = Φ (- \frac{c_{2} - x_{i}^{'} y}{\sqrt{1 - x_{i}^{'} x_{i}}}) . \end{matrix}

For simplicity, assume the default loss is the constant

{\bar{b}}_{1}

and the prepayment loss is the constant

{\bar{b}}_{2}

. Then, we obtain

m_{j} (x, y) = {\bar{b}}_{j} q^{(j)} (1, x, y)

from (12). Building on the Equation (17), the entries of

μ (y)

can be expressed as

\begin{matrix} μ_{1} (y) = & \int {\bar{b}}_{1} q^{(1)} (1, x, y) d F_{X} (x) = {\bar{b}}_{1} \int Φ (\frac{c_{1} - x^{'} y}{\sqrt{1 - x^{'} x}}) d F_{X} (x), \end{matrix}

(24)

\begin{matrix} μ_{2} (y) = & \int {\bar{b}}_{2} q^{(2)} (1, x, y) d F_{X} (x) = {\bar{b}}_{2} \int Φ (- \frac{c_{2} - x^{'} y}{\sqrt{1 - x^{'} x}}) d F_{X} (x) . \end{matrix}

(25)

3.3.2. The Markov Model

Consistent with the transform mechanism of the Markov model (see Jarrow et al. 1997), we assume that the probabilities of default and prepayment are fixed during the payment period, i.e.,

q^{(j)} (k, x, y) = {\bar{q}}^{(j)} (x, y)

for

1 \leq k \leq K

,

j = 1, 2

. Specifically, it follows the multinomial logistic model given by Example 1:

q^{(j)} (k, x, y) = {\bar{q}}^{(j)} (x, y) = \frac{exp {β_{0}^{(j)} + β_{X}^{(j)'} x + β_{Y}^{(j)'} y}}{1 + \sum_{j = 1}^{2} exp {β_{0}^{(j)} + β_{X}^{(j)'} x + β_{Y}^{(j)'} y}} .

(26)

For simplicity, we assume the default and prepayment losses are proportional to the remaining repayment period, expressed as

b_{i}^{(j)} = b^{(j)} (k, x, y) = {\bar{b}}_{j} \frac{K - k + 1}{K}, given τ_{i} = k, D_{i} = j

(27)

for

j = 1, 2

. This linear scaling implies that losses decrease uniformly over time. We do not assume the special forms of

F_{Y} (y)

and

F_{X} (x)

in Assumption 4. Under this setup, the functions

m_{j} (x, y)

,

j = 1, 2

defined in (12) have the following specific form.

Lemma 2.

Suppose that (26) and (27) hold. Then,

m_{j} (x, y)

defined in (12) is given by

m_{j} (x, y) = \frac{{\bar{b}}_{j} {\bar{q}}^{(j)} (x, y)}{K e^{r}} \cdot \frac{K - (K + 1) a (x, y) + a {(x, y)}^{K + 1}}{{(1 - a (x, y))}^{2}}, j = 1, 2,

(28)

where

a (x, y) = (1 - {\bar{q}}^{(1)} (x, y) - {\bar{q}}^{(2)} (x, y)) / e^{r}

.

Subsequently, the function

μ (y)

is given by (21) with

m (x, y)

specified in Lemma 2.

3.3.3. Simulation Results

Based on the simplified theoretical loss models, we conduct the Monte Carlo simulations to analyze the loss distributions and compare them with the log-normal distribution proposed by Moody’s Investors Service (2024). For each model, we perform the simulations with a portfolio of 10,000 loans and 10,000 Monte Carlo trials. The key parameters include the payment periods K, the risk-free interest rate r, the distribution

F_{Y} (y)

of

Y

and

F_{X} (x)

in Assumption 4, and the functions

q^{(j)} (k, x, y)

and

b^{(j)} (k, x, y)

in Assumption 2. These parameters are provided differently in the models of Section 3.3.1 and Section 3.3.2. The simulation takes the following steps.

(i): The individual and systematic factors. First, we generate the individual factors $x_{i}$ from the distribution $F_{X} (x)$ and fix them in the subsequent steps. Then, in each simulation, we generate the systematic factors $Y$ from the distribution $F_{Y} (y)$ , representing an economic scenario.
(ii): The repayment behaviors. Given the factors $x_{i}$ and $Y = y$ in each simulation, the repayment behaviors $(τ_{i}, D_{i})$ are simulated as in Figure 1, with default/prepayment rates $q^{(j)} (k, x, y)$ , $j = 1, 2$ . For simplicity, the default/prepayment losses are fixed at their expected values $b^{(j)} (k, x, y)$ .
(iii): The portfolio losses. In each simulation, the loan losses $l_{i}$ , $1 \leq i \leq N$ in (2) are obtained based on the first two steps, and the portfolio loss $L_{N}$ is derived by (3). By repeating this process across 10,000 economic scenarios, we generate an empirical distribution of $L_{N}$ .

In the following, denote the distribution function of the Beta(

α, β

) distribution by

I (x; α, β) : = \frac{1}{B (α, β)} \int_{0}^{x} u^{α - 1} {(1 - u)}^{β - 1} d u, 0 \leq x \leq 1, α, β > 0 .

Based on the asset value model, Figure 2 presents the simulations for the limit distribution of portfolio loss

L_{N}

under different parameter settings. The main parameters are set based on the existing literature. The default rates

8 %

and

5 %

align with the B-rated bond studies (

5 %

in Frey and McNeil (2003),

6.25 %

in Gordy (2003), and

7.27 %

in Credit Suisse Financial Products (1997)), while the prepayment rates were assumed based on Deng et al. (2000). Since the loan recovery rate are set as

28.29 %, 38.28 %

, and

47.54 %

in Credit Suisse Financial Products (1997) for different seniority levels and securities, we adopt

20 %

and

40 %

, respectively, in Figure 2. Some other parameters are set according to the asset correlations derived from Flores et al. (2010), while the rest of the parameters were selected for the purpose of simulation.

Figure 2. The joint distribution, copula, and marginal distributions of

L_{N}

for the asset value model. Note: The upper panel and lower panel exhibit the joint distribution, copula, and marginal distributions of

L_{N}

for two cases of the asset value model in (24) and (25), respectively. In the upper panel, we set

n = 2

,

{\bar{b}}_{1} = 0.8

,

{\bar{b}}_{2} = 0.2

,

c_{1} = Φ^{- 1} (0.08) = - 1.405

,

c_{2} = Φ^{- 1} (0.8) = 0.842

,

F_{X} (x) = x_{1} x_{2} / 0.48, 0 \leq x \leq (0.8, 0.6)

, and

Y \sim N (0, I_{2})

. In the lower panel, we set

n = 2

,

{\bar{b}}_{1} = 0.6

,

{\bar{b}}_{2} = 0.3

,

c_{1} = Φ^{- 1} (0.05) = - 1.645

,

c_{2} = Φ^{- 1} (0.7) = 0.524

,

F_{X} (x) = I (x_{1}; 2, 3) I (x_{2}; 4, 4)

, and

Y \sim N (0, I_{2})

.

The results in Figure 2 show the clear negative correlation between the loss of default and the loss of prepayment. The empirical distributions of both default and prepayment losses in the simulated data show strong agreement with the log-normal distributions. Furthermore, the loss distribution exhibits a unimodal pattern, peaked at relatively small loss values and decreasing monotonically as the losses increase.

Under the above Markov model, Figure 3 presents the simulations illustrating the distribution of portfolio loss

L_{N}

under different parameters. The parameter selection is similar to that in Figure 2. The results show that the marginal distributions of

L_{N}

fit the log-normal distribution well and further validate the log-normal assumption in Moody’s methodology (Moody’s Investors Service 2024).

Figure 3. The joint distribution, copula, and marginal distributions of

L_{N}

for the Markov model. Note: The upper panel and lower panel exhibit the joint distribution, copula, and marginal distributions of

L_{N}

for two cases of the Markov model in (26) and (27), respectively. In both cases, we set

K = 60

,

r = 0.003

,

{\bar{b}}_{1} = 0.8

,

{\bar{b}}_{2} = 0.2

, and

(β_{0}^{(1)}, β_{X}^{(1)}, β_{Y}^{(1)}) = (- 7, - 2, 0.2, 1, 0.6)

. The upper panel corresponds to

(β_{0}^{(2)}, β_{X}^{(2)}, β_{Y}^{(2)}) = (- 8, 0.2, 1, 0.8, 1)

,

F_{X} (x) = x_{1} x_{2} / 0.48, 0 \leq x \leq (0.8, 0.6)

, and

Y \sim N ((1, 1), ((1, - 0.2), (- 0.2, 1)))

. The lower panel corresponds to

(β_{0}^{(2)}, β_{X}^{(2)}, β_{Y}^{(2)}) = (- 8, 0.2, 1, 1, 0.8)

,

F_{X} (x) = I (x_{1}; 2, 3) I (x_{2}; 4, 4)

, and

Y \sim N ((1, 1), ((1, 0.2), (0.2, 1)))

.

4. Empirical Studies

In this section, we present an empirical analysis of the residential mortgage-backed security (RMBS) under our setup of jointly modeling the default and prepayment. As a type of ABS, the RMBS is created by pooling numerous residential mortgage loans and converting them into tradable investment instruments. The investors can invest into the RMBS according to their personal risk preferences. Nowadays, the RMBS stands as the largest proportion of the cash securitization market globally, especially popular in the US, continental Europe, the UK, Australia, etc.4

The residential mortgage loans in the RMBS typically exhibit distinct characteristics, such as fixed or adjustable interest rates, long-term maturities (e.g., 15–30 years), and amortizing payment structures. The RMBS is also exposed to the default and prepayment risks. The default is usually driven by borrower financial distress, declining property values, or macroeconomic shocks, leading to the credit losses. The prepayment is triggered by refinancing (e.g., rate declines) or home sales, altering the cash flow timing and reinvestment risk. The performances of residential mortgages in the RMBS are affected by a series of factors, including the borrower quality, underwriting guidelines, and servicer/originator quality.

As an application of our theoretical framework, the empirical study pursues two objectives. First, we incorporate both default and prepayment risks into the portfolio loss model of the RMBS and empirically investigate how the factors significantly affect the default and prepayment. Second, we use the estimated hazards to generate bottom-up Monte Carlo simulations of portfolio losses. Note that it is complex to represent the asymptotic conditional expectation

μ (Y) = (μ_{1} (Y), μ_{2} (Y))

of portfolio loss in Proposition 2. Our empirical results suggest that this conditional expectation of portfolio loss is approximately log-normally distributed, thereby enriching the theoretical model with distributional insight grounded in data.

The study employs Freddie Mac’s single-family loan dataset issued in 2000 and focuses on the 15-year mortgages with monthly repayments. The dataset with full origination and monthly performance records has become a benchmark source for mortgage credit studies (see, e.g., Goodman et al. 2014; Goodman and Zhu 2015; and Bhattacharya et al. 2019). Since this standard dataset consists solely of fixed-rate loans, we restrict the analysis to this loan type. The dataset of fixed year 2000 is selected to ensure a sufficiently long repayment observation period (up to 15 years/180 months as analyzed). Robustness checks confirm that the result does not depend on the choice of the year.

4.1. Model Setup for RMBS Product

With the notation introduced in Section 2, suppose that there are N mortgages in the pool of the RMBS, with the common repayment period of K months. For each mortgage, there are three mutually exclusive states in each month: default, prepaid, or active (normal amortization). The default and prepayment risks of an individual loan are modeled by the Cox proportional hazard functions, which depend on the systematic factors

Y = (Y (1), \dots, Y (K))

and the vector of individual factors

x

fixed at the issue date.

4.1.1. Systematic Factors

The systematic factors

Y = (Y (1), \dots, Y (K))

represent a multivariate time series of macroeconomic variables. Denote the vector of systematic factors observed in the kth month by

Y (k) = (δ (k), H P C (S_{1}, k), \dots, H P C (S_{d}, k)) .

The first component

δ (k)

is the U.S. Treasury long-term yield rate and serves as the refinancing benchmark, which is distinct from the contractual loan rate

r_{i}

set for individual borrowers. The difference

r_{i} - δ (k)

captures the evolving interest rate spread that affects both prepayment for refinancing and default decisions. The symbols

S_{1}, S_{2}, \dots, S_{d}

represent d administrative states in the U.S., so that each mortgage in the pool is located in some state. For

1 \leq m \leq d

,

H P C (S_{m}, k)

denotes the state-level house price changes (HPCs) in the state

S_{m}

at the kth month and captures the house price dynamics, defined by

H P C (S_{m}, k) = log H P I (S_{m}, k) - log H P I (S_{m}, k - 1),

(29)

where

H P I (S_{m}, k)

is the house price index (HPI) for the state

S_{m}

at the kth month. By collecting the systematic factors among

k = 1, \dots, K

, the obtained vector

Y

is random with the dimension

K (d + 1)

.

4.1.2. Individual Factors

For the ith loan, the individual factors are collected as

x_{i} = (r_{i}, C S_{i}, L T V_{i}, D T I_{i}, F T B_{i}, N B_{i}, P S_{i}),

where

r_{i}

is the original loan rate,

C S_{i}

is the borrower’s credit score,

L T V_{i}

is the loan-to-value ratio,

D T I_{i}

is the debt-to-income ratio,

F T B_{i}

is the indicator for first-time home-buyer (1 for yes and 0 for no), and

N B_{i}

is the number of borrowers on the loan. The final coordinate

P S_{i} \in {S_{1}, S_{2}, \dots, S_{d}}

identifies the U.S. state where the property is located. This identifier links the ith loan to the time series of state-level house price changes

H P C (P S_{i}, k)

.

Based on the systematic and individual factors, the competing risks (default and prepayment) are specified by the CPH model demonstrated in Table 1. The CPH framework is widely adopted in mortgage research and practice; see, for example, Deng et al. (2000), Li et al. (2023), and Moody’s Mortgage Portfolio Analyzer (MPA) methodology (Stein et al. 2011). Specifically, the probabilities of default and prepayment in (4) are given by

\begin{matrix} q^{(j)} (k, x_{i}, Y) = & q_{0}^{(j)} (k) exp {β_{r}^{(j)} (r_{i} - δ (k)) + β_{H P C}^{(j)} H P C (P S_{i}, k) + β_{C S}^{(j)} C S_{i} + β_{L T V}^{(j)} L T V_{i} \\ + β_{D T I}^{(j)} D T I_{i} + β_{F T B}^{(j)} F T B_{i} + β_{N B}^{(j)} N B_{i}}, 1 \leq k \leq K \end{matrix}

(30)

for

j = 1, 2

. Here,

q_{0}^{(j)} (k)

is the baseline hazard function and captures the time profile of default or prepayment risk when all the explanatory factors are fixed at the neutral levels. The exponential term in (30) rescales the benchmark

q_{0}^{(j)} (k)

according to the individual characteristics of mortgages and prevailing macroeconomic conditions.

Following the setup in Example 2, the default loss

b^{(1)} (k, x_{i}, Y)

and prepayment loss

b^{(2)} (k, x_{i}, Y)

of the RMBS are specified as

b^{(1)} (k, x_{i}, y) = M_{i} (k) \frac{α (x_{i}, y)}{α (x_{i}, y) + β (x_{i}, y)} and b^{(2)} (k, x_{i}, y) = R_{i} \sum_{h = 0}^{K - k} e^{- h r} - M_{i} (k),

where

R_{i}

and

M_{i} (k)

are defined in the Equation (7), and the functions

α (x_{i}, y), β (x_{i}, y)

are the parameters of Beta distribution. Consequently, the portfolio loss

L_{N}

of the RMBS is calculated by (2) and (3).

4.2. Freddie Mac Single-Family Loan Dataset

Our empirical analysis utilizes the complete Freddie Mac single-family loan dataset,5 which includes all available 15-year fixed-rate mortgages issued in 2000. We partition the data into a training set of 500,000 loans and a test set of 57,202 loans (the remaining parts outside the training set). Both sets are mutually exclusive and randomly selected, preserving the homogeneity. Crucially, our robustness checks confirm that the results also hold even when using only 500,000 randomly sampled loans (vs. the entire 557,202), which indicates that the sensitivity to dataset size or selection is small. The dataset comprises the loans from all 50 U.S. states, with the distribution among states illustrated in Figure 4. The largest share is from California (10.8%), followed by Florida (6.6%), Texas (6.4%), and Illinois (5.4%), and no other state exceeds 5%. The “Others” category aggregates smaller states, accounting for approximately half (54.2%) of the sample, highlighting that the dataset is broadly representative across the geographic regions without significant concentration. In sum, the dataset for estimation consists of

N = 500, 000

loans with

K = 180

monthly repayments.

Figure 4. Distribution of mortgages by state.

The loan status can be deduced from the variables “Zero balance code” and “Defect settlement date” in the dataset. With slight modifications from the definitions outlined in Bhattacharya et al. (2019) due to data updates, the default and prepayment are now defined as follows:

A loan is considered default if, for any month, the “Zero balance code” falls within the set ${02, 03, 09, 15, 16, 96}$ .
A loan is considered prepaid if, for any month, the “Zero balance code” is equal to 01 and the “Defect settlement date” is “NAN”.
A loan remains active if it does not meet the criteria for prepayment or default.

As described in Section 4.1, the systematic factors in our study are the U.S. Treasury long-term yield6

δ (k)

available at a monthly frequency, and the quarterly state-level HPI data,7 from which we compute the house price changes

H P C (S_{m}, k)

. The quarterly HPI data are linearly interpolated to a monthly frequency. The individual factors (i.e., loan rate, CS, LTV, DTI, FTB, and NB) are directly extracted from the dataset, which are fixed at the mortgage issue date.

To derive the loss value of the RMBS, the LGD of defaulted loans are obtained via the variables “Actual Loss Calculation” and “Current Actual UPB” from the dataset,

{LGD}_{i} = \frac{Actual Loss {Calculation}_{i}}{Current Actual {UPB}_{i}} \in [0, 1] .

(31)

Those observations outside the interval

[0, 1]

are discarded.

4.3. Estimation Results

4.3.1. The Coefficients of Factors

Based on the training data of 500,000 mortgages issued at year 2000 (to year 2015) in the Freddie Mac dataset, the coefficient vectors in the joint default–prepayment hazard model (30) are estimated by the maximum partial likelihood method, where the details of the methodology are provided in Appendix B.1. The estimation results are shown in Table 2 and provide some empirical evidence to explain the effect of different factors on the default and prepayment.

Table 2. The coefficient vectors of the CPH model.

First, the borrower behavior is strongly influenced by the market-level systematic factors. Specifically, higher interest rate spreads encourage the borrowers to refinance or sell, substantially increasing the voluntary prepayment; however, the borrowers who are unable to refinance experience greater financial pressure, which leads to higher default risk. Further, the rising house prices enhance the borrower equity, improving the incentives to maintain ownership, thus lowering the default risk (reflected by the negative coefficient of HPC). Conversely, the rising house prices provide easier refinancing opportunities and profitable property sales, increasing the prepayment probabilities (reflected by the positive coefficient of HPC). These findings align with the existing literature (Campbell and Cocco 2015; Quercia and Stegman 1992).

Second, the borrowers’ credit quality and financial structure play a critical role in determining the default risk. Higher credit scores, first-time buyers, and the presence of co-borrowers are associated with significantly lower default probabilities, as indicated by their negative coefficients, reflecting the stronger repayment capabilities and reduced financial distress. In contrast, the single-borrower households demonstrate greater vulnerability to financial distress, leading to elevated default rates. Additionally, higher LTV ratios significantly increase the default risk by reducing the equity buffers and intensifying the financial pressure for borrowers.

Third, the prepayment behavior is accelerated primarily by the stronger refinancing capacity of borrowers. Specifically, the borrowers with higher CS, DTI, and NB prepay more rapidly, reflecting the greater incentives and ability to refinance or sell. This is consistent with the related analysis of prepayment in Munk (2011). In contrast, the prepayment rate is insensitive to the LTV ratio and FTB status.

In summary, the estimation results show that the default and prepayment behaviors of borrowers are affected by distinct economic factors. This can improve the understanding of RMBS loan performance and leads to more accurate portfolio loss forecasts.

4.3.2. Baseline Hazard Functions

Based on the estimated coefficients in Equation (30), we estimate the baseline hazard functions

q_{0}^{(j)} (k)

by matching the expected event numbers with the actual data. The details of the methodology are provided in Appendix B.2. The estimation results for default and prepayment are plotted in Figure 5. The prepayment hazard rate peaks in the first 60 months and flattens from then. The default rate is much lower than the prepayment rate and presents a smoother curve.

Figure 5. The baseline hazard functions.

These patterns have important implications for modeling the borrower behavior and predicting the RMBS cash flows. The significantly higher prepayment risk in the early life of loans underscores the need to capture the time-varying characteristics of the interest rate, which has an influence on the borrowers’ refinancing incentives. On the other hand, the relatively smooth and lower default hazard suggests that the borrowers’ default behaviors are relatively rare and often occur uniformly over time rather than present the clustering characteristics as the prepayment.

4.3.3. The Distribution of LGD

To incorporate the borrower credit quality heterogeneity into the loss severity, we partition the loans into three categories based on the CS percentiles. Specifically, the top

40 %

of loans ordered by CS are labeled as the high credit score borrowers, the middle

40 %

as the moderate, and the bottom

20 %

as the low. For each group, we estimate the LGD distribution by fitting the observed loss data derived by (31) to the Beta distributions, following Moody’s RMBS methodology (Moody’s Investors Service 2024). Table 3 reports the mean of the LGD and the corresponding Beta distribution parameters for each category.

Table 3. The LGD distibutions for different categories of credit scores.

As shown in Table 3, the borrowers with higher credit scores tend to have lower LGD. Incorporating this classification into the simulation allows for more accurate modeling of loss severity across heterogeneous credit risk profiles.

4.3.4. Out-of-Sample Validation

To check the robustness of the estimation procedure, we conduct an out-of-sample validation using a separate test set consisting of 57,202 loans issued in 2000. Figure 6 compares the model-implied cumulative defaults and prepayments against the actual observed values in this test set.

Figure 6. The estimation results of default (left) and prepayment (right) in the test set. Note: the red dotted lines in the lower panels represent zero difference between the expected and actual number of defaults, serving as a baseline for comparing the percentage difference curves.

The close alignment between the estimated and realized outcomes indicates that the estimation procedure is effective. The estimation error is relatively larger in the first few months, primarily due to the limited number of early default/prepayment behaviors, which affects the statistical precision. Nevertheless, the errors decrease quickly over time, indicating that the model captures the underlying risk dynamics reasonably well. These results suggest that our framework is useful for accurately estimating the default and prepayment probabilities in modeling the RMBS, which are essential for further evaluating the portfolio losses.

With a realized systematic factor path

Y = y

fixed (see Appendix B.3 for details of the structure of

y

), we draw

10, 000

independent samples of the portfolio loss

L_{N} = (L_{N}^{(1)}, L_{N}^{(2)})

from the estimated CPH model. Figure 7 plots the empirical distributions of

L_{N}^{(1)}

and

L_{N}^{(2)}

together with the fitted Gaussian densities. The close visual agreement supports the conditional CLT in Theorem 1. The bivariate normal parameters estimated from the

10, 000

draws are

\hat{μ} (y) = (0.482, 10.714), \hat{Σ} (y) = (\begin{matrix} 3.922 \times 10^{- 4} & - 9.668 \times 10^{- 5} \\ - 9.668 \times 10^{- 5} & 1.515 \times 10^{- 4} \end{matrix}),

and the correlation coefficient is

- 0.397 .

The negative correlation between default and prepayment losses reflects the competing-risk nature of these two events in the RMBS portfolios, where the early prepayment naturally reduces the number of loans exposed to default risk. This finding aligns with actual market behavior and highlights the importance of jointly modeling default and prepayment risks.

Figure 7. Empirical limit distributions of default loss (left) and prepayment loss (right).

4.4. Simulation of Portfolio Loss

Recall that in Proposition 2 (ii), the portfolio loss

L_{N}

converges to the conditional mean

μ (Y)

as

N \to \infty

, but the distributions of random variables

μ_{1} (Y)

and

μ_{2} (Y)

in

μ (Y)

do not admit explicit expressions. To further investigate the distribution of

μ (Y)

, in this part we generate the loss distribution of a representative RMBS pool using a bottom-up approach. Given the estimated hazard functions for default and prepayment from Section 4.3, we simulate the losses of individual loans based on the simulated dynamics of systematic risk factors. Then, the portfolio loss for the RMBS is generated.

4.4.1. The Models of Systematic Factors

In our model, both the interest rate and house price changes have influence over the payment period. To simulate the evolution of interest rates, we adopt the Cox–Ingersoll–Ross (CIR) model (Cox et al. 1985), which is widely used due to its ability to capture the non-negativity and mean-reverting nature of interest rates. The discrete-time approximation of the CIR process using the Euler–Maruyama scheme is given by

δ (k + 1) = δ (k) + κ (θ - δ (k)) Δ t + σ \sqrt{δ (k)} \sqrt{Δ t} \cdot Z_{k},

where

Δ t

is the time step (

1 / 12

for monthly frequency),

θ

is the mean of the long-term interest rate,

κ

is the rate of average recovery,

σ

is the volatility, and

{Z_{k}}

are i.i.d.

N (0, 1)

random variables.

The parameters of the CIR model are estimated using the historical U.S. Treasury long-term yield data obtained from the U.S. Department of the Treasury. The estimated results are reported in Table 4. One sample path of the simulated interest rate over a 180-month period is shown in Figure 8. Here, the actual rate curve represents the observed U.S. Treasury long-term yield data from the historical record over the same period. The downward trend in the simulated interest rate arises naturally due to the mean reversion towards the estimated long-run equilibrium rate. As shown in Table 4, the initial historical rate exceeds the equilibrium value

\hat{θ} = 3.4545

. The declining rates widen the loan spreads, subsequently elevating the default and prepayment probabilities and thereby increasing the potential portfolio losses.

Table 4. The estimators of the CIR model.

Figure 8. A simulation of the interest rate curve.

To simulate the dynamics of the house price across the U.S. states, we employ a second-order autoregressive model (AR(2)) on the log house price index series. Specifically, for each state

S_{m}

, the log house price index evolves according to

log H P I (S_{m}, k) = ϕ_{0} + ϕ_{1} log H P I (S_{m}, k - 1) + ϕ_{2} log H P I (S_{m}, k - 2) + σ_{H} Z (S_{m}, k),

where

Z (S_{m}, k) \sim N (0, 1)

,

1 \leq k \leq K

are i.i.d. standard normal shocks in the kth month. Subsequently, the HPC is calculated by the Equation (29).

The estimated AR(2) parameters of all U.S. states are summarized in Table 5. The last row of Table 5 reports the AR(2) estimates for the national-level log house price series, which provides a benchmark for comparing the state-level dynamics. The AR(2) simulations consistently reveal a persistent and autoregressive growth pattern, despite the different initial house price levels across states.

Table 5. Summary statistics of AR(2) parameters for log house price time series across U.S. states.

4.4.2. Simulation Results

Based on the empirical results in Section 4.3, we simulate the RMBS portfolio losses under a bottom-up framework. In each scenario, we simulate one sample of the portfolio default and prepayment loss

(L_{N}^{(1)}, L_{N}^{(2)})

for the pool of 57,202 loans in the test set. By repeating this process over 100,000 times, we generate the distribution of

L_{N}^{(1)}

and

L_{N}^{(2)}

, where the marginal distributions for

L_{N}^{(1)}

and

L_{N}^{(2)}

are presented in Figure 9.

Figure 9. The marginal distributions of the simulated portfolio loss

L_{N}

of the RMBS.

From the right panel in Figure 9, both normal and log-normal distributions provide a good approximation for the distribution of portfolio prepayment loss. In contrast, as seen in the left panel of Figure 9, the log-normal distribution offers a substantially better fit of the portfolio default loss, compared to the normal distribution. This empirical observation provides quantitative evidence for the commonly assumed log-normal specification in the top-down models.

To further empirically examine the distributional form of

L_{N}^{(1)}

and

L_{N}^{(2)}

, we take the logarithm of the simulated portfolio loss and construct the quantile–quantile (QQ) plots against the normal distribution. As shown in Figure 10, the log of losses are close to the theoretical quantiles of a normal distribution, with only mild deviations in the tails. This evidence suggests that

μ (Y)

is approximately log-normally distributed under the simulated macroeconomic scenarios.

Figure 10. The QQ plots for the simulated portfolio loss of the RMBS. Note: The blue line represents the empirical quantiles of the simulated data, while the red straight line corresponds to the theoretical quantiles of a normal distribution.

Moreover, the value-at-risk (VaR) of simulated

L_{N}^{(1)}

and

L_{N}^{(2)}

and the corresponding fitted log-normal distributions are shown in Table 6. The small deviations between the fitted and simulated values reveal that the log-normal distribution provides a reasonable approximation of the portfolio loss distribution. This result not only supports the log-normal assumption widely adopted in the top-down credit risk models but also fills an important gap in our theoretical framework by offering a plausible empirical distribution for the conditional loss mean.

Table 6. The VaR of the simulated

L_{N}^{(1)}

and

L_{N}^{(2)}

of the RMBS.

5. Conclusions

This paper discussed the joint distribution of default and prepayment losses for a large portfolio of loans, extending the previous studies that focused solely on default losses. Under the conditionally independent framework, the two-dimensional limit distributions of the portfolio default and prepayment losses were obtained, including the strong law of large numbers and the central limit theorem. Numerical studies for the portfolio losses were performed for some simplified models. Empirical analysis on the RMBS was carried out for analyzing the limit joint distribution of the default and prepayment losses. The portfolio default and prepayment losses were simulated based on the empirically estimated model parameters. The copula of default and prepayment loss distributions was exhibited. Both the simulated distributions of portfolio default and prepayment losses fitted the log-normal distributions well, which is consistent with the log-normal distribution assumptions for portfolio losses in Moody’s Investors Service (2024).

Author Contributions

Conceptualization, methodology, formal analysis and investigation, C.X., X.Z., L.B., Q.D. and J.Y.; software, validation, data curation and visualization, C.X., L.B. and Q.D.; writing—original draft preparation and writing—review and editing, C.X., X.Z., Q.D. and J.Y.; supervision and funding acquisition, X.Z. and J.Y.; resources and project administration, J.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Key R&D Program of China (Grant No. 2018YFA0703900) and the National Natural Science Foundation of China (Grant Nos. 12471445 and 12071016). The research of Zang was also supported by the National Natural Science Foundation of China (Grant No. 12301598).

Data Availability Statement

The data of the empirical study in this paper is obtained from the Freddie Mac single-family loan dataset released by the Federal Home Loan Mortgage Corporation.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Some Proofs

Appendix A.1. Proof of Lemma 1

Proof.

(i) From Assumption 1,

τ_{i}^{(1)} \neq τ_{i}^{(2)}

a.s. Then, for

1 \leq k \leq K - 1

, we have

\begin{matrix} P (τ_{i} > k | τ_{i} > k - 1, Y = y) = 1 - P (τ_{i} = k | τ_{i} > k - 1, Y = y) \\ = & 1 - P (τ_{i}^{(1)} = k | τ_{i} > k - 1, Y = y) - P (τ_{i}^{(2)} = k | τ_{i} > k - 1, Y = y) \\ = & 1 - q^{(1)} (k, x_{i}, y) - q_{i}^{(2)} (k, x_{i}, y), \end{matrix}

where the probabilities

q^{(j)} (k, x, y), j = 1, 2

are defined in (4), and the last equation is due to Assumption 2. Consequently, the conditional survival function of the ith loan is given by

\begin{matrix} P (τ_{i} > k | Y = y) = & \prod_{h = 1}^{k} P (τ_{i} > h | τ_{i} > h - 1, Y = y) = \prod_{h = 1}^{k} (1 - q^{(1)} (h, x_{i}, y) - q^{(2)} (h, x_{i}, y)) \\ = & S (k, x_{i}, y), 0 \leq k \leq K - 1 \end{matrix}

with

S (k, x, y)

in (9). This yields Equation (10).

(ii) Note that

D_{i} = 0

implies

τ_{i} = K

by Definition (1), so

P (τ_{i} = k, D_{i} = 0 | Y = y) = 0, 1 \leq k \leq K - 1 .

And similar to the proof in (i),

\begin{matrix} P (τ_{i} = K, D_{i} = 0 | Y = y) = & P (τ_{i} > K - 1 | Y = y) P (τ_{i} = K, D_{i} = 0 | τ_{i} > K - 1, Y = y) \\ = & \prod_{h = 1}^{K} (1 - q^{(1)} (h, x_{i}, y) - q^{(2)} (h, x_{i}, y)) = S (K, x_{i}, y) . \end{matrix}

For

j = 1, 2

and

1 \leq k \leq K

, we have

\begin{matrix} P (τ_{i} = k, D_{i} = j | Y = y) = & P (τ_{i} > k - 1 | Y = y) P (τ_{i} = k, D_{i} = j | τ_{i} > k - 1, Y = y) \\ = & P (τ_{i} > k - 1 | Y = y) P (τ_{i}^{(j)} = k | τ_{i} > k - 1, Y = y) \\ = & S (k - 1, x_{i}, y) q^{(j)} (k, x_{i}, y) \end{matrix}

based on (4) and (10). This finishes the proof. □

Appendix A.2. Proof of Proposition 1

Proof.

(i) Given

Y = y

, for

j = 1, 2

, the conditional expectation of

l_{i}^{(j)}

in (2) is given by

\begin{matrix} E [l_{i}^{(j)} | Y = y] = & E [1_{D_{i} = j} b_{i}^{(j)} e^{- r τ_{i}} | Y = y] \\ = & \sum_{k = 1}^{K} P (τ_{i} = k, D_{i} = j | Y = y) E [b_{i}^{(j)} | τ_{i} = k, D_{i} = j, Y = y] e^{- k r} \\ = & \sum_{k = 1}^{K} S (k - 1, x_{i}, y) q^{(j)} (k, x_{i}, y) b^{(j)} (k, x_{i}, y) e^{- k r} = m_{j} (x_{i}, y) \end{matrix}

with

m_{j} (x, y)

defined in (12). Similarly, we get

\begin{matrix} E [{(l_{i}^{(j)})}^{2} | Y = y] = & E [1_{D_{i} = j} {(b_{i}^{(j)})}^{2} e^{- 2 r τ_{i}} | Y = y] \\ = & \sum_{k = 1}^{K} P (τ_{i} = k, D_{i} = j | Y = y) E [{(b_{i}^{(j)})}^{2} | τ_{i} = k, D_{i} = j, Y = y] e^{- 2 k r} \\ = & \sum_{k = 1}^{K} S (k - 1, x_{i}, y) q^{(j)} (k, x_{i}, y) B^{(j)} (k, x_{i}, y) e^{- 2 k r}, \end{matrix}

which yields that

\begin{matrix} Var [l_{i}^{(j)} | Y = y] = & E [{(l_{i}^{(j)})}^{2} | Y = y] - E {[l_{i}^{(j)} | Y = y]}^{2} \\ = & \sum_{k = 1}^{K} S (k - 1, x_{i}, y) q^{(j)} (k, x_{i}, y) B^{(j)} (k, x_{i}, y) e^{- 2 k r} - m_{j}^{2} (x_{i}, y) = v_{j, j} (x_{i}, y) \end{matrix}

with

v_{j, j} (x, y)

defined in (13). Moreover,

\begin{matrix} Cov [l_{i}^{(1)}, l_{i}^{(2)} | Y = y] = & E [l_{i}^{(1)} l_{i}^{(2)} | Y = y] - E [l_{i}^{(1)} | Y = y] E [l_{i}^{(2)} | Y = y] \\ = & - m_{1} (x_{i}, y) m_{2} (x_{i}, y), \end{matrix}

where the last equality is due to

l_{i}^{(1)} l_{i}^{(2)} = 0

, and thus,

E [l_{i}^{(1)} l_{i}^{(2)} | Y = y] = 0

. So, conclusion (14) follows.

(ii) For

L_{N} = \frac{1}{N} \sum_{i = 1}^{N} l_{i}

in (3), we have

E [L_{N} | Y = y] = \frac{1}{N} \sum_{i = 1}^{N} E [l_{i} | Y = y] = \frac{1}{N} \sum_{i = 1}^{N} m (x_{i}, y) .

Moreover, by Assumption 3,

l_{1}, l_{2}, \dots, l_{N}

are conditionally independent given

Y = y

. This yields

Var [L_{N} | Y = y] = \frac{1}{N^{2}} \sum_{i = 1}^{N} Var [l_{i} | Y = y] = \frac{1}{N^{2}} \sum_{i = 1}^{N} V (x_{i}, y) .

□

Appendix A.3. Proof of Proposition 2

Proof.

(i) From Assumption 3, the sequence

{l_{i}}_{i \geq 1}

is conditionally independent given

Y = y

. For

j = 1, 2

, conditional on

Y = y

, the sequence

\frac{l_{i}^{(j)} - E [l_{i}^{(j)} | Y = y]}{i}, i \geq 1

satisfies the conditions of Theorem 2.5.3 in Durrett (2019), such that they are conditionally independent with a conditional expectation of 0, and

\sum_{i = 1}^{\infty} Var [\frac{l_{i}^{(j)} - E [l_{i}^{(j)} | Y = y]}{i} | Y = y] = \sum_{i = 1}^{\infty} \frac{1}{i^{2}} Var [l_{i}^{(j)} | Y = y] \leq \sum_{i = 1}^{\infty} \frac{4}{i^{2}} < \infty,

where the first inequality follows from

l_{i}^{(j)} \in [0, 1]

and further

| l_{i}^{(j)} - E [l_{i}^{(j)} | Y = y] | \leq 2

. Applying this theorem, we conclude that

\sum_{i = 1}^{N} (l_{i}^{(j)} - E [l_{i}^{(j)} | Y = y]) / i

converges almost surely as

N \to \infty

conditional on

Y = y

. By Kronecker’s lemma (Theorem 2.5.5 in Durrett 2019), this implies

P (lim_{N \to \infty} (L_{N}^{(j)} - E [L_{N}^{(j)} | Y = y]) = lim_{N \to \infty} \frac{1}{N} \sum_{i = 1}^{N} (l_{i}^{(j)} - E [l_{i}^{(j)} | Y = y]) = 0 | Y = y) = 1 .

That is,

P ({lim}_{N \to \infty} (L_{N} - E [L_{N} | Y = y]) = 0 | Y = y) = 1

in vector notation. Then, we have

P (lim_{N \to \infty} (L_{N} - E [L_{N} | Y]) = 0) = \int P (lim_{N \to \infty} (L_{N} - E [L_{N} | Y = y]) = 0 | Y = y) d F_{Y} (y) = 1,

which yields Equation (18).

(ii) Since the functions

q^{(j)} (k, x, y)

and

b^{(j)} (k, x, y)

in (4) are continuous, the moment functions

m_{1} (x, y), m_{2} (x, y)

defined in (11) are consequently continuous and bounded in [0,1]. Moreover, Assumption 4 establishes the weak convergence

F_{X}^{(N)} (x) \overset{w}{\to} F_{X} (x)

as

N \to \infty

. By the weak convergence theorem (Theorem 3.2.3 in Durrett 2019), we have

\begin{matrix} lim_{N \to \infty} E [L_{N} | Y = y] = & lim_{N \to \infty} \frac{1}{N} \sum_{i = 1}^{N} m (x_{i}, y) = lim_{N \to \infty} \int m (x, y) d F_{X}^{(N)} (x) \\ = & \int m (x, y) d F_{X} (x) = μ (y), \end{matrix}

(A1)

with

μ (y)

defined in (17). Combining (18) and (A1), we obtain Equation (19). □

Appendix A.4. Proof of Corollary 1

Proof.

From Proposition 2 (ii),

L_{N}

converges almost surely to

μ (Y)

as

N \to \infty

. Consequently, for

j = 1, 2

, its component

L_{N}^{(j)} \to μ_{j} (Y)

a.s. This implies convergence in distribution, i.e.,

lim_{N \to \infty} P (L_{N}^{(j)} \leq t) = P (μ_{j} (Y) \leq t)

(A2)

for any continuity point t of

μ_{j} (Y)

.

Fix

j \in {1, 2}

,

α \in (0, 1)

, and

ε > 0

. Since the distribution function of

μ_{j} (Y)

is non-decreasing and right-continuous, we can select its continuity points

t_{1}, t_{2}

satisfying

{VaR}_{α} (μ_{j} (Y)) - ε < t_{1} < {VaR}_{α} (μ_{j} (Y)) < t_{2} < {VaR}_{α} (μ_{j} (Y)) + ε .

(A3)

By the definition of VaR, we have

P (μ_{j} (Y) \leq t_{1}) \leq α \leq P (μ_{j} (Y) \leq t_{2}) .

(A4)

Combining (A2), (A3), and (A4), it follows that

lim_{N \to \infty} P (L_{N}^{(j)} \leq {VaR}_{α} (μ_{j} (Y)) - ε) \leq lim_{N \to \infty} P (L_{N}^{(j)} \leq t_{1}) = P (μ_{j} (Y) \leq t_{1}) \leq α

and

lim_{N \to \infty} P (L_{N}^{(j)} \leq {VaR}_{α} (μ_{j} (Y)) + ε) \geq lim_{N \to \infty} P (L_{N}^{(j)} \leq t_{2}) = P (μ_{j} (Y) \leq t_{2}) \geq α .

This finishes the proof. □

Appendix A.5. Proof of Theorem 1

For the proof of this theorem, we first propose two technical lemmas.

Lemma A1.

For

1 \leq i \leq N

, let

A_{i}

be a

2 \times 2

symmetric matrix of the form

A_{i} = (\begin{matrix} a_{i} & b_{i} \\ b_{i} & c_{i} \end{matrix}), where 0 \leq a_{i}, c_{i} \leq 1, - 1 \leq b_{i} \leq 1 and \det (A_{i}) > ε

(A5)

for some

ε > 0

. Define the sum matrix

A = \sum_{i = 1}^{N} A_{i}

. Then, each entry of the inverse matrix

A^{- 1}

is in the interval

[- {(N ε)}^{- 1}, {(N ε)}^{- 1}]

.

Proof.

For each

A_{i}

, the determinant condition

det (A_{i}) = a_{i} c_{i} - b_{i}^{2} > ε

implies

c_{i} > (b_{i}^{2} + ε) / a_{i}

. Then, it follows that

\sum_{i = 1}^{N} a_{i} \sum_{i = 1}^{N} c_{i} > \sum_{i = 1}^{N} a_{i} \sum_{i = 1}^{N} \frac{b_{i}^{2} + ε}{a_{i}} = \sum_{i = 1}^{N} a_{i} \sum_{i = 1}^{N} \frac{b_{i}^{2}}{a_{i}} + ε \sum_{i = 1}^{N} a_{i} \sum_{i = 1}^{N} \frac{1}{a_{i}} \geq {(\sum_{i = 1}^{N} b_{i})}^{2} + N^{2} ε,

where the last inequality is due to Cauchy–Schwarz’s inequality (see Theorem 4.1 in Rudin (1987)). Thus,

det (A) = \sum_{i = 1}^{N} a_{i} \sum_{i = 1}^{N} c_{i} - {(\sum_{i = 1}^{N} b_{i})}^{2} > N^{2} ε

and

0 < {det}^{- 1} (A) < {(N^{2} ε)}^{- 1}

. Note that

A = (\begin{matrix} \sum_{i = 1}^{N} a_{i} & \sum_{i = 1}^{N} b_{i} \\ \sum_{i = 1}^{N} b_{i} & \sum_{i = 1}^{N} c_{i} \end{matrix}) and A^{- 1} = \frac{1}{det (A)} (\begin{matrix} \sum_{i = 1}^{N} c_{i} & - \sum_{i = 1}^{N} b_{i} \\ - \sum_{i = 1}^{N} b_{i} & \sum_{i = 1}^{N} a_{i} \end{matrix}) .

As all entries of matrix A, i.e.,

\sum_{i = 1}^{N} a_{i}, \sum_{i = 1}^{N} b_{i}

, and

\sum_{i = 1}^{N} c_{i}

, are bounded in

[- N, N]

, each entry of matrix

A^{- 1}

is bounded in

[- {(N ε)}^{- 1}, {(N ε)}^{- 1}]

, which finishes the proof. □

Lemma A2.

Suppose that Assumptions 1–3 and 5 hold. Given vector

y

, define random variables

Z_{N, i} : = Var {[\sum_{i = 1}^{N} l_{i} | Y = y]}^{- \frac{1}{2}} (l_{i} - E [l_{i} | Y = y]), 1 \leq i \leq N .

(A6)

Then,

{∥Z_{N, i}∥}^{2} \leq 4 {(N ε)}^{- 1}

, with ε in Assumption 5.

Proof.

Note that each entry of

l_{i}

is in

[0, 1]

by definition (2). Further, from Assumption 5, the matrices

Var [l_{i} | Y = y]

,

1 \leq i \leq N

satisfy condition (A5). By Lemma A1, each entry of

Var {[\sum_{i = 1}^{N} l_{i} | Y = y]}^{- 1} = {[\sum_{i = 1}^{N} Var [l_{i} | Y = y]]}^{- 1}

is bounded in

[- {(N ε)}^{- 1}, {(N ε)}^{- 1}]

, where the equality is due to the conditional independence in Assumption 3. Moreover, since each entry of

l_{i} - E [l_{i} | Y = y]

is bounded in

[- 1, 1]

, we have

{∥Z_{N, i}∥}^{2} = Z_{N, i}^{'} Z_{N, i} = {(l_{i} - E [l_{i} | Y = y])}^{'} Var {[\sum_{i = 1}^{N} l_{i} | Y = y]}^{- 1} (l_{i} - E [l_{i} | Y = y]) \leq \frac{4}{N ε}

for

1 \leq i \leq N

. This finishes the proof. □

Now, we prove Theorem 1.

(i) Given

Y = y

, the variables

Z_{N, i}, 1 \leq i \leq N

defined in (A6) are conditionally independent by Assumption 3 and satisfy

E [Z_{N, i} | Y = y] = 0

,

1 \leq i \leq N

. Moreover,

\begin{matrix} \sum_{i = 1}^{N} Z_{N, i} = & Var {[\sum_{i = 1}^{N} l_{i} | Y = y]}^{- \frac{1}{2}} \sum_{i = 1}^{N} (l_{i} - E [l_{i} | Y = y]) \\ = & Var {[L_{N} | Y = y]}^{- \frac{1}{2}} (L_{N} - E [L_{N} | Y = y]) \end{matrix}

(A7)

\begin{matrix} = & Σ_{N} {(y)}^{- \frac{1}{2}} (L_{N} - μ_{N} (y)), \end{matrix}

(A8)

where the second equality is due to (3) and the last equality follows by combining (15) and (21).

Note that

\begin{matrix} \sum_{i = 1}^{N} E [Z_{N, i} Z_{N, i}^{'} | Y = y] = & \sum_{i = 1}^{N} Var [Z_{N, i} | Y = y] = Var [\sum_{i = 1}^{N} Z_{N, i} | Y = y] \\ = & Σ_{N} {(y)}^{- \frac{1}{2}} Var [L_{N} | Y = y] Σ_{N} {(y)}^{- \frac{1}{2}} = I_{2}, \end{matrix}

(A9)

where the first equality is due to

E [Z_{N, i} | Y = y] = 0

and the third equality is due to (A8). And for any

δ > 0

,

\begin{matrix} lim_{N \to \infty} \sum_{i = 1}^{N} E [{∥Z_{N, i}∥}^{2} 1_{{∥ Z_{N, i} ∥ > δ}} | Y = y] \end{matrix}

(A10)

\begin{matrix} \leq & lim_{N \to \infty} \sum_{i = 1}^{N} E [{∥Z_{N, i}∥}^{2} 1_{{4 {(N ε)}^{- 1} > δ^{2}}} | Y = y] = 0, \end{matrix}

(A11)

where the inequality follows from Lemma A2. Equations (A9) and (A11) satisfy the Lindeberg CLT conditions, see Theorem 11.1.6 in Athreya and Lahiri (2010). By this theorem, we achieve

Σ_{N} {(y)}^{- \frac{1}{2}} \cdot (L_{N} - μ_{N} (y)) = \sum_{i = 1}^{N} Z_{N, i} \overset{d}{⟶} N (0, I_{2}),

which finishes the proof. □

(ii) As discussed in the proof of Proposition 2 (ii), the moment functions

m_{1} (x, y), m_{2} (x, y)

in (11) are continuous and bounded in [0,1], and the weak convergence (16) holds. By the weak convergence theorem (Theorem 3.2.3 in Durrett 2019), we have

\begin{matrix} lim_{N \to \infty} N Σ_{N} (y) & = lim_{N \to \infty} \frac{1}{N} \sum_{i = 1}^{N} V (x_{i}, y) = lim_{N \to \infty} \int V (x, y) d F_{X}^{(N)} (x) \\ = & \int V (x, y) d F_{X} (x) = Σ (y), \end{matrix}

(A12)

with

Σ (y)

and

Σ_{N} (y)

defined in (17) and (21). Combining (20) and (A12), we obtain Equation (22).

Appendix A.6. Proof of Theorem 2

For the proof of this theorem, a technical lemma is proposed.

Lemma A3.

Suppose that Assumptions 1–3 and 5 hold. Given

Y = y

, the random variables

Z_{N, i}, 1 \leq i \leq N

defined in (A6) satisfy

\sum_{i = 1}^{N} E [{∥Z_{N, i}∥}^{3} | Y = y] \leq 4 {(N ε)}^{- \frac{1}{2}},

(A13)

with ε in Assumption 5.

Proof.

Given

Y = y

, the variables

Z_{N, i}, 1 \leq i \leq N

in (A6) are conditionally independent and satisfy

E [Z_{N, i} | Y = y] = 0

,

1 \leq i \leq N

. Denote by tr

(A)

the trace of matrix A. It follows from (A9) in the proof of Theorem 1 that

\begin{matrix} \sum_{1}^{N} E [{∥Z_{N, i}∥}^{2} | Y = y] = & \sum_{1}^{N} E [tr (Z_{N, i} Z_{N, i}^{'}) | Y = y] = \sum_{1}^{N} tr (E [Z_{N, i} Z_{N, i}^{'} | Y = y]) \\ = & tr (I_{2}) = 2 . \end{matrix}

Then

\begin{matrix} \sum_{1}^{N} E [{∥Z_{N, i}∥}^{3} | Y = y] \leq & sup_{1 \leq i \leq N} {E [∥Z_{N, i}∥ | Y = y]} \sum_{1}^{N} E [{∥Z_{N, i}∥}^{2} | Y = y] \\ \leq & \sqrt{\frac{4}{N ε}} \cdot 2 = \frac{4}{\sqrt{N ε}}, \end{matrix}

where the second inequality follows from Lemma A2. This finishes the proof. □

Now, we prove Theorem 2. Given

Y = y

, the variables

Z_{N, i}, 1 \leq i \leq N

defined in (A6) are conditionally independent by Assumption 3 and satisfy

E [Z_{N, i} | Y = y] = 0

,

1 \leq i \leq N

. Further, from Equation (A13) in Lemma A3, the variables

Z_{N, i}, 1 \leq i \leq N

satisfy the conditions of the multivariate Berry–Esseen theorem; see Theorem 1.1 in Raič (2019). This theorem implies that

\begin{matrix} sup_{A \in A} | P (Σ_{N} {(y)}^{- \frac{1}{2}} (L_{N} - μ_{N} (y)) \in A | Y = y) - Φ_{(0, I_{2})} (A) | \\ = & sup_{A \in A} | P (\sum_{i = 1}^{N} Z_{N, i} \in A | Y = y) - Φ_{(0, I_{2})} (A) | \\ \leq & K \cdot \sum_{i = 1}^{N} E [{∥Z_{N, i}∥}^{3} | Y = y] \leq \frac{4 K}{\sqrt{N ε}} \leq \frac{264}{\sqrt{N ε}}, \end{matrix}

where the first equality is due to (A8),

K = 16 + 42 \sqrt[4]{2}

, and

A

is a suitable class of subsets of

R^{2}

.

Define function

h (x) : = Σ_{N} {(y)}^{- \frac{1}{2}} (x - μ_{N} (y))

and set

A_{x} : = {X : X \leq x}

for vector

x

. Let random vector

ξ \sim N (μ_{N} (y), Σ_{N} (y))

. Then, it follows that

\begin{matrix} | P (L_{N} \leq x | Y = y) - Φ_{(μ_{N} (y), Σ_{N} (y))} (x) | \\ = & | P (L_{N} \in A_{x} | Y = y) - P (ξ \in A_{x}) | \\ = & | P (h (L_{N}) \in h (A_{x})) - P (h (ξ) \in h (A_{x})) | \\ = & | P (h (L_{N}) \in h (A_{x})) - Φ_{(0, I_{2})} (h (A_{x})) | \leq \frac{264}{\sqrt{N ε}} . \end{matrix}

Thus,

\begin{matrix} | P (L_{N} \leq x) - \int Φ_{(μ_{N} (y), Σ_{N} (y))} (x) d F_{Y} (y) | \\ = |\int P (L_{N} \leq x | Y = y) d F_{Y} (y) - \int Φ_{(μ_{N} (y), Σ_{N} (y))} (x) d F_{Y} (y)| \\ \leq \int |P (L_{N} \leq x | Y = y) - Φ_{(μ_{N} (y), Σ_{N} (y))} (x)| d F_{Y} (y) \leq \frac{264}{\sqrt{N ε}} . \end{matrix}

This finishes the proof. □

Appendix A.7. Proof of Lemma 2

Proof.

Based on (26) and (27), the functions

m_{j} (x, y)

for

j = 1, 2

in (11) have a specific form

\begin{matrix} m_{j} (x, y) = & \sum_{k = 1}^{K} {(1 - {\bar{q}}^{(1)} (x, y) - {\bar{q}}^{(2)} (x, y))}^{k - 1} {\bar{q}}^{(j)} (x, y) \cdot {\bar{b}}_{j} \frac{K - k + 1}{K} e^{- k r} \\ = & \frac{1}{K e^{r}} {\bar{q}}^{(j)} (x, y) {\bar{b}}_{j} \cdot \sum_{k = 1}^{K} (K - k + 1) {(\frac{1 - {\bar{q}}^{(1)} (x, y) - {\bar{q}}^{(2)} (x, y)}{e^{r}})}^{k - 1} \\ = & \frac{1}{K e^{r}} {\bar{q}}^{(j)} (x, y) {\bar{b}}_{j} \cdot \frac{K - (K + 1) a (x, y) + a {(x, y)}^{K + 1}}{{(1 - a (x, y))}^{2}}, \end{matrix}

where

a (x, y) = (1 - {\bar{q}}^{(1)} (x, y) - {\bar{q}}^{(2)} (x, y)) / e^{r}

. The third equation holds because

\begin{matrix} \sum_{k = 1}^{K} (K - k + 1) a^{k - 1} = & \frac{1}{1 - a} [\sum_{k = 1}^{K} (K - k + 1) a^{k - 1} - \sum_{k = 1}^{K} (K - k + 1) a^{k}] \\ = & \frac{1}{1 - a} [\sum_{k = 0}^{K - 1} (K - k) a^{k} - \sum_{k = 1}^{K} (K - k + 1) a^{k}] \\ = & \frac{1}{1 - a} [K - a - \dots - a^{K}] = \frac{K}{1 - a} - \frac{a (1 - a^{K})}{{(1 - a)}^{2}} \\ = & \frac{K - (K + 1) a + a^{K + 1}}{{(1 - a)}^{2}} . \end{matrix}

□

Appendix B. Estimation Methodology

Appendix B.1. Maximum Partial Likelihood Estimator of CPH Model

In the following, we detail the estimation procedure for the joint default–prepayment hazard model, whose hazard function is given in the main text by Equation (30).

For each loan i, we define the risk set

R_{i} = {ℓ : τ_{ℓ} \geq τ_{i}}

representing all loans that remain active in the repayment pool prior to the exit time

τ_{i} .

To formalize event observation, we specify for each discrete-time period

k \in {1, \dots, K}

and event type

j \in {1, 2}

(where 1 = default and 2 = prepayment), the event-specific cohort:

E_{k}^{(j)} = {i : τ_{i} = k, D_{i} = j} .

Let

β^{(j)} = (β_{r}^{(j)}, β_{H P C}^{(j)}, β_{C S}^{(j)}, β_{L T V}^{(j)}, β_{D T I}^{(j)}, β_{F T B}^{(j)}, β_{N B}^{(j)})

denote the parameter vector for event type j. The event-specific partial likelihood incorporates the explicit hazard structure:

L_{j} (β^{(j)}) = \prod_{k = 1}^{K} \prod_{i \in E_{k}^{(j)}} \frac{q^{(j)} (k, x_{i}, y)}{\sum_{ℓ \in R_{i}} q^{(j)} (k, x_{ℓ}, y)} .

This likelihood factorizes into the product of conditional probabilities where each term compares an observed event to all at-risk loans at that event time.

Therefore, for event type j, we estimate

β^{(j)}

by maximizing the log-likelihood

l_{j} (β^{(j)}) = \sum_{k = 1}^{K} \sum_{i \in E_{k}^{(j)}} [η_{i}^{(j)} (k) - log (\sum_{ℓ \in R_{i}} e^{η_{ℓ}^{(j)} (k)})],

where the linear predictor

η_{i}^{(j)} (k)

contains all specified terms:

\begin{matrix} η_{i}^{(j)} (k) = & β_{r}^{(j)} (r_{i} - δ (k)) + β_{H P C}^{(j)} H P C (P S_{i}, k) + β_{C S}^{(j)} C S_{i} \\ + β_{L T V}^{(j)} L T V_{i} + β_{D T I}^{(j)} D T I_{i} + β_{F T B}^{(j)} F T B_{i} + β_{N B}^{(j)} N B_{i} . \end{matrix}

(A14)

The implementation uses Python’s scipy.optimize.minimize with trust-region constraints.

Appendix B.2. Baseline Hazard Iterative Estimator of CPH Model

After obtaining the MPLE

{\hat{β}}^{(j)}

in Equation (30), the fitted linear predictors

{\hat{η}}_{i}^{(j)} (k)

can be obtained by Equation (A14). Initialize survival weights by

s_{i} (0) = 1

for every loan i. For each month

k = 1, \dots, K

the two baseline hazards

q_{0}^{(j)} (k)

are obtained by matching the expected and observed event counts:

\{\begin{matrix} {\hat{q}}_{0}^{(1)} (k) = \frac{# E_{k}^{(1)}}{\sum_{ℓ = 1}^{N} s_{ℓ} (k - 1) e^{{\hat{η}}_{ℓ}^{(1)} (k)}}, \\ {\hat{q}}_{0}^{(2)} (k) = \frac{# E_{k}^{(2)}}{\sum_{ℓ = 1}^{N} s_{ℓ} (k - 1) e^{{\hat{η}}_{ℓ}^{(2)} (k)}}, \\ s_{i} (k) = s_{i} (k - 1) [1 - {\hat{q}}_{0}^{(1)} (k) e^{{\hat{η}}_{i}^{(1)} (k)} - {\hat{q}}_{0}^{(2)} (k) e^{{\hat{η}}_{i}^{(2)} (k)}], i = 1, \dots, N . \end{matrix}

(A15)

Here,

# E_{k}^{(j)}

denotes the number of loans that exit in month k by event type

j \in {1, 2}

. According to (A15), the baseline hazards

{\hat{q}}_{0}^{(j)} (k)

and the survival weights

s_{i} (k)

can be directly obtained through an iterative procedure. With the initial condition

s_{i} (0) = 1

, they are updated month by month, providing a complete estimation of the survival process over the study period.

Appendix B.3. Structure of Realized Systematic Factors

In our empirical analysis, the realized systematic factors

Y = y

are represented as a multivariate time series that captures the monthly evolution of macroeconomic conditions over the 180-month out-of-sample period from January 2000 to January 2015. Specifically, for each month

k \in {1, \dots, K}

(

K = 180

), we observe the U.S. Treasury long-term yield

δ (k)

and the state-level house price changes

H P C (S_{m}, k)

for all

d (d = 50)

U.S. states, as defined in Section 4.1. As a result, the realized

y

is expressed as a

(d + 1) K

dimension vector obtained by concatenating these monthly systematic factors across all

k = 1, \dots, K

. Conditional on this realized

y

, the loan losses across individual loans are independent.

Notes

1	Available online: https://www.bis.org/bcbs/basel3.htm?m=76 (accessed on 1 July 2023).
2	The periodical payment $R_{i}$ satisfies $1 = R_{i} \sum_{k = 1}^{K} e^{- k r_{i}}$ , which yields $R_{i} = e^{r_{i}} (1 - e^{- r_{i}}) / (1 - e^{- K r_{i}})$ . Consequently, the outstanding balance $M_{i} (k)$ is given by $M_{i} (k) = R_{i} \sum_{h = 0}^{K - k} e^{- h r_{i}} = R_{i} (1 - e^{- (K - k + 1) r_{i}}) / (1 - e^{- r_{i}}) = e^{r_{i}} (1 - e^{- (K - k + 1) r_{i}}) / (1 - e^{- K r_{i}})$ .
3	For a positive-definite matrix A with eigen-decomposition $A = Q Λ Q^{'}$ , define $A^{- 1 / 2} : = Q Λ^{- 1 / 2} Q^{'}$ , where Q is an orthogonal matrix and $Λ$ is a diagonal matrix; see Chapter 2 in Van der Vaart (2000).
4	Available online: https://www.assetmanagement.hsbc.com.hk/en/intermediary/news-and-insights/residential-mortgage-backed-securities (accessed on 23 March 2025).
5	The dataset is released by the Federal Home Loan Mortgage Corporation (FHLMC). Available online: https://www.freddiemac.com/research/datasets/sf-loanlevel-dataset (accessed on 1 May 2022).
6	Available online: https://home.treasury.gov/ (accessed on 1 March 2024).
7	Available online: https://fred.stlouisfed.org/ (accessed on 9 September 2024).

References

Athreya, Krishna B., and Soumendra N. Lahiri. 2010. Measure Theory and Probability Theory. New York: Springer. [Google Scholar]
Banasik, John, Jonathan N. Crook, and Lyn C. Thomas. 1999. Not if but when will borrowers default. Journal of the Operational Research Society 50: 1185–90. [Google Scholar] [CrossRef]
Bhattacharya, Arnab, Simon P. Wilson, and Refik Soyer. 2019. A bayesian approach to modeling mortgage default and prepayment. European Journal of Operational Research 274: 1112–24. [Google Scholar] [CrossRef]
Brennan, Michael J., and Eduardo S. Schwartz. 1985. Determinants of GNMA mortgage prices. Real Estate Economics 13: 209–28. [Google Scholar] [CrossRef]
Buser, Stephen A., and Patric H. Hendershott. 1984. Pricing default-free fixed-rate mortgages. Housing Finance Review 3: 405–29. [Google Scholar]
Calhoun, Charles A., and Yongheng Deng. 2002. A dynamic analysis of fixed-and adjustable-rate mortgage terminations. The Journal of Real Estate Finance and Economics 24: 9–33. [Google Scholar] [CrossRef]
Campbell, John Y., and Joao F. Cocco. 2015. A model of mortgage default. The Journal of Finance 70: 1495–554. [Google Scholar] [CrossRef]
Cox, John C., Jonathan E. Ingersoll, and Stephen A. Ross. 1985. A theory of the term structure of interest rates. Econometrica 53: 385–407. [Google Scholar] [CrossRef]
Credit Suisse Financial Products. 1997. CreditRisk+: A Credit Risk Management Framework. Credit Suisse First Boston. Available online: https://globalriskguard.com/resources/credit/creditrisk.pdf (accessed on 1 October 2023).
Cunningham, Donald F., and Patric H. Hendershott. 1984. Pricing FHA mortgage default insurance. Housing Finance Review 3: 373–92. [Google Scholar]
Deng, Yongheng, John M. Quigley, and Robert Van Order. 2000. Mortgage terminations, heterogeneity and the exercise of mortgage options. Econometrica 68: 275–307. [Google Scholar] [CrossRef]
Deng, Yongheng, John M. Quigley, Robert Van Order, and Freddie Mac. 1996. Mortgage default and low downpayment loans: The costs of public subsidy. Regional Science and Urban Economics 26: 263–85. [Google Scholar] [CrossRef]
Deshmukh, Shailaja. 2012. Multiple Decrement Models in Insurance: An Introduction Using R. New Delhi: Springer. [Google Scholar]
Dunn, Kenneth B., and John J. McConnell. 1981. Valuation of GNMA mortgage-backed securities. The Journal of Finance 36: 599–616. [Google Scholar] [CrossRef]
Durrett, Rick. 2019. Probability: Theory and Examples, 5th ed. New York: Cambridge University Press. [Google Scholar]
Epperson, James F., James B. Kau, Donald C. Keenan, and Walter J. Muller, III. 1985. Pricing default risk in mortgages. Real Estate Economics 13: 261–72. [Google Scholar] [CrossRef]
Finger, Christopher C. 1999. Conditional approaches for CreditMetrics portfolio distributions. Credit Metrics Monitor 2: 14–33. [Google Scholar]
Flores, Jesús Alan Elizondo, Valeria Álvarez Navarro, and Israel Sergio Valladares Cedillo. 2010. An actuarial approach to pricing Mortgage Insurance considering simultaneously mortgage default and prepayment. Paper Presented at International Congress of Actuaries, Cape Town, South Africa, March 7–12; Available online: https://actuaries.org/resources-post/an-actuarial-approach-to-pricing-mortgage-insurance-considering-simultaneously-mortgage-default-andprepayment/ (accessed on 8 May 2025).
Foster, Chester, and Robert Van Order. 1984. An option-based model of mortgage default. Housing Finance Review 3: 351–68. [Google Scholar]
Frey, Rüdiger, and Alexander J. McNeil. 2003. Dependent defaults in models of portfolio credit risk. Journal of Risk 6: 59–92. [Google Scholar] [CrossRef]
Giesecke, Kay, Konstantinos Spiliopoulos, and Richard B. Sowers. 2013. Default clustering in large portfolios: Typical events. The Annals of Applied Probability 23: 348–85. [Google Scholar] [CrossRef]
Giesecke, Kay, Konstantinos Spiliopoulos, Richard B. Sowers, and Justin A. Sirignano. 2015. Large portfolio asymptotics for loss from default. Mathematical Finance 25: 77–114. [Google Scholar] [CrossRef]
Goodman, Laurie, and Jun Zhu. 2015. Loss Severity on Residential Mortgages: Evidence from Freddie Mac’s Newest Data. The Journal of Fixed Income 25: 48–57. [Google Scholar] [CrossRef][Green Version]
Goodman, Laurie S., Brian Landy, Roger Ashworth, and Lidan Yang. 2014. A look at freddie mac’s loan-level credit performance data. Journal of Structured Finance 19: 52–61. [Google Scholar] [CrossRef]
Gordy, Michael B. 2003. A risk-factor model foundation for ratings-based bank capital rules. Journal of Financial Intermediation 12: 199–232. [Google Scholar] [CrossRef]
Jarrow, Robert A., David Lando, and Stuart M. Turnbull. 1997. A markov model for the term structure of credit risk spreads. The Review of Financial Studies 10: 481–523. [Google Scholar] [CrossRef]
Jones, Chris, and Xinfu Chen. 2016. Optimal mortgage prepayment under the Cox–Ingersoll–Ross model. SIAM Journal on Financial Mathematics 7: 552–66. [Google Scholar] [CrossRef]
Li, Zhiyong, Aimin Li, Anthony Bellotti, and Xiao Yao. 2023. The profitability of online loans: A competing risks analysis on default and prepayment. European Journal of Operational Research 306: 968–85. [Google Scholar] [CrossRef]
Merton, Robert C. 1974. On the pricing of corporate debt: The risk structure of interest rates. The Journal of Finance 29: 449–70. [Google Scholar]
Moody’s Investors Service. 2024. Moody’s Approach to Rating US RMBS Using the MILAN Framework. Rating methodology Residential MBS, Moody’s Investors Service. Available online: https://www.moodys.com/research/Rating-Methodology-Moodys-Approach-to-Rating-US-RMBS-Using-the-MILAN-Rating-Methodology–PBS_1411650 (accessed on 3 August 2025).
Munk, Claus. 2011. Fixed Income Modelling. New York: Oxford University Press. [Google Scholar]
Quercia, Roberto, and Jonathan Spader. 2008. Does homeownership counseling affect the prepayment and default behavior of affordable mortgage borrowers? Journal of Policy Analysis and Management 27: 304–25. [Google Scholar] [CrossRef]
Quercia, Roberto G., and Michael A. Stegman. 1992. Residential mortgage default: A review of the literature. Journal of Housing Research 3: 341–79. [Google Scholar]
Quigley, John M., and Robert Van Order. 1990. Efficiency in the mortgage market: The borrower’s perspective. Real Estate Economics 18: 237–52. [Google Scholar] [CrossRef]
Quigley, John M., and Robert Van Order. 1995. Explicit tests of contingent claims models of mortgage default. The Journal of Real Estate Finance and Economics 11: 99–117. [Google Scholar] [CrossRef]
Raič, Martin. 2019. A multivariate berry-esseen theorem with explicit constants. Bernoulli 25: 2824–53. [Google Scholar] [CrossRef]
Richard, Scott F., and Richard Roll. 1989. Prepayments of fixed-rate mortgage-backed securities. Journal of Portfolio Management 15: 73–82. [Google Scholar] [CrossRef]
Rotar, Vladimir I. 2014. Actuarial Models: The Mathematics of Insurance, 2nd ed. Boca Raton: CRC Press. [Google Scholar]
Rudin, Walter. 1987. Real and Complex Analysis, 3rd ed. New York: McGraw-Hill. [Google Scholar]
Schwartz, Eduardo S., and Walter N. Torous. 1989. Prepayment and the valuation of mortgage-backed securities. The Journal of Finance 44: 375–92. [Google Scholar] [CrossRef]
Sirignano, Justin, and Kay Giesecke. 2019. Risk analysis for large pools of loans. Management Science 65: 107–21. [Google Scholar] [CrossRef]
Sirignano, Justin A., Gerry Tsoukalas, and Kay Giesecke. 2016. Large-scale loan portfolio selection. Operations Research 64: 1239–55. [Google Scholar] [CrossRef]
Stein, Roger M., Ashish Das, Yufeng Ding, and Shirish Chinchalkar. 2011. Mortgage Portfolio Analyzer: A Quasi-Structural Model of Mortgage Portfolio Losses. Working Paper. New York: Moody’s Research Labs. [Google Scholar]
Steinbuks, Jevgenijs. 2015. Effects of prepayment regulations on termination of subprime mortgages. Journal of Banking & Finance 59: 445–56. [Google Scholar] [CrossRef]
Stepanova, Maria, and Lyn Thomas. 2002. Survival analysis methods for personal loan data. Operations Research 50: 277–89. [Google Scholar] [CrossRef]
Thackham, Mark, and Jun Ma. 2022. On maximum likelihood estimation of competing risks using the cause-specific semi-parametric cox model with time-varying covariates–an application to credit risk. Journal of the Operational Research Society 73: 5–14. [Google Scholar] [CrossRef]
Van der Vaart, Aad W. 2000. Asymptotic Statistics. New York: Cambridge University Press. [Google Scholar]
Vasicek, Oldrich. 1991. Limiting Loan Loss Probability Distribution. San Francisco: KMV Corporation. [Google Scholar]
Zhang, Nailong, Qingyu Yang, Aidan Kelleher, and Wujun Si. 2019. A new mixture cure model under competing risks to score online consumer loans. Quantitative Finance 19: 1243–53. [Google Scholar] [CrossRef]

Figure 1. The repayment behavior of the ith loan.

Figure 2. The joint distribution, copula, and marginal distributions of

L_{N}

for the asset value model. Note: The upper panel and lower panel exhibit the joint distribution, copula, and marginal distributions of

L_{N}

for two cases of the asset value model in (24) and (25), respectively. In the upper panel, we set

n = 2

,

{\bar{b}}_{1} = 0.8

,

{\bar{b}}_{2} = 0.2

,

c_{1} = Φ^{- 1} (0.08) = - 1.405

,

c_{2} = Φ^{- 1} (0.8) = 0.842

,

F_{X} (x) = x_{1} x_{2} / 0.48, 0 \leq x \leq (0.8, 0.6)

, and

Y \sim N (0, I_{2})

. In the lower panel, we set

n = 2

,

{\bar{b}}_{1} = 0.6

,

{\bar{b}}_{2} = 0.3

,

c_{1} = Φ^{- 1} (0.05) = - 1.645

,

c_{2} = Φ^{- 1} (0.7) = 0.524

,

F_{X} (x) = I (x_{1}; 2, 3) I (x_{2}; 4, 4)

, and

Y \sim N (0, I_{2})

.

Figure 2. The joint distribution, copula, and marginal distributions of

L_{N}

for the asset value model. Note: The upper panel and lower panel exhibit the joint distribution, copula, and marginal distributions of

L_{N}

for two cases of the asset value model in (24) and (25), respectively. In the upper panel, we set

n = 2

,

{\bar{b}}_{1} = 0.8

,

{\bar{b}}_{2} = 0.2

,

c_{1} = Φ^{- 1} (0.08) = - 1.405

,

c_{2} = Φ^{- 1} (0.8) = 0.842

,

F_{X} (x) = x_{1} x_{2} / 0.48, 0 \leq x \leq (0.8, 0.6)

, and

Y \sim N (0, I_{2})

. In the lower panel, we set

n = 2

,

{\bar{b}}_{1} = 0.6

,

{\bar{b}}_{2} = 0.3

,

c_{1} = Φ^{- 1} (0.05) = - 1.645

,

c_{2} = Φ^{- 1} (0.7) = 0.524

,

F_{X} (x) = I (x_{1}; 2, 3) I (x_{2}; 4, 4)

, and

Y \sim N (0, I_{2})

.

Figure 3. The joint distribution, copula, and marginal distributions of

L_{N}

for the Markov model. Note: The upper panel and lower panel exhibit the joint distribution, copula, and marginal distributions of

L_{N}

for two cases of the Markov model in (26) and (27), respectively. In both cases, we set

K = 60

,

r = 0.003

,

{\bar{b}}_{1} = 0.8

,

{\bar{b}}_{2} = 0.2

, and

(β_{0}^{(1)}, β_{X}^{(1)}, β_{Y}^{(1)}) = (- 7, - 2, 0.2, 1, 0.6)

. The upper panel corresponds to

(β_{0}^{(2)}, β_{X}^{(2)}, β_{Y}^{(2)}) = (- 8, 0.2, 1, 0.8, 1)

,

F_{X} (x) = x_{1} x_{2} / 0.48, 0 \leq x \leq (0.8, 0.6)

, and

Y \sim N ((1, 1), ((1, - 0.2), (- 0.2, 1)))

. The lower panel corresponds to

(β_{0}^{(2)}, β_{X}^{(2)}, β_{Y}^{(2)}) = (- 8, 0.2, 1, 1, 0.8)

,

F_{X} (x) = I (x_{1}; 2, 3) I (x_{2}; 4, 4)

, and

Y \sim N ((1, 1), ((1, 0.2), (0.2, 1)))

.

Figure 3. The joint distribution, copula, and marginal distributions of

L_{N}

for the Markov model. Note: The upper panel and lower panel exhibit the joint distribution, copula, and marginal distributions of

L_{N}

for two cases of the Markov model in (26) and (27), respectively. In both cases, we set

K = 60

,

r = 0.003

,

{\bar{b}}_{1} = 0.8

,

{\bar{b}}_{2} = 0.2

, and

(β_{0}^{(1)}, β_{X}^{(1)}, β_{Y}^{(1)}) = (- 7, - 2, 0.2, 1, 0.6)

. The upper panel corresponds to

(β_{0}^{(2)}, β_{X}^{(2)}, β_{Y}^{(2)}) = (- 8, 0.2, 1, 0.8, 1)

,

F_{X} (x) = x_{1} x_{2} / 0.48, 0 \leq x \leq (0.8, 0.6)

, and

Y \sim N ((1, 1), ((1, - 0.2), (- 0.2, 1)))

. The lower panel corresponds to

(β_{0}^{(2)}, β_{X}^{(2)}, β_{Y}^{(2)}) = (- 8, 0.2, 1, 1, 0.8)

,

F_{X} (x) = I (x_{1}; 2, 3) I (x_{2}; 4, 4)

, and

Y \sim N ((1, 1), ((1, 0.2), (0.2, 1)))

.

Figure 4. Distribution of mortgages by state.

Figure 5. The baseline hazard functions.

Figure 6. The estimation results of default (left) and prepayment (right) in the test set. Note: the red dotted lines in the lower panels represent zero difference between the expected and actual number of defaults, serving as a baseline for comparing the percentage difference curves.

Figure 7. Empirical limit distributions of default loss (left) and prepayment loss (right).

Figure 8. A simulation of the interest rate curve.

Figure 9. The marginal distributions of the simulated portfolio loss

L_{N}

of the RMBS.

Figure 9. The marginal distributions of the simulated portfolio loss

L_{N}

of the RMBS.

Figure 10. The QQ plots for the simulated portfolio loss of the RMBS. Note: The blue line represents the empirical quantiles of the simulated data, while the red straight line corresponds to the theoretical quantiles of a normal distribution.

Table 1. Some models for

q^{(j)} (k, x, y)

in the literature.

Table 1. Some models for

q^{(j)} (k, x, y)

in the literature.

Model	Expression of $q^{(j)} (k, x_{i}, y)$
Linear regression model (Quercia and Stegman 1992)	$β_{0}^{(j)} (k) + β_{X}^{(j)} {(k)}^{'} x_{i} + β_{Y}^{(j)} {(k)}^{'} y$
Multinomial logistic model (Calhoun and Deng 2002)	$\frac{exp (β_{0}^{(j)} (k) + β_{X}^{(j)} {(k)}^{'} x_{i} + β_{Y}^{(j)} {(k)}^{'} y)}{1 + \sum_{j = 1}^{2} exp (β_{0}^{(j)} (k) + β_{X}^{(j)} {(k)}^{'} x_{i} + β_{Y}^{(j)} {(k)}^{'} y)}$
Proportional hazard model Deng et al. 2000; Li et al. 2023)	$q_{0}^{(j)} (k) exp {(β}_{X}^{(j)}^{'} x_{i} + {β_{Y}^{(j)}}^{'} y)$

Table 2. The coefficient vectors of the CPH model.

Coefficient	Default	Prepayment
interest spread	6.3437	1.1938
HPC	−0.9418	1.1428
CS	−3.3724	0.7926
LTV	3.9230	0.0332
DTI	0.0934	0.4961
FTB	−0.5673	0.0181
NB	−0.6337	0.2550

Table 3. The LGD distibutions for different categories of credit scores.

Category	Description	Mean	Distribution of LGD
High Credit Score	Highest 40% CS	0.3089	Beta (0.4799, 1.0739)
Moderate Credit Score	Moderate 40% CS	0.3161	Beta (0.4833, 1.0458)
Low Credit Score	Lowest 20% CS	0.3284	Beta (0.5359, 1.0962)

Table 4. The estimators of the CIR model.

$\hat{κ}$	$\hat{θ}$	$\hat{σ}$
0.3210	3.4545	0.1973

Table 5. Summary statistics of AR(2) parameters for log house price time series across U.S. states.

Parameter	Mean	Std Deviation	Minimum	Maximum	U.S. National
$ϕ_{0}$	−0.012569	0.039243	−0.119973	0.053533	−0.008022
$ϕ_{1}$	1.596188	0.141466	1.255425	1.818609	1.708026
$ϕ_{2}$	−0.593126	0.145872	−0.823798	−0.257050	−0.706100
$σ_{H}$	0.012770	0.002779	0.008456	0.021200	0.011299

Table 6. The VaR of the simulated

L_{N}^{(1)}

and

L_{N}^{(2)}

of the RMBS.

Table 6. The VaR of the simulated

L_{N}^{(1)}

and

L_{N}^{(2)}

of the RMBS.

	$L_{N}^{(1)}$ (%)			$L_{N}^{(2)}$ (%)
	VaR_0.95	VaR_0.975	VaR_0.99	VaR_0.95	VaR_0.975	VaR_0.99
Simulated	0.5245	0.5473	0.5766	11.85	11.90	11.95
Fit-log-normal	0.5246	0.5480	0.5765	11.86	11.92	11.98

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.