A Urn-Based Nonparametric Modeling of the Dependence Between PD and LGD with an Application to Mortgages

We propose an alternative approach to the modeling of the positive dependence between the probability of default and the loss given default in a portfolio of exposures, using a bivariate urn process. The model combines the power of Bayesian nonparametrics and statistical learning, allowing for the elicitation and the exploitation of experts’ judgements, and for the constant update of this information over time, every time new data is available.<br><br>A real-world application on mortgages is described using the Single Family Loan-Level Dataset by Freddie Mac.

is a beta-Stacy process with parameters {a j , b j } j2N 0 , where a j and b j are the initial numbers of red, 141 respectively blue, balls in urn U(j), prior to any sampling. In other words, F(0) = 0 with probability 142 1 and, for j 1, the increment [F(j) F(j 1)] has the same distribution as V j ' {V j } is a sequence of independent random variables, such that V j ⇠ beta a j , b j . For a reader familiar 144 with urn processes, each beta distributed V j is clearly the result of the corresponding Polya urn U(j) 145 (Mahmoud 2008). 146 Let B 1 , B 2 , ..., B m be the first m blocks generated by a two-color RUP {Z n } n 0 . With T i we indicate the last state visited by {Z n } within block i. Coming back to Equation (1), we would have T 1 = 2, T 2 = 7 and T 3 = 4. Since the random variables T 1 , ..., T m are measurable functions of the exchangeable blocks, they are exchangeable as well, and their de Finetti measure is the same beta-Stacy governing B 1 , B 2 , ..., B m . In what follows a sequence {T i } m i=1 is called LS (Last State) sequence. In terms of probabilities, for T 1 , ..., T m , one can easily observe that and, for m 1, while, trivially, P[T m+1 j|T 1 , T 2 , ..., where r j = Â m i=1 1 {T i =j} and s j = Â m i=1 1 {T i >j} . This means that, every time it is reset to 0, creating a Regarding the initial knowledge, notice that, when we choose the quantities a j and b j for j = 0, 1, 2, ..., from a Bayesian point of view we are eliciting a prior. In fact, by setting a j = c j (G ({j}) and we are just requiring E[F({j})] = G({j}), so that the beta-Stacy process F-a random distribution on 152 discrete distributions-is centered on the discrete distribution G, which we guess may correctly describe 153 the phenomenon we are modeling. The quantity c j 0 is called strength of belief, and it represents 154 how confident we are in our a priori. Given a constant reinforcement, as the one we are using here 155 (+1 ball of the same color), a c j > 1 reduces the speed of learning of the RUP, making the evidence 156 emerging from sampling less relevant in updating the initial compositions. In other terms, c j helps in and the LGD into l = 0, ..., L levels, such that for example l = 0 indicates a PD or LGD of 0%, l = 1 163 something between 0% and 5%, l = 2 a quantity in (5%, 17%], and so on until the last level L. The 164 levels do not need to correspond to equally spaced intervals, and this gives flexibility to the modeling.

165
Clearly, the larger L the finer the partition we obtain. As we will see in Section 4, a convenient way of 166 defining levels is through quantiles.
Now, following Bulla et al. (2007), let us assume that, for each exposure i = 1, ..., m, we have This construction builds a special dependence between the discretised PD and the discretised LGD we can immediately observe that Cov

208
Let F X and F Y be the marginal distributions of X i and Y i . Clearly we have so that both F X and F Y are convolutions of beta-Stacy processes. The dependence between X and Y, given F XY and F A is thus simply Furthermore, if P is the probability function corresponding to F, one then has Assume now that we have observed m exposures, and we have registered their actual PD and

210
LGD, which we have discretised to get Please observe that exchangeability only applies among the couples {(X i , Y i )} m i=1 , while within each couple there is a clear dependence, so that X i and Y i are not exchangeable.
Electronic copy available at: https://ssrn.com/abstract=3360531 distribution for a new exposure (X m+1 , Y m+1 ), given the observed couples (X m = x m , Y m = y m ). This 213 can be extremely useful in applications, when one is interested in making inference about the PD, the 214 LGD and their relation. 215 In fact Given Equation (6), Equation (9) can be rewritten as follows and use formulas like those in Equations (2) and (3); 220 something that for a small portfolio can be done explicitly. However, when m is large, it becomes 221 numerically unfeasible to perform all those sums and products.

222
Luckily, developing an alternative Markov Chain Monte Carlo algorithm is simple and effective.

223
It is sufficient to go through the following steps.
3. The quantities A m+1 , B m+1 , and C m+1 are then sampled according to their beta-Stacy predictive  by an alphanumeric code, which can be used to match the data with the original Freddie Mac's source 3 .

247
For each loan, several interesting pieces of information are available, like its origination date, 248 the loan age in months, the geographical location (ZIP code) within the US, the FICO score of the 249 subscriber, the presence of some form of insurance, the loan to value, the combined loan to value, the 250 debt-to-income ratio, and many others. In terms of credit performance, quantities like the unpaid 251 principal balance and the delinquency status up to termination date are known. Clearly, termination 252 can be due to several reasons, from voluntary prepayment to foreclosure, and this information is also Exceptional, with a score above 800. Table 1   To avoid any copyright problem with Freddie Mac, which already shares its data online, from Maio's dataset we only provide the PD and the LGD estimates, together with the unique alphanumeric identifier. In this way, merging the data sources is straightforward. As an example, Figure 1 shows two plots of the relation between PD and LGD for the "Very poor"

301
In this section we discuss the performances of the bivariate urn model on the mortgage data 302 described in Section 3. For the sake of space, we will mainly focus our attention on the "Very poor" 303 and the "Exceptional" FICO score classes, as per Table 1. 304 In order to use the model, we need 1) to transform and discretise both the PD and the LGD into 305 levels, and 2) to define an a priori for the different beta-Stacy processes involved in the construction of 306 Equation (6).

307
The results we obtain are promising and suggest that the bivariate urn model can represent an 308 interesting way of modeling PD and LGD dependence. percentiles. Clearly, in using a similar approach, one should remember that she is imposing a uniform 318 behaviour on X and Y, in a way similar to copulas (Nelsen 2006). However, differently from copulas, 319 the dependence between X and Y is not restricted to any particular parametric form: dependence will 320 emerge from the combination of the a priori and the data.

321
Another simple way for defining levels is to round the raw observations to the nearest largest integer (ceiling) or to some other value. For instance, we can consider: Even if it is not a strong requirement, given the meaning of the value 0 in a RUP (recall the 0-blocks), 322 we recommend to use 0 as a special level, not mapping to an interval.

323
Notice that, if correctly applied, discretisation maintains the dependence structure between the 324 variables. For the "Very poor" class, using the levels in Equation (11)  In what follows, we discuss the results mainly using the levels in Equation (12). However, our  In order to use the bivariate urn model, it is necessary to elicit an a priori for all its components, , where the range of variation for B and C is simply inherited from X and Y (but extra conditions can be applied, if needed), while for A the range is chosen to guarantee s 2 A = Cov(X, Y). For instance, if the covariance between X and Y is approximately 3, the interval [0, 5] guarantees that s 2 A ⇡ 3 as well. We can simply use the formula for the variance of a discrete uniform, i.e. s 2 = (b a + 1) 2 1 12 .
• Independent Poisson distributions, such that A ⇠ Poi(l A = Cov(X, Y)), while for B and C one points is not very large, having a strong prior does make a difference. Given the set cardinalities, we shall see that a strong prior has a clear impact for the "Very poor" rating class ("only" 1627 observations), 379 while no appreciable effect is observable for the "Exceptional" one (32403 data points).

380
As a final remark, it is worth noticing that one could take all the urns behind {A i } m i=1 to be empty.

381
This would correspond to assuming that no dependence is actually possible between PD and LGD,    The effect is more visible for PD than for LGD, however in both cases the KS test rejects the null 401 (p-value 0.005 and 0.038). Please notice that this is not necessarily a problem, if one really believes that 402 the available data do not contain all the necessary information, or she want to incorporate specific 403 knowledge about future trends and so on.

404
As anticipated, the number of observations available in the data plays a major role in 405 updating-and, in case of a wrongly elicited belief, correcting-the prior. Focusing on the PD, Figure   406 5 shows that when the "Exceptional" rating class is considered, no big difference can be observed in  Figure 6 shows the bivariate distribution we obtain on for the discretised PD and LGD for the  (12). Figure 7 shows the case c j = 100. As one would expect, the strong prior provides a smoother joint 415 distribution, while the weak one tends to make the empirical data prevail. In Figure 8 the equivalent 416 of Figure 6 is given for the "Exceptional" rating class.

459
We have presented a bivariate urn construction to model the dependence between PD and LGD, 460 also showing a promising application on mortgage data.

461
Exploiting the reinforcement mechanism of Polya urns and the conjugacy of beta-Stacy processes,  Clearly, if an a priori cannot be elicited nor be desired, one can still use the model in a totally 479 data-driven way. As shown in Section 4, in the absence of any a priori on the dependence between PD 480 and LGD, the bivariate urn can easily generate an empirical bivariate cumulative distribution function.