Next Article in Journal
Quick Introduction into the General Framework of Portfolio Theory
Previous Article in Journal
Trading Option Portfolios Using Expected Profit and Expected Loss Metrics
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Hypothesis Test for the Long-Term Calibration in Rating Systems with Overlapping Time Windows

1
Landesbank Baden-Württemberg, 70173 Stuttgart, Germany
2
Center for Mathematical Economics, Bielefeld University, 33615 Bielefeld, Germany
*
Author to whom correspondence should be addressed.
Risks 2024, 12(8), 131; https://doi.org/10.3390/risks12080131
Submission received: 3 July 2024 / Revised: 9 August 2024 / Accepted: 13 August 2024 / Published: 16 August 2024

Abstract

:
We present a statistical test for the long-term calibration in rating systems that can deal with overlapping time windows as required by the guidelines of the European Banking Authority (EBA), which apply to major financial institutions in the European System. In accordance with regulation, rating systems are to be calibrated and validated with respect to the long-run default rate. The consideration of one-year default rates on a quarterly basis leads to correlation effects which drastically influence the variance of the long-run default rate. In a first step, we show that the long-run default rate is approximately normally distributed. We then perform a detailed analysis of the correlation effects caused by the overlapping time windows and solve the problem of an unknown distribution of default probabilities.

1. Introduction

Financial institutions use statistical models to estimate the default risk of obligors in order to manage credit risks. According to Basel II, banks are allowed to estimate risk parameters that are used to calculate regulatory capital with their own models. The legal framework for the use of such models in the internal ratings-based (IRB) approach is regulated in the Capital Requirements Regulation (CRR), see CRR (2013). In Cucinelli et al. (2018), the authors conclude that the IRB regulatory framework has promoted stronger risk management practices and strengthened banks’ overall resilience. The CRR imposes specific requirements for the models, e.g., that “institutions shall estimate PDs [probabilities of default] by obligor grade from long run averages of one-year default rates”, see Article 180. In 2017, the European banking authority published the “guidelines on PD estimation, LGD [loss given default] estimation and the treatment of defaulted exposures” (EBA-GL) specifying the CRR requirement, see EBA GL (2017). Paragraph 81 EBA-GL states that “institutions should calculate the observed average default rates as the arithmetic average of all one year default rates”. This requirement was additionally specified in EBA (2016) and ECB (2024). Here, the observed one-year default rate at a given reference date is defined as the percentage of defaulters in the 12-month period after the reference date, so that the observed long-run average default rate depends on the choice of the reference dates. Paragraph 80 EBA-GL allows institutions to choose “between an approach based on overlapping and an approach based on non-overlapping one-year time windows”. Overlapping one-year time windows occur when the time interval between two reference dates is less than one year. Due to computational simplicity, it is of course convenient to continue working with non-overlapping time windows and appropriately adjust the long-term default rate. In many cases, these types of adjustments are rather on the conservative side, see, e.g., Jing et al. (2008) for an empirical study and Li (2016); Zhou (2001) for theoretical analyses in the context of asset correlation. We also refer to Caprioli et al. (2023) for a sensitivity analysis of credit portfolio value at risk with respect to asset correlations.
On the other hand, the approach, using overlapping time windows, provides more information on defaults due to both potential short-term contracts that cannot be observed during one-year periods and possible variations between average default rates using non-overlapping time windows on different reference dates. Especially in such constellations, it is “the ECB’s understanding that overlapping one-year time windows should preferably be used”, see ECB (2024). For this reason, dealing with overlapping time windows is relevant for all major banks in the eurozone using the IRB approach. Furthermore it is favored by most financial institutions that handle portfolios with only few observed defaults, where the estimation and validation of PDs are particularly challenging, cf. Caprioli et al. (2023); Pluto and Tasche (2011); Tasche (2013). Another advantage of this approach lies in the fact that the bias caused by a specific choice of reference dates can be reduced, e.g., when calculating the long-run average default rate as the arithmetic mean of the one-year default rates on a quarterly basis.
Paragraph 87 EBA-GL requires institutions to use a statistical test of calibration at the level of rating grades and for certain portfolios. Classically, the literature dealing with calibration tests assumes a binomially distributed default rate, see, e.g., Deutsche Bundesbank (2003) and Tasche (2008) for a discussion of different hypothesis tests in this context. We also refer to Coppens et al. (2016) for the consideration of different PDs within the same rating grade and Blochwitz et al. (2004, 2006) for a modified binomial test accounting correlated defaults. However, when considering overlapping time windows, the assumption of a binomial distribution can no longer be maintained. For an overview of existing statistical tests, including a hybrid test that represents a combination of exact and approximate testing procedures, we refer to Aussenegg et al. (2011).
More generally, Monte Carlo methods can be used to construct tests for distributions that cannot be determined analytically. For the use of Monte Carlo methods, precise knowledge of the distribution of the probabilities of default within the portfolio is essential. However, in our case, these probabilities are unknown and estimated by the underlying model so they cannot be used to determine the test statistic. On the other hand, analytical tests are desirable since they satisfy requirements such as replicability and reproducibility. We refer to Blöchlinger (2012, 2017) for hypothesis tests in analytic form that take into account correlation effects, replacing the i.i.d. assumption by a conditional i.i.d. assumption. In this context, we also refer to Tasche (2003) for a one-observation-based inference on the adequacy of probability of default forecasts with dependent default events.
In this paper, we therefore present a statistical test that can be used to verify supervisory calibration requirements in the case of overlapping time windows. A major challenge lies in the analysis of the correlation effects that are caused by the overlapping time windows. On the level of individual ratings grades, the variance of the test statistic is already determined by the null hypothesis. This, however, is not the case on a portfolio level, which is why we focus in great detail on a conservative estimate for the variance by solving a related minimization problem. We then present a conservative calibration test that can deal with the unknown variance in the test statistic.
The rest of the paper is structured as follows. In Section 2.1, we introduce the terminology and notation for the formulation of the hypothesis test in Section 2.2. In Section 2.3, we show that the long-run default rate is approximately normally distributed with respect to random effects in default realization. Thereafter, we focus on the analysis of the correlation effects that arise due to the overlapping time periods in Section 2.4 and Section 2.5. Taking these correlation effects into account is essential for determining the variance of the long-run default rate, see Section 2.6. In Section 3 we formulate the hypothesis test in detail, first at the level of individual rating grades, cf. Section 3.1 and then at portfolio level, cf. Section 3.2. We conclude with a discussion on the parameters of the test and further considerations in Section 4. A closed form solution to the minimization problem related to the estimation of the variance is derived in Appendix A. In Appendix B, we propose an alternative way to estimate the variance without solving an optimization problem.

2. Setup and Preliminaries

2.1. Setup and Notation

In this section, we introduce the terminology and notation used in this paper and give a formal description of the statistical test that is studied in this work. We begin with the general setup by considering the default state of an individual obligor. Throughout, a default state over a one-year time horizon is described by a Bernoulli-distributed random variable x B 1 , p , where
p PD 1 , , PD m ,
m N , and 0 < PD 1 < < PD m < 1 are the default probabilities of the rating grades in the underlying master scale. In the following, we also use the notations PD min and PD max for the default probabilities PD 1 and PD m , respectively.
Next, we introduce the long-run default rate for a given history of reference dates RD 1 < < RD N with N N . For all t = 1 , , N , we are given a number n t N 0 of customers within the portfolio at the reference date RD t , and write
n min : = min { n t | t = 1 , , N } { 0 } and n max : = max { n t | t = 1 , , N }
for the minimal and maximal number of existing customers at any reference date throughout the history of the portfolio, respectively. Throughout, we assume that n max 0 or, equivalently, that n t 0 for some t = 1 , , N .
Let q N be the number of reference dates within a one-year time horizon starting from an arbitrary reference date. To be more precise, we assume that
RD t + 1 = RD t + 1 q for all t = 1 , , N 1 .
According to the EBA guidelines EBA GL (2017), the long-run default rate should be computed at least on a quarterly basis, i.e., q = 4 , or on an annual basis, i.e., q = 1 , performing, however, an analysis of the possible bias that occurs due to the negligence of quarterly data. The main interest of our analysis lies in the case q > 1 leading to correlation effects caused by overlapping time windows. For s , t = 1 , , N with s > t , we define w t , s : = max { 0 , RD t + 1 RD s } for the size of the overlap of the observation periods.
Since new customers are usually added over time and the business relationship ends for others, assign to each customer during the history of reference dates a number from 1 to M N , and we define Λ t { 1 , . . , M } as the set of obligors at reference date RD t for all t = 1 , , N . In particular, t = 1 N Λ t = { 1 , , M } , Λ t = n t , and n max M . For t = 1 , , N , we then define the one-year default rate among the considered customers at reference date RD t by
X t = 1 n t j Λ t x j , t ,
where x t , j B ( 1 , p t , j ) is the one-year default state and p t , j PD 1 , , PD m the probability of default over a one-year time horizon of customer j Λ t at the reference date RD t . Since n t = 0 cannot be excluded, we use the convention 0 0 : = 0 .
The realized default rate on a reference date RD t is denoted by DR t , i.e., DR t is the realization of the random variable X t for t = 1 , , N . Moreover, we define
R N : = t { 1 , , N } | n t > 0
as the set of indices for reference dates, where the portfolio contains at least one customer, and denote its cardinality by R ( N ) . Since we assume that n max 0 , it follows that R ( N ) > 0 , and we define the realized long-run default rate as
LRDR : = 1 R ( N ) t = 1 N DR t ,
which is a realization of the the long-run default rate
Z : = 1 R ( N ) t = 1 N X t .
We emphasize that the long-run default rate is the arithmetic mean of the one-year default rates and not the arithmetic mean of the individual default states of all customers for all reference dates, which intuition might suggest. Due to the chosen convention 0 0 = 0 , it follows that X t = 0 if n t = 0 , i.e., reference dates, where the portfolio contains no customers are completely disregarded in the computation of the long-run default rate. Nevertheless, these reference dates have to be included into the timeline since they describe an overlapping period.

2.2. Formal Description of the Test

We now give an overview over the hypothesis test presented in this paper. The aim is to formulate a statistical test for comparing the realized long-run default rate LRDR with the estimated long-run default rate, which is called the long-run central tendency, denoted by LRCT , both on the level of individual rating grades and on the portfolio level. In other words, LRDR describes the default rate that has actually occurred, while LRCT represents the long-run prediction from the rating model. Both LRDR and LRCT are therefore non-random. We use the random variable Z with expected value μ : = E ( Z ) as the test statistic. Since LRCT is the estimated expected value of Z, i.e., LRCT = μ ^ , we can formulate the following hypothesis test with
null hypothesis H 0 : μ = LRCT and alternative hypothesis H 1 : μ LRCT .
Based on this, we determine values k , K [ 0 , 1 ] and consider the test function
φ ( z ) : = 0 , if z [ k , K ] , 1 , otherwise .
If φ LRDR = 0 , the null hypothesis based on the available data is retained, whereas it is rejected in favor of the alternative hypothesis if φ LRDR = 1 . We point out that the distribution of the test statistic Z does not only depend on μ but also on other parameters that are not determined by the null hypothesis. The main focus of this paper is to postulate a distributional assumption for Z such that its distribution is determined by the null hypothesis. In this context, we first show that Z is approximately normally distributed, i.e., Z N μ , σ 2 . In a second step, the focus lies on a conservative estimate for the variance σ 2 based on the information given by the null hypothesis. A major challenge lies in the fact that the consideration of overlapping time windows leads to unavoidable correlation effects that rule out independence assumptions on the random variables X 1 , , X N .

2.3. Distribution of the Long-Run Default Rate

As we have seen in the previous section, the long-run calibration test requires the formulation of a distribution assumption for the long-run default rate. However, the definition of the long-run default rate as the arithmetic mean of the one-year default rates leads to difficulties in the derivation of an analytical description of the distribution based on the Bernoulli-distributed default states of the individual obligors. The aim of this section is to show that, despite the correlation effects and the possibly varying number of obligors with different PDs at each reference date, the long-run default rate is still approximately normally distributed.
In order to simplify notation, we define y t , j : = x t , j if j Λ t and y t , j : = 0 otherwise, for all t = 1 , , N . Then,
Z = 1 R ( N ) t = 1 N 1 n t j = 1 M y t , j = j = 1 M Y j
with Y j = t = 1 N 1 R ( N ) 1 n t y t , j .
In the sequel, we will show that Z is approximately normally distributed. For this, we assume that default states of different obligors are independent of each other, from which the independence of the family ( Y j ) j = 1 , , M follows. This assumption is standard and sufficiently conservative for the calibration of rating models.
In order to apply the Lindeberg–Feller central limit theorem (CLT), cf. (Klenke 2020, Theorem 15.44), which is a generalization of the classical CLT, we first show that the variances of the random variables ( Y j ) j = 1 , , M cannot become arbitrarily small. In fact, by neglecting the covariance between the default states of an obligor at different reference dates,
σ 2 Y j = 1 R ( N ) 2 t = 1 N 1 n t 2 σ 2 ( y t , j ) + 2 1 R ( N ) 2 s = t + 1 N 1 n t n s Cov ( y t , j , y s , j ) 1 R ( N ) 2 t = 1 N 1 n t 2 σ 2 ( y t , j ) for all j = 1 , , M .
Note that, in the previous estimate, we have used the inequality
s = t + 1 N 1 n t n s Cov ( y t , j , y s , j ) 0 ,
which is not automatically satisfied but rather follows from Formula (5) below, which results from the additional assumptions and considerations on default states in Section 2.4. Moreover, for each obligor j = 1 , , M , there exists at least one index t = 1 , , N with σ 2 ( y t , j ) > 0 since t = 1 N Λ t = { 1 , , M } . Hence, for each obligor j = 1 , , M ,
σ 2 Y j 1 R ( N ) 1 n max 2 min k = 1 , , m ( PD k ( 1 PD k ) ) = : v min .
Note that v min , as defined on the right-hand side of the previous estimate, is independent of the customer j = 1 , , M and v min > 0 . Since, up to now, we only consider finitely many customers, we extend the family ( Y j ) j = 1 , , M by a sequence of independent random variables Y M + 1 , Y M + 2 , with σ 2 Y j v min , e.g., Y j N μ j , v min with arbitrary drift μ j ( 0 , 1 ) for j M + 1 .
Now, the central limit theorem shall be applied to the family ( Y j ) j N . In the classical version by Lindeberg and Lévy, the CLT states that a proper renormalization of the arithmetic mean is approximately normally distributed if the underlying sequence of random variables is independent and identically distributed. Its generalized version by Lindeberg and Feller also applies to random variables that are not identically distributed as in our case. It states that a proper renormalization of the arithmetic mean of a family of independent random variables converges in probability against a normally distributed random variable if this family satisfies the so-called Lindeberg condition, see (Klenke 2020, Definition 15.41). In our case, the Lindeberg condition applies if, for all ε > 0 ,
lim k 1 s k 2 j = 1 k E Y j E ( Y j ) 2 · 1 Y j E ( Y j ) > ε s k = 0 ,
where
s k = j = 1 k σ 2 Y j .
That is, the Lindeberg condition requires that the underlying sequence of random variables does not exhibit arbitrarily large deviations from the expected value.
In the following, we verify the Lindeberg condition in the our setup. To that end, let ε > 0 . Since, for all j = 1 , , M , the random variable Y j only takes values in ( 0 , 1 ) ,
σ 2 Y j v min > 0 , and s k as k ,
there exists some k 0 N such that, for all k N with k k 0 , j N , and ω Ω ,
| Y j ( ω ) E ( Y j ) | ε · s k .
Hence, for all k N with k k 0 ,
1 s k 2 j = 1 k E ( Y j E ( Y j ) 2 · 1 Y j E ( Y j ) > ε s k )   = 1 s k 2 j = 1 k 0 E Y j E ( Y j ) 2 · 1 Y j E ( Y j ) > ε s k   1 s k 2 j = 1 k 0 E Y j E ( Y j ) 2 0 as k ,
i.e., the Lindeberg condition is satisfied. By the central limit theorem, cf. (Klenke 2020, Theorem 15.44), we obtain that
lim k 1 s k j = 1 k Y j E ( Y j ) z = Φ z for all z R ,
where Φ is the cumulative distribution function for the standard normal distribution. Since the number of customers M is large but fixed throughout our analysis, we may therefore assume that
Z N μ , σ 2
with μ = j = 1 M E ( Y j ) and σ = s M , suppressing the dependence of μ and σ on the number of customers M.
The literature suggests that, in standard cases, a sample size of at least 30 is required in order to achieve useful results with the approximation of a normal distribution, see, for example, Hogg and Tanis (1977). In our setting, however, the situation is more complex due to the specific calculation method of the long-run default rate. As we will see in Section 4.3, the convergence depends both on the number of reference dates N and the number of customers n t per reference date RD t ; hence, indirectly on M, but not solely on M. As a consequence, a large number of customers does not necessarily imply a good approximation, e.g., if we have N = 2 reference dates and M = 1,000,000 customers, but only one customer at the second reference date, i.e., n 2 = 1 . For more details, we refer to Section 4.3, where we focus on the convergence for different combinations of N and numbers of customers n t per reference date RD t .

2.4. Covariance between Default States

Pursuant to Paragraph 80 EBA-GL (EBA GL 2017), when calculating the long-run default rate, institutions may choose between overlapping time windows of one year and non-overlapping time windows. According to Paragraph 78 EBA-GL, the reference dates used should include at least all quarterly reference dates. The overlap of the time windows of the one-year default rates leads to correlations between them, which affects the variance of the long-run default rate and thus the acceptance ranges of the test.
In order to be able to describe the distribution of the test statistic as precisely as possible, in Section 2.5 below, we thus focus on the analysis of the covariances between and the default rates, and start by considering the covariance between two default states of individual obligors for overlapping observation periods in this subsection. To that end, we consider two reference dates RD t and RD s with RD t < RD s < RD t + 1 and an obligor who can be observed in the period [ RD t , RD s + 1 ) . The default state in the period [ RD t , RD t + 1 ) is described by a random variable x t 0 , 1 , the default state in the period [ RD s , RD s + 1 ) by a random variable x s 0 , 1 , cf. Figure 1. Throughout, default events correspond to realizations of the value 1 for default states. That is, a realization of x t or x s answers the question whether or not the obligor under consideration defaulted in time period [ RD t , RD t + 1 ) or [ RD s , RD s + 1 ) , respectively.
In order to compute the covariance between the two default states x t and x s , a detailed knowledge of the time of default is necessary, where, in the case of more than one default event, we focus on the time of the temporally first default. The time of first default is described by a random variable T 1 [ RD t , ) , which creates a link between default status and time of default, see, e.g., European Commission (2021). While T 1 describes the timing of the first default starting at time RD t , we model the timing of the first default after the default described by T 1 and after RD s using a random variable T 2 * [ RD s , ) , with T 2 * > T 1 . We then define
T 2 : = T 1 , if T 1 RD s and T 2 * > RD s + 1 , T 2 * , otherwise .
If besides T 1 more than one default is observed in the interval [ RD s , RD s + 1 ) , T 2 describes the time of the second default, otherwise T 2 is used to model the time of the first default starting from RD s . We now divide the considered time period into three disjoint intervals
I 1 : = [ RD t , RD s ] , I 2 : = ( RD s , RD t + 1 ] , and I 3 : = ( RD t + 1 , RD s + 1 ] ,
and look at the following disjoint events
E 1 : = { T 1 I 1 } , E 2 : = { T 1 I 2 } , and E 3 : = { T 1 I 1 I 2 } .
Moreover, we define
E 4 : = { T 2 I 2 } , E 5 : = { T 2 I 3 } , and E 6 : = { T 2 I 2 I 3 } .
In order to simplify notation, we set p t : = P ( x t = 1 ) and p s : = P ( x s = 1 ) and assume that
P ( x t = 1 | E 5 ) = p t and P ( x s = 1 | E 1 ) = p s .
In a first step, we focus on the probability P ( E 1 ) . For this, we assume that the probability of default in a certain time interval depends on the length of this interval and on the general creditworthiness of the obligor, i.e., we assume that
P ( T 1 I | x t = 1 ) = f 1 ( | I | )
for every interval I [ RD t , RD t + 1 ) and a non-negative and non-decreasing function f 1 with f 1 ( 0 ) = 0 and f 1 ( 1 ) = 1 . This ensures that the probability of observing a default in a given time interval increases if the length of the interval increases and that the probabilities of default are identical for intervals of the same length. We deduce
P ( E 1 | x t = 1 ) = 1 P ( E 2 | x t = 1 ) = 1 f 1 ( w t , s ) ,
and therefore, using Bayes’ theorem,
1 f 1 ( w t , s ) = P ( E 1 | x t = 1 ) = P ( x t = 1 | E 1 ) · P ( E 1 ) P ( x t = 1 ) = P ( E 1 ) p t .
Rearranging terms, we find that
P ( E 1 ) = 1 f 1 ( w t , s ) p t
and, analogously,
P ( E 2 ) = f 1 ( w t , s ) p t .
Hence,
E ( x t · x s ) = P ( x t = 1 , x s = 1 ) = P ( x t = 1 , x s = 1 | E 1 ) · P ( E 1 ) + P ( x t = 1 , x s = 1 | E 2 ) · P ( E 2 ) + P ( x t = 1 , x s = 1 | E 3 ) · P ( E 3 ) = P ( x s = 1 | E 1 ) · P ( E 1 ) + P ( E 2 ) = p s 1 f 1 ( w t , s ) p t + f 1 ( w t , s ) p t = p t p s + f 1 ( w t , s ) p t ( 1 p s ) .
Analogously, one obtains
P ( E 4 ) = f 2 ( w t , s ) p s and P ( E 5 ) = 1 f 2 ( w t , s ) p s ,
for a non-negative and non-decreasing function f 2 with f 2 ( 0 ) = 0 and f 2 ( 1 ) = 1 , and we find that
E ( x t · x s ) = P ( x t = 1 , x s = 1 ) = P ( x t = 1 , x s = 1 | E 4 ) · P ( E 4 ) + P ( x t = 1 , x s = 1 | E 5 ) · P ( E 5 ) + P ( x t = 1 , x s = 1 | E 6 ) · P ( E 6 ) = P ( E 4 ) + P ( x t = 1 | E 5 ) · P ( E 5 ) = f 2 ( w t , s ) p s + p t 1 f 2 ( w t , s ) p s = p t p s + f 2 ( w t , s ) p s ( 1 p t ) .
Equating the previous expectations results in
f 1 ( w t , s ) = f 2 ( w t , s ) p s 1 p t p t 1 p s ,
i.e., f 1 is a linear transform of f 2 . To determine the covariance between x t and x s , knowledge of at least one of the functions f 1 or f 2 is required. We assume that the distribution of the time of default within an observation year is uniform. We postulate this assumption for x s , since it gives preference to the more recent information at RD s over the information at RD t , which lies further in the past. We therefore set f 2 ( w t , s ) = w t , s and end up with
Cov ( x t , x s ) = E ( x t · x s ) E ( x t ) · E ( x s ) = w t , s p s ( 1 p t ) .

2.5. Covariance between Default Rates

Having previously examined the covariance between obligor default states, we extrapolate this result to the covariance of default rates. To do this, we consider a sample of debtors on reference dates RD t and RD s with RD s < RD t + 1 and t , s = 1 , , N . During the transition from the first to the second reference date, debtors can be removed from monitoring due to defaults, terminated business relationships, or migrations to other rating systems, and new debtors can be added on the second reporting date due to new business or migrations to the rating system under consideration. In addition, there are debtors who can be observed on both reference dates. The number of these so-called persisting customers with respect to reference dates RD t and RD s is denoted by Λ t Λ s = : k t , s N . The one-year default rates on the two reference dates are then given by X t = 1 n t j Λ t x t , j with x t , j B 1 , p t , j and X s = 1 n s j Λ s x s , j with x s , j B 1 , p s , j . Hence,
Cov X t , X s = 1 n t n s i Λ t j Λ t Cov x t , i , x s , j .
Again, assuming that default states for pairs of different customers are independent, using Formula (5), we find that
Cov X t , X s = 1 n t n s j Λ t Λ s Cov x t , j , x s , j = w t , s n t n s j Λ t Λ s p s , j 1 p t , j ,
where w t , s describes, as before, the size of the overlap between the observation periods starting from the reference dates RD t and RD s .

2.6. Variance of the Long-Run Default Rate

Based on the considerations for estimating the covariance of default rates in overlapping periods, we now discuss the variance of the long-run default rate. For the random variable X t with t = 1 , , N , we define E ( X t ) = : μ t and σ ( X t ) = : σ t . For the long-run default rate Z, which is approximately normally distributed with Z N ( μ , σ 2 ) , we obtain
μ = 1 R ( N ) · t = 1 N μ t = 1 R ( N ) · t = 1 N 1 n t j Λ t p t , j
and
R ( N ) 2 · σ 2 = t = 1 N σ t 2 + 2 t = 1 N 1 s = t + 1 N Cov X t , X s = t = 1 N σ i 2 + 2 i = 1 q 1 t = 1 N i Cov X t , X t + i ,
where the second term contains the covariances caused by overlapping time periods. Using Equation (6), we end up with
R ( N ) 2 · σ 2 = t = 1 N 1 n t 2 j Λ t p t , j ( 1 p t , j ) + i = 1 q 1 2 ( q i ) q t = 1 N i 1 n t · n t + i j Λ t Λ t + i p t + i , j ( 1 p t , j ) .
For the final calibration test, it will be crucial to estimate the variance of the long-run default rate by a term that solely depends on the expected value μ , the number of debtors per reference date, and debtor-independent variables, see Section 3.2.

3. Hypothesis Test for Long-Term Calibration

The aim of this section is to formulate a statistical test for comparing the realized long-run default rate with the estimated long-run default rate.

3.1. Statistical Test per Rating Grade

We start with the simplest case and consider a portfolio consisting of only one rating grade with an associated probability of default, which we call PD grade . That is, all considered obligors have the same unknown probability of default, which was estimated by PD grade . We recall that the measured long-run default rate, denoted by LRDR , is the realization of a random variable Z for which approximately Z N ( μ , σ 2 ) holds. We consider the hypothesis test, described in Section 2.2, with
null hypothesis H 0 : μ = PD grade and alternative hypothesis H 1 : μ PD grade .
From Equation (8), we obtain
R ( N ) 2 · σ 2 = t = 1 N σ t 2 + i = 1 q 1 2 ( q i ) q t = 1 N i 1 n t · n t + i j Λ t Λ t + i μ 1 μ = t = 1 N σ t 2 + μ 1 μ i = 1 q 1 2 ( q i ) q t = 1 N i k t , t + i n t · n t + i = t = 1 N σ t 2 + μ 1 μ i = 1 q 1 λ i ,
where
λ i : = 2 ( q i ) q t = 1 N i k t , t + i n t · n t + i .
Moreover,
t = 1 N σ t 2 = t R N 1 n t · μ 1 μ = μ 1 μ t R N 1 n t ,
which implies that
σ 2 = μ 1 μ R ( N ) 2 t R N 1 n t + i = 1 q 1 λ i .
Therefore, assuming the validity of the null hypothesis, σ is known. In order to define the limits of the acceptance range for a significance level α 0 , 1 , we choose k , K [ 0 , 1 ] in such a way that
E φ ( Z ) | H 0 α ,
where φ is given by (3). To that end, let
k = PD grade + Φ 1 1 2 α · PD grade 1 PD grade R ( N ) 2 t R N 1 n t + i = 1 q 1 λ i
and
K = PD grade + Φ 1 1 1 2 α · PD grade 1 PD grade R ( N ) 2 t R N 1 n t + i = 1 q 1 λ i .
Then, the calibration test passes if
LRDR k , K .
To see the influence of persisting customers on the acceptance range and the density function of the long-run default rate, we refer to Section 4.1 and Section 4.2.

3.2. Statistical Test on Portfolio Level

We now consider a portfolio of heterogeneous obligors, i.e., each obligor has its own individual probability of default. For this purpose, we denote the estimated PD of customer j on reference date RD t by p ^ t , j , the mean PD based on the rating model examined on the reference date RD t by PD ^ t for t = 1 , , N , and the long-run central tendency by
LRCT = 1 N · t = 1 N PD ^ t = 1 N · t = 1 N 1 n t j Λ t p ^ t , j .
While on the level of rating grades, it is not an exception that the number of customers at a reference date RD t might be zero, i.e., n t = 0 , on a portfolio level, though this rarely happens in practice. In this section, we therefore restrict our attention to the case where n t > 0 for all t = 1 , , N and therefore R ( N ) = N .
We consider the hypothesis test from Section 2.2 with
null hypothesis H 0 : μ = LRCT and alternative hypothesis H 1 : μ LRCT .
As in the case of individual rating grades, the test statistic depends on μ and σ but, in this case, Equation (8) can no longer be reduced to a term depending only on μ . More precisely, if the correctness of the null hypothesis is assumed, the distribution parameters are only partially determined. If, however, we replace σ by an expression σ min ( μ ) with σ σ min ( μ ) , the null hypothesis implies a concrete distribution for the underlying test statistic. This results in a narrower confidence interval and therefore a greater likelihood of committing a type I error. At the same time, the probability of a type II error is reduced, comparable to lowering the significance level.
The key challenge is to derive an analytic expression for σ min that depends only on PD-independent model parameters, such as N, M, n t , etc., and satisfies σ min ( μ ) σ under the null hypothesis for all possible combinations of p t , j [ PD min , PD max ] . Since we aim to minimize the distance between σ min ( μ ) and σ , it is sensible to define σ min via the solution to a minimization problem with cost functional given by the right-hand side of (8) under the side condition
μ = 1 N · t = 1 N 1 n t j Λ t p t , j and p t , j [ PD min , PD max ] .
Observe that the right-hand side of (8) is, in general, neither convex nor concave with respect to the p t , j . This, together with the high dimensionality of the problem, means the numerical or analytical computation of solutions is rather involved. In order to circumvent this issue, we therefore replace the right-hand side of (8) by a suitable linear cost functional, which turns the minimization into a linear program that can be solved analytically, up to a sort algorithm, and therefore also numerically with great efficiency.
First, note that mixed terms, i.e., products of PDs of the same customer at different reference dates, always appear in the sums
j Λ t Λ t + i p t + i , j 1 p t , j
caused by the covariances. Hence, in Formula (8), we want to substitute the subtrahend (blue) by an affine linear function with respect to the multiplier (orange) in a conservative way. On a portfolio level, it is appropriate to assume that an obligor’s PD remains constant on average over a one-year time horizon, i.e., we assume
j Λ t Λ t + i p t + i , j 1 p t , j j Λ t Λ t + l p t + i , j 1 p t + i , j .
For most portfolios, this is not only a plausible but also a fairly conservative assumption regarding the portfolio variance since, for a given vector ( x 1 , , x n ) with 0 < x i < 1 for i = 1 , , n , the sum
i = 1 n x i ( 1 y i )
is minimized for x i = y i , under the assumption that, for each i = 1 , , n , there exists some j = 1 , , n with x i = y j . Certainly, even more conservative assumptions can be made at this point with PD max being the most conservative subtrahend possible. The impact of such a choice is discussed in Section 4.6, where we indicate that the acceptance ranges hardly change when using PD max instead of p t + i , j . We point out that the subsequent discussion applies also to this choice without limitations.
From Equation (8), we obtain
N 2 · σ 2 = t = 1 N 1 n t 2 j Λ t p t , j ( 1 p t , j ) + i = 1 q 1 2 ( q i ) q t = 1 N i 1 n t · n t + i j Λ t Λ t + i p t + i , j ( 1 p t + i , j ) = t = 1 N 1 n t 2 j Λ t g 0 ( p t , j ) + 2 i = 1 q 1 t = 1 N i 1 n t · n t + i j Λ t Λ t + i g i ( p t + i , j )
with
g i x : = q i q · x 1 x for x ( 0 , 1 ) and i = 0 , , q 1 .
Now, we replace each of the functions g i by an affine linear function f i such that the functions g i and f i coincide on PD max and PD min . Since g i is concave and f i is affine linear, this implies that g i ( x ) f i ( x ) for all x [ PD min , PD max ] . To be precise, the functions f i are given by
f i ( x ) : = α i · x + c i for all x ( 0 , 1 )
with
α i : = q i q ( 1 PD max PD min ) and c i = q i q PD min PD max
for all i = 0 , , q 1 . We therefore end up with the estimate
N 2 · σ 2 = t = 1 N 1 n t 2 j Λ t g 0 ( p t , j ) + 2 i = 1 q 1 t = 1 N i 1 n t · n t + i j Λ t Λ t + i g i ( p t + i , j ) t = 1 N 1 n t 2 j Λ t f 0 ( p t , j ) + 2 i = 1 q 1 t = 1 N i 1 n t · n t + i j Λ t Λ t + i f i ( p t + i , j ) = t = 1 N 1 n t 2 j Λ t ( α 0 · p t , j + c 0 ) + 2 i = 1 q 1 t = 1 N i 1 n t · n t + i j Λ t Λ t + i ( α i · p t + i , j + c i ) = t = 1 N j Λ t α t , j p t , j + C ,
where in the last step, we have isolated all terms that do not depend on p t , j in the constant C by using k t , t + i = Λ t Λ t + i , i.e.,
C = c 0 · t = 1 N 1 n t + 2 · i = 1 q 1 c i · t = 1 N i k t , t + i n t · n t + i
and
α t , j = 1 n t 2 α 0 1 Λ t ( j ) + 2 n t · i = 1 q 1 α i n t i 1 Λ t i Λ t ( j )
with Λ t i : = if t i < 1 for t = 1 , , N and j = 1 , , M . We have therefore reduced the original problem to
minimize t = 1 N j Λ t α t , j p t , j under the side condition ( 9 ) .
As shown in Appendix A, the solution p ( μ ) to the minimization problem (12) can be easily computed analytically and numerically. We denote the minimal value by m ( μ ) , and define σ min 2 ( μ ) = m ( μ ) + C N 2 .
For the modified random variable Z * N μ , σ min 2 ( μ ) , we are now able to define the limits k , K [ 0 , 1 ] of the acceptance range for a significance level α 0 , 1 in such a way that E φ Z * | H 0 α . For this, we define
k : = LRCT + Φ 1 1 2 α · σ min ( LRCT )
and
K : = LRCT + Φ 1 1 1 2 α · σ min ( LRCT ) .
The calibration test passes if
LRDR k , K .

4. Discussion and Further Considerations

4.1. Effect of Persisting Customers on the Variance of Z

In this section, we aim to highlight the effects caused by persisting customers on the distribution of Z. We calculate the density function of the long-run default rate using a sample portfolio in a rating grade with PD = 0.02 and set q = 4 . The number of reference dates is N = R ( N ) = 32 , i.e., we consider a portfolio with a relatively short history of eight years. The number of customers is set to be constant over time with n = 50 . We furthermore assume that the ratio of persisting customers per time interval is constant over time. We have k t , t + 1 = 45 for all t 31 , k t , t + 2 = 40 for all t 30 , and k t , t + 3 = 35 for all t 29 . Only about 70 % of the persisting customers are still in the considered rating grade after three- quarters, which represents a relatively high fluctuation.
These specifications determine the density function of the long-run default rate (orange). In comparison, one can see the density functions of the long-run default rate with a maximum and minimum number of persisting customers in gray and blue, respectively. We have
σ orange 2 4.13 · 10 5 , σ blue 2 1.23 · 10 5 , and σ gray 2 4.71 · 10 5 .
Figure 2 shows that that correlation effects caused by overlapping time windows should not be neglected since, otherwise, the acceptance range would be way too tight, see also Section 4.2 below.

4.2. Effect of Persisting Customers on Acceptance Range

We examine the influence of the number of persisting customers on the width of the acceptance range. Again, we choose q = 4 and, for the sake of simplicity, we consider the case at the level of the individual grades choosing PD grade = 0.02 . The proportion of persisting customers is usually lower at the level of individual grades than in the overall portfolio, since a rating migration automatically means that the customer in question is no longer in the sample for the individual grade but is still in the overall portfolio.
The number of reference dates is N = R ( N ) = 60 and the number of customers is constant, i.e., n t : = n = 50 for t = 1 , , 50 . The extreme cases regarding persisting customers now represent the two scenarios in which, on the one hand, all customers remain the same over the entire history and, on the other hand, none of the customers in a specific quarter occurs in the previous quarter or in the following quarter.
The first case implies k t , t + 1 = n for t N 1 , k t , t + 2 = n for t N 2 and k t , t + 3 = n for t N 3 . Thus, for two one-year default rates X t and X s where 1 t < s 50 , we have
Cov X t , X s = 1 n · w t , s 1 PD grade · PD grade .
For the second case, we have k t , t + 1 = 0 for t N 1 , k t , t + 2 = 0 for t N 2 and k t , t + 3 = 0 for t N 3 , implying
Cov X t , X s = 0
for two one-year default rates X t and X s . Since the covariance between default rates is always 0, in this case, the distribution of the long-run default rate is the same as if correlation effects due to overlapping time windows were ignored completely. Note that such a scenario is extremely unlikely, especially for a large number of ratings, and is therefore purely hypothetical since customers would constantly switch rating grades, which would imply a high level of instability in the rating system.
In Figure 3, we can see that the width of the acceptance range to a given level α for a portfolio only consisting of persisting customers is almost doubled compared to a portfolio with no persisting customers with respect to a three-month time horizon. A similar picture emerges when comparing the density functions of the long-run default rates, cf. Figure 4.

4.3. Some Thoughts on the Rate of Convergence

In view of Section 2.3, we briefly discuss the rate of convergence and mention a few particularities that need to be taken into account concerning the asymptotic behavior of the long-run default rate Z. When using the classical central limit theorem, it is common to specify a minimum number of random variables ensuring a reasonable approximation of the normal distribution. In our case, additional factors need to be taken into account since, for example, a large number of customers, i.e., a large number M in
Z = 1 N t = 1 N 1 n t j = 1 M y t , j = j = 1 M Y j ,
does not directly lead to a good approximation. This stems from the calculation logic of the long-run default rate. For a simple illustration, we assume that we are in the setting, where all customers have the same PD with PD = 0.01 . For N = 1 and M = 1000 or, analogously, N = 1000 and n t = 1 for t = 1 , , 1000 , the normal approximation is considerably better than for N = 2 with n 1 = 1 and n 2 = 999 since, in the first case, P ( Z 0 , 5 ) 0 while, in the second case, P ( Z 0 , 5 ) > 0 , 01 . Thus, the rate of convergence depends on the number of customers, the number of reference dates, and the number of customers per reference date. While the previously described scenarios are uncommon at portfolio level, at the level of individual rating grades, constellations can arise in which, on certain reference dates, there is only one customer in the corresponding rating grade.
In the sequel, we elaborate more on this particularity and aim to provide a rule of thumb, i.e., conditions for N, n max , and n min that imply a satisfactory normal approximation. To that end, we consider a synthetic portfolio. For the sake of simplicity, we assume that q = 1 and simulate a convolution of weighted binomial distributions. We study the difference between the distribution functions of the long-run default rate on the level of a rating grade with PD = 0.01 when using a simulation on the one hand and using the normal approximation on the other hand in different scenarios. We assume the number of reference dates is N = 56 . The number of customers n t per reference date RD t and thus the number of all customers M in the portfolio varies between the considered scenarios.
In Figure 5, there are eight reference dates where only one to five customers are in the portfolio. On the other reference dates, we assume the number of customers to be constant with n t = 100 . We observe large differences in the two distribution functions, especially in the tails, which are particularly relevant for the test. If we now increase the number of customers on the critical reference dates (those with n t 5 ) to 10, we observe, in Figure 6, that the distribution functions are almost the same. Hence, the poor convergence in Figure 5 is caused by the few reference dates with very few customers. Furthermore, we see that, in this special case, n min n max 1 10 seems to be an appropriate bound in order to obtain a satisfactory approximation.
For a second comparison, we now reduce the number of customers significantly to 1 n t 10 . Again, the number of customers equals one only on the first eight reference dates. In Figure 7, we still see major differences in the distribution functions. Doubling the number of customers leads to Figure 8, where we already obtain useful results.
On the other hand, if the number of customers is very low, e.g., n t = 2 for half of all reference dates, we still obtain a good approximation, cf. Figure 9.
From our examples and statements from the literature, we conclude that N 30 , n min 2 , and n min n max 1 10 seem to be useful conditions to guarantee a satisfying normal approximation.

4.4. An Alternative Way to Bound the Variance

In Appendix B, we propose a different way to estimate the variance in an even more conservative way. If one accepts the associated stricter test as a conservative check of the calibration, one can use
σ alt 2 ( μ ) : = 1 N C + K 1 μ K 2 N 2
as the variance of the test statistic with parameters K 1 and K 2 depending only on the underlying portfolio, see Appendix B for the details.
In this case, we do not have to solve an optimization problem, and can determine the variance directly using the portfolio-specific parameters. However, we additionally need conservative estimates for two parameters that are not directly determined by the null hypothesis. Again, we refer to Appendix B for the details. We briefly discuss the consequences of this alternative method at this point.
The determination of σ alt 2 ( μ ) is based on a successive estimation of the variance of the test statistic against the maximum conservative value in each single step. This procedure can therefore be understood as a maximally conservative solution to the test problem. In the case of a portfolio with approximately constant size, approximately constant distribution of default probabilities over time, and no other special features, the conservatism can often be tolerated. In case of build-up or run-down portfolios or portfolios with a focus on high investment grade obligors, the risk of a type I error may become disproportionately large and the test may therefore not be suitable. This can be easily illustrated by the following two points. The estimation of quadratic terms on the left-hand side of (A6) is performed using PD max regardless of how many obligors belong to the associated rating grade. For portfolios with mainly obligors with good ratings, this estimate is certainly too conservative. Moreover, the parameters K 1 and K 2 depend on
min 1 n t | t = 1 , , N and n t > 0 and min k t , t + i | t = 1 , , N i ,
i.e., for portfolios where the number of customers or the number of persisting customers changes strongly over time, a lot of information is lost with this method. Thus, the estimate on the variance of the test statistic in Section 3.2 is much less conservative and suitable for significantly more portfolios.

4.5. Additional Conditions on the Rating Distribution

In this section, we give an outlook and offer suggestions as to which requirements can be placed on rating distributions. The implementation of some suggestions is briefly outlined. The rating distribution of p ( μ ) obtained by minimizing (12) tends to be U-shaped, cf. Figure 10, with all but at most one rating in the best or the worst rating grade. A U-shaped rating distribution, i.e., a distribution almost exclusively between the two extreme grades, is highly unusual, both in the case of a large number of ratings and in a portfolio that hardly has any customers in the lower rating grade.
Moreover, high values of PD max lead to a narrowing of the acceptance range. Depending on the portfolio, this reduction can be disproportionately strong if, for example, the bad rating grades in low-default portfolios are heavily underrepresented. Hence, the question may be raised why the rating distribution, estimated by the rating model, is not used directly. As already mentioned, this would, however, require assumptions that go far beyond the null hypothesis, and specifying the rating distribution in that way would automatically determine the null hypothesis. However, it may be sensible to place additional yet conservative conditions on the rating distribution in (9).
For example, one could demand that ratings exhibit a conservative distribution, given by a function φ 1 : { PD m n + 1 , , PD m } [ 0 , 1 ] across the n worst rating grades with
P p t , j = PD k | p t , j PD m n + 1 = φ 1 ( PD k ) for k m n + 1 .
For most portfolios, a conservative choice of φ 1 would be φ 1 PD k = 1 n , cf. Figure 11.
Another approach would be to trust the distribution estimated by the rating model to some extent, and require that there is at least a fixed fraction δ [ 0 , 1 ] of the number of ratings per grade given by this distribution with density φ 2 : { PD 1 , , PD m } [ 0 , 1 ] . Then, in addition to (9), one might demand
n PD k n min PD k : = t = 1 N n t · φ 2 ( PD k ) · δ ,
where n PD k is the number of the ratings in rating grade k, cf. Figure 12.
We conclude this section with a modified version of a previously described approach using a uniform distribution among the n worst rating grades. As before, we consider the n worst rating grades PD m n + 1 , , PD m with PD m n + 1 > μ . Our aim is to formulate the requirement of a uniform distribution in such a way that we can again use the method for minimizing the variance already presented. To that end, we define
PD ¯ : = 1 n i = m n + 1 m PD i .
We follow the idea from Section 3.2, and proceed exactly as in Appendix A with PD ¯ instead of PD max . The effect on the variance or, almost equivalently, the acceptance range is illustrated in Figure 13. On the one hand, this approach avoids unnecessary high conservatism. On the other hand, it is still sufficiently conservative and simple to implement.

4.6. Impact of Simplification on the Acceptance Range

In this section, we discuss how different choices of affine linear functions for the simplification of the variance effect the acceptance range. We focus on the impact when choosing either the constant function g = PD max or identity function as in (10).
We start by computing the reduction in the acceptance range due to choosing the affine linear function as the constant PD max using a synthetic portfolio. We choose the number of customers per reference date to be constant with n = 1000 , let the number of reference dates be N = 60 , and q = 4 . We define PD max = 0.2 , PD min = 0.0003 , and set μ = 0.01 . We recall that the components of the minimal solution p ( μ ) of (12) consist only of PD max , PD min , and μ k 0 + 1 , where the latter is uniquely determined by the minimization problem.
We now choose g = PD max , i.e., the most conservative function possible. Proceeding exactly as in Section 3.2 and in Appendix A, we simply obtain different values for α i and c i with i = 0 , , q 1 , namely,
α i = q i q ( 1 PD max ) , c i = 0 .
Estimating the variance as in Section 3.2, once with g = PD max and once with g = id , we obtain
K ( PD max ) k ( PD max ) K ( id ) k ( id ) 0.994 ,
where K ( PD max ) and k ( PD max ) and K ( id ) and k ( id ) denote the upper and lower bound of the acceptance range for g = PD max and g = id , respectively. In this particular setting, we see that the choice of g does not have a substantial influence on the size of the acceptance range.
The difference in the size of the acceptance range is largely influenced by the contributions of the obligors with PD = PD min . By Theorem A1, large values of PD max lead to an increased number of pairs ( t , j ) with p t , j = PD min . Hence, the size of the quotient in (13) is reduced for very large values of PD max and very small values of μ . For instance, by redefining PD max = 0.45 and μ = 0.0006 we obtain
K ( PD max ) k ( PD max ) K ( id ) k ( id ) 0.778 .
As a result, relevant differences in the acceptance ranges mostly occur in settings with very large values of PD max and values of μ that are close to PD min . In practice, the maximally conservative choice of g = PD max is therefore not recommendable since it assumes that on each reference date, all persisting customers have been in the worst rating grade, independent of their current rating. To conclude, it is recommended to choose g = id rather than g = PD max .

5. Conclusions

We introduced a new statistical test for the long-term calibration in rating systems that can deal with correlation effects caused by overlapping time windows. Hypothesis tests that can deal with this type of correlation effects are necessary to implement regulatory requirements in the eurozone. While on the level of individual ratings grades, the variance of the test statistic is already determined by the null hypothesis, on a portfolio level, we provided a conservative estimate for the variance by solving a suitable minimization problem.
In Section 2.4, we calculated the covariances under the assumption that the time of default is uniformly distributed over the observation period, which is a natural choice in the absence of additional information about the default behavior. Considering different distributions for the time of default is, in principle, possible but it increases the mathematical complexity disproportionately.
Moreover, assuming that the default behaviors of different customers are independent, we showed that the the long-run default rate is approximately normally distributed. We are aware that, in practice, this assumption cannot always be justified, e.g., in sectors that heavily depend on economic cycles. Nevertheless, the assumption of independence ensures a smaller acceptance range and is therefore conservative. Abandoning the assumption of independence would require a completely different methodology, as the properties of the normal distribution are fundamental for the analysis performed in this paper.
All additional assumptions made in the course of the paper are of a conservative nature, in the sense that they ensure a lower estimate of the variance of the long-run default rate, e.g., in Formula (10). In Section 4.5, we discussed and showed that further assumptions on the rating distribution are also possible in specific scenarios. We also presented less conservative but, on a practical level, more plausible choices of rating distributions and discussed their effect on the acceptance range.

Author Contributions

Conceptualization, P.K., M.N. and J.S.; methodology, P.K., M.N. and J.S.; validation, P.K., M.N. and J.S.; formal analysis, P.K., M.N. and J.S.; investigation, P.K., M.N. and J.S.; resources, P.K., M.N. and J.S.; writing—original draft preparation, P.K. and J.S.; writing—review and editing, M.N.; visualization, P.K. and J.S.; supervision, M.N.; project administration, M.N.; funding acquisition, M.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—SFB 1283/2 2021—317210226.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors thank Johannes Emmerling, Markus Klein, and Marco Ritter for helpful discussions related to this work. The first and the third author are grateful for the support of the Landesbank Baden-Württemberg related to this work.

Conflicts of Interest

The authors Patrick Kurth and Jan Streicher were employed by the company Landesbank Baden-Württemberg. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

List of Symbols and Abbreviations

PD probability of default
LGDloss given default
B ( 1 , p ) Bernoulli distribution with probability p
PD k default probability of rating grade k in the underlying master scale
PD min equal to PD 1 , i.e., default probability of the best rating grade
PD max equal to PD m , i.e., default probability of the worst rating grade
RD t reference date number t
Nnumber of reference dates
N set of natural numbers
n t number of obligors on reference date
N 0 set of natural numbers including zero
n min minimal number (larger than zero) of existing obligors at any reference date
n max maximal number of existing obligors at any reference date
min A minimum of a set A
max A maximum of a set A
qnumber of reference dates within a one-year time horizon starting from an arbitrary reference date
w t , s size of the overlap of observation periods with reference dates RD t and RD s
Mtotal number of all customers during the history
Λ t set of customers at reference date RD t
X t one-year default rate at reference date RD t
x t one-year default state of an unspecified customer at reference date RD t
x t , j one-year default state of customer j at reference date RD t
p t , j probability of default over a one-year time horizon of customer j at reference date RD t
DR t realized default rate on reference date RD t
R N set of indices for reference dates, where the portfolio contains at least one customer
R ( N ) cardinality of R N
Zlong-run default rate
LRDR realized long-run default rate
LRCT estimated long-run default rate (long-run central tendency)
μ expected value of the long-run default rate
σ 2 variance of the long-run default rate
H 0 null hypothesis
H 1 alternative hypothesis
k , K lower and upper bound of the acceptance range of the hypothesis test
Cov ( X , Y ) covariance of the random variables X and Y
N μ , σ 2 Normal distribution with expected value μ and variance σ 2
E ( X ) expected value of a random variable X
P probability measure
R set of real numbers
1 A indicator function of set A
Φ cumulative distribution function for the standard normal distribution
| I | length of an interval I R or cardinality of a finite set I
k t , s number of persisting customers with respect to reference dates RD t and RD s
empty set
p ( μ ) solution of the minimization problem
m ( μ ) minimal value of the minimization problem
| x | absolute value of a real number x

Appendix A. Minimization Problem

Let n N and α i 0 for i = 1 , , n . In this section, we compute the minimum of the function
f : R n R , x = x 1 , , x n f ( x ) : = i = 1 n α i x i
under side conditions
i = 1 n β i x i = m and c 1 x i c 2 for i = 1 , , n
with given constants β i > 0 for i = 1 , , n , m > 0 , and 0 < c 1 < c 2 . We sort the ratios α i / β i i = 1 , , n in descending order and denote by α ( k ) and β ( k ) , which are the numerator and denominator of the k-th largest element of these ratios. In case of equal values among the ratios α i / β i i = 1 , , n , the one with smaller index i is listed first. Moreover, x ( k ) denotes the variable of f with coefficient α ( k ) . We define
k 0 : = max k j = 1 k c 1 · β ( j ) + j = k + 1 n c 2 · β ( j ) m .
Theorem A1.
The cost function in (A1) is minimized under (A2) for
x ( k ) = c 1 , k k 0 , μ k 0 + 1 , k = k 0 + 1 , c 2 , k 0 + 2 k n ,
where the value μ k 0 + 1 is uniquely determined by (A2) through
μ k 0 + 1 = m j = 1 k c 1 · β ( j ) j = k + 2 n c 2 · β ( j ) β ( k 0 + 1 ) .
Proof. 
We show that for any y R n that fulfills (A2), f ( y ) f ( x ) holds if y x . Per construction of x, there exists some ε > 0 such that either
j = 1 k 0 y ( j ) · β ( j ) = j = 1 k 0 x ( j ) · β ( j ) + ε and j = k 0 + 1 n y ( j ) · β ( j ) = j = k 0 + 1 n x ( j ) · β ( j ) ε
or
j = 1 k 0 + 1 y ( j ) · β ( j ) = j = 1 k 0 + 1 x ( j ) · β ( j ) + ε and j = k 0 + 2 n y ( j ) · β ( j ) = j = k 0 + 2 n x ( j ) · β ( j ) ε
holds. Without loss of generality we regard the first case. Let ε = i = 1 k 0 ε i such that
y ( i ) = x ( i ) + ε i β ( i ) for all i = 1 , , k 0
and ε = i = k 0 + 1 n ε i such that
y ( i ) = x ( i ) ε i β ( i ) for all i = k 0 + 1 , , n .
Then,
j = 1 k 0 y ( j ) α ( j ) = j = 1 k 0 x ( j ) + ε j β ( j ) α ( j ) j = 1 k 0 x ( j ) α ( j ) + α ( k 0 ) β ( k 0 ) ε
and
j = k 0 + 1 n y ( j ) α ( j ) = j = k 0 + 1 n x ( j ) ε j β ( j ) α ( j ) j = k 0 + 1 n x ( j ) α ( j ) α ( k 0 + 1 ) β ( k 0 + 1 ) ε .
Hence,
j = 1 n y ( j ) α ( j ) j = 1 n x ( j ) α ( j ) + α ( k 0 ) β ( k 0 ) α ( k 0 + 1 ) β ( k 0 + 1 ) ε j = 1 n x ( j ) α ( j )
For the second case, one proceeds exactly in the same way. □
We now translate the previous theorem into the setting of Section 3.2. There, we aim to minimize the sum
t = 1 N j Λ t α t , j p t , j
under the side condition
μ = 1 N · t = 1 N 1 n t j Λ t p t , j and p t , j [ PD min , PD max ] .
We rewrite
μ = t = 1 N j Λ t β t , j p t , j
with β t , j : = 1 N · 1 n t . We first sort the set of all possible tuples ( t , j ) in ascending order, first with respect to j and then with respect to t, i.e., we can identify any possible tuple ( t , j ) with a natural number i = 1 , , n : = t = 1 N n t , which simplifies the cost function to
f : R n R , p = p 1 , , p n f ( p ) : = i = 1 n α i p i
with side conditions
i = 1 n β i p i = μ and PD min p i PD max for i = 1 , , n .
Now, we are exactly in the setting of (A1) and (A2), and can apply Theorem A1.

Appendix B. Test on Portfolio Level without Solving the Minimization Problem

We start by estimating the covariance between two default rates on reference dates RD t and RD s . Using Equation (5), we find that
Cov X t , X s = 1 n t · n s j Λ t Λ t 2 Cov x t , j , x s , j 1 n t · n s j Λ t Λ s p s , j w t , s 1 PD max = k t , s n t · n s w t , s 1 PD max 1 k t , s j Λ t Λ s p s , j k t , s n t · n s w t , s 1 PD max γ · E ( X s ) .
The parameter γ > 0 is less than or equal to the ratio of the average default risk of persisting customers in the portfolio and the average default risk of the entire portfolio, i.e.,
E 1 k t , s j Λ t Λ s x s , j γ · E 1 n s j Λ s x s , j .
In view of the fact that persisting customers should be the majority in the portfolio, it is reasonable to assume that γ 1 . The value is to be estimated conservatively depending on the portfolio under consideration, i.e., in case of doubt, the value chosen should be relatively small. If a portfolio consists only of persisting customers, γ is always 1, while the parameter for portfolios with a small proportion of persisting customers can be above or below 1. In the following, the parameter γ is to be understood as an estimate that is independent of the reference date and conservative for the entire history. From Equation (8), we see that
σ 2 1 N 2 t = 1 N 1 n t 2 j Λ t p t , j ( 1 p t , j ) C · E 1 N t = 1 N X t
with C = 1 PD max n max . Hence, using Equation (6),
N 2 · σ 2 = t = 1 N σ i 2 + 2 i = 1 q 1 t = 1 N i Cov X t , X t + i t = 1 N σ t 2 + 2 i = 1 q 1 t = 1 N i k t , t + i n t · n t + i · q i q 1 PD max · γ · μ t + i t = 1 N σ t 2 + 2 i = 1 q 1 min t = 1 , , N i k t , t + i n t · n t + i q i q 1 PD max · γ t = 1 N i μ t + i t = 1 N σ t 2 + K 1 t = 1 N μ t K 2
with
K 1 : = 2 i = 1 q 1 min t = 1 , , N i k t , t + i n t · n t + i q i q 1 PD max · γ and K 2 : = 2 i = 1 q 1 min t = 1 , , N i k t , t + i n t · n t + i q i q 1 PD max · γ · i · μ old ,
where μ old max i = 1 , , q 1 μ i . The term μ old cannot be determined by the null hypothesis, thus a conservative value is to be determined. For portfolios consisting of customers with good credit ratings, the choice μ old = PD max is clearly not adequate. Using Equation (A6), it follows that
N 2 · σ 2 c i = 1 N μ i + K 1 i = 1 N μ i K 2 = N C + K 1 μ K 2
with c 1 n max 1 PD max and
σ 2 σ alt 2 ( μ ) : = 1 N c + K 1 μ K 2 N 2 .
Moreover, for portfolios with a short history and low PDs, it is not always possible to choose μ old = PD max since 1 N c + K 1 μ K 2 N 2 could become smaller than 0. In order to be able to rule out this case, which also appears to be of little interest for applications of the test, we set an upper bound for μ old , i.e.,
μ old N · c + K 1 μ ( q 1 ) · K 1 .
Considering the random variable Z * N LRCT , σ alt 2 ( LRCT ) , we define the limits of the acceptance range as
k : = LRCT + Φ 1 1 2 α σ alt LRCT
and
K : = LRCT + Φ 1 1 1 2 α σ alt LRCT .
The calibration test passes if
LRDR k , K .

References

  1. Aussenegg, Wolfgang, Florian Resch, and Gerhard Winkler. 2011. Pitfalls and remedies in testing the calibration quality of rating systems. Journal of Banking and Finance 35: 698–708. [Google Scholar] [CrossRef]
  2. Blochwitz, Stefan, Stefan Hohl, Dirk Tasche, and Carsten S. Wehn. 2004. Validating Default Probabilities on Short Time Series. Chicago: Capital & Market Risk Insights, Federal Reserve Bank of Chicago. [Google Scholar]
  3. Blochwitz, Stefan, Marcus R. W. Martin, and Carsten S. Wehn. 2006. Statistical Approaches to PD Validation. In The Basel II Risk Parameters: Estimation, Validation, and Stress Testing. Berlin and Heidelberg: Springer, pp. 289–306. [Google Scholar]
  4. Blöchlinger, Andreas. 2012. Validation of default probabilities. Journal of Financial and Quantitative Analysis 47: 1089–123. [Google Scholar] [CrossRef]
  5. Blöchlinger, Andreas. 2017. Are the Probabilities Right? New Multiperiod Calibration Tests. The Journal of Fixed Income 26: 25–32. [Google Scholar] [CrossRef]
  6. Caprioli, Sergio, Emanuele Cagliero, and Riccardo Crupi. 2023. Quantifying Credit Portfolio sensitivity to asset correlations with interpretable generative neural networks. arXiv arXiv:2309.08652. [Google Scholar] [CrossRef]
  7. Caprioli, Sergio, Riccardo Cogo, and Raphael Cavallari. 2023. Back-Testing Credit Risk Parameters on Low Default Portfolios: A Bayesian Approach with an Application to Sovereign Risk. Preprint SSRN. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4408217 (accessed on 1 February 2023).
  8. Council of European Union. 2013. Regulation (EU) No 575/2013 of the European Parliament and of the Council of 26 June 2013 on prudential requirements for credit institutions and investment firms and amending regulation (EU) No 648/2012. Official Journal of the European Union L 176: 1. Available online: https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=CELEX:32013R0575 (accessed on 1 February 2023).
  9. Coppens, Francois, Manuel Mayer, Laurent Millischer, Florian Resch, Stephan Sauer, and Klaas Schulze. 2016. Advances in Multivariate Back-Testing for Credit Risk Underestimation. Frankfurt am Main: European Central Bank. [Google Scholar]
  10. Cucinelli, Doriana, Maria Luisa Di Battista, Malvina Marchese, and Laura Nieri. 2018. Credit risk in European banks: The bright side of the internal ratings based approach underestimation. Journal of Banking and Finance 93: 213–29. [Google Scholar] [CrossRef]
  11. Deutsche Bundesbank. 2003. Approaches to the Validation of Internal Rating Systems. Monatsbericht September. Frankfurt am Main: Deutsche Bundesbank, pp. 59–71. [Google Scholar]
  12. European Banking Authority (EBA). 2016. Final Draft Regulatory Technical Standards (RTS) on the Specification of the Assessment Methodology for IRB. Available online: https://www.eba.europa.eu/activities/single-rulebook/regulatory-activities/credit-risk/regulatory-technical-standards-2 (accessed on 1 February 2023).
  13. European Banking Authority (EBA). 2017. Guidelines on PD Estimation, LGD Estimation and Treatment of Defaulted Assets. Available online: https://www.eba.europa.eu/regulation-and-policy/model-validation/guidelines-on-pd-lgd-estimation-and-treatment-of-defaulted-assets (accessed on 1 February 2023).
  14. European Central Bank (ECB). 2024. ECB Guide to Internal Models. Available online: https://www.bankingsupervision.europa.eu/ecb/pub/pdf/ssm.supervisory_guides202402_internalmodels.en.pdf (accessed on 25 June 2024).
  15. European Commission. 2021. Commission Delegated Regulation (EU) 2022/439. Official Journal of the European Union L 90: 1–66. [Google Scholar]
  16. Hogg, Robert V., and Elliot A. Tanis. 1977. Probability and Statistical Inference. New York: Macmillan Publishing Co., Inc. London: Collier Macmillan Publishers. [Google Scholar]
  17. Jing, Zhang, Fanlin Zhu, and Joseph Lee. 2008. Asset correlation, realized default correlation and portfolio credit risk modeling methodology. Moody’s KMV, March. [Google Scholar]
  18. Klenke, Achim. 2020. Probability Theory—A Comprehensive Course, 3rd ed. Universitext. Cham: Springer. [Google Scholar]
  19. Li, Weiping. 2016. Probability of Default and Default Correlations. Journal of Risk and Financial Management 9: 7. [Google Scholar] [CrossRef]
  20. Pluto, Katja, and Dirk Tasche. 2011. Estimating probabilities of default for low default portfolios. In The Basel II Risk Parameters: Estimation, Validation, Stress Testing-with Applications to Loan Risk Management. Berlin and Heidelberg: Springer, pp. 75–101. [Google Scholar]
  21. Tasche, Dirk. 2003. A Traffic Lights Approach to PD Validation. Preprint arXiv. Available online: https://arxiv.org/abs/cond-mat/0305038 (accessed on 1 February 2023).
  22. Tasche, Dirk. 2008. Validation of internal rating systems and PD estimates. In The Analytics of Risk Model Validation. Amsterdam: Elsevier, pp. 169–96. [Google Scholar]
  23. Tasche, Dirk. 2013. Bayesian estimation of probabilities of default for low default portfolios. Journal of Risk Management in Financial Institutions 6: 302–26. [Google Scholar] [CrossRef]
  24. Zhou, Chunsheng. 2001. An Analysis of Default Correlations and Multiple Defaults. The Review of Financial Studies 14: 555–76. [Google Scholar] [CrossRef]
Figure 1. Default states per time period.
Figure 1. Default states per time period.
Risks 12 00131 g001
Figure 2. Density functions of long-run default rates.
Figure 2. Density functions of long-run default rates.
Risks 12 00131 g002
Figure 3. Effect of proportion of persisting customers on acceptance range per confidence level α .
Figure 3. Effect of proportion of persisting customers on acceptance range per confidence level α .
Risks 12 00131 g003
Figure 4. Comparison between density functions of long-run default rates.
Figure 4. Comparison between density functions of long-run default rates.
Risks 12 00131 g004
Figure 5. Distribution functions: simulation and normal approximation with n t 5 for t = 1 , , 8 and n t = 100 for t = 9 , , 56 .
Figure 5. Distribution functions: simulation and normal approximation with n t 5 for t = 1 , , 8 and n t = 100 for t = 9 , , 56 .
Risks 12 00131 g005
Figure 6. Distribution functions: simulation and normal approximation with n t 10 for t = 1 , , 8 and n t = 100 for t = 9 , , 56 .
Figure 6. Distribution functions: simulation and normal approximation with n t 10 for t = 1 , , 8 and n t = 100 for t = 9 , , 56 .
Risks 12 00131 g006
Figure 7. Distribution functions: simulation and normal approximation with n t = 1 for t = 1 , , 8 and n t = 10 for t = 9 , , 56 .
Figure 7. Distribution functions: simulation and normal approximation with n t = 1 for t = 1 , , 8 and n t = 10 for t = 9 , , 56 .
Risks 12 00131 g007
Figure 8. Distribution functions: simulation and normal approximation with n t = 2 for t = 1 , , 8 and n t = 20 for t = 9 , , 56 .
Figure 8. Distribution functions: simulation and normal approximation with n t = 2 for t = 1 , , 8 and n t = 20 for t = 9 , , 56 .
Risks 12 00131 g008
Figure 9. Distribution functions: simulation and normal approximation with n t = 2 for t = 1 , , 28 and n t = 20 for t = 29 , , 56 .
Figure 9. Distribution functions: simulation and normal approximation with n t = 2 for t = 1 , , 28 and n t = 20 for t = 29 , , 56 .
Risks 12 00131 g009
Figure 10. Rating distribution of rating model and rating distribution after minimization without additional assumptions on the distribution.
Figure 10. Rating distribution of rating model and rating distribution after minimization without additional assumptions on the distribution.
Risks 12 00131 g010
Figure 11. Rating distribution of rating model and rating distribution after minimization with assumption of uniform distribution for bad rating grades.
Figure 11. Rating distribution of rating model and rating distribution after minimization with assumption of uniform distribution for bad rating grades.
Risks 12 00131 g011
Figure 12. Rating distribution of rating model and rating distribution after minimization with assumption of trusting the distribution of the model to a certain degree.
Figure 12. Rating distribution of rating model and rating distribution after minimization with assumption of trusting the distribution of the model to a certain degree.
Risks 12 00131 g012
Figure 13. Dependence of sigma on PD max .
Figure 13. Dependence of sigma on PD max .
Risks 12 00131 g013
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kurth, P.; Nendel, M.; Streicher, J. A Hypothesis Test for the Long-Term Calibration in Rating Systems with Overlapping Time Windows. Risks 2024, 12, 131. https://doi.org/10.3390/risks12080131

AMA Style

Kurth P, Nendel M, Streicher J. A Hypothesis Test for the Long-Term Calibration in Rating Systems with Overlapping Time Windows. Risks. 2024; 12(8):131. https://doi.org/10.3390/risks12080131

Chicago/Turabian Style

Kurth, Patrick, Max Nendel, and Jan Streicher. 2024. "A Hypothesis Test for the Long-Term Calibration in Rating Systems with Overlapping Time Windows" Risks 12, no. 8: 131. https://doi.org/10.3390/risks12080131

APA Style

Kurth, P., Nendel, M., & Streicher, J. (2024). A Hypothesis Test for the Long-Term Calibration in Rating Systems with Overlapping Time Windows. Risks, 12(8), 131. https://doi.org/10.3390/risks12080131

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop