1. Introduction
The debts coming from clients with unpaid credits have a important impact in the solvency of banks and other credit institutions. Therefore, one of the most crucial elements that influences the risk in credits is the probability of default (PD). For a fixed time, t, and a horizon time, b, the PD can be defined as the probability that a credit that has been paid until time t, becomes unpaid not later than time .
The probability of default conditional on the credit scoring can be written as a transformation of the conditional survival function. Therefore, in
Section 2.1 Beran’s survival estimator is used to obtain a PD estimator. A time variable smoothing for this estimator is proposed in
Section 2.2. In
Section 3, both estimators are applied to a real data set.
Section 4 contains some concluding remarks.
2. Nonparametric PD Estimators
Let
be a simple random sample of
where
X is the credit scoring,
is the observed maturity,
T is the time to default,
C is the time until the end of the study or the time until the anticipated cancellation of the credit and
is the uncensoring indicator. Let
x be a fixed value of the covariate
X,
b a horizon time and
the conditional survival function of
T. Then, the probability of default in a time horizon
from a maturity time
t is defined as follows
Replacing
with a nonparametric estimator,
, in (
1), the following estimator for
is obtained:
In [
5] the theoretical results that allow to obtain, under general conditions, asymptotic properties for a PD estimator are proved. They are based on these properties for the corresponding estimator of the conditional survival function.
2.1. Beran’s Estimator
Beran’s survival estimator proposed in [
1] is given by
where the weights are
with
,
K is a kernel function and
is a smoothing parameter. Now, replacing
with
in (
2), Beran’s estimator of the probability of default,
, is available. It was firstly used in [
2].
The asymptotic properties of Beran’s estimator for the conditional survival function were proven in both [
3,
4] under certain assumptions. From them, the expressions of the bias and the variance of the estimator
can be found by using Theorem 1 in [
5].
A simulation study was conducted in order to analyse the performance of Beran’s estimator. Its behavior was compared with other estimators of the probability of default obtained from estimators of the survival function, including a benchmark method based on proportional hazards models. For more details about the simulation study, see [
5].
The results show that the probability of default estimations obtained by means of the estimators built according to (
2) are very reasonable, but they have excessive variability and they are very rough curves.
2.2. Smoothed Beran’s Estimator
Beran’s estimator is smoothed with respect to the covariate, but not with respect to the time variable. This fact along with the survival ratio structure of the PD estimator could be the cause of the instability of the estimations. Therefore, a time variable smoothing of the survival estimator is proposed.
The smoothed Beran’s survival estimator is given by
where
with
the
i-th element of the sorted sample of
Z,
the distribution function of a kernel
K and
is the smoothing parameter for the time variable. Finally, the smoothed Beran’s PD estimator,
, is obtained by replacing
with
in Equation (
2).
The asymptotic expressions for the bias and the variance of the smoothed Beran’s estimator of the survival function have been recently found [
6]. The results are too extensive to be shown here. By applying Theorem 1 of [
5], the corresponding asymptotic properties of the smoothed Beran’s estimator of the PD are obtained.
The simulation study carried out shows that the time variable smoothing significantly reduces the error committed in the PD estimation. This technique implies a considerable increase in the computation time and the improvement is not very noticeable in the estimation of the survival function. However, in the case of the PD, the variability and roughness of the estimations is clearly reduced.
3. Application to Real Data
To illustrate the differences between the estimator based on Beran’s and its smoothed version, we obtain the estimation of the conditional survival function and the PD in a real data set. The data consists of a sample of 10,000 consumer credits from a Spanish bank registered between July 2004 and November 2006. The sample contains the credit scoring of each borrower, the observed lifetime and the uncensoring indicator. The sample censoring percentage is
. The probability of default is estimated using Beran’s and smoothed Beran’s estimators with
and
.
Figure 1 shows the result.
4. Conclusions
This work proposes a time variable smoothing for Beran’s estimator of the conditional survival function. General asymptotic expressions for the bias and the variance of this estimator are proven. It is used to build a doubly-smoothed PD estimator whose asymptotic properties are also proved. In view of the simulation study carried out, it can be concluded that the smoothed Beran’s estimator seems to reduce the estimation error committed when estimating the probability of default.
Work is currently underway to develop a method for choosing the smoothing parameters involved in the above-mentioned estimators. In addition, since the censoring probability is heavy in this context, nonparametric cure models are going to be considered in the study.