2. Background and Literature Review
Over the past 50 years, much impressive research has been performed to better predict default risk. Among those studies, the first, and one of the most seminal, works is the Z-score model (
Altman 1968). Altman used multivariate discriminant analysis (MDA) to investigate bankruptcy prediction in companies. MDA is a linear and statistical technique that can classify observations into different groups based on predictive variables. The discriminant function of MDA is
where
are observation values of
n features used in the model and
are the discriminant coefficient computed by the MDA method. For the Z-score model, the typical features used are financial ratios from company accounts. This model can then transform an individual company’s features into a one-dimensional discriminant score, or
Z, which can be used to classify companies by risk of bankruptcy. For consumer credit, MDA and other linear models such as logistic regression have traditionally been used for credit scoring (
Khemais et al. 2016;
Sohn et al. 2016;
Thomas et al. 2017).
Although this kind of traditional linear method can determine whether an applicant is in a good or bad financial situation, it cannot deal with dynamic aspects of credit risk, in particular, time to default. Therefore, survival analysis has been proposed as an alternative credit risk modeling approach default time prediction.
Banasik et al. (
1999) pointed out that the use of survival analysis facilitates the prediction of when a borrower is likely to default, not merely a binary forecast of whether or not a default would occur in a fixed time period. This is because survival analysis models permit the inclusion of dynamic behavioral and environmental risk factors in a way that regression models cannot perform.
One of the early popular multivariate survival models is the Cox proportional hazard (PH) model (
Cox 1972). Unlike regression models, Cox’s PH model can include variables that affect survival time. The semiparametric nature of the model means that a general non-linear effect is included. The Cox PH model is composed of a linear component, the parametric form, and a baseline hazard part in non-parametric form as follows:
This function assumes the risk for a particular individual at time
t is the product of a non-specified baseline hazard function of time
and an exponential term of linear series of variables. The coefficients,
, are estimated using maximum partial likelihood estimation without needing to specify the hazard function
. Therefore, the Cox PH model is called a semiparametric model. One advantage of the Cox PH model is that it can estimate the hazard rate (the probability that the event occurs per unit of time conditioning on no prior event happened before) for each variable separately, without needing to estimate the baseline hazard function
. However, for predictive models, as would typically be required in credit risk modeling, the baseline hazard will need to be estimated post hoc.
Thomas (
2000) and
Stepanova and Thomas (
2001) developed survival models for behavioral scoring for credit and showed how these could be used for profit estimation over a portfolio of loans.
Bellotti and Crook (
2009) developed a Cox PH model for credit risk for a large portfolio of credit cards and showed it provided benefits beyond a standard logistic regression model, including improved model fit and forecasting performance. They showed that credit status is influenced by the economic environment represented through time-varying covariates in the survival model.
Dirick et al. (
2017) provided a benchmark of various survival analysis technologies including the Cox PH model, with and without splines, accelerated failure time models, and mixture cure models. They considered multiple evaluation techniques such as the receiver operating characteristic (ROC) curve, the area under the ROC curve (AUC), and deviation from time to default, across multiple datasets. They found that no particular variant strictly outperforms all others, although Cox PH with splines was best overall. In their notes, they mentioned the challenges of choosing the correct performance measure for this problem, when using survival models for prediction. This remains an open problem.
Traditional survival models are continuous time survival models. However, discrete-time survival models (DTSM) have received great attention in recent years. For credit risk, where data are collected at discrete time points, typically monthly or quarterly repayment periods, a discrete time approach matches the application problem better than continuous time modeling; furthermore, for prediction, using discrete time is computationally more efficient (
Bellotti and Crook 2013).
Gourieroux et al. (
2006) pointed out that the continuous time affine model often has a poor model fit due to a lack of flexibility. They developed a discrete-time survival affine analysis for credit risk which allows dynamic factors to be less constrained.
De Leonardis and Rocci (
2008) adapted the Cox PH model to predict the firm default at discrete time points. Although time is viewed as a continuous variable, companies’ datasets are constructed on a monthly or yearly discrete-time basis. Companies’ survival or default is measured within a specific time interval, which means the Cox PH model needs to be adapted so that the time will be grouped into discrete time intervals. The adapted model not only produces a sequence of each firm’s hazard rates at discrete time points but also provides an improvement in default prediction accuracy.
Bellotti and Crook (
2013) used a DTSM framework to model default on a dataset of UK credit cards. They used credit card behavioral data and macroeconomic variables to improve model fit and better predict the time to default. In their paper, time is treated monthly and the model is trained using three large datasets from UK credit card data, including over 750,000 accounts from 1999 to mid-2006. The model is more flexible than the traditional one.
Bellotti and Crook (
2014) followed up by building a DTSM for credit risk and showed how it can be used for stress testing. Both papers treat the credit data as a panel dataset indexed by both account number and loan age (in months), with one observation being a repayment statement for the account at a particular loan age. Unlike previous studies, they used models to measure the risk, forecasting, and stress testing and pointed out that including statistically explanatory behavioral variables can improve model fit and predictive performance.
Even though the Cox model is a popular approach for survival analysis, it still suffers from a number of drawbacks. Firstly, the baseline hazard function is assumed to be the same across all observations but this may not be realistic in many applications, such as credit risk, where we may expect that different population segments may have different default behavior. Furthermore, since the parametric component of the Cox PH model is linear, non-linear effects of variables must be included by transformations or by including explicit interaction terms. But it can be difficult to identify these by manual processes. These difficulties, however, can be handled automatically using some other non-linear methods, such as an underlying machine learning algorithm like the random survival forest (RSF), support vector machine (SVM), and different kinds of artificial neural networks (ANN), which can also potentially improve the model fit.
The random survival forest (
Ptak-Chmielewska and Matuszyk 2020) evolved from random forests and inherited many of its characteristics. Only three hyperparameters need to be specified in a random survival forest (RSF): the number of predictors, which are randomly selected, the number of trees, and the splitting rules. Also, unlike the Cox PH model, RSF is essentially assumption-free, although the downside of this is that it does not (directly) provide statistical inference. However, this is a very useful property in survival modeling in the context of credit risk, where the value of a model is in prediction, rather than inference.
Ptak-Chmielewska and Matuszyk (
2020) showed that RSF has a lower concordance error when compared with the Cox PH model. Therefore, RSF is a promising approach to predict account default.
To further improve the model fit and the model prediction of credit risk, ANN has received increased attention in credit risk, known as a more powerful and complex non-linear method with improved performance in other areas such as computer vision (
Lu and Zhang 2016).
Faraggi and Simon (
1995) upgraded the Cox proportional hazards model with a neural network estimator. The linear term
in Equation (2) is replaced by the output of a neural network
. This neural network model incorporates all the benefits of the standard Cox PH model and retains the proportional hazard assumption but allows for non-linearity amongst the risk factors.
Ohno-Machado (
1996) tackled the survival analysis problem utilizing multiple neural networks. The model is composed of a collection of subnetworks and each of them corresponds to a certain discrete time point, such as a month, quarter, or year. Each subnetwork has a single output forecasting survival probability at its corresponding time period. The datasets are also divided into discrete subsets consisting of cases at specific time points in the same way and assigned to each subnetwork for training. The paper also describes that the learning performance of the neural network can be enhanced by combining the subnetworks, such as using outputs of some subnetworks as inputs for another subnetwork. But the issue of how to configure an optimal architecture of neural networks (i.e., how to combine the neural networks) remains an open research problem.
Gensheimer and Narasimhan (
2019) proposed a scalable DTSM using neural networks that can deal with non-proportional hazards, trained using mini-batch gradient descent. This approach can be especially useful when there are non-proportional hazard effects on observations for large datasets with large numbers of features. Time is divided into multiple intervals dependent on the length of the timelines. Each observation is transformed into a vector format to be used in the model, where one vector represents the survival indicator and another represents the event or default indicator, if it happened. The results show that this discrete-time survival neural network model can speed up the training time and can produce good discrimination and calibration performance.
Many studies using machine learning with survival models are in the medical domain.
Ryu et al. (
2020) utilized a deep learning approach to survival analysis with competing risk called DeepHit, which uses a deep learning neural network in a medical setting to learn individual’s behavior and allow for the dynamic risk effect of time-varying covariates. The architecture of DeepHit consists of a shared network and a group of sub-networks and both are composed of a series of fully connected layers. The output layer uses the SoftMax activation function, which produces the probability of events for different discrete time points.
There are few papers studying the application of neural networks for DTSM specifically in the context of credit risk models. Practitioners and researchers in credit risk have begun to explore machine learning (ML) and deep learning (DL) in application to survival analysis, see, e.g.,
Breeden (
2021) for an overview.
Blumenstock et al. (
2022) explored ML/DL models for survival analysis, comparing them with traditional statistical models such as the Cox PH model motivated by the results in previous work on ML/DL credit risk models. They found that the performance of the DL method DeepHit (
Lee et al. 2018) outperforms the statistical survival model and RSF. Another contribution of their paper is introducing an approach to extract feature importance scores from DeepHit, which is a first step to building trust in black box models among practitioners in industry. However, this approach cannot reveal a clear picture of the mechanism and prediction behavior of DL models for analysts. As we describe later, one of our contributions is using local APC models as a means to interpret the black box of the DL model across different segments of the population.
Dendramis et al. (
2020) also proposed a deep learning multilayer artificial neural network method for survival analysis. The complex interactions of the neural network structures can capture the non-linear risk factors causing loan default among the behavioral variables and other institutional factors. These factors play more important roles in predicting default than the frequently used traditional fundamental variables in statistical models. They also showed that their neural network outperforms alternative logistic regression models. Their experimental results are based on a portfolio of 11,522 Greek business loans during the severe recession period, March 2014 to June 2017, with a relatively high default rate.
In this article, we report the results of a study using neural networks for DTSM on a large portfolio of US mortgage data over a long period, from 2004 to 2013, which covers the global financial crisis and its aftermath. This long period of data allows us to explore and decompose the maturation, vintage, behavioral, and environmental risk factors more clearly. We show that these models are more flexible than the standard linear models and provide an improved predictive performance overall.
Studies in credit risk modeling with machine learning typically focus on predictive performance, which is indeed the primary goal in this application context. However, there is an increasing concern in the financial industry that credit risk models are not interpretable or explainable. Banks and companies do not trust the black box of complex neural network architectures or other machine learning algorithms that lack transparency and interpretability (
Quell et al. 2021;
Breeden 2021). To address this concern, in this paper, we use the output of the DTSM neural network as an input to local linear models that can provide an interpretation of risk behavior for individuals or population segments in terms of the different risk timelines related to loan age (maturity), origination cohort (vintage), and environmental and economic effect over calendar time. These types of models are known as age–period–cohort (APC) models and are well-known outside credit risk (see, e.g.,
Fosse and Winship 2019), but their use as a method to decompose the timeline of credit risk was pioneered by Breeden (e.g., see
Breeden 2016). There remains an identification problem with APC models and in this study, we address this problem using a combination of regularization and fitting the environmental timeline to known macroeconomic data. To the best of our knowledge, this is the first use of APC models in the context of neural networks for credit risk modeling.
3. Research Methodology
In this section, we describe the algorithms and approaches used in our method, including the DTSM, neural network, lexis graph, APC effect, and time-series linear regression model. All of these methods helped predict the time-to-default event and solve the APC identification problem.
3.1. Discrete-Time Survival Model
Even though time-to-default events can be viewed as occurring in continuous time, credit portfolios are typically represented as panel data, which record account usage and repayment in discrete time (typically monthly or quarterly records). Therefore, it is more natural to treat time as a discrete point for a credit risk model (
Bellotti and Crook 2013). If we use discrete time, the data are presented as a panel data indexed on both account
i and discrete time
t. We provided the following concepts, notations, and functions for a credit risk DTSM.
The variable was used as the primary time-to-event variable and indicated the loan age of a credit account. Loan age is the span of the time since a loan account was created. It is also called loan maturity. For this study, data were provided monthly so is the number of months but the period could be different, e.g., quarterly;
The variable represents the number of loan accounts in the dataset, so ;
We let be the loan age time index of the last observation recorded for account , so that for each account , records only exist for loan age ;
The binary variable
represents whether account
defaults or not (1 denotes default and 0 denotes non-default) at a certain loan age
t. The precise definition of default can vary by application but for this study, three months’ consecutive missed payments were used, which is an industry-standard following the Basel II convention of 90 days of missed payments (
BCBS 2006);
Notably, in the survival analysis context, the default must be the last event in a series; hence, for , and for , , which indicates a censored account (i.e., event time, such as death time in medical or default time in credit risk, is unknown during the whole observation period), and indicates the default event;
The variable is a vector of static application variables collected at the time when the customer applies for a loan (e.g., credit score, interest rate, debt-to-income ratio, and loan-to-value);
We let be the origination period, or vintage, of account i. Normally, the period is the quarter or year when the account was originated. This is actually just one of the features in the vector . We let be the total number of vintages (the time when an individual customer open the account) in the dataset;
Meanwhile, we denoted time-varying variables (e.g., behavioral, repayment history, and macroeconomic data) by vector , which is collected across the lifetime of the account;
We let be the calendar time of account i at loan age t, with being the total number of calendar time periods. The measurement of calendar time is typically monthly, quarterly, or annually. Notably, is actually just one of the features in the vector .
Notably, loan age, vintage, and calendar time are related additively: . For example, suppose an account originated () in June 2009 and we consider repayment at loan age (t) of 10 months, then this repayment observation must then have a calendar time () of April 2010.
The DTSM model’s probability of default (PD) for each account
at time
is given as
PD at time
t is dependent on the account not defaulting prior to
t, i.e., the account has survived up to time
. A further constraint on the model is that we did not consider further defaults after the default was first observed. It is these conditions that make such a model a survival model. The linear DTSM was built using the following model structure:
where
F is an appropriate link function, such as logit,
is some transformation of
t,
is the intercept term, and
and
are vectors of coefficients. Even though this model was across observations indexed by both account
i and time
t and we could not assume independence between each time
and
within the same account
i, by applying the chain rule for conditional probabilities, the likelihood function could be expressed as
where
D refers to the panel data, which records accounts behavior at consecutive time points. With
F as the logit link function, it is the same form as logistic regression and hence the intercept, coefficients, and parameters in
can be estimated using a maximum likelihood estimator for logistic regression. Details can be found in
Allison (
1982).
3.2. Vintage Model
In the financial industry, analysis is often performed and models are built at the vintage level (
Siarka 2011). That is, separate models are built on accounts that originate within the same time period, e.g., in the same quarter or same year. The parallel is with wine production where wines produced in the same year generally share the same quality and is referred to as
vintage. A similar phenomenon is recognized in credit due to lenders’ different risk appetites and different borrower demographics, at different times (
Breeden 2016). Using the notation above, it means they all have the same value of
. Vintage modeling leads to a suite of models, one for each origination period. This is a useful practice since it may be expected that different vintages will behave in different ways and hence require separate models. The DTSM can be built as a vintage model for a fixed origination date, in which case loan age
t also corresponds to calendar time, in the context of each separate vintage model.
3.3. Neural Network with DTSM (NN-DTSM) for Credit Risk
The model structure in Equation (4) is constrained as a linear model. We hypothesize that a better model can be built with a non-linear model structure since introducing non-linearity enables interaction terms between features, automated segmentation between population subgroups, and representation of non-linear relationships between features and outcome variables. Equation (4) can be extended by changing the linear term into a nonlinear equation. For this, we replaced Equation (4) with a neural network structure. The log of the likelihood function in Equation (5) could be taken as the objective function for the neural network and this corresponds to the usual cross-entropy loss.
The neural networks are built as vintage models, i.e., a suite of neural networks, one for each origination period, following
Ohno-Machado (
1996). This is to match standard industry practice for vintage models and also to make estimations of neural networks less computationally expensive. Each subnetwork of this architecture is a multilayer perceptron (MLP) neural network (
Correa et al. 2011), which consists of a dropout layer (to moderate overfitting), an input layer, several hidden layers, and an output layer. The dropout was proposed by
Dahl et al. (
2013) who pointed out that the overfitting can be prevented by randomly deleting part of the neurons in hidden layers and repeating this process in different small data samples to reduce the interaction between feature detectors (neurons). Each neuron in the hidden layer receives input from the former layer, computes its corresponding value with a specific activation function, and transfers the output to the next layer. The output of the neuron calculated with the activation function represents the status. In each neural subnetwork, we applied the RELU activation function to each hidden layer,
, and applied the sigmoid activation function to the output layer which corresponds to a logit link function:
The overall architecture of the multiple neural networks with discrete-time survival analysis is shown in
Figure 1.
Each neural network has a single unit in the output layer predicting an estimation of PD (with value 0 to 1) at a certain discrete time point. In our study, we combined 40 neural networks which were constructed at the vintage level based on the datasets covering the period from 2004 to 2013 (i.e., from
v = 1 to
v = 40), which was divided into quarterly subsets (the input of each separate model) and assigned to the subnetworks for training and testing. Each subnetwork predicted the default event at its corresponding quarter and the overall output function for the DTSM with neural network is
where
is the subnetwork in each vintage
j and
is the logit function.
In machine learning frameworks, it is important to tune hyperparameters for optimal performance. Unsuitable selection of hyperparameters can lead to problems of underfitting or overfitting. However, the process of selecting appropriate hyperparameters is time-consuming. Manually combining different hyperparameters in neural networks is tedious. Meanwhile, it is impossible to explore many multiple combinations in a limited time. As a result, grid search is a compromise between exhaustive search and manual selection. Grid search tries all possible combinations of hyperparameters from a given candidate pool and chooses the best set of parameters according to prediction performance. Typically, grid search calculates loss (e.g., mean square error or cross-entropy) for each set of possible values from the pool by using cross-validation (
Huang et al. 2012). The use of cross-validation is intended to reduce overfitting and selection bias (
Pang et al. 2011). In our study, we combined grid search with cross-validation with specified parameter ranges for the number of hidden layers, the number of neurons in each layer, and the control parameter for regularization.
3.4. Age–Period–Cohort Effects and Lexis Graph
In the context of better analyzing time-ordered datasets, age–period–cohort (APC) analysis is proposed to estimate and interpret the contribution of three kinds of time-related changes to social phenomena (
Yang and Land 2013). APC analysis is composed of three time components, age effects, period effect, and cohort effect, where each one plays a different role, see (
Glenn 2005;
Yang et al. 2008;
Kupper et al. 1985).
The age effect reflects effects relating to the aging and developmental changes to individuals across their lifecycle;
The period effect represents an equal environmental effect on all individuals over a specific calendar time period simultaneously, since systematic changes in social events, such as a financial crisis or COVID-19, may cause similar effects on individuals across all ages at the same time;
The cohort effect is the influence on groups of observations that originate at the same time, depending on the context of the problem. For example, it could be people born at the same time or cars manufactured in the same batch.
In the context of credit risk, loan performance can be decomposed by an APC model into loan age performance, period effect through calendar time of loan repayment schedule, and cohort effect through origination date (vintage) of the loan (
Breeden 2016;
Breeden and Crook 2022). In credit risk, the calendar time effect reflects macroeconomic, environmental, and societal effects that impact borrowers at the same time, along with changes in legislation. Operational changes in the lender’s organization, such as changes in risk appetite, can affect vintage (cohort) or environmental (period) timelines.
The Lexis graph is a useful tool to represent and visualize APC in data. We describe it in the context of credit risk here to show default intensities in different timelines. In the Lexis graph, the x-axis represents the calendar time and the y-axis represents the loan age effect. Each square in the graph represents a specific PD modeled by the DTSM based on the account panel data corresponding to that time point. The shade (or color) of each square thus indicates the degree of the probability of default of each account at the corresponding time point, the darker, and the higher. For examples of Lexis graphs, see the figures following experimental results in
Section 5.2.
Although a Lexis graph can be produced for the whole loan population, it is useful to produce Lexis graphs and, consequently, APC analysis by different subpopulations or population segments. If linear survival models are used to construct the Lexis graph, this is not possible, since the time variables are not linked to other variables that could differ between segments. However, the use of NN-DTSM will enable segment-specific Lexis graphs as a natural consequence of the non-linearity in the model structure and it is one of the key contributions of this study.
3.5. Age Period Cohort Model
We considered APC models built on aggregations of accounts using the prediction output of the DTSMs as training data. These data are essentially the points given in the Lexis graph and the APC model could be seen as a way to decompose the three time components in the Lexis graph.
Firstly, it is notable that calendar time
c can be given as the sum of origination date
v and loan age
t:
. Therefore, the outcome of the APC model could be expressed and indexed on any two of these and we used loan age (
t) and vintage (
v). For this study, the outcome variable was the average default rate predicted by the DTSM at this particular time point computed as
where
is the index set of observations to include in the analysis. This may be the whole test set or some segment that we wish to examine. To represent the APC model, we used the following notation for three sets of indicator variables corresponding to each timeline at the time point given by
For all
t such that
, where
For all
v such that
For all
c such that
These represent the one-hot encoding of the time variables. Then, the general APC model in discrete time is
where
,
, and
represent the coefficients on the timeline indicator variables and
is an error term given from a known distribution, typically normal. This is a general model in discrete time since it allows the estimation of a separate coefficient for each value in each timeline. Once the data points
were generated from the model using (8), the APC model (12) could be estimated using linear regression.
However, the APC identification problem needed to be solved. This was due to the linear relationship between the three timelines, i.e.,
. The identification problem in APC analysis cannot be automatically, perfectly, or mechanically solved without making further restrictions and assumptions on the model to find a plausible combination of APC, ensuring those assumptions are validated across the whole lifecycle of analysis (
Bell 2020). The identification problem was derived as follows to show there was not a unique set of solutions to Equation (12); rather, there were different sets of solutions controlled by an arbitrary slope term
.
Firstly,
was combined with Equation (12) for some arbitrary scalar
,
Then, it was noticed from the definition of the indicator variables that
this gave
New coefficients are thus constructed as follows:
to form
which is another solution to the exact same regression problem. Therefore, we showed that Equation (12) has no unique solution and, indeed, there are infinite solutions, one for each choice of
. We called
a slope since it alters each collection of indicator variable coefficients by a linear term scaled by
. The identification problem is to identify the correct value of slope
.
To resolve this problem, there were several approaches which involved placing constraints on the model, such as removing some variables (essentially setting them to zero) or arbitrarily setting one set of the coefficients to a fixed value (i.e., no effect), see, e.g.,
Fosse and Winship (
2019). However, these solutions can be arbitrary. Therefore, in this study, two approaches were considered: (1) regularization and (2) constraining the calendar time effect in relation to observed macroeconomic effects. The first approach includes regularization penalties on the coefficients in the loss function that can be implemented in Ridge regression expressed by the loss function,
which minimizes mean squared error plus the regularization term, where
is the strength of the regularization penalty. This loss function provides a unique solution in the coefficients for (12). The second approach is discussed in detail in the next section.
3.6. Linear Regression and Fitting Macroeconomic Variables
In the initial APC model,
was unknown. A common and recommended solution to the identification problem is to use additional domain knowledge (
Fosse and Winship 2019). In this case, we supposed that the calendar-time effect would be caused by macroeconomic conditions, at least partly. Therefore, by treating the APC model calendar-time coefficients
as outcomes, these could be regressed from observed macroeconomic data. As part of this process, the slope
could be estimated to optimize the fit. Therefore, we used the term
, which relates to the time trend of the calendar to adjust the shape of the calendar time function. Once the calendar time function was fitted, the vintage function and loan age function were also determined. The time-series regression was defined as
where
represents the raw coefficients of the calendar time function from the APC model regression,
is the number of macroeconomic variables (MEVs),
indicates the
MEV, with a particular time lag
and
is an error term, normally distributed as usual. This can be rewritten as
and it can be seen as a linear regression with
a coefficient on variable
c, which can then be estimated with the intercept
and other coefficients
. Typical MEVs for credit risk are gross domestic product (GDP), house price index (HPI), and national unemployment rate (see, e.g.,
Bellotti and Crook 2009). This solution, to fit estimated coefficients against economic conditions, is similar to that used by
Breeden (
2016) who solves the identification problem by retrending the calendar-time effect to zero. However, in that research, the retrending is against a long series of economic data, whereas we used time series regression against a shorter span of data. This is because for shorter periods of time (less than 5 years), the calendar-time effect may have a genuine trend that can be observed in macroeconomic data over that time but retrending over long periods would remove that.
3.7. Lagged Macroeconomic Model
Some MEVs might have a lagged effect in the model. For example, the fluctuation in the house price will not have an immediate effect on people’s behavior but people will change their consumer behavior after a few months. To find the best fit between dynamic economic variables and customer behaviors, a lagged univariate model was used for each MEV:
Formula (21) is in the form of time series regression, where indicates the customer behavioral variables, denotes the macroeconomic time series variable, is the lagged offset, and is a normally distributed error term. Univariate regression of on the MEV was repeated for different plausible values of and the best value of was chosen based on maximizing . Once was selected for each MEV , Equation (20) could then be estimated.
3.8. Overall Framework of the Proposed Method
The overall flowchart for our method is shown in
Figure 2. The neural network in this study consisted of multiple vintage-level sub-networks. After the model was trained, the next step was to extract and analyze different kinds of customer behaviors from the model by inputting different data segments, which might show varied characteristics (e.g., low-interest rate customer groups). These data segments helped to construct Lexis graphs from the model which can visualize the characteristics of different data groups. To better capture and analyze the risk from the model, we used APC ridge regression to decompose the default rate modeled by the neural network in the Lexis graph into the three APC timelines: age, vintage, and calendar time. These can finally be expressed as three APC graphs that can help experts to better understand and explain the behavior of different loan types.
The coefficients of calendar dates extracted from the neural network represented the size and direction of the relationship between the specific calendar date and loan performance. Matching these coefficients with the macroeconomic data revealed the relationship between the macroeconomic effect in the model and its impact on loans. This helped to solve the APC identification problem. It is possible that the regularized APC model already provided the correct slope and that a statistical test on the time-series macroeconomic model was used to test this.
6. Discussion
This new framework will benefit practitioners and supervisors in the financial services industry.
Firstly, many may not have already adopted panel models and DTSM in their credit risk modeling, but this paper may inspire them to do so and provide background on how to do it.
Secondly, machine learning is currently of great interest in industry and wider society and many financial institutions are considering how to leverage this technology in their business, especially neural networks. For those who already use linear DTSM based on existing approaches (e.g.,
Bellotti and Crook 2013;
Breeden and Crook 2022) but who may wonder how to incorporate non-linear machine learning algorithms, this study provides a blueprint that can be used directly or as the basis for their own custom approaches. Our experiments on real-world mortgage data also demonstrate the potential benefit when adopting NN-DTSM for credit risk modeling, in terms of improved model fit and accuracy.
A key challenge for deploying machine learning in financial services is opacity and lack of explainability (
Breeden 2021). The approach in this article addresses this concern by showing how the output of NN-DTSM can be interpreted using APC analysis, which will already be familiar with many practitioners, and can be conducted at the global level (i.e., across the whole population) or local level (i.e., for different segments), thus enabling trust in the model. Some practitioners have recently wondered how generative AI can be used in financial services to gain value (
Kielstra 2023). Interestingly, the explanatory approach taken in this study is an example of generative AI since the Lexis graph is generated from the trained NN-DTSM.
Thirdly, this approach will also be valuable for supervisors as a methodological framework to test bank’s dynamic model development or be recommended for use. It can also be implemented as a benchmark model to check the performance of bank’s models. Finally, it could be extended to support stress tests, e.g., following a similar approach to
Bellotti and Crook (
2014).
The models in this study were implemented using Python 3.5 and Keras 2.4.3. Running on a conventional laptop (CPU:Intel(R) Core(TM) i7-10750H CPU @ 2.60 GHz, memory: 16 GB), it took less than 20 min to train NN-DTSM for all 40 vintages. Therefore, this model development is within the technological capacity of all financial institutions.