# Developing an Impairment Loss Given Default Model Using Weighted Logistic Regression Illustrated on a Secured Retail Bank Portfolio

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. LGD Methodology

#### 2.1. LGD1 Methodology

#### 2.2. LGD2 Methodology

#### 2.2.1. Step 1: Sample Created

#### 2.2.2. Step 2: Target and Weight Variables Created

- $i$ is the number of observations from $1$ to $N$;
- $Exposur{e}_{i}$ is the exposure of observation $i$; and therefore,$$EA{D}_{i}={{\displaystyle \sum}}_{\forall {Y}_{i}}Exposur{e}_{i}=Exposur{e}_{i}\mathrm{IND}\left({Y}_{i}=1\right)+Exposur{e}_{i}\mathrm{IND}\left({Y}_{i}=0\right),$$
- $P\left(Cure\right)$ is the proportion of cured observations over the total number of worked-out accounts (over the reference period);
- $P\left(redefault\right)$ is the proportion of observations that re-default over the reference period;
- $LG{D}_{1.Unadj}$ is the exposure at default (EAD) minus the net present value (NPV) of recoveries from first point of default for all observations in the reference period divided by the EAD—see e.g., PWC (2017) and Volarević and Varović (2018);
- $W{O}_{i}$ is the discounted write-off amount for observation $i$; and
- $P\left(Cure\right)$, $P\left(redefault\right)$ and $LG{D}_{1.Unadj}$ are therefore empirical calculated values. This should be regularly updated to ensure the final LGD estimate remains a point in time estimate as required by IFRS (IFRS 2014).

#### 2.2.3. Step 3: Input Variables (i.e., Variable Selection)

- Sort the data by descending order of the proportion of events in each attribute. Suppose a characteristic has $m$ attributes. Then, the sorted attributes are placed in groups $1,2,\dots ,m$. Each group corresponds to an attribute.
- For each of these sorted groups, compute the number of events $\left((\#{\left(Y=1\right)}_{j}\right)$ and the number of nonevents (#(Y=0)_j)in group $j$. Then compute the Gini statistic:$$\left(1-\frac{2{{\displaystyle \sum}}_{j=2}^{m}\left((\#{\left(Y=1\right)}_{j}\times {{\displaystyle \sum}}_{j=1}^{j-1}\#{\left(Y=0\right)}_{j}\right)+{{\displaystyle \sum}}_{j=1}^{m}\left((\#{\left(Y=1\right)}_{j}\times \#{\left(Y=0\right)}_{j}\right)}{\#\left(Y=1\right)\times \#\left(Y=0\right)}\times 100\right),$$

- The average LGD value can be calculated for missing values, which will allow ”Missing” to be used in model fit (otherwise these rows would not have been used in modelling). Note that not all missing values are equal and there are cases where they need to be treated separately based on reason for missing, e.g., “No hit” at the bureau vs. no trades present. It is therefore essential that business analysts investigate the reason for missing values and treat them appropriately. This again forms part of data preparation that is always a key prerequisite to predictive modelling.
- Sparse outliers will not have an effect on the fit of the model. These outliers will become incorporated into the nearest bin and their contributions diminished through the usage of bin WOE or average LGD.
- Binning can capture some of the generalisation (required in predictive modelling). Generalisation refers to the ability to predict the target of new cases and binning improves the balance between being too vague or too specific.
- The binning can capture possible non-linear trends (as long as they can be assigned logical causality).
- Using the standardised average LGD value for each bin ensures that all variables are of the same scale (i.e., average LGD value).
- Using the average LGD value ensures that all types of variables (categorical, numerical, nominal, ordinal) will be transformed into the same measurement type.
- Quantifying the bins (rather than using dummy variables) results in each variable being seen as one group (and not each level as a different variable). This aids in reducing the number of parameter estimates.

#### 2.2.4. Step 4: Weighted Logistic Regression

- ${p}_{i}=E({Y}_{i}=1|{\mathit{X}}_{i},\beta )$ is the probability of loss for observation $i$;
- ${\beta}_{0},\mathit{\beta}$ are regression coefficients with $\mathit{\beta}$ = {${\beta}_{1,}\dots ,{\beta}_{K}\}$;
- ${\mathit{X}}_{i}$ is the vector of the predictor variables ${X}_{i1,},\dots ,{X}_{iK}$ for observation $i$; and
- ${w}_{i}$ is the weight of each observation $i$, calculated by the actual loss amount ($s) and given in Equation (1).

#### 2.2.5. Step 5: Test the Effect of the Dependence Assumption

## 3. Case Study: Secured Retail Portfolio from a South African Bank

#### 3.1. LGD1 Results

#### 3.2. LGD2 Results

#### 3.2.1. Step 1: Sample Created

#### 3.2.2. Step 2: Target and Weight Variables Created

#### 3.2.3. Step 3: Input Variables

- Higher LTV values are associated with higher LGD values (LTV).
- The higher the month on book (MOB) value for a customer, the lower the expected LGD value (MOB).
- The more months a customer has been in default, the higher the LGD value (Default).
- Customers buying old vehicles are associated with higher LGD values (New/Old).
- Certain channels and certain manufacturers are associated with higher LGD values (Channel Manufacturer).

#### 3.2.4. Step 4: Weighted Logistic Regression

#### 3.2.5. Step 5: Test the Effect of the Dependence Assumption

#### 3.3. Additional Investigation: Decision Tree

## 4. Strengths and Weaknesses of the Methodology

- Better handling of missing values, and their usage in the model.
- Better way to deal with outliers by minimising their influence.
- Improved generalisation of data.
- Easier way to capture non-linear trends.
- Easier comparison across variables through the usage of standardised average LGD value for each bin and standardised estimates.
- A reduction in the degrees of freedom introduces stability into the model.

## 5. Conclusions and Recommendation

- This methodology presents a relatively simple approach using logistic regression, which is a well-known and accepted technique in the banking industry.
- The results are easy to interpret and understand, and when converted to the scorecard format, provide a transparent user-friendly output.
- The method also uses transformations that offer better alternatives for dealing with issues such as missing data and outliers.
- Most banks have well-established processes for monitoring and implementing logistic regression models and they are well understood by stakeholders.
- From a practical perspective, there was no discernible difference in model accuracy when comparing the logistic regression model to the GEE model or the decision tree.

## Supplementary Materials

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## References

- Anderson, Raymond. 2007. The Credit Scoring Toolkit: Theory and Practice for Retail Credit Risk Management and Decision Automation. Oxford: Oxford University Press. [Google Scholar]
- Aptivaa. 2016. Cash Shortfall & LGD – Two Sides of the Same Coin. Available online: http://www.aptivaa.com/blog/cash-shortfall-lgd-two-sides-of-the-same-coin/ (accessed on 4 March 2019).
- Baesens, Bar, Daniel Rosch, and Harald Scheule. 2016. Credit Risk Analytics. Cary: SAS Institute, Wiley. [Google Scholar]
- Basel Committee on Banking Supervision. 2015a. Guidance on Accounting for Expected Credit Losses. Bank for International Settlements. Available online: https://www.bis.org/bcbs/publ/d350.htm (accessed on 31 January 2017).
- Basel Committee on Banking Supervision. 2015b. Revisions to the Standardised Approach for Credit Risk. Bank for International Settlements. Available online: https://www.bis.org/bcbs/publ/d347.pdf (accessed on 18 February 2019).
- Basel Committee on Banking Supervision. 2019a. CRE33 IRB Approach: Supervisory Slotting Approach for Specialised Lending (CRE Calculation of RWA for Credit Risk). Bank for International Settlements. Available online: https://www.bis.org/basel_framework/chapter/CRE/33.htm?tldate=20220101&inforce=20190101&export=pdf&pdfid=15661993943265707 (accessed on 11 March 2019).
- Basel Committee on Banking Supervision. 2019b. Calculation of RWA for Credit Risk: CRE36 IRB Approach: Minimum Requirements to Use IRB Approach. Bank for International Settlements. Available online: https://www.bis.org/basel_framework/chapter/CRE/36.htm?inforce=20190101&export=pdf&pdfid=0 (accessed on 11 March 2019).
- Beerbaum, Dirk. 2015. Significant increase in credit risk according to IFRS 9: Implications for financial institutions. International Journal of Economics and Management Sciences 4: 1–3. [Google Scholar] [CrossRef]
- Bijak, Katarzyna, and Lyn C. Thomas. 2018. Underperforming performance measures? A review of measures for loss given default models. Journal of Risk Model Validation 12: 1–28. [Google Scholar] [CrossRef]
- Breed, Douw Gerbrand, and Tanja Verster. 2017. The benefits of segmentation: Evidence from a South African bank and other studies. South African Journal of Science 113: 1–7. [Google Scholar] [CrossRef]
- Breiman, Leo, Jerome Friedman, Richard A. Olsen, and Charles J. Stone. 1984. Classification and Regression Trees. Wadsworth: Pacific Grove. [Google Scholar]
- De Jongh, Pieter Juriaan, Tanja Verster, Elzabe Reynolds, Morne Joubert, and Helgard Raubenheimer. 2017. A critical review of the Basel margin of conservatism requirement in a retail credit context. International Business & Economics Research Journal 16: 257–74. [Google Scholar]
- European Banking Authority (EBA). 2016. Consultation Paper EBA/CP/2016/10: Draft Guidelines on Credit Institutions’ Credit Risk Management Practices and Accounting for Expected Credit Losses. Available online: https://www.eba.europa.eu/documents/10180/1532063/EBA-CP-2016-10+%28CP+on+Guidelines+on+Accounting+for+Expected+Credit%29.pdf (accessed on 3 May 2017).
- European Central Bank. 2018. Proposal on ELBE and LGD in-Default: Tackling Capital Requirements after the Financial Crisis. Available online: https://www.ecb.europa.eu/pub/pdf/scpwps/ecb.wp2165.en.pdf?176589bb4b7b020c3d3faffee9b982cd:No2165/June2018 (accessed on 11 February 2019).
- Global Public Policy Committee (GPPC). 2016. The Implementation of IFRS 9 Impairment Requirements by Banks: Considerations for Those Charged with Governance of Systemically Important Banks. Global Public Policy Committee. Available online: http://www.ey.com/Publication/vwLUAssets/Implementation_of_IFRS_9_impairment_requirements_by_systemically_important_banks/$File/BCM-FIImpair-GPPC-June2016%20int.pdf (accessed on 25 February 2019).
- IFRS. 2014. IRFS9 Financial Instruments: Project Summary. Available online: http://www.ifrs.org/Current-Projects/IASB-Projects/Financial-Instruments-A-Replacement-of-IAS-39-Financial-Instruments-Recognitio/Documents/IFRS-9-Project-Summary-July-2014.pdf (accessed on 31 January 2016).
- Joubert, Morne, Tanja Verster, and Helgard Raubenheimer. 2018a. Default weighted survival analysis to directly model loss given default. South African Statistical Journal 52: 173–202. [Google Scholar]
- Joubert, Morne, Tanja Verster, and Helgard Raubenheimer. 2018b. Making use of survival analysis to indirectly model loss given default. Orion 34: 107–32. [Google Scholar] [CrossRef]
- Kuchibhatla, Maragatha, and Gerda G. Fillenbaum. 2003. Comparison of methods for analyzing longitudinal binary outcomes: Cognitive status as an example. Aging & Mental Health 7: 462–68. [Google Scholar]
- Lund, Bruce, and Steven Raimi. 2012. Collapsing Levels of Predictor Variables for Logistic Regression and Weight of Evidence Coding. MWSUG 2012: Proceedings, Paper SA-03. Available online: http://www.mwsug.org/proceedings/2012/SA/MWSUG-2012-SA03.pdf (accessed on 9 April 2019).
- Miu, Peter, and Bogie Ozdemir. 2017. Adapting the Basel II advanced internal ratings-based models for International Financial Reporting Standard 9. Journal of Credit Risk 13: 53–83. [Google Scholar] [CrossRef]
- Neter, John, Michael H. Kutner, Christopher J. Nachtsheim, and William Wasserman. 1996. Applied Linear Statistical Models, 4th ed. WCB McGraw-Hill: New York. [Google Scholar]
- Nguyen, Hoang-Vu, Emmanuel Müller, Jilles Vreeken, and Klemens Böhm. 2014. Unsupervised interaction-preserving discretization of multivariate data. Data Mining Knowledge Discovery 28: 1366–97. [Google Scholar] [CrossRef]
- PWC. 2017. IFRS 9 for Banks: Illustrative Disclosures. February. Available online: https://www.pwc.com/ee/et/home/majaastaaruanded/Illustrative_discloser_IFRS_9_for_Banks.pdf (accessed on 8 April 2019).
- SAS Institute. 2010. Predictive Modelling Using Logistic Regression. Cary: SAS Institute Inc., Available online: http://support.sas.com/documentation/cdl/en/prochp/67530/HTML/default/viewer.htm#prochp_hpbin_overview.htm (accessed on 6 September 2017).
- SAS Institute. 2017. Development of Credit Scoring Applications Using SAS Enterprise Miner (SAS Course Notes: LWCSEM42). Cary: SAS Institute, ISBN 978-1-63526-092-2. [Google Scholar]
- Sheu, Ching-fan. 2000. Regression analysis of correlated binary outcomes. Behavior Research Methods, Instruments & Computers 32: 269–73. [Google Scholar]
- Siddiqi, Naeem. 2006. Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. Hoboken: John Wiley & Sons. [Google Scholar]
- Siddiqi, Naeem. 2017. Intelligent Credit Scoring: Building and Implementing Better Credit Risk Scorecards. Hoboken: John Wiley & Sons. [Google Scholar]
- Tevet, Dan. 2013. Exploring model lift: is your model worth implementing? Actuarial Review 40: 10–13. [Google Scholar]
- Thomas, Lyn C. 2009. Consumer Credit Models: Pricing, Profit and Portfolios. Oxford: Oxford University Press. [Google Scholar]
- Van Berkel, Anthony, and Naeem Siddiqi. 2012. Building Loss Given Default Scorecard using Weight of Evidence Bins in SAS
^{®}Enterprise Miner™. SAS Institute Inc Paper 141–2012. Cary: SAS Institute. [Google Scholar] - Verster, Tanja. 2018. Autobin: A predictive approach towards automatic binning using data splitting. South African Statistical Journal 52: 139–55. [Google Scholar]
- Volarević, Hrvoje, and Mario Varović. 2018. Internal model for ifrs 9-expected credit losses calculation. Ekonomski pregled: Mjesečnik Hrvatskog Društva Ekonomista Zagreb 69: 269. [Google Scholar] [CrossRef]

$\mathbf{Binary}\text{}\mathbf{Outcome}\text{}\left(\mathit{Y}\right)$ | Exposure | Weight Variable |
---|---|---|

0 | $50,000 | $13,500 |

1 | $50,000 | $36,500 |

LTV | Channel & Manufacturer | New/Old | Standardised LGD |
---|---|---|---|

<=1 | Group 1 | New | −1.0553 |

<=1 | Group 1 | Old | −1.00075 |

<=1 | Group 2 | New | −0.87389 |

<=1 | Group 2 | Old | −0.18252 |

<=1 | Group 1 | New | −0.2155 |

<=1 | Group 1 | Old | −0.10513 |

<=1 | Group 3 | New | −0.67346 |

<=1 | Group 3 | Old | 0.050902 |

>1 | Group 1 | New | −0.22311 |

>1 | Group 1 | Old | 0.519007 |

>1 | Group 2 | New | −0.24721 |

>1 | Group 2 | Old | 0.532962 |

>1 | Group 1 | New | 0.365509 |

>1 | Group 1 | Old | 0.957936 |

>1 | Group 3 | New | 0.647134 |

>1 | Group 3 | Old | 1.503425 |

LTV | LTV Range | # | Standardised LGD |

Bin 1 | LTV <=1 | 18188 | −0.00566 |

Bin 2 | LTV <=1.2 | 10461 | −0.00268 |

Bin 3 | LTV > 1.2 | 9703 | 0.004802 |

MOB | MOB Range | # | Standardised LGD |

Bin 1 | MOB <=24 | 17593 | 0.005193244 |

Bin 2 | MOB <=42 | 10431 | −0.000342394 |

Bin 3 | MOB > 42 | 10328 | −0.006198457 |

Default | Default Range | # | Standardised LGD |

Bin 1 | 0 | 1043 | −0.005747327 |

Bin 2 | 1 | 7706 | −0.004411893 |

Bin 3 | 2+ | 16150 | −0.000289465 |

Bin 4 | Other/Missing | 13453 | 0.006032881 |

New/Old | New/Old Range | # | Standardised LGD |

Bin 1 | New | 15249 | −0.004677389 |

Bin 2 | Old | 23103 | 0.004428005 |

Channel Manufacturer | Channel Manufacturer Range | # | Standardised LGD |

Bin 1 | Group 1 | 3870 | −0.008325 |

Bin 2 | Group 2 | 5984 | −0.004694 |

Bin 3 | Group 3 | 26422 | 0.001172 |

Bin 4 | Group 4 | 2076 | 0.011212 |

Analysis of Maximum Likelihood Estimates | |||||
---|---|---|---|---|---|

$\mathbf{Parameter}\text{}\left(\mathit{X}\right)$ | DF | $\mathbf{Estimate}\text{}\left(\mathit{\beta}\right)$ | Standard Error | Wald Chi-Square | Pr > ChiSq |

Intercept | 1 | −1.0977 | 0.000012 | 8528907254 | <0.0001 |

$\mathrm{LTV}\text{}({X}_{1})$ | 1 | 32.6329 | 0.00256 | 161977546 | <0.0001 |

$\mathrm{Months}\text{}\mathrm{on}\text{}\mathrm{books}\text{}({X}_{2})$ | 1 | 10.3046 | 0.00261 | 15622966.5 | <0.0001 |

$\mathrm{Default}\text{}\mathrm{event}\text{}({X}_{3})$ | 1 | 173.9 | 0.00253 | 4709270394 | <0.0001 |

$\mathrm{New}/\mathrm{Old}\text{}({X}_{4})$ | 1 | 18.5934 | 0.00252 | 54593987.2 | <0.0001 |

$\mathrm{Channel}/\mathrm{Manufacturer}\text{}({X}_{5})$ | 1 | 17.3602 | 0.00248 | 48935118.5 | <0.0001 |

Analysis of GEE Parameter Estimates | ||||||
---|---|---|---|---|---|---|

Empirical Standard Error Estimates | ||||||

$\mathbf{Parameter}\text{}\left(\mathit{X}\right)$ | $\mathbf{Estimate}\text{}\left(\mathit{\beta}\right)$ | Standard Error | 95% Confidence Limits | Z | Pr > |Z| | |

Intercept | −1.0978 | 0.0116 | −1.1205 | −1.0750 | −94.44 | <0.0001 |

$\mathrm{LTV}\text{}({X}_{1})$ | 32.6348 | 2.4257 | 27.8805 | 37.3891 | 13.45 | <0.0001 |

$\mathrm{Months}\text{}\mathrm{on}\text{}\mathrm{books}\text{}({X}_{2})$ | 10.3055 | 2.3708 | 5.6587 | 14.9522 | 4.35 | <0.0001 |

$\mathrm{Default}\text{}\mathrm{event}\text{}({X}_{3})$ | 173.8758 | 1.8297 | 170.2897 | 177.4619 | 95.03 | <0.0001 |

$\mathrm{New}/\mathrm{Old}\text{}({X}_{4})$ | 18.5943 | 2.4984 | 13.6976 | 23.4910 | 7.44 | <0.0001 |

$\mathrm{Channel}\text{}\mathrm{Manufacturer}\text{}({X}_{5})$ | 17.3607 | 2.5861 | 12.2921 | 22.4293 | 6.71 | <0.0001 |

Analysis of GEE Parameter Estimates | ||||||
---|---|---|---|---|---|---|

Empirical Standard Error Estimates | ||||||

$\mathbf{Parameter}\text{}\left(\mathit{X}\right)$ | $\mathbf{Estimate}\text{}\left(\mathit{\beta}\right)$ | Standard Error | 95% Confidence Limits | Z | Pr > |Z| | |

Intercept | −0.7973 | 0.0080 | −0.8131 | −0.7816 | −99.15 | <0.0001 |

$\mathrm{LTV}\text{}({X}_{1})$ | 24.8404 | 1.8335 | 21.2468 | 28.4339 | 13.55 | <0.0001 |

$\mathrm{Months}\text{}\mathrm{on}\text{}\mathrm{books}\text{}({X}_{2})$ | 6.8528 | 1.7314 | 3.4592 | 10.2463 | 3.96 | <0.0001 |

$\mathrm{Default}\text{}\mathrm{event}\text{}({X}_{3})$ | 129.6377 | 1.3393 | 127.0126 | 132.2627 | 96.79 | <0.0001 |

$\mathrm{New}/\mathrm{Old}\text{}({X}_{4})$ | 12.5228 | 1.8139 | 8.9677 | 16.0779 | 6.90 | <0.0001 |

$\mathrm{Channel}\text{}\mathrm{Manufacturer}\text{}({X}_{5})$ | 11.7312 | 1.8959 | 8.0154 | 15.4470 | 6.19 | <0.0001 |

Weighted LR | GEE (Ind Corr) | GEE (Ar1 Corr) | |
---|---|---|---|

${\beta}_{0}$ | −1.0977 | −1.0978 | −0.7973 |

${\beta}_{1}$ | 32.6329 | 32.6348 | 24.8404 |

${\beta}_{2}$ | 10.3046 | 10.3055 | 6.8528 |

${\beta}_{3}$ | 173.9 | 173.8758 | 129.6377 |

${\beta}_{4}$ | 18.5934 | 18.5943 | 12.5228 |

${\beta}_{5}$ | 17.3602 | 17.3607 | 11.7312 |

Weighted LR | GEE (Ind Corr) | GEE (Ar1 Corr) | |
---|---|---|---|

${\beta}_{0}$ | 0.000012 | 0.0116 | 0.0080 |

${\beta}_{1}$ | 0.00256 | 2.4257 | 1.8335 |

${\beta}_{2}$ | 0.00261 | 2.3708 | 1.7314 |

${\beta}_{3}$ | 0.00253 | 1.8297 | 1.3393 |

${\beta}_{4}$ | 0.00252 | 2.4984 | 1.8139 |

${\beta}_{5}$ | 0.00248 | 2.5861 | 1.8959 |

Technique | Train MSE | Valid MSE | Train Gini | Valid Gini |
---|---|---|---|---|

Weighted logistic regression | 0.04727719 | 0.04274367 | 0.45492145910 | 0.36039085030 |

GEE (independent correlation) | 0.04727703 | 0.04274417 | 0.45492145910 | 0.36039085030 |

GEE (AR 1 correlation) | 0.05222953 | 0.04062386 | 0.45482289180 | 0.36037450660 |

Technique | Valid MSE |
---|---|

Decision tree (default settings) | 0.1012884759 |

Decision tree (prune on ASE) | 0.1002412789 |

Decision tree (no pruning) | 0.1041756997 |

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Breed, D.G.; Verster, T.; Schutte, W.D.; Siddiqi, N.
Developing an Impairment Loss Given Default Model Using Weighted Logistic Regression Illustrated on a Secured Retail Bank Portfolio. *Risks* **2019**, *7*, 123.
https://doi.org/10.3390/risks7040123

**AMA Style**

Breed DG, Verster T, Schutte WD, Siddiqi N.
Developing an Impairment Loss Given Default Model Using Weighted Logistic Regression Illustrated on a Secured Retail Bank Portfolio. *Risks*. 2019; 7(4):123.
https://doi.org/10.3390/risks7040123

**Chicago/Turabian Style**

Breed, Douw Gerbrand, Tanja Verster, Willem D. Schutte, and Naeem Siddiqi.
2019. "Developing an Impairment Loss Given Default Model Using Weighted Logistic Regression Illustrated on a Secured Retail Bank Portfolio" *Risks* 7, no. 4: 123.
https://doi.org/10.3390/risks7040123