# Using Model Performance to Assess the Representativeness of Data for Model Development and Calibration in Financial Institutions

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Literature Review

#### 2.1. Regulatory Perspective

- (a)
- the scope of application;
- (b)
- the definition of default;
- (c)
- the distribution of the relevant risk characteristics;
- (d)
- the lending standards and recovery policies.

- (e)
- the current and foreseeable economic or market conditions.

#### 2.2. Qualitative Aspects of Representativeness

#### 2.3. Quantitative Aspects of Representativeness

- assessing the qualitative aspects of representativeness using expert judgement;
- assessing the quantitative aspects of representativeness using distributional comparisons, for example, the use of the PSI;
- using the Prediction Accuracy Index (PAI) as an alternative to the PSI when comparing distributions (Taplin and Hunt 2019).

## 3. Generic Methodology to Assess Data Representativeness Quantitatively

- Define $MS{E}_{Q+BB,BT}$ as the MSE (or any other model performance measure) of Data set BT using the model developed on Data set Q + BB and $MS{E}_{BB,BT}$ as the MSE of Data set BT using the model developed on Data set BB.
- If $MS{E}_{Q+BB,BT}<MS{E}_{BB,BT}\text{},$ the model developed on the augmented data has improved the model performance compared to the model developed only on the base data. Suppose that more substantiation is required regarding the significance of the difference between the MSEs, then the formal tests proposed in Step 5 could be optionally performed. However, if $MS{E}_{Q+BB,BT}\ge \text{}MS{E}_{BB,BT}$, the formal tests proposed in Step 5 should be performed.

- $LG{D}_{i}$ indicates the observed outcome for observation $i$,
- ${\widehat{LGD}}_{i,Q+BB,BT}$ indicates the predicted outcome for observation $i$ calculated on Data set BT using the model build on Data set Q + BB and
- ${\widehat{LGD}}_{i,BB,BT}$ indicates the predicted outcome for observation $i$ calculated on Data set BT using the model build on Data set BB for $i=1,\dots ,{N}_{BT},$ where ${N}_{BT}$ is the number of observations in Data set BT.

#### 3.1. Remarks

- Stability measures to what extent the population that was used to construct the rating system is similar to the current population.
- Discrimination measures how well the rating system provides an ordinal ranking of the risk.
- Calibration measures if there is a deviation of the estimated risk measure from what has been observed ex-post.

- the model methodology;
- the performance measures;
- the type of dependent two-sample test;
- the significance level.

#### 3.2. Roadmap/Summary to Assessing Representativeness

## 4. Case Studies

#### 4.1. Methodology and Data

#### 4.1.1. Modelling Technique Used

- The model is linear in the parameters and variables.
- The error terms are normally distributed.
- The regressors are independent of one another (no collinearity).
- The error terms are independently distributed.
- The error terms have constant variance (no heteroscedasticity).

#### 4.1.2. Data

#### 4.1.3. Dependent Variable Used

#### 4.1.4. Independent Variables Used

- EAD: Exposure at default.
- Facility type: Represents the different loan types. Member banks are responsible for mapping their own internal facilities denominations to the GCD Facility types.
- Seniority code: Debt grouped and assigned a code according to seniority level (e.g., Super Senior, Pari-Passu, Subordinated and Junior).
- Guarantee indicator: Indicates whether a loan has underlying protection in the form of a guarantee, a credit default swap or support from a key party.
- Collateral indicator: Indicates whether a loan has underlying protection in the form of collateral or a security.
- Industry code: The industry that accounts for the largest percentage of the entity’s revenues.

#### 4.1.5. Data Preparation on Independent Variables

- Missing values will also be coded as the average LGD value and will therefore be used in model fit (else these rows will not be used in modelling).
- Outliers will have little effect on the fit of the model (as all high values (or all the low values) will have the same LGD value if they are in the same bin).
- Binning can capture some of the generalisation (required in predictive modelling) (Verster 2018).
- Binning can capture possible nonlinear trends (Siddiqi 2006).
- Using the average LGD value for each bin ensures that all variables are of the same scale (i.e., average LGD value). Note that many measures could have been used to quantify each bin, and the average was arbitrarily chosen.
- Using the average LGD value ensures that all types of variables (categorical, numerical, nominal and ordinal) will be transformed into the same measurement type.

- Using Data set BB to bin and then applying the binning results to both Data set BB and Data set BT.
- Using Data set Q + BB to bin and then applying the binning results to Data set Q + BB and then to Data set BT.

- Seniority code: Senior debt is associated with less risk (lower LGD) than junior debt.
- Guarantee indicator: Debt with a guarantee indicator is associated with less risk (lower LGD values).
- Collateral indicator: Debt with a collateral indicator is associated with less risk (lower LGD values).
- Industry code: Some industries (e.g., mining) are associated with less risk than other industries, e.g., education.
- Type of loan: Some types of loans (e.g., revolver loans) are associated with less risk (lower LGD) than other types of loans, e.g., overdrafts.
- Exposure at default: We used ten equal-sized bins for the EAD. The risk increases as the EAD decreases. This seems counterintuitive, and the reason might be due to the fact that both large corporates and SMEs were included. Typically, large corporates are associated with higher loan amounts but are typically lower risk companies. The loan size of SMEs will usually be smaller but could be associated with a higher risk.

#### 4.1.6. The Multivariate Prediction Accuracy Index (MPAI) as a Potential Measure to Relate Our Result

#### 4.2. Results of Case Study 1

#### 4.3. Results of Case Study 2

## 5. Conclusions and Recommendations

- The modelling technique used to illustrate our proposed methodology was linear regression. Many other modelling techniques could be investigated, and it would be ideal to evaluate the performance of our proposed methodology and the MPAI in a simulation design by fitting different models to simulated data and comparing the outcomes under controlled conditions;
- A similar simulation design could be employed to assess the p-value cut-offs for our proposed methodology and to evaluate the MPAI thresholds proposed by Taplin and Hunt (2019) and those proposed for our setting of assessing representativeness;
- We used a clustering algorithm to bin industries together, although many other methods exist. A future research study could be to bin the industries using other techniques, such as the classification used by Krüger and Rösch (2017).

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Note

1 | A loan from a specific country or region can originate from any global bank that submits data to the GCD. |

## References

- Arlot, Sylvain, and Alain Celisse. 2010. A survey of cross-validation procedures for model selection. Statistics Surveys 4: 40–79. [Google Scholar] [CrossRef]
- Baesens, Bart, Daniel Rosch, and Harald Scheule. 2016. Credit Risk Analytics: Measurement Techniques, Applications and Examples in SAS. Hoboken: Wiley & Sons. [Google Scholar]
- Barnard, George Alfred. 1974. Discussion of Cross-Validatory Choice and Assessment of Statistical Predictions. Journal of the Royal Statistical Society 36: 133–35. [Google Scholar]
- BCBS. 2006. Basel II: International Convergence of Capital Measurement and Capital Standards: A Revised Framework. Basel: Bank for International Settlements, Available online: https://www.bis.org/publ/bcbs128.htm (accessed on 19 January 2018).
- Breed, Douw Gerbrand, and Tanja Verster. 2017. The benefits of segmentation: Evidence from a South African bank and other studies. South African Journal of Science 113: 1–7. [Google Scholar] [CrossRef]
- Cortés, Lina Marcela, Andres Mora-Valencia, and Javier Perote. 2017. Measuring firm size distribution with semi-nonparametric densities. Physica A: Statistical Mechanics and its Applications 485: 35–47. [Google Scholar] [CrossRef]
- Cutaia, Massimo. 2017. Isn’t There Really Enough Data to Produce Good LGD and EAD Models? Edinburgh: Credit Research Centre, Business School, University of Edinburgh, Available online: https://www.crc.business-school.ed.ac.uk/sites/crc/files/2020-11/17-Massimo_Cutaia.pdf (accessed on 15 March 2019).
- D’Agostino, Ralph, and Michael Stephens. 1986. Goodness-of-Fit Techniques. New York: Marcel Dekker Inc. [Google Scholar]
- de Jongh, Pieter. Juriaan, Tanja Verster, Elsabe Reynolds, Morne Joubert, and Helgard Raubenheimer. 2017. A Critical Review of the Basel Margin of Conservatism Requirement in a Retail Credit Context. International Business & Economics Research Journal 16: 257–74. Available online: https://clutejournals.com/index.php/IBER/article/view/10041/10147 (accessed on 3 December 2020).
- Diebold, Francis X. 2015. Comparing Predictive Accuracy, Twenty Years Later: A Personal Perspective on the Use and Abuse of Diebold–Mariano Tests. Journal of Business & Economic Statistics 33: 1. [Google Scholar] [CrossRef][Green Version]
- EBA. 2017. Guidelines on PD Estimation, EAD Estimation and the Treatment of Defaulted Exposures. Available online: https://www.eba.europa.eu/regulation-and-policy/model-validation/guidelines-on-pd-lgd-estimation-and-treatment-of-defaulted-assets (accessed on 18 June 2019).
- Engelman, Bernd, and Robert Rauhmeier. 2011. The Basel II Risk Parameters: Estimation, Validation, and Stress Testing, 2nd ed. Berlin: Springer. [Google Scholar] [CrossRef]
- European Capital Requirement Regulations. 2013. Regulation (EU) No 575/2013 of the European Parliament and of the Council. Luxembourg: Official Journal of the European Union, Available online: https://eur-lex.europa.eu/eli/reg/2013/575/oj (accessed on 5 December 2019).
- GCD. 2018. LGD Report 2018—Large Corporate Borrowers; Reeuwijk: Global Credit Data. Available online: https://www.globalcreditdata.org/library/lgd-report-large-corporates-2018 (accessed on 21 February 2020).
- GCD. 2019. Global Credit Data. Available online: https://www.globalcreditdata.org/ (accessed on 5 December 2019).
- Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference and Prediction. New York: Springer. [Google Scholar]
- James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning with Applications in R. New York: Springer. [Google Scholar]
- Joubert, Morne, Tanja Verster, and Helgard Raubenheimer. 2018a. Default weighted survival analysis to directly model loss given default. South African Statistical Journal 52: 173–202. Available online: https://hdl.handle.net/10520/EJC-10cdc036ea (accessed on 5 December 2019).
- Joubert, Morne, Tanja Verster, and Helgard Raubenheimer. 2018b. Making use of survival analysis to indirectly model loss given default. Orion 34: 107–32. [Google Scholar] [CrossRef]
- Krüger, Steffen, and Daniel Rösch. 2017. Downturn LGD modeling using quantile regression. Journal of Banking and Finance 79: 42–56. [Google Scholar] [CrossRef]
- Li, David, Ruchi Bhariok, Sean Keenan, and Stefano Santilli. 2009. Validation techniques and performance metrics for loss given default models. The Journal of Risk Model Validation 3: 3–26. [Google Scholar] [CrossRef]
- Loterman, Gert, Iain Brown, David Martens, Christophe Mues, and Bart Baesens. 2012. Benchmarking regression algorithms for loss given default modeling. International Journal of Forecasting 28: 161–70. [Google Scholar] [CrossRef]
- Lund, Bruce, and Steven Raimi. 2012. Collapsing Levels of Predictor Variables for Logistic Regression and Weight of Evidence Coding. In MWSUG 2012: Proceedings. Paper SA-03. Minneapolis: Midwest SAS Users Group, Inc., Available online: http://www.mwsug.org/proceedings/2012/SA/MWSUG-2012-SA03.pdf (accessed on 19 January 2018).
- Mountrakis, Giorgos, and Bo Xi. 2013. Assessing the reference dataset representativeness through confidence metrics based on information density. ISPRS Journal of Photogrammetry and Remote Sensing 78: 129–47. [Google Scholar] [CrossRef]
- Neter, John, Michael H. Kutner, Christopher J. Nachtsheim, and William Wasserman. 1996. Applied Linear Statistical Models, 4th ed. New York: WCB McGraw-Hill. [Google Scholar]
- OCC. 2011. Supervisory Guidance on Model Risk Management; Attachment to Supervisory Letter 11-7. Washington, DC: Board of Governors of the Federal Reserve System. Available online: https://www.federalreserve.gov/boarddocs/srletters/2011/sr1107a1.pdf (accessed on 19 January 2018).
- Picard, Richard, and Kenneth Berk. 1990. Data splitting. The American Statistician 44: 140–47. [Google Scholar] [CrossRef]
- Prorokowski, Lukasz. 2018. Validation of the backtesting process under the targeted review of internal models: Practical recommendations for probability of default models. Journal of Risk Model Validation 13: 109–47. [Google Scholar] [CrossRef]
- Prudential Regulation Authority. 2019. Internal Ratings Based (IRB) Approaches (Supervisory Statement SS11/13); London: Bank of England. Available online: https://www.bankofengland.co.uk/prudential-regulation/publication/2013/internal-ratings-based-approaches-ss (accessed on 21 February 2020).
- Ramzai, Juhi. 2020. PSI and CSI: Top 2 Model Monitoring Metrics. Available online: https://towardsdatascience.com/psi-and-csi-top-2-model-monitoring-metrics-924a2540bed8 (accessed on 1 March 2021).
- Riskworx. 2011. LGD Distributions. Available online: http://www.riskworx.co.za/resources/LGD%20Distributions.pdf (accessed on 20 February 2020).
- SARB. 2015. Bank’s Act Reporting. Available online: https://www.resbank.co.za/Lists/News%20and%20Publications/Attachments/6864/07%20Chapter%202%20credit%20risk.pdf (accessed on 19 January 2018).
- SAS Institute. 2010. Predictive Modelling Using Logistic Regression. Cary: SAS Institute. [Google Scholar]
- SAS Institute. 2019. The Modeclus Procedure (SAS/STAT 14.3 User’s Guide); Cary: SAS Institute. Available online: http://documentation.sas.com/?cdcId=pgmsascdc&cdcVersion=9.4_3.4&docsetId=statug&docsetTarget=statug_modeclus_toc.htm&locale=en (accessed on 2 February 2018).
- Sheather, Simon. 2009. A Modern Approach to Regression with R. New York: Springer Science & Business Media. [Google Scholar]
- Siddiqi, Naeem. 2006. Credit Risk Scorecards: Developing and Implementing Intelligent Credit Scoring. Hoboken: John Wiley & Sons. [Google Scholar]
- Sprent, Peter, and Nigel C. Smeeton. 2001. Applied Nonparametric Statistical Methods. London: Chapman & Hall/CRC. [Google Scholar]
- Taplin, Ross, and Clive Hunt. 2019. The Population Accuracy Index: A New Measure of Population Stability for Model Monitoring. Risks 7: 53. [Google Scholar] [CrossRef][Green Version]
- Thompson, Steven K. 2012. Sampling, 3rd ed. Hoboken: Wiley. [Google Scholar]
- Verster, Tanja. 2018. Autobin: A Predictive Approach towards Automatic Binning Using Data Splitting. South African Statistical Journal 52: 139–55. Available online: https://hdl.handle.net/10520/EJC-10ca0d9e8d (accessed on 5 June 2020).
- Wasserstein, Ronald L., and Nicole A. Lazar. 2016. The ASA’s statement on p-values: Context, process and purpose. The American Statistician 70: 129–33. [Google Scholar] [CrossRef][Green Version]
- Yurdakul, Bilal, and Joshua Naranjo. 2020. Statistical properties of the population stability index. Journal of Risk Model Validation 14: 89–100. [Google Scholar] [CrossRef]
- Zhang, Yongli, and Yuhong Yang. 2015. Cross-validation for selecting a model selection procedure. Journal of Econometrics 187: 95–112. [Google Scholar] [CrossRef]

**Table 1.**GCD definition of the asset classes (GCD 2019).

Asset Class | GCD Definition |
---|---|

SME | Borrowers in the Corporate Asset Class as defined in the Basel II Accord §218 and §273, where the reported sales for the consolidated group of which the firm is a part is less than €50 million and where the exposure is not treated as retail, i.e., group exposure > €1 million. |

Large corporate | Borrowers in the Corporate Asset Class as defined in the Basel II Accord §218 and §273, where the reported sales for the consolidated group of which the firm is a part is above or equal than €50 million but which is not reported in a more specialised Asset Class. |

SMEs | n | Large Corporates | n |
---|---|---|---|

LGD < −0.01 | 175 | LGD < −0.01 | 14 |

−0.01 ≤ LGD ≤ 1.5 | 3600 | −0.01 ≤ LGD ≤ 1.5 | 231 |

LGD > 1.5 | 0 | LGD > 1.5 | 0 |

Parameter Estimates | ||
---|---|---|

Variable (Binned, Average LGD) | Parameter Estimate | p-Value |

Intercept | 1.31 | 0.02 |

Facility type | 0.41 | 0.08 |

Industry code | 0.25 | 0.02 |

Collateral indicator | 0.59 | <0.01 |

Seniority code | −4.89 | 0.01 |

Exposure at default | 0.81 | <0.01 |

Goodness-of-fit statistics | ||

R-squared on ABC build data set (Data set BB) | 32.67% | |

Adjusted R-squared on ABC build data set (Data set BB) | 32.23% |

Parameter Estimates | ||
---|---|---|

Variable (Binned, Average LGD) | Parameter Estimate | p-Value |

Intercept | 0.59 | 0.01 |

Facility type | 0.32 | <0.01 |

Guarantee indicator | −0.40 | 0.05 |

Industry code | 0.40 | <0.01 |

Collateral indicator | 0.68 | <0.01 |

Seniority code | −2.52 | <0.01 |

Exposure at default | 0.72 | <0.01 |

Goodness-of-fit statistics | ||

R-squared on SA GCD data (Data set Q + BB) | 32.16% | |

Adjusted R-squared on SA GCD data (Data set Q + BB) | 32.05% |

MSE | MPAI | |
---|---|---|

$MS{E}_{BB,BB}$ | 10.84% | 1 |

$MS{E}_{BB,BT}$ | 11.11% | 0.86 |

$MS{E}_{Q+BB,Q+BB}$ | 10.70% | 1 |

$MS{E}_{Q+BB,BT}$ | 10.78% | 1.09 |

Variable: LGD | |
---|---|

Number of observations | 3065 |

Difference between the mean and median | 25.94% |

Standard deviation | 0.40 |

Parameter Estimates | ||
---|---|---|

Variable (Binned, Average LGD) | Parameter Estimate | p-Value |

Intercept | 2.24 | <0.01 |

Facility type | 0.32 | <0.01 |

Industry code | 0.44 | <0.01 |

Collateral indicator | 0.70 | <0.01 |

Seniority code | −7.91 | <0.01 |

Exposure at default | 0.69 | <0.01 |

Goodness-of-fit statistics | ||

R-squared on SA build data set | 32.51% | |

Adjusted R-squared on SA build data set | 32.40% |

Parameter Estimates | ||
---|---|---|

Variable (Binned, Average LGD) | Parameter Estimate | p-Value |

Intercept | 0.28 | 0.08 |

Facility type | 0.40 | <0.01 |

Industry code | 0.46 | <0.01 |

Collateral indicator | 0.68 | <0.01 |

Seniority code | −2.12 | <0.01 |

Exposure at default | 0.72 | <0.01 |

Goodness-of-fit statistics | ||

R-squared on SA plus Country L | 28.25% | |

Adjusted R-squared on SA plus Country L | 28.14% |

Parameter Estimates | ||
---|---|---|

Variable (Binned, Average LGD) | Parameter Estimate | p-Value |

Intercept | −0.94 | <0.01 |

Facility type | 0.80 | <0.01 |

Guarantee indicator | 0.46 | <0.01 |

Industry code | 0.91 | <0.01 |

Collateral indicator | 0.79 | <0.01 |

Seniority code | 0.66 | <0.01 |

Exposure at default | 0.60 | <0.01 |

Goodness-of-fit statistics | ||

R-squared on SA plus Country AH | 7.42% | |

Adjusted R-squared on SA plus Country AH | 7.39% |

Country | R-Squared (Q + BB) | MSE (Q + BB, Q + BB) | MSE (Q + BB, BT) | p-Value of t-Test (Squared Error) * | p-Value of t-Test (Absolute Error) * | MPAI * |
---|---|---|---|---|---|---|

Country A | 29.45% | 10.77% | 11.31% | 0.04 | 0.23 | 0.952 |

Country B | 30.35% | 10.98% | 11.11% | 0.05 | 0.04 | 0.974 |

Country C | 25.75% | 11.78% | 11.32% | <0.01 | <0.01 | 0.69 |

Country D | 28.54% | 11.11% | 11.05% | 0.16 | 0.01 | 0.983 |

Country E | 19.03% | 10.85% | 13.08% | <0.01 | <0.01 | 0.019 |

Country F | 32.02% | 10.58% | 10.95% | 0.71 | 0.32 | 0.912 |

Country G | 29.67% | 11.07% | 11.09% | 0.05 | 0.15 | 0.983 |

Country H | 29.80% | 11.09% | 11.14% | 0.05 | <0.01 | 0.853 |

Country I | 28.25% | 11.22% | 11.04% | 0.2 | 0.06 | 0.9 |

Country J | 29.26% | 11.10% | 11.19% | 0.01 | 0.01 | 1.012 |

Country K | 10.56% | 14.35% | 13.41% | <0.01 | <0.01 | 0.002 |

Country L | 28.25% | 11.11% | 10.94% | 0.75 | 0.12 | 0.905 |

Country M | 30.19% | 10.84% | 11.04% | 0.16 | 0.16 | 0.925 |

Country N | 28.49% | 11.29% | 11.13% | 0.02 | <0.01 | 0.9 |

Country O | 14.99% | 9.15% | 13.85% | <0.01 | <0.01 | 0.01 |

Country P | 19.58% | 13.45% | 13.13% | <0.01 | <0.01 | 0.033 |

Country Q | 21.86% | 10.58% | 13.18% | <0.01 | <0.01 | 0.105 |

Country R | 27.23% | 10.96% | 11.21% | 0.1 | <0.01 | 0.704 |

Country S | 13.96% | 11.45% | 13.13% | <0.01 | <0.01 | 0.534 |

Country T | 30.80% | 10.62% | 11.00% | 0.34 | 0.54 | 0.914 |

Country U | 28.54% | 11.21% | 11.10% | 0.07 | <0.01 | 0.981 |

Country V | 14.70% | 13.67% | 12.85% | <0.01 | <0.01 | 0.052 |

Country W | 29.67% | 11.04% | 10.97% | 0.52 | 0.28 | 0.956 |

Country X | 26.44% | 11.46% | 11.20% | 0.02 | <0.01 | 0.874 |

Country Y | 31.18% | 10.78% | 11.01% | 0.23 | 0.63 | 0.933 |

Country Z | 22.93% | 10.37% | 13.32% | <0.01 | <0.01 | 0.047 |

Country AA | 28.11% | 10.89% | 11.05% | 0.18 | 0.08 | 0.912 |

Country AB | 14.95% | 12.53% | 13.73% | <0.01 | <0.01 | 0.772 |

Country AC | 16.74% | 13.37% | 12.83% | <0.01 | <0.01 | 0.099 |

Country AD | 29.88% | 11.09% | 11.09% | 0.1 | 0.04 | 1.013 |

Country AE | 15.14% | 12.60% | 12.92% | <0.01 | <0.01 | 0.515 |

Country AF | 25.02% | 11.87% | 11.29% | <0.01 | <0.01 | 0.752 |

Country AG | 28.51% | 11.36% | 11.04% | 0.31 | <0.01 | 0.817 |

Country AH | 7.42% | 11.98% | 14.27% | <0.01 | <0.01 | 0.289 |

Country AI | 26.86% | 11.23% | 11.09% | 0.22 | <0.01 | 0.922 |

Country AJ | 25.59% | 10.84% | 11.12% | 0.12 | <0.01 | 0.712 |

Country AK | 27.74% | 12.40% | 11.81% | <0.01 | <0.01 | 0.486 |

Country AL | 31.24% | 10.74% | 10.97% | 0.31 | 0.97 | 0.985 |

Country AM | 30.61% | 10.90% | 11.03% | 0.08 | 0.15 | 0.96 |

Country AN | 28.31% | 11.36% | 11.15% | 0.04 | <0.01 | 1.023 |

Country AO | 29.54% | 11.18% | 11.03% | 0.12 | <0.01 | 0.941 |

Country AP | 31.54% | 10.68% | 10.97% | 0.43 | 0.34 | 0.979 |

Variable: LGD | |
---|---|

Number of observations | 272 |

Mean | 27% |

Standard deviation | 0.35 |

Median | 9.71% |

Variable: LGD | |
---|---|

Number of observations | 18,580 |

Mean | 28.45% |

Standard deviation | 0.35 |

Median | 7.47% |

Country | Mean LGD | Standard Deviation of LGD | Median LGD |
---|---|---|---|

Country F | 10.56% | 0.25 | 0.54% |

Country G | 30.31% | 0.39 | 6.71% |

Country I | 29.19% | 0.38 | 4.28% |

Country L | 27.00% | 0.35 | 9.71% |

Country M | 27.48% | 0.33 | 10.94% |

Country T | 12.61% | 0.26 | 1.08% |

Country W | 27.37% | 0.38 | 1.96% |

Country Y | 29.26% | 0.36 | 8.87% |

Country AA | 21.41% | 0.31 | 3.31% |

Country AL | 19.87% | 0.31 | 4.82% |

Country AM | 23.32% | 0.37 | 4.03% |

Country AP | 17.02% | 0.30 | 3.39% |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kruger, C.; Schutte, W.D.; Verster, T.
Using Model Performance to Assess the Representativeness of Data for Model Development and Calibration in Financial Institutions. *Risks* **2021**, *9*, 204.
https://doi.org/10.3390/risks9110204

**AMA Style**

Kruger C, Schutte WD, Verster T.
Using Model Performance to Assess the Representativeness of Data for Model Development and Calibration in Financial Institutions. *Risks*. 2021; 9(11):204.
https://doi.org/10.3390/risks9110204

**Chicago/Turabian Style**

Kruger, Chamay, Willem Daniel Schutte, and Tanja Verster.
2021. "Using Model Performance to Assess the Representativeness of Data for Model Development and Calibration in Financial Institutions" *Risks* 9, no. 11: 204.
https://doi.org/10.3390/risks9110204