Bayesian Logistic Regression for Credit Risk Modelling Among South African Loan Borrowers

Masekoameng, John Lehlaka; Mbona, Sizwe Vincent; Ananth, Anisha; Chifurira, Retius

doi:10.3390/jrfm19050358

Open AccessArticle

Bayesian Logistic Regression for Credit Risk Modelling Among South African Loan Borrowers

¹

Department of Statistics, Faculty of Applied Sciences, Durban University of Technology, Durban 4001, South Africa

²

School of Agriculture and Sciences, University of KwaZulu-Natal, Durban 4001, South Africa

^*

Author to whom correspondence should be addressed.

J. Risk Financial Manag. 2026, 19(5), 358; https://doi.org/10.3390/jrfm19050358

Submission received: 4 February 2026 / Revised: 22 February 2026 / Accepted: 25 February 2026 / Published: 15 May 2026

(This article belongs to the Section Banking and Finance)

Download

Browse Figures

Versions Notes

Abstract

Credit risk management is critical in developing economies where high default rates threaten financial stability. This study compares traditional logistic regression (TLR) and Bayesian logistic regression (BLR) for predicting loan default using anonymized National Credit Regulator (NCR) data from 5000 South African loan borrowers (2018–2022). The NCR data included both bank and non-bank lenders. The findings indicate that the BLR model outperformed TLR, achieving an average precision of 0.94. Loan terms, inflation rates, and income bands of R5000–R10,000 and R20,000–R50,000 were associated with higher default risk, whereas higher credit scores and personal loan products were associated with lower default risk. Model performance improved when focusing on these predictors rather than all variables. Using a 0.5 probability threshold, BLR classified 94.5% of borrowers as high risk. Findings highlight the practical value of BLR for identifying key predictors and improving borrower risk classification. These insights can inform targeted strategies such as enhanced screening for long-term loans, monitoring during inflationary periods, and tailored repayment plans for vulnerable income groups, supporting responsible lending and portfolio stability.

Keywords:

banking sector; Bayesian logistic regression; credit risk; loan default; predictors; traditional logistic regression

1. Introduction

Credit risk, the possibility that a borrower fails to meet repayment obligations, remains a key source of vulnerability for banks and the wider financial system (Moody’s, 2025). In emerging markets such as South Africa, the combination of heterogeneous borrower behaviour, institutional differences across lenders, and shifting macroeconomic conditions complicates default prediction and risk control. Robust application-focused modelling is therefore central to prudent lending and financial stability, particularly under conditions of data scarcity and market uncertainty. Credit default patterns have become more complex because of geopolitical tensions, economic shocks, and evolving regulatory landscapes. Recent assessments suggest that while credit conditions remain broadly supportive, regional divergence and investor risk aversion shape default trends (S&P Global, 2025). Downturns amplify credit losses across sectors, and banks adjust loss estimates using macroeconomic forecasts and sector-specific data (Global Credit Data, 2020; International Monetary Fund, 2025). Interconnected risks, such as inflation and shifts in interest rates, now sit at the centre of credit risk modelling (Moody’s, 2025).

The South African market is characterized by high unemployment, income inequality, and substantial informal economic activity, which affect repayment capacity and exposure to shocks. These features challenge imported scorecard approaches and motivated context-sensitive models tailored to local data (Kimetto, 2023). Non-performing loans depress profitability for smaller institutions, while larger banks partially offset losses through diversification and stronger capital buffers (Lawrence et al., 2024). Macro-financial shocks, such as interest rate changes or fiscal stress, also affect default rates across South African borrowers (Saliba et al., 2023). The pandemic highlighted differences in shock transmission, with interest-based lending proving more sensitive than risk-sharing contracts used in Islamic banking (Ahmed et al., 2022; Butt & Chamberlain, 2025).

Traditional logistic regression (TLR) remains a widely used benchmark due to its interpretability, computational efficiency, and regulatory familiarity (Seitshiro & Govender, 2024). It has been extensively applied to binary outcomes in medicine, business, finance, and the social sciences due to its simplicity and ease of interpretation, particularly through expressing regression coefficients as odds ratios (Dey et al., 2025). Logistic regression is most used in cross-sectional and case–control studies, where parameters are estimated using maximum likelihood estimation (MLE) (Hosmer et al., 2013). Inference is therefore grounded in the maximum likelihood framework because of the model’s nonlinearity. Despite its widespread application, TLR has several limitations. The MLE approach can yield biassed estimates in small samples, relies on assumptions of independence and homogeneity across observations, and has limited capacity to accommodate grouped data structures. In addition, it does not naturally incorporate prior information or fully capture uncertainty in parameter estimates, which restricts its flexibility when applied to complex or heterogeneous datasets (Lewis & Battey, 2024).

Bayesian logistic regression (BLR) offers a key advantage over TLR by estimating model parameters from their posterior distributions, which combine prior information with the likelihood function (Aydin, 2021). As a result, Bayesian models provide a flexible framework that allows the incorporation of prior knowledge and multiple levels of predictors such as borrower, bank, and macroeconomic-level factors, while explicitly quantifying parameter uncertainty (Moolchandani, 2024; Kyeong & Shin, 2022). In addition, BLR facilitates the identification of predictors associated with elevated default risk and illustrates how these signals can inform loan screening, inflation-aware monitoring, and the design of tailored repayment strategies for vulnerable income groups (Lawrence et al., 2024; Ahmed et al., 2022). Recent studies further demonstrate that Bayesian approaches can enhance predictive performance and support more robust decision-making in emerging market contexts, where historical data are often limited or noisy (Principa, 2025).

Despite the growing literature on credit risk in South Africa, studies that jointly model borrower-level predictors and bank-level predictors within a Bayesian framework remain limited. Prior work often imports methods developed for advanced economies or focuses on single-level models that do not account for variability across institutions or time-varying macroeconomic exposures (Kimetto, 2023; Lawrence et al., 2024; Saliba et al., 2023). To address this gap, we analyze anonymized NCR records for 5000 loan borrowers from three South African banks and non-bank lenders for the period 2018 to 2022. The study compares TLR with BLR and evaluates whether Bayesian modelling improves the identification of key default predictors and the classification of high-risk borrowers under realistic portfolio conditions.

This study positions the applied problem of predicting default in South African consumer lending at the centre of the analysis, aligning with the emphasis on real-world statistical practice. It quantifies borrower- and bank-level effects while propagating uncertainty through a BLR framework, contrasting performance and interpretability with a TLR baseline (Moolchandani, 2024). In addition, it identifies a set of predictors associated with elevated default risk and demonstrates how these signals can inform screening for long-term loans, inflation-aware monitoring, and tailored repayment strategies for vulnerable income bands (Lawrence et al., 2024; Ahmed et al., 2022).

2. Materials and Methods

2.1. Data Collection

The data used in this study were obtained from anonymized records provided by the National Credit Regulator (NCR) and covered borrowers from three different banks across all provinces in South Africa. The dataset included customers who held at least one loan—namely Credit Card, Mortgage, Personal Loan, Store Credit, or Vehicle Finance between 2018 and 2022. Only borrowers with complete and suitable anonymized records within this period were included, resulting in a final sample of 5000 borrowers used for analysis. Records were excluded if the loans fell outside the 2018–2022 period, involved loan types not defined in the study scope, contained missing or incomplete key variables, or represented duplicate borrower entries across banks.

In this study, a customer was considered to have defaulted after missing three consecutive loan repayments. This approach helps to identify early signs of financial difficulty and is commonly used in research and real-world practice, especially in short-term consumer lending and microfinance settings. Previous studies and industry guidelines support this approach. For example, Credit Risk Scorecards explains that behavioural definitions of default based on two or three missed payments are widely accepted and routinely used in retail credit risk datasets (Siddiqi, 2012).

2.2. The Traditional Logistic Regression (TLR) Model

In many cases, TLR model is used to model the relationship between a binary outcome (response) variable and predictor variables, whether categorical or continuous. The TLR is simpler because only the regression parameters are estimated, and no variance term is estimated (Hassan, 2020). Suppose that the outcome variable is binary with the probability of success and the probability of failure, then the TLR model is defined as (Agresti, 2015):

l o g i t (p) = \log (\frac{p}{1 - p}) = β_{0} + β_{1} x_{1} + β_{2} x_{2} + \dots + β_{p} x_{p}

(1)

where

x_{1}, x_{2}, \dots, x_{p}

are

p

predictor variables,

β_{0}

is the intercept, and

β_{1}, β_{2}, \dots, β_{p}

are the unknown regression parameters to be estimated. The predicted value of

y_{i} (i = 1, 2, \dots, n)

for the

n

independent Bernoulli experiments with the probability of success

P (y_{i} = 1)

is given by (Hosmer et al., 2013):

P (y_{i}) = \frac{e x p (β_{0} + β_{1} x_{1 i} + β_{2} x_{2 i} + \dots + β_{p} x_{p i})}{1 + e x p (β_{0} + β_{1} x_{1 i} + β_{2} x_{2 i} + \dots + β_{p} x_{p i})}

(2)

where

y_{i}

indicates the presence,

y_{i} = 1

, or absence

y_{i} = 0

of the event for that subject

i

.

The hypothesis used in this test is:

H₀:

β_{j} = 0

.

H₁:

β_{j} \neq 0 w h e r e j = 1, 2, \dots, p

.

That is, to determine whether a predictor variable has a significant effect on the outcome variable. The Wald test statistic is:

W = {(\frac{β_{j}}{s e (β_{j})})}^{2}, W ~ χ^{2} (1)

(3)

The null hypothesis

(H_{0})

is rejected if the test statistic value

|W| > Z

, which indicates that the predictor variable significantly affects the outcome variable.

2.3. Bayesian Logistic Regression (BLR) Model

The Bayesian logistic regression (BLR) model focuses on inference through the prior distribution, likelihood function, and posterior distribution. The effect size for each parameter is estimated from the posterior distribution, which combines prior information with the likelihood of the data (Chen & Nandram, 2023). This is achieved by multiplying the likelihood function by the prior distribution to produce the posterior distribution, from which all parameter estimates are derived. Bayes’ rule formally combines these three components as:

P o s t e r i o r d i s t r i b u t i o n = l i k e l i h o o d f u n c t i o n \times p r i o r d i s t r i b u t i o n

(4)

The posterior distribution contains all information about model parameters. As shown in (4), the information contained in the sample (likelihood function) is combined with information from other sources (prior distribution) to obtain the posterior distribution (Loredo & Wolpert, 2024).

2.4. Likelihood Function

The likelihood function used in Bayesian inference is analogous to that in frequentist inference. Given the probability of success (which in logistic regression varies from one subject to another, depending on their covariates), the likelihood contribution from the subject is binomial (Bolarinwa et al., 2023):

{l i k e l i h o o d}_{i} = {[π (x_{i})]}^{y_{i}} {[1 - π (x_{i})]}^{1 - y_{i}}

(5)

In (5) above,

π (x_{i})

is the probability of the event for the subject

i

that has a covariate vector

x_{i}

. Again, in the TLR,

π (x_{i})

is given by:

π (x) = \frac{e x p (β_{0} + β_{1} x_{1} + β_{2} x_{2} + \dots + β_{p} x_{p})}{1 + e x p (β_{0} + β_{1} x_{1} + β_{2} x_{2} + \dots + β_{p} x_{p})}

(6)

Now, the likelihood contribution from the

i^{t h}

subject is:

{l i k e l i h o o d}_{i} = {[\frac{e x p (β_{0} + β_{1} x_{1 i} + β_{2} x_{2 i} + \dots + β_{p} x_{p i})}{1 + e x p (β_{0} + β_{1} x_{1 i} + β_{2} x_{2 i} + \dots + β_{p} x_{p i})}]}^{y_{i}} {[1 - \frac{e x p (β_{0} + β_{1} x_{1 i} + β_{2} x_{2 i} + \dots + β_{p} x_{p i})}{1 + e x p (β_{0} + β_{1} x_{1 i} + β_{2} x_{2 i} + \dots + β_{p} x_{p i})}]}^{1 - y_{i}}

(7)

Since subjects are assumed to be independent of each other, the likelihood function for

n

Subjects are given by:

l i k e l i h o o d = \prod_{i = 1}^{n} {{[\frac{e x p (β_{0} + β_{1} x_{1 i} + β_{2} x_{2 i} + \dots + β_{p} x_{p i})}{1 + e x p (β_{0} + β_{1} x_{1 i} + β_{2} x_{2 i} + \dots + β_{p} x_{p i})}]}^{y_{i}} {[1 - \frac{e x p (β_{0} + β_{1} x_{1 i} + β_{2} x_{2 i} + \dots + β_{p} x_{p i})}{1 + e x p (β_{0} + β_{1} x_{1 i} + β_{2} x_{2 i} + \dots + β_{p} x_{p i})}]}^{1 - y_{i}}}

(8)

2.5. Prior Distribution

To make Bayesian inference for the unknown, one must choose from two types of prior (informative or non-informative) distributions. The vital step in Bayesian inference is choosing the prior distributions. When information about the values of the unknown parameters is available, informative prior distributions are employed. However, if there is nothing or little is known about the unknown parameters, or sometimes one wants to make sure that prior information does not play a big role in the analysis (that is, the data is allowed to be influential in the analysis), then non-informative priors are applied (this is sometimes known as objective-Bayesian analysis). In this paper, the researchers chose non-informative priors for the regression coefficients because they had no prior knowledge about the parameters. In many statistical software packages, including R(version number: 4.5.2), a default normal or diffuse prior with large variance is often used when no specific prior information is provided, making it a common choice for general-purpose Bayesian modelling (Newman et al., 2025). This study used a default flat prior Normal (Gaussian) distribution to estimate regression coefficients. This prior distribution is the simplest over the other priors (Laplace, Cauchy, etc.) (Gelman et al., 2019). The Gaussian prior distribution is given by:

P (β_{j} | μ_{j}, {σ_{j}}^{2}) = \frac{1}{\sqrt{2 π {σ_{j}}^{2}}} e x p [- \frac{1}{2 {σ_{j}}^{2}} {(β_{j} - μ_{j})}^{2}]

(9)

In this paper, the default values for hyper-parameters

μ_{j}, {σ_{j}}^{2}

are chosen as

μ = 0

, and

σ = 1000

(large enough) such that they give non-informative priors, thereby allowing the likelihood to dominate posterior inference (Pham et al., 2025). Although domain knowledge exists in credit risk modelling, this study adopts non-informative priors to avoid imposing subjective assumptions that may not be transferable across heterogeneous lenders, borrower segments, and time periods represented in the NCR data. Given the diversity of institutions and loan products included in the sample, specifying informative priors could bias parameter estimates toward patterns observed in specific contexts rather than allowing the data to drive inference.

2.6. Posterior Distribution

Bayes’ theorem combines the likelihood of the data with prior beliefs about the model parameters. Bayesian inference derives posterior probability distributions by multiplying the full likelihood function by the prior distribution. The posterior distribution of the unknown parameters for the BLR with a Normal prior distribution is (Bolarinwa et al., 2023):

p o s t e r i o r = \prod_{i = 1}^{n} {{[\frac{e x p (β_{0} + β_{1} x_{1 i} + β_{2} x_{2 i} + \dots + β_{p} x_{p i})}{1 + e x p (β_{0} + β_{1} x_{1 i} + β_{2} x_{2 i} + \dots + β_{p} x_{p i})}]}^{y_{i}} [1 - {\frac{e x p (β_{0} + β_{1} x_{1 i} + β_{2} x_{2 i} + \dots + β_{p} x_{p i})}{1 + e x p (β_{0} + β_{1} x_{1 i} + β_{2} x_{2 i} + \dots + β_{p} x_{p i})}]}^{1 - y_{i}}} \times \{\prod_{j = 0}^{p} \frac{1}{\sqrt{2 π {σ_{j}}^{2}}} e x p [- \frac{1}{2 {σ_{j}}^{2}} {(β_{j} - μ_{j})}^{2}]\}

(10)

2.7. Markov Chain Monte Carlo (MCMC) Algorithm

Due to the complexity of the posterior distribution in Bayesian models, it is generally not analytically tractable. Markov Chain Monte Carlo (MCMC) methods are therefore used to estimate parameters by drawing samples from the posterior distribution. Common MCMC algorithms include the Gibbs sampler and the Metropolis-Hastings (MH) algorithm, which enable sampling from complex posterior distributions when direct sampling is infeasible. In this study, the Metropolis-Hastings algorithm was used to generate random samples from the underlying posterior distribution. The basic steps of this algorithm involve proposing candidate values and accepting or rejecting them based on the posterior density to approximate the marginal posterior distributions of model parameters (Li, 2021; Atchadé & Wang, 2023):

Step 1: Take the initial value for the parameter: $θ_{j = 0} = θ^{*} ~ p (θ | θ_{j - 1})$ . The starting values can be obtained via MLE.
Step 2: Generate a random sample from a uniform distribution $u ~ U (0,1)$ .
Step 3: Compute the ratio $R = m i n (1, \frac{p (θ^{*} | X, y) p (θ_{j - 1} | θ^{*})}{p (θ_{j - 1} | X, y) p (θ^{*} | θ_{j - 1})})$ .
Step 4: Compare $R$ with a $U (0,1)$ random draw $u$ . If $R > u$ , then set $θ_{j} = θ^{*}$ . However, if $R < u$ , set $θ_{j} = θ_{j - 1}$
Step 5: Set $j = j + 1$ and repeat steps 1 to 4 until enough draws are obtained.

2.8. Convergence Assessment Using the $\hat{R}$ Diagnostic

Convergence of the MCMC algorithm was evaluated using the Gelman–Rubin statistic (

\hat{R}

) (Gelman et al., 2019). The diagnostic compares the between-chain and within-chain variances across multiple parallel chains. Formally,

\hat{R} = \sqrt{\frac{\hat{V}}{W},}

(11)

where

\hat{V} = \frac{n - 1}{n} W + \frac{1}{n} B,

(12)

n

is the number of iterations per chain,

W

is the mean within-chain variance, and

B

is the variance of the chain means. Convergence is indicated when

\hat{R} \approx 1

, meaning the chains are sampling from the same posterior distribution. Values

\hat{R} > 1.05

signal insufficient mixing and the need for additional iterations or model re-specification, while values above 1.10 imply serious non-convergence (Vehtari et al., 2021). The

\hat{R}

statistic, therefore, provides a rigorous and widely used measure for verifying the stability of Bayesian posterior estimates.

2.9. Model Evaluation Metrics and Classification Performance Assessment

To evaluate the predictive performance of the TLR and BLR models, classification performance was assessed using precision–recall (PR) analysis and average precision. Precision–recall curves illustrate the trade-off between precision (the proportion of correctly predicted defaulters among all predicted defaulters) and recall (the proportion of actual defaulters correctly identified) across different classification thresholds (Saito & Rehmsmeier, 2015). A model whose PR curve lies closer to the upper-right region of the plot demonstrates superior performance, as it achieves high precision and recall simultaneously. Precision–recall curves are particularly suitable for imbalanced datasets, such as loan default data, where non-default cases substantially outnumber default cases. Unlike receiver operating characteristic (ROC) curves, which can present overly optimistic performance by incorporating true negatives that dominate imbalanced data, PR curves focus exclusively on the positive (minority) class and therefore provide a more informative assessment of a model’s ability to identify defaulters (Fischer & Wollstadt, 2024).

In addition, average precision (AP), defined as the area under the precision–recall curve (PR AUC), was used to summarize model performance across all classification thresholds. The AP value ranges from 0 to 1, with higher values indicating better model performance and stronger discriminative ability, whereas lower values indicate poor precision across recall levels (Richardson et al., 2024).

2.10. Data Analysis

Both TLR and BLR models were compared, and the model with the best fit to the data was used to identify the key predictors of loan default. To account for limited temporal variation in macroeconomic indicators between 2018 and 2022, we explicitly modelled time by including year-level effects. Macroeconomic variables were specified at the annual level and assigned to borrowers based on the year of loan origination. This approach ensures that macroeconomic effects are estimated using between-year variation rather than between-borrower differences, thereby reducing the risk of ecological bias. Before fitting the models, we checked whether the continuous predictor variables were highly related to each other using variance inflation factor (VIF) statistics and correlation matrices. No strong relationships were found and all VIF values were small, indicating that multicollinearity was not a problem in the analysis. The dataset was also checked for missing values across all variables. No missing data were found, and therefore no imputation methods were needed. Data analysis was carried out using R statistical software version 4.5.2. Statistical significance was determined at a p-value of less than 0.05.

3. Results

Out of a total of 5000 borrowers who are included in the study, the overall median (interquartile range, IQR) origin credit score was 601 (560–640). Borrowers who eventually defaulted had a lower median credit score at loan origination [594 (555–632)] compared to non-defaulting borrowers [633 (590–668)], indicating weaker initial credit profiles among defaulters. The median (IQR) bank governance score of 3.19 (3.11–3.22) and was identical across default and non-default groups, reflecting that governance indicators were measured at the bank level and therefore exhibited limited within-bank variation across individual borrowers. Furthermore, the median (IQR) original interest rate was 6.25% (4.75–6.75), the median unemployment rate was 30.8% (29.1–33.9), and the median inflation rate was 4.4% (4.1–4.5). These macroeconomic indicators showed identical medians between default and non-default groups because they were defined at the national or bank–year level rather than at the individual borrower level. As such, these variables primarily capture shared economic conditions faced by borrowers at the time of loan origination and are not intended to explain within-group variation in default outcomes at the descriptive level. A larger proportion of loan borrowers were from Bank A (44.3%), followed by Bank B (36.1%) and Bank C (19.5%). Default rates were approximately the same across banks: 81.0% (95% CI: 79.4–82.6) for Bank A, 80.7% (95% CI: 78.9–82.5) for Bank B, and 81.9% (95% CI: 79.4–84.3) for Bank C. In addition, default rates consistently exceeded non-default rates from 2018 to 2022 (Figure 1). In terms of gender distribution, the sample was relatively balanced, with 49.7% females, 48.5% males, and 1.8% other. Default rates were comparable between females 80.5% (95% CI: 79.0–82.1) and males 81.7% (95% CI: 80.1–83.2) (Table 1).

Approximately 32.1% of loan borrowers were aged 26–35 years, followed by those who were between 36 and 45 years (27.8%), 46–60 years (20.7%), 18–25 years (11.8%), and over 60 years (7.7%). Default rates ranged from 78.8% (95% CI: 76.3–81.3) in the 46–60 group to 84.1% (95% CI: 80.4–87.7) in those over 60 years. Borrowers were evenly distributed across all provinces in South Africa among the three selected banks. Among the loan products analyzed, personal loans accounted for the largest share of the sample (34.4%), followed by credit cards (25.2%), vehicle finance (20.0%), mortgages (10.5%), and store credit (9.9%). Default rates varied across product types, with the highest observed for credit card and store credit products, both at 82.4% (95% CI: 80.3–84.5 and 95% CI: 79.0–85.7, respectively). In contrast, personal loans exhibited the lowest default rate at 79.7% (95% CI: 77.8–81.6). Income band analysis revealed an increasing trend in default rates with higher income: from 78.1% (95% CI: 75.5–80.6) in the <R5000 group to 86.9% (95% CI: 82.9–91.0) in the ≥R50,000 group. The sample was predominantly composed of African borrowers, accounting for 62.5% of the total population. Default rates were broadly consistent across racial groups, with African borrowers exhibiting a default rate of 81.0% (95% CI: 79.7–82.4), Coloured borrowers at 80.1% (95% CI: 77.5–82.8), Indian borrowers at 82.1% (95% CI: 78.4–85.7), and White borrowers at 81.9% (95% CI: 78.7–85.0). These findings suggest minimal variation in default behaviour across racial categories within the sampled population. Education levels were distributed as follows: diploma (34.0%), bachelor’s degree (30.6%), some college (16.5%), postgraduate (9.0%), and high school (9.9%). Default rates were highest among postgraduates, 83.3% (95% CI: 79.8–86.7) and lowest among those with some colleges 80.3% (95% CI: 77.6–83.0). The sample comprised 40.4% married individuals, 27.4% single, 22.2% divorced, and 10.0% widowed. Default rates were comparable across marital status categories, with single borrowers recording the highest rate at 82.3% (95% CI: 80.3–84.3), followed by married 80.9% (95% CI: 79.2–82.6), divorced 80.3% (95% CI: 77.9–82.6), and widowed borrowers 79.8% (95% CI: 76.3–83.4). These results suggest that marital status had a limited differential impact on default behaviour within the study population.

Both the TLR and BLR models were fitted to the standardized dataset (with continuous predictors scaled and categorical variables appropriately encoded). The Bayesian model included a random intercept for each bank to capture institution-level heterogeneity in default risk, following modern modelling practices (McElreath, 2018; Gelman et al., 2020).

The precision–recall curves in Figure 2 compare the classification performance of the TLR and BLR models in predicting credit default. Both curves exhibit a similar declining pattern, indicating that the two models demonstrate comparable ability to distinguish between defaulters and non-defaulters across varying decision thresholds. However, the Bayesian model’s curve lies consistently slightly above that of the traditional model, suggesting a modest improvement in classification performance. The average precision (AP) values are 0.9368 for the TLR model and 0.9381 for the BLR model. These high AP values indicate strong discriminative ability for both models, with the Bayesian model exhibiting a marginal advantage. Since AP values closer to 1 indicate superior performance, both models perform well overall; however, the Bayesian model provides more reliable separation between defaulters and non-defaulters.

Importantly, precision–recall analysis emphasizes performance on the majority class (defaulters), making it particularly suitable for this imbalanced classification problem. The BLR model consistently achieves higher precision at equivalent recall levels, indicating improved identification of high-risk borrowers. This advantage is especially relevant in the South African banking context, where accurate credit risk assessment is critical for responsible lending and regulatory compliance. Overall, the results suggest that the BLR model is better suited for credit default prediction in imbalanced datasets.

The regression results for both models are presented in Table 2. All continuous predictors were standardized to have a mean of zero and a standard deviation of one prior to fitting the BLR model. Consequently, the reported odds ratios (OR) represent the change in the odds of default associated with a one-standard deviation increase in each predictor rather than a one-unit increase in the original scale. The standardization was applied consistently across all predictors, all model parameters showed good convergence (

\hat{R} = 1

), and the Bayesian framework further reduces potential instability arising from multicollinearity through partial pooling. This satisfies current recommendations for robust Bayesian computation (Vehtari et al., 2021), confirming that the model’s posterior estimates are stable and dependable. Since the BLR outperformed the TLR, results from the BLR were used for interpretation (Table 2).

The BLR results indicate that several factors are significantly associated with the likelihood of loan default. Loan term was positively associated with default risk. Specifically, a one-standard deviation increase in the loan term was associated with a 2.32-fold increase in the odds of default (OR = 2.32, 95% credible interval [CI]: 2.15–2.52), indicating that longer loan durations are associated with a higher probability of default, holding other variables constant. Credit score at loan origination was inversely associated with default; a one-standard deviation increase in credit score reduced the odds of default by 53% (OR = 0.47, 95% CI: 0.44–0.51), suggesting that higher creditworthiness substantially lowers default risk. The inflation rate was positively associated with default, with a one-standard deviation increase in inflation corresponding to a 16% increase in the odds of default (OR = 1.16, 95% CI: 1.06–1.27). For loan product type, borrowers with personal loans had lower odds of default than those with credit cards (OR = 0.79, 95% CI: 0.64–0.96). Income band was also significantly associated with default: borrowers earning between R5000 and R10,000 had 2.03 times higher odds of default (OR = 2.03, 95% CI: 1.35–3.11), while those earning between R20,000 and R50,000 had 1.75 times higher odds (OR = 1.75, 95% CI: 1.29–2.34), compared to borrowers earning less than R5000 (Table 2). Overall, these findings indicate that both borrower-level characteristics (credit score, income, and product type) and macroeconomic factors (loan term and inflation) play a significant role in determining default risk, with the BLR model providing stable and interpretable estimates.

The calibration plot in Figure 3 shows that the model’s predicted probabilities closely match the observed default rates across the 10 quantile groups. The calibration curve stays near the 45-degree reference line, indicating good agreement between predicted and observed probabilities. There is a small upward deviation in the middle probability range, suggesting a slight underestimation of default risk in that range, but this difference is minimal. Overall, the model shows good calibration, especially in the higher-risk groups where predicted probabilities closely reflect actual default outcomes.

Results presented in Table 3 show the percentage of borrowers classified as being at risk of default using a predicted default probability threshold of 0.5 to define high risk. Overall, the Bayesian model identified approximately 94.5% of borrowers as high risk. The model further classified about 96.0% of borrowers with store credit loans, 97.6% of borrowers in the R50,000 and above income band, and 95.4% of borrowers from Bank B as being at high risk of loan default.

4. Discussion

This study analyzed a large dataset of 5000 borrowers from three selected banks across all provinces of South Africa to identify key predictors of loan default and to predict borrowers at high risk of default. A BLR model was used because it showed better identification of loan defaults than the TLR model. The average precision value of the BLR model was 0.9381, slightly higher than that of the TLR model (0.9368), in line with the findings of Tham et al. (2023). The proportion of loan default in the sample was 81.1% (95% CI: 80.0–82.2), consistent with high levels of consumer credit stress and elevated default behaviour reported in South African credit markets, where defaults remain a major concern for lenders (TransUnion, 2025). The high default rate observed in this study should be understood considering the behavioural definition of default we applied. In our analysis, customers were considered to have defaulted after missing three consecutive loan payments, which captures early signs of financial stress rather than legal default or loan write-off. This approach is widely used in credit risk research and practice, particularly in short-term consumer lending and microfinance, and typically results in higher observed default rates than those reported for entire portfolios by lenders (Siddiqi, 2012). The default rate in the NCR data used in this study appeared high because the NCR data includes both bank and non-bank lenders, such as unsecured and short-term credit providers, which naturally show higher default levels. In conventional credit risk research and industry practice, default rates for broad consumer loan portfolios are much lower because they are based on standard definitions, such as being delinquent for more than 90 days or experiencing a charge-off. Empirical studies indicate that delinquency beyond 90 days is the usual operational definition of default in consumer credit research (Kim et al., 2018), and large datasets on U.S. consumer loans report delinquency rates around 2.7% in recent 2025 Federal Reserve data of New York (Adams et al., 2025). These examples highlight that portfolio-level default rates under standard definitions are not directly comparable to the high rate observed in our study, which is based on three consecutive missed payments.

Using a predicted default probability threshold of 0.5, approximately 94.5% of borrowers were classified as being at high risk of loan default. The BLR model identified loan term, credit score at origination, inflation rate, product type (personal loan), and income bands of R5000–R10,000 and R20,000–R50,000 as significant predictors of loan default.

Longer loan terms were positively associated with default risk, consistent with evidence that extended repayment periods increase exposure to macroeconomic shocks and weaken repayment discipline (Tham et al., 2023). Higher origination credit scores strongly reduced default probability, reaffirming their role in risk-based pricing and underwriting (Kimetto, 2023). Inflation emerged as a significant macroeconomic driver of default, while interest rates and unemployment showed no notable effects, likely due to limited variation and contractual features (Saliba et al., 2023).

Income effects were non-linear: borrowers in the R5000–R10,000 and R20,000–R50,000 ranges exhibited higher default probabilities, reflecting affordability thresholds and exposure effects, a pattern consistent with global evidence on middle-income vulnerability (Adams et al., 2025). Product type influenced risk, with unsecured credit and revolving facilities exhibiting higher default rates than structured loans, although these differences diminished after controlling for borrower characteristics and loan terms (Bhandary & Ghosh, 2025).

The Bayesian model provided more stable and informative estimates than traditional approaches, capturing borrower- and bank-level heterogeneity and outperforming frequentist models in complex, volatile environments (Moolchandani, 2024). These findings highlight the value of Bayesian modelling for South African credit portfolios and support its adoption as a robust framework for risk analytics and management.

Study Strengths and Limitations

The study employed a BLR model, which allows simultaneous modelling of borrower- and institution-level variability, yielding more robust estimates of default risk than TLR approaches. The analysis was based on a large and diverse dataset that captures a wide range of borrower profiles and loan characteristics, thereby enhancing the generalisability of the findings to the South African credit market. By incorporating macroeconomic variables such as inflation, interest rates, and unemployment, the study accounts for broader economic conditions that influence borrower repayment behaviour. However, the study also has limitations. Although macroeconomic variables were modelled at the year level, their limited variation over the study period may constrain the precision of their estimated effects, and the results should therefore be interpreted as reflecting broad economic conditions rather than individual borrower behaviour. While the dataset is extensive, it may not capture all relevant borrower behaviours, such as informal credit usage, potentially underestimating some risk factors. In addition, although the BLR framework accounts for institutional structure through institution-level random effects, the full likelihood and detailed prior specifications for these random effects were not formally presented, nor were posterior summaries of random-effect variance reported. Consequently, the results may be sensitive to modelling assumptions related to prior choice and variance magnitude. Furthermore, using priors with very large variance may make it difficult to clearly identify model parameters or may cause computational problems, and the absence of a formal prior sensitivity analysis means that the robustness of the results under different prior choices could not be fully evaluated. Finally, classification performance was evaluated using a fixed probability threshold and was not assessed using economic or cost-sensitive loss functions; therefore, the practical financial implications of misclassification were not explicitly quantified and should be considered in future research.

5. Conclusions

This study shows that BLR is a valuable tool for credit risk modeling in South Africa’s banking sector. The BLR framework also uncovers complex patterns across income groups and product types, highlighting higher risk among certain middle-income brackets and lower default rates for personal loans compared to credit cards. These findings emphasize the need to consider borrower diversity, product design, and affordability levels in risk assessment. Based on these results, banks should adopt Bayesian methods in credit risk management, focus on monitoring loan terms and credit scores, and conduct stress tests of repayment capacity during economic shocks. Lending policies should consider income-related vulnerabilities and product-specific risk differences, while ensuring model calibration and convergence diagnostics for accuracy. Future research should examine behavioral, transactional, and alternative data sources to further improve predictive accuracy.

Author Contributions

Conceptualization, J.L.M.; methodology, J.L.M.; software, J.L.M.; validation, J.L.M., S.V.M., A.A. and R.C.; formal analysis, J.L.M.; investigation, J.L.M.; resources, J.L.M.; data curation, J.L.M.; writing—original draft preparation, J.L.M.; writing—review and editing, J.L.M., S.V.M., A.A. and R.C.; visualization, J.L.M.; supervision, S.V.M., A.A. and R.C.; project administration, J.L.M.; funding acquisition, South African Reserve Bank. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the South African Reserve Bank. No grant number is associated with this funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset used in this study is not publicly accessible; it can be provided to reviewers upon request. Additional materials, codes, and statistical analyses are also available from the corresponding author upon reasonable request. All code used in this study is fully reproducible. The analysis was conducted using anonymized National Credit Regulator (NCR) records for borrowers from three South African banks (2018–2022). No personal or identifiable information was used, and all data was supplied in anonymized form under access conditions that prevent public release.

Acknowledgments

We express our appreciation to the South African Reserve Bank for funding this study.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AP	Average Precision
BLR	Bayesian Logistic Regression
CI	Confidence Interval/Credible Interval
IQR	Interquartile Range
MCMC	Markov Chain Monte Carlo
MH	Metropolis–Hastings
MLE	Maximum Likelihood Estimation
NCR	National Credit Regulator
PR	Precision–Recall
$\hat{R}$ (Rhat)	Gelman–Rubin Convergence Diagnostic
SE	Standard Error
TLR	Traditional Logistic Regression

References

Adams, R., Barnes, C., Bopst, C., & Sommer, K. (2025). A note on recent dynamics of consumer delinquency rates. Board of Governors of the Federal Reserve System. [CrossRef]
Agresti, A. (2015). Foundations of linear and generalized linear models. John Wiley & Sons. Available online: https://www.oreilly.com/library/view/foundations-of-linear/9781118730058/ (accessed on 3 September 2025).
Ahmed, H. M., El-Halaby, S. I., & Soliman, H. A. (2022). The consequence of the credit risk on the financial performance in light of COVID-19: Evidence from Islamic versus conventional banks across MEA region. Future Business Journal, 8(1), 21. [Google Scholar] [CrossRef]
Atchadé, Y., & Wang, L. (2023). A fast asynchronous Markov chain Monte Carlo sampler for sparse Bayesian inference. Journal of the Royal Statistical Society Series B: Statistical Methodology, 85(5), 1492–1516. [Google Scholar] [CrossRef]
Aydin, S. (2021). Bayesian logistic regression inference: Posterior distribution based on prior and likelihood. BMC Public Health, 21, 10674. [Google Scholar] [CrossRef]
Bhandary, R., & Ghosh, B. K. (2025). Credit card default prediction: An empirical analysis on predictive performance using statistical and machine learning methods. Journal of Risk and Financial Management, 18(1), 23. [Google Scholar] [CrossRef]
Bolarinwa, F. A., Makinde, O. S., & Fasoranbaku, O. A. (2023). A new Bayesian ridge estimator for logistic regression in the presence of multicollinearity. World Journal of Advanced Research and Reviews, 20(3), 458–465. [Google Scholar] [CrossRef]
Butt, U., & Chamberlain, T. (2025). Performance of Islamic banks during the COVID-19 pandemic: An empirical analysis and comparison with conventional banking. Journal of Risk and Financial Management, 18(6), 308. [Google Scholar] [CrossRef]
Chen, L., & Nandram, B. (2023). Bayesian logistic regression model for sub-areas. Stats, 6(1), 209–231. [Google Scholar] [CrossRef]
Dey, D., Haque, M. S., Islam, M. M., Aishi, U. I., Shammy, S. S., & Mayen, M. S. A. (2025). The proper application of logistic regression model in complex survey data: A systematic review. BMC Medical Research Methodology, 25, 15. [Google Scholar] [CrossRef]
Fischer, L., & Wollstadt, P. (2024). Precision and recall reject curves for classification. arXiv, arXiv:2308.08381. [Google Scholar] [CrossRef]
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2019). Bayesian data analysis (3rd ed.). Chapman & Hall/CRC. Available online: https://www.routledge.com/Bayesian-Data-Analysis/Gelman-Carlin-Stern-Dunson-Vehtari-Rubin/p/book/9780429113079 (accessed on 27 July 2025).
Gelman, A., Vehtari, A., Simpson, D., Margossian, C. C., Carpenter, B., Yao, Y., Kennedy, L., Gabry, J., Bürkner, P.-C., & Modrák, M. (2020). Bayesian workflow. arXiv, arXiv:2011.01808. [Google Scholar] [CrossRef]
Global Credit Data. (2020). Downturn LGD study 2020. Available online: https://globalcreditdata.org (accessed on 17 August 2025).
Hassan, M. M. (2020). A fully Bayesian logistic regression model for classification of ZADA diabetes dataset. Science Journal of University of Zakho, 8(3), 105–111. [Google Scholar] [CrossRef]
Hosmer, D. W., Jr., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression. John Wiley & Sons. [Google Scholar] [CrossRef]
International Monetary Fund (IMF). (2025). Global financial stability report, April 2025: Enhancing resilience amid uncertainty. Available online: https://www.imf.org/en/Publications/GFSR/Issues/2025/04/22/global-financial-stability-report-april-2025 (accessed on 10 October 2025).
Kim, H., Cho, H., & Ryu, D. (2018). An empirical study on credit card loan delinquency. Economic Systems, 42(3), 437–449. [Google Scholar] [CrossRef]
Kimetto, G. J. (2023). Adapting a developed market credit risk model for the understanding and estimation of consumer credit losses in South Africa [Doctoral dissertation, University of South Africa]. [Google Scholar]
Kyeong, S., & Shin, J. (2022). Two-stage credit scoring using Bayesian approach. Journal of Big Data, 9, 106. [Google Scholar] [CrossRef]
Lawrence, B., Doorasamy, M., & Sarpong, P. (2024). The impact of credit risk on performance: A case of South African commercial banks. Global Business Review, 25, S151–S164. [Google Scholar] [CrossRef]
Lewis, R. M., & Battey, H. S. (2024). On inference in high-dimensional logistic regression models with separated data. Biometrika, 111(3), 989–1011. [Google Scholar] [CrossRef]
Li, Z. (2021). A review of Bayesian posterior distribution based on MCMC methods. In Computing and data science (pp. 204–213). Springer. [Google Scholar] [CrossRef]
Loredo, T. J., & Wolpert, R. L. (2024). Bayesian inference: More than Bayes’s theorem. Frontiers in Astronomy and Space Sciences, 11, 1326926. [Google Scholar] [CrossRef]
McElreath, R. (2018). Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman and Hall/CRC. [Google Scholar] [CrossRef]
Moody’s. (2025). Credit risk insights. Available online: https://www.moodys.com (accessed on 13 November 2025).
Moolchandani, S. (2024). Exploring Bayesian hierarchical models for multi-level credit risk assessment: Detailed insights. International Journal of Computer Science & Information Technology, 16(3), 67–74. [Google Scholar] [CrossRef]
Newman, K. B., Villa, C., & King, R. (2025). Logistic regression models: Practical induced prior specification. arXiv, arXiv:2501.18106. [Google Scholar] [CrossRef]
Pham, H. T., Pham, H., & Siong Yow, K. (2025). Applying non-informative G-prior for logistic regression models with different patterns of data points. Monte Carlo Methods and Applications, 31(4), 343–356. [Google Scholar] [CrossRef]
Principa. (2025). Leveraging Bayesian models for financial inclusion in South Africa. Principa Insights. Available online: https://principa.co.za/how-to-use-alternative-data-to-improve-credit-risk-models-in-south-africa/ (accessed on 5 September 2025).
Richardson, E., Trevizani, R., Greenbaum, J. A., Carter, H., & Nielsen, M. (2024). The receiver operating characteristic curve accurately assesses imbalanced datasets and interprets precision–recall behaviour. Patterns, 5, 100994. [Google Scholar] [CrossRef]
Saito, T., & Rehmsmeier, M. (2015). The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE, 10(3), e0118432. [Google Scholar] [CrossRef] [PubMed]
Saliba, C., Farmanesh, P., & Athari, S. A. (2023). Does country risk impact the banking sectors’ non-performing loans? Evidence from BRICS emerging economies. Financial Innovation, 9(1), 86. [Google Scholar] [CrossRef] [PubMed]
Seitshiro, M. B., & Govender, S. (2024). Credit risk prediction with and without weights of evidence using quantitative learning models. Cogent Economics & Finance, 12(1), 2338971. [Google Scholar] [CrossRef]
Siddiqi, N. (2012). Credit risk scorecards: Developing and implementing intelligent credit scoring. John Wiley & Sons. Available online: https://onlinelibrary.wiley.com/doi/book/10.1002/9781119201731?msockid=244f0f0cb968683e351f1a84b8a16920 (accessed on 19 August 2025).
S&P Global. (2025). Global credit outlook 2025. Available online: https://www.spglobal.com (accessed on 7 November 2025).
Tham, A. W., Kakamu, K., & Liu, S. (2023). Bayesian statistics for loan default. Journal of Risk and Financial Management, 16(3), 203. [Google Scholar] [CrossRef]
TransUnion. (2025). Industry insights report: South African consumer credit and delinquency trends. TransUnion South Africa. Available online: https://transunion.co.za (accessed on 1 January 2026).
Vehtari, A., Gelman, A., Simpson, D., Carpenter, B., & Bürkner, P.-C. (2021). Rank normalization, folding, and localization: An improved $\hat{R}$ for assessing convergence of MCMC (with discussion). Bayesian Analysis, 16(2), 667–718. [Google Scholar] [CrossRef]

Figure 1. Percentage of default vs. non-default events by year (2018–2022).

Figure 2. Comparison of Precision–Recall Performance Between Traditional and Bayesian Logistic Regression Models.

Figure 3. Calibration plot of predicted default probabilities using 10 quantile bins.

Table 1. Descriptive statistics of borrower and loan characteristics (n = 5000).

Variable	Category	Total, n (%)	Default, % (95% CI)	Not Default, % (95% CI)
Credit Score origin, median (IQR)		601 (560–640)	594 (555–632)	633 (590–668)
Bank governance, median (IQR)		3.19 (3.11–3.22)	3.19 (3.11–3.22)	3.19 (3.11–3.22)
Original Interest rate, median (IQR)		6.25 (4.75–6.75)	6.25 (4.75–6.75)	6.25 (4.75–6.75)
Original Unemployment rate, median (IQR)		30.8 (29.1–33.9)	30.8 (29.1–33.9)	30.8 (29.1–33.9)
Original Inflation rate, median (IQR)		4.4 (4.1–4.5)	4.4 (4.1–4.5)	4.4 (4.1–4.5)
Interest rate at event, median (IQR)		6.25 (4.75–6.75)	6.25 (4.75–6.75)	6.25 (4.75–6.75)
Unemployment rate at event, median (IQR)		30.8 (29.1–33.9)	30.8 (29.1–33.9)	30.8 (29.1–33.9)
Inflation rate at event, median (IQR)		4.4 (4.1–4.5)	4.4 (4.1–4.5)	4.4 (4.1–4.5)
Bank	Bank A	2217 (44.3)	81.0 (79.4–82.6)	19.0 (17.4–20.6)
	Bank B	1807 (36.1)	80.7 (78.9–82.5)	19.3 (17.5–21.1)
	Bank C	976 (19.5)	81.9 (79.4–84.3)	18.1 (15.7–20.6)
Gender	Female	2487 (49.7)	80.5 (79.0–82.1)	19.5 (17.9–21.0)
	Male	2424 (48.5)	81.7 (80.1–83.2)	18.3 (16.8–19.9)
	Other	89 (1.8)	78.7 (70.1–87.2)	21.3 (12.8–29.9)
Age group (years)	18–25	588 (11.8)	80.3 (77.1–83.5)	19.7 (16.5–22.9)
	26–35	1605 (32.1)	82.7 (80.9–84.6)	17.3 (15.4–19.1)
	36–45	1391 (27.8)	80.3 (78.2–82.4)	19.7 (17.6–21.8)
	46–60	1033 (20.7)	78.8 (76.3–81.3)	21.2 (18.7–23.7)
	60+	383 (7.7)	84.1 (80.4–87.7)	15.9 (12.3–19.6)
Product type	Credit Card	1262 (25.2)	82.4 (80.3–84.5)	17.6 (15.5–19.7)
	Mortgage	525 (10.5)	81.7 (78.4–85.0)	18.3 (15.0–21.6)
	Personal Loan	1718 (34.4)	79.7 (77.8–81.6)	20.3 (18.4–22.2)
	Store Credit	493 (9.9)	82.4 (79.0–85.7)	17.6 (14.3–21.0)
	Vehicle Finance	1002 (20.0)	80.7 (78.3–83.2)	19.3 (16.8–21.7)
Income band (rands)	<5000	1030 (20.6)	78.1 (75.5–80.6)	21.9 (19.4–24.5)
	5000–<10,000	1713 (34.3)	80.6 (78.7–82.4)	19.4 (17.6–21.3)
	10,000–<20,000	1261 (25.2)	80.6 (78.4–82.8)	19.4 (17.2–21.6)
	20,000–<50,000	728 (14.6)	85.2 (82.6–87.7)	14.8 (12.3–17.4)
	≥50,000	268 (5.4)	86.9 (82.9–91.0)	13.1 (9.0–17.1)
Race	African	3127 (62.5)	81.0 (79.7–82.4)	19.0 (17.6–20.3)
	Coloured	870 (17.4)	80.1 (77.5–82.8)	19.9 (17.2–22.5)
	Indian	424 (8.5)	82.1 (78.4–85.7)	17.9 (14.3–21.6)
	White	579 (11.6)	81.9 (78.7–85.0)	18.1 (15.0–21.3)
Education level	High School	497 (9.9)	82.3 (78.9–85.6)	17.7 (14.4–21.1)
	Some Colleges	827 (16.5)	80.3 (77.6–83.0)	19.7 (17.0–22.4)
	Diploma	1698 (34.0)	80.8 (78.9–82.7)	19.2 (17.3–21.1)
	Bachelor	1529 (30.6)	80.7 (78.7–82.7)	19.3 (17.3–21.3)
	Postgraduate	449 (9.0)	83.3 (79.8–86.7)	16.7 (13.3–20.2)
Marital status	Single	1369 (27.4)	82.3 (80.3–84.3)	17.7 (15.7–19.7)
	Married	2019 (40.4)	80.9 (79.2–82.6)	19.1 (17.4–20.8)
	Divorced	1111 (22.2)	80.3 (77.9–82.6)	19.7 (17.4–22.1)
	Widowed	501 (10.0)	79.8 (76.3–83.4)	20.2 (16.6–23.7)

n = number, % = percentage, CI = confidence interval, IQR = interquartile range.

Table 2. Traditional vs. Bayesian Logistic Regression.

Variable		Traditional Logistic Regression		Bayesian Logistic Regression
Variable		SE	OR (95% CI)	SE	OR (95% CI)	$\hat{R}$ (Rhat)
Intercept		5.17	79.18 (0–1,944,304.75)	2.77	1.70 (0.01–380.79)	1
Term of loan (months)		4.62 × 10⁻³	1.10 (1.09–1.11) *	0.04	2.32 (2.15–2.52) *	1
Loan amount (rands)		6.44 × 10⁻⁷	1.00 (1.00–1.00)	0.04	0.95 (0.88–1.02)	1
Credit Score at origin		7.29 × 10⁻⁴	0.99 (0.99–0.99) *	0.04	0.47 (0.44–0.51) *	1
Bank governance at origin		1.51	3.60 (0.19–70.38)	0.86	1.48 (0.28–7.94)	1
Interest rate		0.10	0.89 (0.73–1.08)	0.12	0.87 (0.68–1.11)	1
Unemployment rate		0.05	0.95 (0.87–1.04)	0.13	0.88 (0.69–1.13)	1
Inflation rate		0.04	1.13 (1.05–1.22) *	0.05	1.16 (1.06–1.27) *	1
Bank	A (Ref.)
	B	0.17	1.14 (0.82–1.59)	0.12	1.05 (0.82–1.33)	1
	C	0.13	1.00 (0.78–1.28)	0.12	1.04 (0.83–1.30)	1
Gender	Female (Ref.)
	Male	0.08	1.10 (0.94–1.29)	0.08	1.11 (0.94–1.29)	1
	Other	0.30	0.70 (0.40–1.29)	0.29	0.73 (0.43–1.32)	1
Age group (years)	18–25 (Ref.)
	26–35	0.19	1.26 (0.87–1.83)	0.20	1.30 (0.87–1.91)	1
	36–45	0.26	0.94 (0.57–1.54)	0.27	1.01 (0.59–1.66)	1
	46–59	0.26	0.95 (0.58–1.57)	0.27	1.03 (0.60–1.70)	1
	60 and above	0.26	1.57 (0.94–2.63)	0.26	1.56 (0.93–2.61)	1
Product type	Credit Card (Ref.)
	Mortgage	0.15	0.89 (0.67–1.19)	0.14	0.89 (0.68–1.19)	1
	Personal loan	0.10	0.78 (0.64–0.96) *	0.10	0.79 (0.64–0.96) *	1
	Store credit	0.15	1.12 (0.83–1.52)	0.15	1.13 (0.83–1.52)	1
	Vehicle finance	0.12	0.87 (0.69–1.11)	0.12	0.89 (0.71–1.13)	1
Income band (rands)	<5000 (Ref.)
	5000–<10,000	0.11	1.17 (0.95–1.44)	0.21	2.03 (1.35–3.11) *	1
	10,000–<20,000	0.11	1.19 (0.95–1.49)	0.13	1.22 (0.94–1.56)	1
	20,000–<50,000	0.14	1.72 (1.31–2.27) *	0.15	1.75 (1.29–2.34) *	1
	>50,000	0.21	2.09 (1.39–3.21) *	0.11	1.16 (0.93–1.43)	1
Race	African (Ref.)
	Coloured	0.11	0.92 (0.74–1.13)	0.13	0.86 (0.67–1.11)	1
	Indian	0.15	1.10 (0.83–1.48)	0.18	1.14 (0.81–1.61)	1
	White	0.13	1.00 (0.78–1.30)	0.15	0.98 (0.72–1.32)	1
Education level	High school (Ref.)
	Some college	0.19	0.84 (0.58–1.22)	0.10	0.92 (0.75–1.12)	1
	Diploma	0.2	1.03 (0.69–1.54)	0.21	0.88 (0.59–1.34)	1
	Bachelor	0.22	1.13 (0.74–1.73)	0.17	1.15 (0.82–1.63)	1
	Postgraduate	0.26	1.30 (0.78–2.18)	0.15	0.74 (0.56–1.00)	1
Marital status	Single (Ref.)
	Married	0.15	0.85 (0.64–1.13)	0.12	0.93 (0.74–1.17)	1
	Divorced	0.19	0.92 (0.64–1.33)	0.18	1.07 (0.75–1.51)	1
	Widowed	0.22	0.74 (0.48–1.13)	0.17	0.81 (0.58–1.14)	1
AP		0.9368		0.9381

SE = standard error; OR = odds ratios; CI = confident interval; CI = credible interval; AP = average precision; * Statistically significant at p-value less than 0.05.

Table 3. Percentage of borrowers at risk of defaulting.

		Percentage (%)
Product type	Credit card	95.7
	Mortgage	92.6
	Personal loan	93.0
	Store credit	96.0
	Vehicle finance	95.7
Income band (rands)	<5000	93.8
	50,000–<10,000	93.8
	10,000–<20,000	94.4
	20,000–<50,000	95.9
	50,000 and above	97.6
Bank	A	94.1
	B	95.4
	C	93.7
Overall		94.5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Masekoameng, J.L.; Mbona, S.V.; Ananth, A.; Chifurira, R. Bayesian Logistic Regression for Credit Risk Modelling Among South African Loan Borrowers. J. Risk Financial Manag. 2026, 19, 358. https://doi.org/10.3390/jrfm19050358

AMA Style

Masekoameng JL, Mbona SV, Ananth A, Chifurira R. Bayesian Logistic Regression for Credit Risk Modelling Among South African Loan Borrowers. Journal of Risk and Financial Management. 2026; 19(5):358. https://doi.org/10.3390/jrfm19050358

Chicago/Turabian Style

Masekoameng, John Lehlaka, Sizwe Vincent Mbona, Anisha Ananth, and Retius Chifurira. 2026. "Bayesian Logistic Regression for Credit Risk Modelling Among South African Loan Borrowers" Journal of Risk and Financial Management 19, no. 5: 358. https://doi.org/10.3390/jrfm19050358

APA Style

Masekoameng, J. L., Mbona, S. V., Ananth, A., & Chifurira, R. (2026). Bayesian Logistic Regression for Credit Risk Modelling Among South African Loan Borrowers. Journal of Risk and Financial Management, 19(5), 358. https://doi.org/10.3390/jrfm19050358

Article Menu

Bayesian Logistic Regression for Credit Risk Modelling Among South African Loan Borrowers

Abstract

1. Introduction