A Credit Risk Identification Model Based on the Minimax Probability Machine with Generative Adversarial Networks

Yutong Zhang; Xiaodong Zhao; Hailong Huang

doi:10.3390/math13203345

,

and

¹

School of Statistics and Data Science, Shanghai University of International Business and Economics, Shanghai 201620, China

²

School of Management, Shanghai University of International Business and Economics, Shanghai 201620, China

^*

Author to whom correspondence should be addressed.

Mathematics2025, 13(20), 3345;https://doi.org/10.3390/math13203345

This article belongs to the Section E2: Control Theory and Mechanics

Version Notes

Order Reprints

Abstract

In the context of industrial transitions and tariff frictions, financial markets are experiencing frequent defaults, emphasizing the urgency of upgrading credit scoring methodologies. A novel credit risk identification model integrating generative adversarial networks (GAN) and the minimax probability machine (MPM) is proposed. GAN generates realistic augmented samples to alleviate class imbalance in the credit score dataset, while the MPM optimizes the classification hyperplane by reformulating probability constraints into second-order cone problems via the multivariate Chebyshev inequality. Numerical experiments conducted on the South German Credit dataset, which represents individual (consumer) credit risk, demonstrate that the proposed generative adversarial network’s minimax probability machine (GAN-MPM) model achieves 76.13%, 60.93%, 71.78%, and 72.03% for accuracy, F1-score, sensitivity, and AUC, respectively, significantly outperforming support vector machines, random forests, and XGBoost. Furthermore, SHAP analysis reveals that the installment rate in percentage of disposable income, housing type, duration in month, and status of existing checking accounts are the most influential features. These findings demonstrate the effectiveness and interpretability of the GAN-MPM model, offering a more accurate and reliable tool for credit risk management.

Keywords:

credit risk identification; generative adversarial networks; minimax probability machine

MSC:

90B90

1. Introduction

1.1. Background

In recent years, credit loans have become a key financing instrument for supporting enterprise transformation and stimulating household consumption, thereby playing an essential role in maintaining economic growth. However, the expansion of credit lending requires precise risk management, which underscores the central role of credit scoring systems. Credit scoring quantifies the credit risk of individuals and firms, enabling financial institutions to optimize resource allocation in credit markets. For enterprises, such systems channel funds toward high-potential firms, supporting technological innovation and resilience against trade barriers. For individuals, differentiated credit scoring fosters consumption and enhances market vitality. Hence, investigating credit scoring under complex economic conditions is of theoretical and practical relevance for improving financial services and promoting sustainable economic development.

1.2. Theoretical Foundations of Credit Scoring

Early credit scoring models were represented by the Z-score model [1], which employed financial ratios for discriminant analysis to predict corporate bankruptcy probability. Subsequently, logistic regression gradually became the mainstream approach in credit scoring, with advantages in probabilistic modeling of default events, interpretability, and robustness [2]. From an economic perspective, the theoretical foundation of credit scoring lies in information asymmetry theory. Long [3] argued that in the “market for lemons,” borrower quality cannot be fully observed, leading to adverse selection. Cable [4] further analyzed that under information asymmetry, banks cannot effectively distinguish borrower types by adjusting interest rates, thereby underscoring the importance of credit scoring systems. Empirical research has also quantified these theoretical mechanisms: DeFusco et al. [5] estimated the welfare cost of asymmetric information in consumer credit markets, showing that misallocation due to hidden borrower characteristics imposes significant efficiency losses. Complementarily, Ioannidou et al. [6] demonstrated that collateral can serve as an effective contract design tool to mitigate adverse selection and moral hazard in lending markets, reinforcing the role of credit scoring as part of a broader set of information mechanisms. As a screening mechanism in the credit market, scoring quantifies borrowers’ historical behavior, financial status, and demographic information to estimate default probability, alleviating both adverse selection and moral hazard in credit allocation. Furthermore, behavioral economics provides a micro-level basis for credit scoring. Together, these theories form the foundation of modern credit scoring systems.

1.3. Behavioral Economics and Traditional Scorecard Systems

Wang et al. [7] found that borrowers’ time preferences are significantly correlated with default probability, indicating that behavioral factors also influence credit decisions beyond traditional financial variables. Here, a borrower’s time preference reflects the degree of present bias—that is, the tendency to favor immediate consumption over future benefits—rather than the requested loan duration, which may affect repayment discipline and overall credit risk. In practical financial applications, credit risk assessment has long relied on credit scorecard systems, which assign weighted scores to key borrower attributes to estimate the probability of default [8,9,10]. Typically based on logistic regression, scorecards are valued for their transparency, stability, and compliance with Basel and IFRS regulations. In the U.S. credit market, the FICO scoring system represents the dominant commercial implementation of credit scoring [11]. According to the official FICO explanation, a FICO Score is a three-digit number derived from key factors such as payment history, amounts owed, length of credit history, new credit, and credit mix, which helps lenders assess a borrower’s likelihood of repayment. While this approach enhances predictive reliability, it also involves extensive access to personal financial information, raising concerns regarding data privacy and interpretability. Nevertheless, the linear and additive structure of traditional scorecards limits their capacity to capture complex feature interactions and often results in degraded performance under imbalanced data conditions. Recent World Bank guidelines emphasize that, while explainability remains essential, modern credit risk models should also incorporate advanced learning frameworks to address data imbalance and nonlinear relationships without sacrificing interpretability.

1.4. Machine Learning Approaches Under Imbalanced Data

Research shows that machine learning models generally outperform traditional statistical models in credit scoring tasks. Gambacorta et al. [12] compared machine learning with traditional models in the Chinese fintech context and found that machine learning models perform better during credit tightening, capturing nonlinear relationships more effectively. Suhadolnik et al. [13] evaluated ten machine learning algorithms and reported that XGBoost achieved the best performance with an AUC of 0.7185. Mushava et al. [14] proposed new loss functions for gradient-boosted decision trees, improving performance under class imbalance conditions. Wahab et al. [15] further compared multiple machine learning and deep learning approaches, including decision tree, AdaBoost, and artificial neural networks, for credit card default prediction, and found that ensemble-based models achieved the highest overall accuracy. This work highlights the growing trend of integrating deep learning architectures into credit risk modeling, providing a methodological complement to our GAN-MPM framework. The SMOTE algorithm proposed by Chawla et al. [16] is a widely used oversampling technique that generates synthetic minority class samples to address class imbalance problems in machine learning. In a comprehensive survey, Chen et al. [17] reviewed recent advances in imbalanced learning, highlighting the integration of deep learning and ensemble strategies as promising directions to handle skewed data distributions, which are particularly relevant in credit scoring applications. Likewise, Dal Pozzolo [18] emphasized that in highly imbalanced scenarios, evaluation metrics such as the precision–recall curve and AUC-PR should be prioritized over accuracy, and further discussed the role of concept drift and temporal validation in sustaining model reliability for credit risk and fraud detection. Doshi-Velez et al. [19] highlight the need for a rigorous scientific foundation in interpretable machine learning, advocating for formal frameworks to evaluate and compare interpretability methods. These methods provide valuable preprocessing frameworks for subsequent model development and enable financial institutions to build high-precision credit risk prediction models.

1.5. Model Interpretability and Explainable AI in Credit Scoring

Model interpretability is a critical requirement in credit scoring. Hjelkrem et al. [20] used SHapley Additive exPlanations (SHAP) to explain deep learning models trained on open banking data and found that models trained from scratch outperformed BERT-based transfer learning models. Talaat et al. [21] combined deep learning with explainable AI techniques to develop a credit card default prediction system, achieving an accuracy of 83.50% while also delivering meaningful feature explanations. It should be noted that their study focused on credit card debt, which is fundamentally different from the South German Credit dataset, as credit card default can often be avoided by making only the minimum required payments. In contrast, the South German Credit dataset represents installment-based personal loans, where missed payments directly indicate default. Ribeiro et al. [22] introduced LIME, a local interpretable model-agnostic explanation method, which provides human-understandable justifications for individual classifier predictions by learning interpretable surrogate models. Aljadani et al. [23] proposed a unified mathematical framework incorporating the LIME explainer, which showed strong performance across multiple datasets. Building on this direction, Lundberg et.al [24] proposed SHAP, a unified framework based on Shapley values from cooperative game theory, offering consistent and theoretically grounded explanations of feature contributions across a wide range of machine learning models. Beyond individual methods, Wang et al. [25] provided a comprehensive survey of interpretable machine learning in credit scoring, categorizing explanation techniques and emphasizing their role in enhancing trust, fairness, and regulatory compliance in financial decision-making. At a broader level, Arrieta et al. [26] outlined the conceptual foundations of explainable artificial intelligence (XAI), proposing taxonomies and highlighting both opportunities and challenges toward building responsible AI systems. These contributions together underscore that interpretability is not only a technical necessity but also a cornerstone for trustworthy and accountable credit scoring practices.

1.6. Motivation and Overview of the Proposed GAN-MPM Framework

Despite significant advances in interpretable and data-driven credit scoring models, there remains a pressing need for methods that can simultaneously improve predictive performance and preserve transparency under imbalanced and high-default conditions. In this paper, to address these challenges, a generative adversarial network minimax probability machine (GAN-MPM) framework for credit risk identification is proposed. The key idea of the proposed approach is to integrate data-level augmentation with model-level robustness. The GAN component is employed to generate realistic synthetic samples of the minority (default) class, thus alleviating class imbalance and enhancing the representativeness of rare default patterns. The MPM component, in turn, formulates the credit classification problem as a probabilistic optimization task, where the separating hyperplane is derived under explicit probability constraints based on the multivariate Chebyshev inequality. This formulation provides rigorous probabilistic guarantees and improved generalization on small, noisy datasets.

The rest of this paper is organized as follows. Section 2 reviews the generative adversarial network and minimax probability machine. Section 3 gives the algorithm details of the proposed model. Section 4 shows the numerical experiments on the South German credit dataset and illustrates the comparison performance of the proposed method with other methods. Section 5 provides some conclusions and future work.

2. Related Work

2.1. Generative Adversarial Network

The Generative Adversarial Network (GAN) [27] consists of two neural networks: a Generator (G) and a Discriminator (D). During training, these two networks are engaged in an adversarial process and optimized simultaneously. The generator attempts to capture the distribution of real data in order to produce synthetic samples that resemble genuine ones, while the discriminator seeks to distinguish whether the input samples come from the true data distribution or are generated by G. This interaction forms a minimax game between the two components. The basic loss function is defined as:

min_{G} max_{D} V (D, G) = E_{x \sim p_{data} (x)} [log D (x)] + E_{z \sim p_{z} (z)} [log (1 - D (G (z)))],

(1)

where x denotes a real sample, z represents a random noise vector fed into the generator (typically drawn from a uniform or Gaussian distribution),

G (z)

is the synthetic sample produced by the generator, and

D (\cdot) \in (0, 1)

indicates the discriminator’s confidence that the input is a real sample. Figure 1 illustrates the surface of the loss function

V (G, D)

with respect to the parameter spaces of the generator and discriminator. Different points on the surface correspond to different states of the generator–discriminator pair during training. The specific meaning is as follows:

-: Current loss point $V (G, D)$ : the loss value corresponding to the current parameter settings of the generator and discriminator in the given training iteration.
-: Max $V (G, D)$ : the maximum discriminative capability achievable by D when the generator G is fixed.
-: Min $V (G, D)$ : the minimum loss achievable by G when the discriminator D is fixed, representing the generator’s ability to “fool” the discriminator.
-: Global optimum: the state in which the distribution of generated samples perfectly matches the real data distribution, such that the discriminator cannot distinguish between the two and always outputs $D (x) = 0.5$ .

Figure 1. Generative adversarial network minimax loss surface.

This structure is essentially a zero-sum game system. In theory, it can be proven that when the training reaches a Nash equilibrium, the distribution learned by the generator becomes identical to the real data distribution. To overcome the problem of gradient vanishing in the early stages of training, the generator’s objective function is often replaced with the following non-adversarial form:

min_{G} E_{z \sim p_{z} (z)} [- log D (G (z))]

(2)

Such an alternative loss function can enhance the gradient signal during the initial phase, enabling the generator to learn more effectively and approximate the true distribution more rapidly.

2.2. Minimax Probability Machine

This section provides a brief introduction to the minimax probability machine. For more details, please refer to [28].

Define the binary classification training set

\begin{matrix} T_{x} = \{(x_{1}, + 1), (x_{2}, + 1), \dots, (x_{N_{1}}, + 1)\}, \\ T_{y} = \{(y_{1}, - 1), (y_{2}, - 1), \dots, (y_{N_{2}}, - 1)\}, \end{matrix}

(3)

where

x_{i} \in R^{d}

is the

i

-th positive sample,

i = 1, 2, \dots, N_{1}

,

y_{j} \in R^{d}

is the

i

-th negative sample,

j = 1, 2, \dots, N_{2}

, and

N_{1}

,

N_{2}

denote the number of samples of positive and negative classes, respectively.

2.2.1. Linear Minimax Probability Machine

Minimax probability machine is a binary classification method that is proposed by minimizing the probability of misclassification. It does not require assumptions about the data distribution and provides a lower bound on the accuracy of classification.

Define the classification hyperplane of MPM as

g (x) = w^{T} x - b = 0,

(4)

where

w \in R^{d}

,

b \in R

.

Based on the Minimax Probability Machine (MPM) framework [28], the optimization problem can be formulated as follows.

\begin{matrix} max_{α, w, b} & α \\ s . t . & inf_{x \sim (μ_{1}, Σ_{1})} Pr \{w^{T} x \geq b\} \geq α, \\ inf_{y \sim (μ_{2}, Σ_{2})} Pr \{w^{T} y \leq b\} \geq α, \end{matrix}

(5)

where

x

is a positive sample with mean

μ_{1}

and covariance matrix

Σ_{1}

,

y

is a negative sample with mean

μ_{2}

and covariance matrix

Σ_{2}

. As shown in Equation (5), MPM tries to maximize the lower bound of both the probability that a positive class of samples lies above the hyperplane and a negative class of samples lies below the hyperplane. For solving Equation (5) effectively, Lemma 1, which is derived from the multivariate Chebyshev inequality [29], is adopted.

Lemma 1.

If

w

is not equal to 0, the following condition

\begin{matrix} inf_{x \sim (μ, Σ)} Pr \{w^{T} x - b \leq 0\} \geq α, \end{matrix}

holds if and only if

max (- w^{T} μ + b, 0) \geq κ (α) \sqrt{w^{T} Σ w} .

That is, when

w^{T} μ - b \leq 0

:

- w^{T} μ + b \geq κ (α) \sqrt{w^{T} Σ w},

where

κ (α) = \sqrt{\frac{α}{1 - α}}

.

Applying Lemma 1 to the two constraints in Equation (5) with means

μ_{1}, μ_{2}

and covariances

Σ_{1}, Σ_{2}

, then Equation (5) can be reformulated as the following:

\begin{matrix} max_{α, w, b} & κ (α) \\ s . t . & w^{T} μ_{1} - b \geq κ (α) \sqrt{w^{T} Σ_{1} w}, \\ - w^{T} μ_{2} + b \geq κ (α) \sqrt{w^{T} Σ_{2} w} . \end{matrix}

(6)

The following optimization problem is obtained by eliminating b in the constraints

\begin{matrix} max_{α, w} & κ (α) \\ s . t . & w^{T} μ_{2} + κ (α) \sqrt{w^{T} Σ_{2} w} \leq w^{T} μ_{1} - κ (α) \sqrt{w^{T} Σ_{1} w} . \end{matrix}

(7)

When the constraint in Equation (7) reaches equality, the optimal solution satisfies

w^{T} (μ_{1} - μ_{2}) = κ (α) (\sqrt{w^{T} Σ_{1} w} + \sqrt{w^{T} Σ_{2} w}),

(8)

which defines the boundary condition between the two classes.

Since MPM seeks to maximize

α

, and

κ (α) = \sqrt{\frac{α}{1 - α}}

increases monotonically with

α

, a larger

κ (α)

corresponds to a tighter probabilistic bound, thus yielding a more conservative and robust separating hyperplane. The left-hand side of Equation (8),

w^{T} (μ_{1} - μ_{2})

, is determined by the data means and thus fixed, maximizing

κ (α)

is equivalent to minimizing the right-hand side of the equation

\sqrt{w^{T} Σ_{1} w} + \sqrt{w^{T} Σ_{2} w}

. Consequently, Equation (7) can be formulated as the following second-order cone programming problem.

\begin{matrix} min_{w} & {∥Σ_{1}^{\frac{1}{2}} w∥}_{2} + {∥Σ_{2}^{\frac{1}{2}} w∥}_{2} \\ s . t . & w^{T} (μ_{1} - μ_{2}) = 1 . \end{matrix}

(9)

Here,

∥ Σ_{i}^{\frac{1}{2}} {w ∥}_{2}

denotes the Euclidean (

ℓ_{2}

) norm, defined as

\begin{matrix} ∥ Σ_{i}^{\frac{1}{2}} {w ∥}_{2} = \sqrt{w^{T} Σ_{i} w} \end{matrix}

(10)

where

Σ_{i}

represents the covariance matrix of class i

(i = 1, 2)

.

With the optimal value of

w

, the optimal value of b can be calculated as

\begin{matrix} b = w^{T} μ_{1} - \frac{{∥Σ_{1}^{\frac{1}{2}} w∥}_{2}}{{∥Σ_{1}^{\frac{1}{2}} w∥}_{2} + {∥Σ_{2}^{\frac{1}{2}} w∥}_{2}} . \end{matrix}

(11)

The class label of a new data point

x \in R^{d}

can be assigned by calculating

f (x) = s i g n (w^{T} x - b) .

(12)

2.2.2. Nonlinear Minimax Probability Machine

When the dataset is nonlinearly separable, a nonlinear mapping

φ (x)

can be used to project the data from a low-dimensional space to a high-dimensional feature space. In the new space, the positive samples

φ (x)

have the mean

μ_{φ (x)}

and covariance matrix

Σ_{φ (x)}

, while the negative samples

φ (y)

have the mean

μ_{φ (y)}

and covariance matrix

Σ_{φ (y)}

. To capture nonlinear separability, the mapping

φ (\cdot)

corresponds to a transformation into a high-dimensional reproducing kernel Hilbert space (RKHS), where the MPM is expressed in terms of kernel functions, according to the kernel extension of MPM proposed by Lanckriet et al. [28]. Typical kernel choices include the linear kernel

k (x_{i}, x_{j}) = x_{i}^{⊤} x_{j}

and the radial basis function (RBF) kernel

k (x_{i}, x_{j}) = exp (- γ ∥ x_{i} - x_{j} ∥^{2})

, which enable flexible nonlinear decision boundaries in the transformed feature space.

By defining the classification hyperplane

a^{T} φ (z) = b

in the new feature space, the classification model can be formulated as

\begin{matrix} min_{a} & \sqrt{a^{T} Σ_{φ (x)} a} + \sqrt{a^{T} Σ_{φ (y)} a} \\ s . t . & a^{T} (\bar{φ (x)} - \bar{φ (y)}) = 1 . \end{matrix}

(13)

Define the kernel function

K (x, y) = φ {(x)}^{T} φ (y)

, and

a = \sum_{i = 1}^{N_{1}} α_{i} φ (x_{i}) + \sum_{j = 1}^{N_{2}} β_{j} φ (y_{j}) .

(14)

Construct

\begin{matrix} K_{x} \in R^{N_{1} * (N_{1} + N_{2}),} & {[K_{x}]}_{j i} = K (x_{j}, z_{i}), \\ K_{y} \in R^{N_{2} * (N_{1} + N_{2})}, & {[K_{y}]}_{j i} = K (y_{j}, z_{i}), \\ {\tilde{k}}_{x} \in R^{N_{1} + N_{2}}, & {[{\tilde{k}}_{x}]}_{i} = \frac{1}{N_{1}} \sum_{j = 1}^{N_{1}} K (x_{j}, z_{i}), \\ {\tilde{k}}_{y} \in R^{N_{1} + N_{2}}, & {[{\tilde{k}}_{y}]}_{i} = \frac{1}{N_{2}} \sum_{j = 1}^{N_{2}} K (y_{j}, z_{i}), \end{matrix}

(15)

\begin{matrix} {\tilde{K}}_{x} = K_{x} - 1_{N_{1}} {\tilde{k}}_{x}^{T}, \\ {\tilde{K}}_{y} = K_{y} - 1_{N_{2}} {\tilde{k}}_{y}^{T}, \end{matrix}

(16)

where

z_{i}

is defined as:

z_{i} = \{\begin{matrix} x_{i} & i = 1, 2, \dots, N_{1} \\ y_{i - N_{1}} & i = N_{1} + 1, N_{1} + 2, \dots, N_{1} + N_{2} . \end{matrix}

(17)

Equation (13) can be rewritten as:

\begin{matrix} min_{γ} & {∥\frac{{\tilde{K}}_{x}}{\sqrt{N_{x}}} γ∥}_{2} + {∥\frac{{\tilde{K}}_{y}}{\sqrt{N_{2}}} γ∥}_{2} \\ s . t . & γ^{T} ({\tilde{k}}_{x} - {\tilde{k}}_{y}) = 1 . \end{matrix}

(18)

where

γ = {[\begin{matrix} α_{1} & α_{2} & \dots & α_{N_{1}} & β_{1} & β_{2} & \dots & β_{N_{2}} \end{matrix}]}^{T} .

The constraint

γ^{T} ({\tilde{k}}_{x} - {\tilde{k}}_{y}) = 1

serves as a normalization condition that fixes the scale of the kernel discriminant vector

γ

, thereby preventing trivial zero solutions and ensuring that the optimization problem remains convex and well-posed.

For the new sample

x

, the class label can be assigned by calculating

f (x) = s i g n (\sum_{i = 1}^{N_{1} + N_{2}} γ_{i} K (x, z_{i}) - b) .

(19)

3. A Proposed Model for Credit Risk Identification

3.1. The GAN-MPM Framework

In real-world credit risk modeling, datasets often exhibit structural imbalance between default and non-default cases, as the number of borrowers with good credit behavior typically exceeds those who default. This imbalance can make models more prone to learning non-default patterns, thereby strengthening their ability to identify low-risk customers, whereas financial institutions are often more concerned with accurately recognizing potential defaulters. To alleviate this issue, CTGAN is employed to generate realistic synthetic samples for both default and non-default classes. Unlike traditional oversampling methods such as SMOTE, CTGAN models the conditional joint distribution of tabular credit data via adversarial learning, thereby preserving nonlinear dependencies between demographic and financial features and ensuring that synthetic borrowers remain statistically consistent with real-world credit populations.

Based on the augmented dataset, our approach extends the MPM model (5) to a framework based on credit default risk evaluation. The extended MPM is referred to as GAN-MPM. In credit scoring, GAN-MPM aims to construct a classifier that maximizes the worst-case probability of correct classification. Instead of focusing on maximizing the geometric margin, as in support vector machines, MPM directly optimizes a probabilistic guarantee. By introducing the parameter

α

, the model ensures that both default (positive) and non-default (negative) classes achieve at least probability

α

of being correctly classified. In this way,

α

represents the maximum worst-case accuracy that the classifier can guarantee, providing a more robust decision boundary under distributional uncertainty in credit scoring datasets.

3.2. Algorithm

As described above, solving a second-order cone programming (SOCP) problem is applied to the GAN-MPM method. The algorithm constructs an SOCP problem and employs an interior-point method to obtain the optimal hyperplane direction vector. The bias term is then computed based on the means of the two classes of samples. Since the adversarial training process does not guarantee strict convergence, a practical termination criterion based on a fixed number of epochs T or stable loss values is adopted to ensure training stability. Finally, the algorithm outputs the optimal classification hyperplane parameters

γ

and b. The detailed procedure of the GAN-MPM algorithm is as follows (Algorithm 1):

Algorithm 1 GAN-MPM algorithm

Input: Positive samples

x

, negative samples

y

, kernel parameters, generator

G (z; θ_{g})

,
discriminator

D (x; θ_{d})

, learning rates

η_{g}, η_{d}

, batch size m, number of epochs T, tolerance
    tol.
Output: The label of a new sample.
     Step 1: Using GAN to generate adversarial samples:
        Step 1.1: Sample minibatches of real positive samples

x

and noise

z \sim p_{z} (z)

.
Step 1.2: Generate synthetic positive samples

\tilde{x} = G (z; θ_{g})

.
Step 1.3: Update discriminator parameters

θ_{d}

by maximizing

L_{D} = log D (x) + log (1 - D (\tilde{x})) .

Step 1.3: Update generator parameters

θ_{g}

by minimizing

L_{G} = log (1 - D (G (z; θ_{g}))) .

Step 1.4: Repeat for a fixed number of epochs T or until the discriminator and generator
losses satisfy

| Δ L_{D} | < tol

and

| Δ L_{G} | < tol

.
Step 2: Construct the augmented dataset by combining real samples

(x, y)

with GAN-
generated synthetic positive samples

\tilde{x}

.
Step 3: Calculate

{\tilde{k}}_{x}

,

{\tilde{k}}_{y}

,

{\tilde{K}}_{x}

,

{\tilde{K}}_{y}

, according to (15) and (16).
Step 4: Compute

γ

and

b

by solving problem (18).
Step 5: Determine the label of a new sample using Formula (19).

3.3. Model Performance Evaluation Metrics

In the field of credit scoring, model performance is commonly evaluated using metrics such as accuracy (Acc), F1-score, Sensitivity, Specificity, and AUC.

The AUC is employed to measure the model’s ability to distinguish between positive and negative classes, and it is particularly suitable for imbalanced datasets or scenarios where the effect of varying classification thresholds needs to be comprehensively assessed. The AUC corresponds to the area under the receiver operating characteristic (ROC) curve, with values ranging from 0 to 1.

Accuracy reflects the proportion of correctly classified credit cases at a specific decision threshold, while AUC indicates the overall discriminative ability of the model across all possible thresholds. Additionally, AUC remains robust under class imbalance, since it simultaneously considers both true positive and false positive rates. A higher AUC usually implies a better achievable accuracy after threshold optimization.

4. Experiments

In this section, numerical experiments are conducted to verify the feasibility and effectiveness of the proposed GAN-MPM method. To eliminate the impact of differences in variable scales on model training, all numerical features were further standardized to have a mean of 0 and a variance of 1, thereby providing a solid data foundation for efficient training and robust prediction of subsequent classifiers. The experiments are performed on South German credit dataset with Random Forest, XGBoost and support vector machine (SVM), MPM, GAN-MPM with a linear kernel (lin), and Gaussian kernel (rbf).

The experimental process is set as follows: 70% of the data points are randomly selected for training, and the remaining 30% are used for testing. The training and testing process is repeated 10 times, and the mean and standard deviation of the 10 experiments are recorded. Then, 5-fold cross-validation is adopted to choose the optimal parameters. In addition, to ensure comparability across models, the GAN model is trained only on the training portion to generate additional synthetic samples, which are then combined with the original training data to form the augmented training set, while the test set remains completely separate. The regularization parameter C for SVM is selected in the range of

{2^{- 5}, \dots, 2^{5}}

. The kernel parameters are chosen in the range of

{2^{- 5}, 2^{0}, 2^{5}}

for SVM, CSMPM. The

n_{estimators}

parameter of both Random Forest and XGBoost is set to 100. For XGBoost, the evaluation metric is logloss, the maximum tree depth is fixed at 5, and the learning rate

η

is set to 0.1. For GAN training, the number of epochs is set to 300, with both the generator and discriminator learning rates configured as 2 × 10⁻⁴, and the batch size is fixed at 500. The average classification Acc, MCC, F1-score, Sensitivity, Specificity, and the corresponding standard deviation are recorded. All of these methods are implemented in Matlab 2025a equipped with a laptop with Windows 10 operating system with Intel(R) Core(TM) i7 processor and 16 GB RAM.

4.1. The South German Credit Dataset

The South German credit dataset (Credit-g) from the UCI Machine Learning Repository, available at https://archive.ics.uci.edu/dataset/573/south+german+credit+update (accessed on 20 September 2025), is used in this experiment. In this study, credit risk specifically refers to individual (consumer) loan default risk, as represented by the South German Credit dataset, which characterizes personal borrowers rather than corporate or commercial lending. The dataset contains a total of 1000 samples, with each sample consisting of 20 feature variables.

The attributes describe the demographic, financial, and credit-related information of each applicant. Specifically, status (A1) indicates the condition of the applicant’s existing checking account, reflecting short-term liquidity. Duration (A2) denotes the credit duration in months, corresponding to the loan term. Credit_history (A3) records past repayment performance and existing credit behavior. Purpose (A4) specifies the intended use of the loan, such as car purchase, furniture, appliances, or education. Amount (A5) represents the total requested credit amount in Deutsche Marks (DM). Savings (A6) describes the average balance of the applicant’s savings account or bonds, serving as an indicator of asset stability. Employment_duration (A7) measures the number of years the applicant has been employed, reflecting job stability. Installment_rate (A8) represents the percentage of the applicant’s monthly installment relative to disposable income, indicating repayment burden rather than loan term or payment frequency. Personal_status_sex (A9) combines marital status and gender information. Other_debtors (A10) indicates whether the applicant has co-applicants or guarantors. Present_residence (A11) denotes the number of years the applicant has lived at their current address. Property (A12) identifies the type of property ownership, such as real estate, savings agreements, or other assets. Age (A13) represents the applicant’s age in years. Other_installment_plans (A14) indicates the presence of other installment agreements, such as with banks or stores. Housing (A15) refers to the applicant’s housing type—own, rent, or free—reflecting financial stability and asset possession. Number_credits (A16) represents the number of existing credits at the bank. Job (A17) describes the type of occupation and qualification level. People_liable (A18) specifies the number of people financially dependent on the applicant. Telephone (A19) indicates whether the applicant has a registered telephone line. Foreign_worker (A20) identifies whether the applicant is a foreign worker. Finally, the target variable credit risk (Class) is binary and classifies applicants as either good (1) or bad (0) credit risks based on historical repayment outcomes. A brief description of these variables is provided in Table 1.

Table 1. Description of feature variables on the Credit-g dataset.

The number of default samples (700, 70.00%) is larger than non-default samples (300, 30.00%), leading to structural imbalance. This imbalance causes models to focus more on rejection, enhancing their capacity to detect negative cases, whereas in practice, financial institutions place greater emphasis on accurately identifying creditworthy applicants—those with a low probability of default—so as to reduce financial risk while maintaining reasonable non-default rates. After GAN training, the number of non-default (positive) samples increases from 300 to 600, resulting in an enhanced dataset with a ratio of approximately 0.86:1 (non-default: defaulted), which better aligns with industry target non-default rates.

4.2. The Results Based on the ACC, F1-Score, Sensitivity, Specificity, and AUC

The comparative results are summarized in Table 2. Overall, the introduction of GAN-generated samples yields consistent improvements across all key evaluation metrics. In terms of accuracy, GAN–MPM (rbf) achieves the highest performance of 76.13%, outperforming traditional MPM, Random Forest, and XGBoost. For positive-class recognition, GAN–MPM (lin) attains the best F1-score (60.93%) and Sensitivity (71.78%), demonstrating its superior capability in identifying default (minority) cases. Meanwhile, Random Forest and XGBoost exhibit relatively high Specificity, with XGBoost reaching 94.77%, indicating a stronger bias toward correctly classifying non-default (majority) samples. These results highlight that the proposed GAN–MPM framework effectively balances predictive accuracy and positive-class recall, providing a more comprehensive solution for credit risk assessment under imbalanced data conditions.

Table 2. The mean and standard deviation of ACC, F1-score, Sensitivity, Specificity, and AUC on the Credit-g dataset in each model.

In addition, the AUC results provide a more comprehensive view of each model’s discriminative capability on the Credit-g dataset. Despite the limited sample size and imbalance, GAN–MPM (lin) achieves the highest AUC value of 72.03%, demonstrating its superior ability to distinguish between default and non-default applicants. XGBoost delivers a competitive AUC of 70.82%, followed by MPM (rbf) with 69.81%, indicating that both models maintain stable performance under imbalanced conditions. In contrast, SVM models exhibit lower and more variable AUC values, reflecting weaker separability between creditworthy and defaulting applicants.

4.3. Feature Importance Analysis and Interpretability Study of the GAN-MPM Model Based on SHAP

SHapley Additive exPlanations (SHAP) is a method used to interpret the prediction results of machine learning models. It is based on the theory of Shapley values and decomposes the prediction outcome into contributions from individual features, thereby providing both global and local interpretability. SHAP has been widely applied in feature importance ranking. Through SHAP values, one can intuitively observe the extent to which each feature influences the model’s predictions.

In the field of machine learning and deep learning, although complex models such as deep neural networks and ensemble models often demonstrate superior predictive performance, they are typically regarded as “black boxes,” making it difficult to interpret their internal decision-making processes. SHAP addresses this issue by assigning importance values to features, thereby offering a powerful tool to explain model outputs.

In this experiment, the SHAP method is employed to perform interpretability analysis of the GAN-MPM model. By computing the Shapley value of each feature, the contribution and directional effect of the top 10 features on model predictions are revealed, as shown in Figure 2. Each bar represents the mean absolute SHAP value of a feature, while the overlaid scatter points (colored from blue to red) indicate the SHAP value of each sample, where red denotes higher feature values and blue denotes lower ones. Points distributed to the right of zero contribute positively to the predicted probability of approval (non-default), whereas those on the left indicate higher predicted default risk.

Figure 2. SHAP beeswarm plot of the GAN-MPM model.

The feature “installment rate” shows the largest mean SHAP magnitude, indicating its dominant influence. Most red points for this variable are located on the left side of zero, while blue points are concentrated on the right. This pattern implies that higher installment rates (red, representing a larger ratio of monthly payment to income) contribute negatively to loan approval by increasing repayment burden and default risk, whereas lower installment rates (blue) contribute positively by reducing financial stress and improving the likelihood of approval. The “housing” variable also exhibits strong importance, with red points (property ownership) mainly concentrated on the right side, reflecting a stabilizing effect of home ownership on creditworthiness. Conversely, blue points (rented or free accommodation) appear more dispersed toward the negative SHAP region, suggesting less financial stability and a higher default tendency. Variables such as “credit history,” “employment duration,” and “savings” display wider horizontal spreads and mixed color patterns, indicating heterogeneous effects across borrowers. For example, while a strong credit history (red) generally reduces default risk, certain applicants with limited employment duration or low savings may still be assigned higher risk due to combined socio-economic factors. These interactions lead to broader SHAP distributions even among features with substantial average importance. Overall, the SHAP analysis highlights that repayment capacity and financial stability indicators (installment rate and housing) have the strongest overall impact on credit risk in this dataset, complementing traditional variables such as credit history.

5. Conclusions

In this paper, we focus on the South German dataset and construct a series of classification models for loan default prediction. In particular, a GAN-MPM credit risk identification model is proposed to improve overall performance under imbalanced sample conditions. Evaluations across multiple metrics demonstrate that the proposed model outperforms traditional models such as SVM and XGBoost, thereby validating its effectiveness and significant advantages in financial risk control tasks. The GAN–MPM model achieves competitive accuracy while providing enhanced robustness and interpretability under these more constrained conditions. Furthermore, SHAP is introduced to conduct a detailed feature contribution analysis, providing deeper insights into the specific role of each feature in model predictions, enhancing model transparency, and enabling the accurate identification of key decision-making features.

The interpretability analysis of the GAN–MPM model shows that features reflecting repayment capacity (installment rate), asset stability (housing and savings), and behavioral reliability (credit history and employment duration) dominate credit risk prediction. These results align with previous studies emphasizing repayment burden and property ownership as indicators of financial stability [1,10,13]. Model interpretation studies [21,24] have also shown that credit history interacts with other socio-economic variables, resulting in heterogeneous effects across borrowers. Unlike traditional statistical models, the GAN–MPM framework captures non-linear dependencies between demographic and financial features, slightly altering feature rankings. Overall, the findings remain economically interpretable and broadly consistent with established credit risk literature, while demonstrating the advantage of SHAP-based interpretability in complex generative models. In addition, future work will aim to quantify the cross-fold variance of SHAP values to provide a more formal measure of interpretability robustness and to statistically validate the consistency of feature contributions under data resampling.

In the future, research can be carried out in the following directions. First, since MPM assumes that both classes are equally important in decision-making, future work may attempt to integrate GAN with the minimum error maximum–minimum probability machine (MEMPM) for credit risk identification by introducing a weighting parameter to assign different levels of attention to different classes. Second, the applicability of the proposed model can be further explored in other financial domains, such as fraud detection and customer segmentation, to enhance the personalization and controllability of financial services. Third, future studies may investigate the integration of deep feature extraction techniques in complex feature interaction spaces or combine transfer learning strategies to build high-performance credit scoring models suitable for cross-regional and multi-source data scenarios, thereby providing stronger technical support for the development of financial risk management.

Author Contributions

Methodology, Y.Z.; software, Y.Z.; validation, Y.Z. and H.H.; writing—original draft, Y.Z.; writing—review and editing, H.H. and X.Z.; supervision, H.H.; project administration, H.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Natural Science Foundation of China (72293583, 72293580).

Data Availability Statement

The original data presented in the study are openly available in the UCI Machine Learning Repository at https://archive.ics.uci.edu/dataset/573/south+german+credit+update (accessed on 20 September 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bui, T.H.; Truong, T.T.D.; Tran, T.P.T. Financial Ratios as Indicators in Bankruptcy Prediction: A Comparative Analysis of Statistical and Machine Learning Models. Res. Sq. 2025. [Google Scholar] [CrossRef]
Kozodoi, N.; Jacob, J.; Lessmann, S. Fairness in Credit Scoring: Assessment, Implementation and Profit Implications. Eur. J. Oper. Res. 2022, 297, 1083–1094. [Google Scholar] [CrossRef]
Long, R. The Market for Lemons and the Regulator’s Signalling Problem. arXiv 2023, arXiv:2312.10896. [Google Scholar]
Cable, J.; Turner, P. Asymmetric Information and Credit Rationing: Another Economic View Problem of Industrial Bank Lending and Britain’s. In Advances in Monetary Economics; Routledge: London, UK, 2021; pp. 207–220. [Google Scholar]
DeFusco, A.A.; Tang, H.; Yannelis, C. Measuring the Welfare Cost of Asymmetric Information in Consumer Credit Markets. J. Financ. Econ. 2022, 146, 821–840. [Google Scholar] [CrossRef]
Ioannidou, V.; Ongena, S.; Peydró, J.L.; van Horen, N. Collateral and Asymmetric Information in Lending Markets. J. Financ. Econ. 2022, 143, 875–902. [Google Scholar] [CrossRef]
Wang, S.; St John, J. Present Bias, Payday Borrowing, and Financial Literacy. 2023. Available online: https://commons.stmarytx.edu/rsc25pres/14/ (accessed on 8 July 2025).
Martin, J.; Akhavan-Abdollahian, M.; Taheri, S.; Akman, D. Optimal Credit Scorecard Model Selection Using Costs Arising from Both False Positives and False Negatives. SSRN Electron. J. 2022. [Google Scholar] [CrossRef]
Huang, E.; Scott, C. Credit Risk Scorecard Design, Validation and User Acceptance—A Lesson for Modellers and Risk Managers; Credit Research Centre, University of Edinburgh Business School: Edinburgh, UK, 2007. [Google Scholar]
World Bank Group. Credit Scoring Approaches Guidelines; World Bank: Washington, DC, USA, 2020. [Google Scholar]
FICO. What Is a FICO Score? Fair Isaac Corporation: Minneapolis, MN, USA, 2025. [Google Scholar]
Gambacorta, L.; Huang, Y.; Qiu, H.; Wang, J. How Do Machine Learning and Non-Traditional Data Affect Credit Scoring? New Evidence from a Chinese Fintech Firm. J. Financ. Stab. 2024, 73, 101284. [Google Scholar] [CrossRef]
Suhadolnik, N.; Ueyama, J.; Da Silva, S. Machine Learning for Enhanced Credit Risk Assessment: An Empirical Approach. J. Risk Financ. Manag. 2023, 16, 496. [Google Scholar] [CrossRef]
Mushava, J.; Murray, M. Flexible Loss Functions for Binary Classification in Gradient-Boosted Decision Trees: An Application to Credit Scoring. Expert Syst. Appl. 2024, 238, 121876. [Google Scholar] [CrossRef]
Wahab, F.; Khan, I.; Sabada, S. Credit Card Default Prediction Using ML and DL Techniques. Internet Things Cyber-Phys. Syst. 2024, 4, 100008. [Google Scholar] [CrossRef]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Chen, Y.; Zhang, H.; Wang, J.; Li, X. Imbalanced Learning: Progress and Challenges with Deep Learning and Ensemble Methods. Artif. Intell. Rev. 2024, 57, 2105–2136. [Google Scholar]
Dal Pozzolo, A.; Boracchi, G.; Caelen, O.; Alippi, C.; Bontempi, G. Credit Card Fraud Detection: A Realistic Modeling and a Novel Learning Strategy. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 3784–3797. [Google Scholar] [CrossRef] [PubMed]
Doshi-Velez, F.; Kim, B. Towards a rigorous science of interpretable machine learning. arXiv 2017, arXiv:1702.08608. [Google Scholar] [CrossRef]
Hjelkrem, L.O.; Lange, P.E. Explaining Deep Learning Models for Credit Scoring with SHAP: A Case Study Using Open Banking Data. J. Risk Financ. Manag. 2023, 16, 221. [Google Scholar] [CrossRef]
Talaat, F.M.; Aljadani, A.; Badawy, M.; Elhosseini, M. Toward Interpretable Credit Scoring: Integrating Explainable Artificial Intelligence with Deep Learning for Credit Card Default Prediction. Neural Comput. Appl. 2024, 36, 4847–4865. [Google Scholar] [CrossRef]
Ribeiro, M.T.; Singh, S.; Guestrin, C. “Why should I trust you?” Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1135–1144. [Google Scholar]
Aljadani, A.; Alharthi, B.; Farsi, M.A.; Balaha, H.M.; Badawy, M.; Elhosseini, M.A. Mathematical Modeling and Analysis of Credit Scoring Using the LIME Explainer: A Comprehensive Approach. Mathematics 2023, 11, 4055. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. In Proceedings of the Advances in Neural Information Processing Systems 30 (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Wang, Y.; Chen, X.; Li, Y. Interpretable Machine Learning in Credit Scoring: A Survey. Expert Syst. Appl. 2023, 213, 118849. [Google Scholar]
Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, Taxonomies, Opportunities and Challenges toward Responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
Fiore, U.; De Santis, A.; Perla, F.; Zanetti, P.; Palmieri, F. Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Inf. Sci. 2019, 479, 448–455. [Google Scholar] [CrossRef]
Lanckriet, G.R.G.; El Ghaoui, L.; Bhattacharyya, C.; Jordan, M.I. Minimax Probability Machine. In Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic; MIT Press: Cambridge, MA, USA, 2021; pp. 801–807. [Google Scholar]
Albert, M.W.; Ingram, O. Multivariate Chebyshev Inequalities. Ann. Math. Stat. 1960, 31, 1001–1014. [Google Scholar] [CrossRef]

Figure 2. SHAP beeswarm plot of the GAN-MPM model.

Table 1. Description of feature variables on the Credit-g dataset.

Attribute	Field Name	Value Coding
A1	status	Ordinal (4 levels)
A2	duration	Real value (months)
A3	credit_history	Ordinal (5 levels)
A4	purpose	Categorical (multi-class)
A5	amount	Real value (DM)
A6	savings	Ordinal (5 levels)
A7	employment_duration	Ordinal (5 levels)
A8	installment_rate	Ordinal (4 levels)
A9	personal_status_sex	Categorical (4 levels)
A10	other_debtors	Categorical (3 levels)
A11	present_residence	Real value (years)
A12	property	Ordinal (4 levels)
A13	age	Real value (years)
A14	other_installment_plans	Categorical (3 levels)
A15	housing	Categorical (3 levels)
A16	number_credits	Real value (count)
A17	job	Ordinal (4 levels)
A18	people_liable	Binary (1/2)
A19	telephone	Binary (0/1)
A20	foreign_worker	Binary (0/1)
Class	credit_risk	Binary (0/1)

Table 2. The mean and standard deviation of ACC, F1-score, Sensitivity, Specificity, and AUC on the Credit-g dataset in each model.

Model	ACC	F1-Score	Sensitivity	Specificity	AUC
Random Forest	69.93 ± 2.31	21.03 ± 7.72	20.99 ± 5.67	90.89 ± 2.06	62.81 ± 3.84
XGBoost	72.47 ± 2.29	25.40 ± 5.36	20.31 ± 4.13	94.77 ± 1.45	70.82 ± 1.70
SVM	70.10 ± 1.99 (lin) 71.17 ± 3.42 (rbf)	29.36 ± 20.11 (lin) 27.56 ± 15.44 (rbf)	42.08 ± 25.92 (lin) 22.45 ± 13.66 (rbf)	63.63 ± 30.18 (lin) 74.14 ± 4.07 (rbf)	52.85 ± 10.94 (lin) 58.30 ± 4.87 (rbf)
MPM	71.47 ± 1.74 (lin) 72.13 ± 2.10 (rbf)	44.06 ± 2.65 (lin) 57.44 ± 3.43 (rbf)	37.56 ± 3.81 (lin) 62.89± 6.33 (rbf)	86.00 ± 3.30 (lin) 76.10 ± 3.36 (rbf)	61.43 ± 0.45 (lin) 69.81 ± 0.46 (rbf)
GAN-MPM	72.37 ± 2.18 (lin) 76.13 ± 1.83 (rbf)	60.93 ± 2.06 (lin) 56.74 ± 3.04 (rbf)	71.78 ± 4.13 (lin) 52.22 ± 3.95 (rbf)	72.62 ± 3.92 (lin) 86.38 ± 2.54 (rbf)	72.03 ± 0.36 (lin) 69.26 ± 0.45 (rbf)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A Credit Risk Identification Model Based on the Minimax Probability Machine with Generative Adversarial Networks

Abstract

1. Introduction

1.1. Background

1.2. Theoretical Foundations of Credit Scoring

1.3. Behavioral Economics and Traditional Scorecard Systems

1.4. Machine Learning Approaches Under Imbalanced Data

1.5. Model Interpretability and Explainable AI in Credit Scoring

1.6. Motivation and Overview of the Proposed GAN-MPM Framework

2. Related Work

2.1. Generative Adversarial Network

2.2. Minimax Probability Machine

2.2.1. Linear Minimax Probability Machine

2.2.2. Nonlinear Minimax Probability Machine

3. A Proposed Model for Credit Risk Identification

3.1. The GAN-MPM Framework

3.2. Algorithm

3.3. Model Performance Evaluation Metrics

4. Experiments

4.1. The South German Credit Dataset

4.2. The Results Based on the ACC, F1-Score, Sensitivity, Specificity, and AUC

4.3. Feature Importance Analysis and Interpretability Study of the GAN-MPM Model Based on SHAP

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics