Relevant prior work includes studies of credit scoring models and algorithms to handle distributional shifts in credit scoring problems.
2.1. Credit Scoring Model
Existing credit scoring models can be classified into two main categories: single-structure models and fusion-structure models. Early studies focused on the simple analysis of individual credit using descriptive and exploratory statistical methods. Over time, however, research has shifted towards the integration of machine learning techniques to enhance model scoring.
The origins of credit scoring can be traced back to 1956, when Bill Fair introduced the FICO method. As an ancient single-structure model, it scores an individual’s credit based on a quantitatively weighted score through feature selection by a credit expert [
11]. The FICO methodology laid the foundation for the idea of credit scoring using multidimensional weighted evaluations, which is widely used in traditional bank credit scoring practices. However, the results of the FICO model are influenced by credit experts’ selection of features and lack fairness [
12]. To mitigate this issue, many researchers have employed more advanced statistical tools, along with machine learning and deep learning techniques, to minimize the influence of human factors.
In terms of the use of statistical tools, credit scoring models based on methods such as logistic regression, naive Bayes, MDP, etc., have been developed [
4,
13]. These models are simple in structure, offer good interpretability and robustness, and meet the regulatory needs of the financial sector, but the accuracy needs to be improved. Machine learning models, such as decision trees, support vector machines, and BP neural networks, can uncover nonlinear relationships in data and tend to generalize better than traditional statistical models [
14]. However, except for decision trees, these models are less interpretable and computationally efficient than methods like logistic regression. Deep learning, a representation learning method, has achieved significant success with image, speech, and text data, but its structure often requires adjustments before it can effectively analyze tabular data in credit scoring. Also, the multilayered, nonlinear nature of deep models makes their decision-making process difficult to interpret.
Different single-structure models each have their own advantages and disadvantages. Fusion structure models combine multiple types of sub-models to leverage the strengths of each and improve overall model performance [
15,
16]. Based on how the sub-models are combined, fusion models can be categorized into integration-type models and non-integration-type models.
Integration-type models typically involve constructing multiple independent sub-models, which may be of the same or different types. These sub-models’ prediction results are then combined (e.g., through averaging, voting, etc.) to generate a comprehensive prediction that outperforms a single model [
17]. For example, Xiao et al. [
7] designed a semi-supervised integration method for credit scoring. They first trained an initial integrated model using labeled samples consisting of multiple sub-models. This model then labels unlabeled data through voting, which is added to the training set along with the predicted labels. The model is updated on the new training set, and a cost-sensitive neural network is constructed to complete the credit scoring process. Liu et al. [
16] designed a heterogeneous deep forest model based on random forests. This model integrates multiple tree-based ensemble learners at each level, increasing complexity as the dataset size grows and avoiding isomorphic predictions in ensemble frameworks. Runchi et al. [
3] employed an integrated approach to improve the logistic model. By adjusting the equilibrium rate of the sub-datasets used to train the sub-models and applying dynamic weighting, their model enhanced its effectiveness in handling imbalanced credit data.
Non-integration-type models focus on combining different types of models or different feature representations to create a new model with superior overall performance. This typically involves processing the outputs of different models (e.g., through weighting, summing, or splicing) to generate final predictions. Many models of this type have shown strong performance in credit scoring. For instance, Shen et al. [
18] combined a three-stage decision module with an unsupervised transfer module to align the selected sample distribution with the accepted sample distribution while also selecting the rejected samples. Roseline et al. [
19] proposed an LSTM-RNN model that uses a recurrent neural network to fit the samples and an LSTM to extract complex interrelated features from sequential data. Their experimental results showed that the model outperformed single-structure models. To address the issue of spatial local correlation in form data for credit scoring, Qian et al. [
20] used soft reordering to adaptively reorganize the one-dimensional form data. This allowed the data to have spatially correlated information, thus enhancing the CNN’s ability to process the data effectively.
Based on the preceding discussion of credit scoring models, it can be observed that fusion models exhibit higher prediction accuracy and robustness compared to single models in credit scoring applications, although their complex structure impairs interpretability. Additionally, most existing studies have concentrated on enhancing the predictive performance of credit scoring models without addressing the shift in credit data distribution between the source and target domains.
2.2. Algorithm to Handle Distributional Difference in the Credit Scoring Problem
The accuracy of credit scoring models, which are essential tools for banks to evaluate loan applicants’ qualifications and determine the credit amounts to be lent, is highly dependent on the consistency of the data distribution. In contrast to other domains, credit data often encounter issues such as missing rejection samples and imbalanced sample distribution, which significantly exacerbate the distributional discrepancies between the credit data used to train the model (the source domain) and the actual credit data used during model application (the target domain), ultimately reducing the model’s accuracy [
21].
The reasons for distributional shifts in credit assessment can be classified into two categories: sample selection bias and changes in application scenarios. As banks and financial institutions lack access to information about unapproved loan applicants, most data used for constructing credit scoring models are derived from approved customers, leading to a sample selection bias in relation to the true distribution [
22]. Researchers often consider this sample selection bias as a missing data issue, treating applicants’ insufficient credit history or partial data loss as random missing (MAR) and missing rejection samples as non-random missing (MNAR), with the process of addressing missing data referred to as rejection inference [
23].
Under the MAR assumption, Li et al. [
24] employed a semi-supervised SVM model for rejection inference to determine the support hyperplane by incorporating both accepted and rejected samples, with results outperforming traditional supervised credit scoring methods. Kang et al. [
25] introduced the label spreading method into credit scoring and employed a graph-based semi-supervised learning technique with SMOTE for rejection inference, improving the model’s performance in handling unbalanced data. Concurrently, deep generative methods have also been incorporated into rejection inference. Mancisidor et al. [
26] combined Gaussian mixture models and auxiliary variables within a semi-supervised framework with generative models, utilizing efficient stochastic gradient optimization to enhance the model’s capacity to handle large datasets. This approach is advantageous, as model performance improves progressively with increased training data. Shen et al. [
21] employed a semi-supervised approach based on joint distributional adaptivity to a supervised form, reducing the distributional discrepancy between acceptance and rejection sample sets and achieving state-of-the-art performance in the semi-supervised rejection inference credit assessment task.
In summary, existing studies have demonstrated an improved ability to address sample selection bias, which results in data distribution shift, through a variety of rejection inference techniques. However, the existing research does not consider the problem of data distribution shift faced by credit scoring models.
With the advent of the big data era, the impact of shifting application scenarios on the accuracy of credit scoring has become increasingly significant. Compared to traditional bank lending practices, diversified online lending products cater to different audiences, which often results in a distributional shift between the training data and actual data. As a result, it is difficult to directly apply a credit scoring model developed for one product to other products.
Domain adaptation is a primary approach to address distribution bias, applicable when the source and target tasks are the same in transfer process but the data distributions in the source and target domains differ. Unlike traditional credit scoring models, domain adaptation models use a small amount of target domain unlabeled data.
Figure 1 illustrates the classification of domain adaptation methods. Depending on whether intermediate domains are constructed during transfer, domain adaptation methods can be divided into single-step and multi-step approaches.
In cases where the shift between the source and target domains is substantial (e.g., when the source domain contains text data and the target domain contains image data), the transfer task may need to be broken into multiple mappings for multi-step domain adaptation. For instance, in situations where the distribution shift in an image classification task is large, Xiang et al. [
27] designed a multi-step domain-adaptive image classification network with an attention mechanism to complete the domain adaptation task in two steps. The first step uses the attention mechanism to merge the source and target domain data, while the second step aligns the source and target domains at both the pixel and global levels. This method effectively mitigates model performance degradation caused by significant distributional differences between the source and target domains in image classification tasks.
Furthermore, single-step domain adaptation can be further divided into homogeneous domain adaptation (where the data space is consistent, but distributions differ) and heterogeneous domain adaptation (where both the data space and distributions differ), depending on the source and target domain data [
28]. In the case of heterogeneous domain adaptation, features are challenging to automatically align in the neural networks used for transfer. Thus, a feature converter must be introduced before mapping the source and target domain features. The feature converter is based on the distributional information between the source and target domains to achieve generalized feature alignment. Building on this concept, Gupta et al. [
29] designed the Cross Modal Distillation method, which first learns the structure of the feature converter from a large labeled modality (the source domain) and then uses it to extract features from an unlabeled modality (the target domain), facilitating transfer supervision between images from different modalities.
For credit scoring tasks, which typically rely on tabular data with similar features, only a single-step domain adaptation is necessary. Since different credit datasets usually have different data spaces, a feature converter needs to be designed to resolve the heterogeneity between the data domains. For example, AghaeiRad et al. [
30] designed an unsupervised transfer model based on the self-organising map (SOM), which clusters the knowledge discovered by the SOM and passes it into the FNN to achieve more accurate credit scoring.
The core of domain adaptation modeling lies in measuring and adjusting the distributional differences between the source and target domains. Early research focused on the linear assumption that the distributional bias could be represented by a linear mapping [
31]. Over time, nonlinear representations (e.g., neural network-based models) were introduced to improve the handling of distributional differences, particularly through the development of robust representation principles in the denoising autoencoder paradigm [
32]. More recently, many unsupervised methods have been applied to address distributional bias. Zhang et al. [
33] reweighted the sample data to adjust the distributions, but this approach struggled with feature space variations. Zhang et al. developed a method to learn feature transformation matrices online in the original feature space by measuring the similarities between distributions. They then mapped these original features to the kernel space using Online Single Kernel Feature Transformation (OSKFT), thus learning nonlinear feature transformations. Compared to sample reweighting, this method—matching kernels to regenerate the mean of the Hilbert space distribution—can handle feature space variations. However, it suffers from unidirectionality in its mapping without further improvements.
Our approach also aims to match the spatial distributions of features between the source and target domains. However, unlike sample reweighting or direct mapping from the source domain to the target domain, it leverages the concept of domain-adversarial training [
34]. Specifically, we use the Wasserstein metric between features and the discriminative properties between domains as criteria to classify feature subsets. We then confrontationally learn a mapping such that, after passing through this mapping, the data from the source and target domains become indistinguishable to the domain discriminator.
Figure 2 illustrates the difference between our method and the previous methods.