You are currently viewing a new version of our website. To view the old version click .
Computers
  • Article
  • Open Access

18 December 2025

FastTree-Guided Genetic Algorithm for Credit Scoring Feature Selection

,
and
1
College of Information Technology, University of Bahrain, Zallaq 1054, Bahrain
2
Beacom College of Computer and Cyber Sciences, Dakota State University, Madison, SD 57042, USA
*
Author to whom correspondence should be addressed.
This article belongs to the Section AI-Driven Innovations

Abstract

Feature selection is pivotal in enhancing the efficiency of credit scoring predictions, where misclassifications are critical because they can result in financial losses for lenders and exclusion of eligible borrowers. While traditional feature selection methods can improve accuracy and class separation, they often struggle to maintain consistent performance aligned with institutional preferences across datasets of varying size and imbalance. This study introduces a FastTree-Guided Genetic Algorithm (FT-GA) that combines gradient-boosted learning with evolutionary optimization to prioritize class separability and minimize false-risk exposure. In contrast to traditional approaches, FT-GA provides fine-grained search guidance by acknowledging that false positives and false negatives carry disproportionate consequences in high-stakes lending contexts. By embedding domain-specific weighting into its fitness function, FT-GA favors separability over raw accuracy, reflecting practical risk sensitivity in real credit decision settings. Experimental results show that FT-GA achieved similar or higher AUC values ranging from 76% to 92% while reducing the average feature set by 21% when compared with the strongest baseline techniques. It also demonstrated strong performance on small to moderately imbalanced datasets and more resilience on highly imbalanced ones. These findings indicate that FT-GA offers a risk-aware enhancement to automated credit assessment workflows, supporting lower operational risk for financial institutions while showing potential applicability to other high-stakes domains.

1. Introduction

In continued pursuance of economic growth nationwide, financial institutions continue to provide consumer facilities as one of their core business functions. Facilities such as mortgages or personal loans provide borrowers with the necessary funds to purchase various products. Nonetheless, credit scoring plays a pivotal role in deciding whether a loan is to be lent to an applicant, whereby applicants must undergo a rigorous risk assessment process to evaluate their financial stability to approve and lend the facility. Creditworthiness is commonly measured based on the financial factors of the applicants, which include income, credit history, and collateral [1]. From this stance, failing to adhere to the risk assessment process would place any financial institution at risk, especially when loans are expected to default. This underscores that a balanced and low-risk approach is needed to ensure equal opportunities while minimizing financial losses.
For several decades, financial institutions have extensively relied on traditional approaches to assess the eligibility of applicants to approve loans. The approaches varied between judgmental and statistical [2]. Qualitative factors, such as interviewing applicants or judging a business plan by financial experts, are favored in judgmental approaches. The approval of loan applications is subject to the feedback and impressions of these experts, which ultimately vary from person to person. Statistical approaches, on the other hand, utilize quantitative data analysis, based upon which an individual’s eligibility is determined. This includes analysis of the ratios and the use of credit scoring models such as the logit model, linear discriminant analysis, and logistic regression [3]. Both approaches, i.e., statistical and judgmental, were found to be inapt to address the complexity of assessing eligibility and prone to biased decisions. For instance, judgmental approaches rely heavily on human expertise and subjective interpretation, leading to inconsistent decisions and potential bias across credit assessors [4]. Statistical methods tend to reduce this subjectivity, yet they depend on historical behavioral data that is often unavailable for thin-file applicants and often assume linear relationships that rarely capture real-world borrower repayment patterns and evolving feature interactions [5]. Consequently, this exposes institutions to inconsistent and unreliable lending outcomes. These weaknesses motivated the need for data-driven approaches that can generalize informative credit predictors beyond predefined modeling assumptions.
With the advent of machine learning, the prediction of creditworthiness to determine the eligibility of applicants has become possible due to its data-driven prediction process that requires minimal human intervention and domain expertise [6,7,8]. Before that, customers should undergo a rigorous process known as Know Your Customer (KYC), which is required by lending institutions to capture personal and credit history details as part of their official procedures. The information is later loaded into machine learning models for training purposes to generalize their predictions towards new applicants, ultimately supporting decisions for thin file clients. As a result, the adoption of machine learning has seen increased dependence in leading countries, such as the United States, the United Kingdom, and China [9].
A critical step in the Machine Learning (ML) pipeline is feature engineering, which involves identifying influential variables that contribute significantly to the model’s accuracy and ability to distinguish between good customers and defaulters. This process not only ensures better performance but also caters to simplicity and eliminates unnecessary features that have a minimal contribution to the final outcome [10]. In contrast, improperly considering inappropriate or redundant features in the training phase can degrade the model’s ability to provide precise results and worsen its ability to distinguish between classes. Further, the complexity of identifying interdependent relationships requires a thorough mathematical understanding of the problem, particularly linearity and indirect relations among features, which is less rigidly predefined in evolutionary feature selection techniques. These challenges reinforce the need for a feature selection strategy that can explore non-linear interactions, minimize costly misclassifications, and generalize across heterogeneous datasets.
This paper proposes a FastTree-Guided Genetic Algorithm (FT-GA) that embeds FastTree, a gradient-boosting model, to guide the evolutionary search toward more promising regions of the feature space. By incorporating risk-sensitive weighting and separability feedback, FT-GA is designed to maximize predictive accuracy while reducing false-positive and false-negative rates in lending environments. Empirical results demonstrate that FT-GA achieves competitive or superior AUC performance across benchmark credit datasets, particularly under low and moderate imbalance conditions, while reducing feature dimensionality and improving decision separability. The contributions of this paper are summarized as follows:
  • The development of a FastTree-Guided Genetic Algorithm (FT-GA) that leverages boosting-based separability feedback to direct evolutionary feature selection toward more informative regions of the search space.
  • The demonstration of the effectiveness of evolutionary methods to uniformly address the class imbalance issue across different rates, often requiring pairing the best selection and learning techniques when conventional approaches are considered.
  • The introduction of risk-sensitive weighting within the fitness calculation to prioritize class separability over raw accuracy, reflecting the unequal impact of false approvals and false rejections in lending environments.
  • Validation of the proposed FT-GA through a comprehensive experimental study using four benchmark credit scoring datasets of varying size and imbalance, showing higher or competitive performance.
The remainder of this paper is structured as follows. Section 2 provides an overview of the topic and focuses on the set of techniques that are effectively used for the selection of characteristics. Section 3 provides the formulation of the problem and highlights the steps involved in generating the results of our proposed method. Finally, the performance of FT-GA is evaluated in Section 4, with traditional and modern feature selection techniques used as a baseline to benchmark our results.

3. Proposed Method

The main aim of this study is to propose a hybrid feature selection approach that incorporates FastTrees (Version 3.0.1), a proprietary Microsoft software, to serve as a fitness function to guide the search for GA. The objective of the search is to select the best feature subset that ensures high accuracy while also accounting for data imbalance and separability between classes. FastTrees are a scalable gradient boosting algorithm built on decision trees and is based on the Multiple Additive Regression Trees (MART) algorithm [32]. The FT algorithm itself was initially developed to rank web pages and was then effectively considered for classification and regression tasks. With its ability to optimally handle large and high-dimensional datasets often exceeding memory, FT targets production grade scalability and robustness, often overlooked by traditional algorithms, to better suit real-world datasets [33]. Unlike decision trees that partition data recursively to form a hierarchical structure, the FT algorithm uses an ensemble of shallow decision trees added sequentially to minimize the logistic loss function given in (1). This can be achieved by adopting the stage-wise additive approach in which each successive tree corrects the residuals of the previous ensemble, allowing better generalization and class separation [34].
L = i = 1 n log 1 + e y i F ( x i )
where n is the number of training examples, y i { 1 , + 1 } denoting the true label, for example, i, and F ( x i ) is the predicted score. In order to provide the anticipated results of FT-GA, as depicted in Figure 1, let D = { ( f i , y i ) , i = 1 , , n } be the dataset used, where f i R is used to represent a feature vector corresponding to a particular dataset, and n is the number of instances given in the same dataset. Since the anticipated output is binary denoting the eligibility of applicants, the output y i { 0 , 1 } would be the class label having only two possible outcomes, i.e., eligible or not. To measure the performance of a given subset, let c p = [ c p 1 , c p 2 , , c p m ] { 0 , 1 } m be the binary chromosome to represent the selection of features given in a dataset, where m is the total number of features present in a particular dataset and generation p { 0 , S } to which the chromosome belongs, where S is the total number of iterations and is fixed to a maximum of 50 iterations due to earlier convergence. Typically, a chromosome vector c p represents a subset of features f i . Therefore, the selected feature set for chromosome c p is given by (2).
F c = f i [ c p ] | i = 1 , , n ; p { 0 , S }
where f i [ c ] is the result of applying a chromosome mask c p to each instance f i to retain the selected features. Each of these chromosomes is used to train the FT classifier and validated using a 30–70% split to generate the accuracy and AUC. Although the 30–70% split enables rapid fitness evaluation inside the evolutionary loop, it constitutes only the internal scoring mechanism of FT-GA. To ensure that reported performance is statistically reliable and not dependent on a single train–test partition, final results for FT-GA and all baselines were additionally evaluated under 10-fold stratified cross-validation. This decouples the optimization speed from generalization validity by assessing stability estimates through mean performance and variance measures later on. Considering the trade-off nature between accuracy and the model’s ability to separate between classes, particularly in binary classification problems [35], two static weights α and β are multiplied by accuracy and AUC respectively to represent the balance between the two, as shown in (3) to determine the fitness of F c . It is worth noting that the values assigned to these two static weights reflect the domain-specific risk preference.
Fitness ( c p ) = α × Accuracy ( F c ) + β × AUC ( F c ) ; α , β [ 0 , 1 ] , α + β = 1
where the values of α and β , and are subject to domain specifications and risk tolerance levels to reflect risk tolerance. In this work, we grounded our decision on the principle of Hofmann stated in the data set [36]: “It is worse to class a customer as good when they are bad, than it is to class a customer as bad when they are good.”, therefore emphasizing AUC to minimize false positives, which is more risky in credit assessment. In light of that, weights of 20% and 80% have been assigned to the accuracy and AUC respectively to favor class separation over opportunity capture, acknowledging that false positives pose greater financial risk than false negatives, thereby making our approach particularly well-suited for conservative financial environments. This decision is supported by previous studies, such as the work given by [37], which explicitly highlight the importance of prioritizing AUC over other metrics to achieve better separability between classes. The Algorithm 1 shows the steps involved in convergent to solutions that prioritize class separation over precision.
Figure 1. Proposed method resembling FT-GA.
To ensure proper hyper parametrization of our proposed method, various GA configuration parameters have been considered in this work to encourage better feature subset exploration. The population size included 10, 30, and 50 chromosomes to account for computational costs and convergence speed. Furthermore, two crossover points have been considered with crossover rates ( r c ) varying between 0.6 and 0.8, as well as gene and chromosome mutation techniques with 0.05 probability rate of mutation rate ( r m ) in order to introduce diversity into a single population. The selection of parent chromosomes to produce offspring was based on tournament and roulette-wheel selections, with tournament sizes varying between 2 and 5. To retain quality solutions, a rate of 0.1 was assigned to ( r e ) to preserve 10% of the elite chromosomes. Lastly, all attempts to measure the sensitivity of GA configuration parameters were executed for 50 iterations to observe the convergence speed across populations. To tune these parameters, each parameter value was perturbed independently while holding others constant, enabling observation of marginal performance shifts on AUC and accuracy under the fitness weighting considered. This resulted in 36 configurations, from which settings were selected that consistently balanced between accuracy and separability improvements while avoiding excessive runtime or premature convergence. Table 1 shows the set of GA configuration parameters used for each dataset, chosen for their suitability after conducting a parameter sensitivity analysis.
Algorithm 1: Feature Selection with FastTree
Computers 14 00566 i001
Table 1. Parameter Configuration for FT-GA.

3.1. Datasets Description & Preprocessing

To evaluate the performance of our proposed solution, four real-world credit score datasets retrieved from UCI and Kaggle have been selected to assess various metrics, including the size of the dataset, the number of features included in the final subset, class imbalance, and generalizability of the FT-GA. The datasets included in our experiment were German credit data [36], Australian credit approval [38], Japanese credit scoring [39], and Taiwanese default of credit card clients [40]. The variety of datasets is intended to promote generalizability and credibility among diverse borrower populations. These datasets were intentionally selected because they are widely used benchmarks in credit scoring research, publicly available, and vary in size and imbalance ratios, allowing FT-GA to be evaluated across realistic operational conditions. Moreover, each dataset contains the core risk factors traditionally used in lending decisions, i.e., demographics and repayment history, meaning their indicators align with theoretical and regulatory practice in credit risk modeling.
The number of instances included varied between datasets, with the Australian dataset having 690 instances at a minimum and more than 30,000 instances given in the Taiwanese dataset at a maximum. This allows our method to be evaluated across small, medium, and large-scale scenarios to assess its scalability. Moreover, all selected datasets were meant to provide varying ratios of class imbalance, with the Taiwanese dataset scoring the highest imbalance ratio, and the Japanese and Australian datasets having balanced datasets. Table 2 shows the distribution of percentages among defaults and non-defaults for the four datasets.
Table 2. Distribution of Classes per Dataset.
The number of features included within each dataset varied significantly, and only a slight preprocessing has been applied to each, which include:
  • Label encoding of categorical features given in the German credit score dataset.
  • Handling mixed data types, mostly strings and numbers, given in the Australian credit approval dataset. Label encoding was applied to non-numeric features to ensure that the models could process them.
  • Replace missing categorical values with the mode and averaging continuous values in the Japanese dataset before encoding them, and converting the target variable labels to binary in order to be consistent with other datasets.
Both German and Taiwanese datasets explicitly provide feature names, while features included in Japanese and Australian datasets were anonymized for privacy purposes. Table 3 and Table 4 show all attributes included in the German and Taiwanese datasets with their data types, respectively.
Table 3. Attributes of the German Credit Score Dataset.
Table 4. Attributes of the Taiwanese Credit Card Defaults Dataset.

3.2. Baseline ML Models

To assess the performance of FT-GA, a set of feature selection techniques has been considered to train a set of well-known ML models. The results given by each pair, i.e., selection technique to train the ML model, will serve as a baseline to benchmark the results of our proposed method. Traditional and modern feature selection techniques highlighted in the literature were used to select a subset of features to train the following ML models.
1.
Logistic Regression (LR): a simple statistical model considered to be a special type of linear regression formula being substituted in the logistic (sigmoid) function [6]. It aims to optimally determine the values of the coefficients ( β ) given in the linear formula, thereby predicting the output for a given set of inputs. The formula for a logistic regression is as shown in (1).
σ = 1 1 + e y ; y = i = 0 n β i X i + ε
where β i is the optimized coefficient for the data point X i , and the value of σ determines the probability [ 0 0.5 1 ] to classify datapoints. LR is mostly efficient for solving linear problems only.
2.
Naïve Bayes (NB): a supervised machine learning model that operates based on the Bayes theorem. This model operates under the assumption that the input features are conditionally independent given a predicted class [41]. It is particularly efficient for classification problems where this independence assumption approximately holds true despite its simplicity. By multiplying the probability of evidence if the outcome is true P ( X y ) by how common that outcome is P ( y ) , and dividing by how common the evidence (observations) is overall P ( X ) , the probability of a given result is determined given the observed evidence P ( y X ) , as shown in (2).
P ( y X ) = P ( X y ) · P ( y ) P ( X )
3.
Random Forest (RF): An ensemble learning model that produces multiple decision trees to formulate its prediction. The algorithm works by constructing a ‘forest’ of Decision Trees (DTs) to aggregate the results, thereby improving its accuracy. It resolves the issue of overfitting by relying on identically distributed decision trees to reach a majority vote. The steps involved include creating bootstrap samples, forming subsets of data and features, training the decision trees based on the random subsets generated in the previous step, and predicting the classification based on votes given by all generated trees [30]. The core concept of RF lies in minimizing the Gini impurity that measures the probability of incorrectly classifying a randomly chosen sample, as shown in (3).
G i n i ( D ) = 1 k = 1 n P n 2 , P n = Samples in class ( n ) Total samples in node
4.
Artificial Neural Networks (ANN): inspired by how the human brain works, and comprises several neural nodes receiving an input, processing it, and producing an output signaled to the subsequent node. It consists of three main layers: the input layer receiving inputs, the hidden layer comprising one or more nodes, and adjusts the weights and biases to capture hidden patterns, as well as the output layer which provides the classification [42] based on a chosen transfer function. Our baseline considers only feed-forward NN with Sigmoid transfer to provide the logistic prediction.

3.3. Performance Metrics & Evaluation

Since the problem presented in this research is modeled as a binary classification problem, a set of well-known performance metrics, including accuracy, AUC, recall, and precision [43] have been adopted to assess the performance of our proposed method along with the baseline solutions. For each best-performing pair comprising the feature selection method used and the ML model to provide the prediction, all metrics highlighted in Equations (7)–(10) have been used to determine the suitability of our method. To achieve a lower risk approach, the AUC metric was prioritized over other metrics that concern positive classes, including precision and recall. The AUC will determine the effectiveness of our proposed method in separating class labels, and its value can be between 0 and 1 whereby an efficient classifier would have an AUC value close to 1 [44].
Accuracy = T P T P + T N + F P + F N
Recall = T P T P + F N
Precision = T P T P + F P
F 1 - Score = 2 × Precision × Recall Precision + Recall
  • True Positive (TP): the classifier predicted that an applicant is eligible, in which they are in reality eligible. Therefore, it Measures the classifier’s capability to capture actual positive cases.
  • False Positive (FP): the classifier incorrectly predicts an applicant to be eligible when they are in reality not. From the perspective of a lending institution, this could potentially lead to a financial loss due to a possible default.
  • True Negative (TN): the classifier correctly predicts that an applicant is not eligible, in which in reality they are not. It measures the classifiers capability to correctly capture actual negative cases corresponding to non-eligible applicants.
  • False Negative (FN): the classifier incorrectly predicts that an applicant is not eligible, when they actually are eligible. From the perspective of a lending institution, this forms a potential opportunity loss.
To compare performance between all pairs, in addition to accuracy and AUC, other key performance metrics such as recall, precision, and f1 score have also been considered to reflect overall performance. Equations (7)–(10) provide the formulas used to determine all metrics based on the results given in the confusion matrix. For FT-GA and all baseline models, performance reporting follows 10-fold stratified cross-validation rather than a single split, in order to mitigate sampling variability and ensure statistical robustness. For each best performing pair, mean values and standard deviations were computed for accuracy and AUC, alongside Kolmogorov–Smirnov (KS) distance as a separability indicator to strengthen the comparability by evaluating stability across folds rather than isolated test outcomes.

4. Results & Discussion

This work aims to provide a low-risk approach to balance the trade-off between accuracy and separation between classes. The results of the experiment were generated in two phases: First, the selection of features was made using different feature selection techniques highlighted previously, including Chi2-square (Chi2), Mutual Information (MI), Recursive Feature Elimination (RFE), Logistic Regression with L1 regularization (L1 LogReg), and Random Forest (RF). Then, each subset given by the feature selection technique is used to train four different ML models, including Logistic Regression (LR), Naïve Bayes (NB), Random Forest (RF), and Neural Network (NN). The results of all pairs are later used to compare and benchmark those given by FT-GA. For simplicity, the best-performing pairs have been considered to ensure that our proposed solution is in line with the best possible configurations. Table 5, Table 6, Table 7 and Table 8 show the feature subsets determined by different methods corresponding to each dataset. Notably, FT-GA tends to converge on compact subsets that do not focus on reducing dimensions butrather preserve domain-meaningful weighting between accuracy and AUC. It suggests that its separability-driven objective sought by GA implicitly favors features relevant to risk signaling rather than generic feature reduction or accuracy-focused techniques. This observation is evidenced by the varying number of features maintained across the four distinct datasets and suggests that FT-GA adapts to different datasets rather than applying a uniform feature importance pattern, thereby reinforcing its suitability for heterogeneous credit scoring contexts.
Table 5. Feature Subsets Determined By Different Methods for the Australian Credit Dataset.
Table 6. Feature Subsets Determined By Different Methods for the Japanese Credit Dataset.
Table 7. Feature Subsets Determined By Different Methods for the Taiwanese Credit Dataset.
Table 8. Feature Subsets Determined By Different Methods for the German Credit Dataset.
Table 9 provides the overall performance between the best-performing pairs retrieved from Table A1, Table A2, Table A3 and Table A4 listed in the Appendix A, and Figure 2a–d visualize their AUC in all datasets used. Overall, FT-GA demonstrated competitive results, especially with those given by RF, and better adaptability across different datasets. This is evidenced by the numerous combinations pairing feature selection and training models in order to achieve close results, considering the inclusion of top-performing baselines only. This positions FT-GA as a robust and generalizable choice to balance between feature reduction and obtaining contextual results, which in our case include accuracy and separability between classes. The same applies from a risk-assessment standpoint, where our proposed method achieved the highest AUC values for relatively smaller datasets, in the case of Australian and Japanese datasets, and also maintained a good balance between offering opportunities to applicants and flagging potential defaulters. This balance was demonstrated by recall values exceeding 0.90 and 0.88 for the Australian and Japanese datasets, respectively. It is also worth noting that variation in model accuracy across datasets is expected, as performance is inherently influenced by dataset complexity, particularly differences in sample size and class imbalance. Accordingly, datasets with milder imbalance and clearer signal structure yielded higher predictive ceilings, whereas highly imbalanced or larger ones imposed lower attainable accuracy levels for all competing methods. Nonetheless, the results suggest that FT-GA performs best when minority class representation is sufficient for the boosting component to learn discrimination boundaries, whereas traditional methods tend to overfit majority behavior. In general, this reflected a higher level of flexibility in terms of guiding a selection method towards more promising and domain-specific results, thereby aligning with practical needs.
Table 9. Overall performance between the best-performing pairs.
Figure 2. AUC for all solutions visualized by dataset: (a) Australian, (b) German, (c) Japanese, (d) Taiwanese.
Across all datasets, the AUC for FT-GA shown in Figure 2a–d consistently resembled superior or similar separation capability between the two classes with reference to the baseline pairs, confirming its ability to minimize false positive results. In addition, FT-GA demonstrated its capability to handle imbalanced data in smaller datasets, particularly in the German dataset, where the data were moderately imbalanced. The AUC given in Figure 2b serves as a proof that FT-GA provides better separation between the two classes, thereby ensuring less risky decisions. Moreover, the performance of FT-GA on the German dataset also accounted for providing the best results in terms of accuracy, precision, and recall, with values of 0.84, 0.88, 0.93 respectively. This concludes that FT-GA managed to distinguish between defaulters and non-defaulters, while also accounting for fair opportunities to eligible applicants with no compromise. These findings emphasize the stability and generalization of FT-GA across moderately imbalanced datasets that are small in size. Interestingly, FT-GA achieved higher recall without requiring aggressive resampling, a common preprocessing requirement for imbalanced datasets, implying that the separability weighting in its objective function compensates for imbalance by rewarding clearer minority class splits.
To further validate robustness beyond single train–test performance, a 10-fold cross-validation experiment was conducted across all datasets, and the aggregated results are reported in Table 10. These results provide additional evidence of the stability of FT-GA relative to competing feature selection–model pairs. Notably, FT-GA achieved the highest AUC and KS statistics in the German and Taiwanese datasets, while maintaining competitive performance in the Australian and Japanese cases despite slightly lower mean accuracy. The relatively lower standard deviation in AUC observed for FT-GA across folds, particularly in the German dataset, supports its generalization capacity, indicating that the prioritized separability objective yields more stable class boundary learning than accuracy-based approaches. These outcomes reinforce the earlier observation that FT-GA performs more effectively where minority class representation and imbalance conditions provide meaningful discrimination cues.
Table 10. 10-fold cross-validation results across all datasets.
Despite the compelling results, however, all techniques fell short to handle extreme data imbalances, and it worht noting that the true strength of FT-GA lies in handling small to medium-sized datasets with moderate imbalance ratios. Moreover, in cases of extreme data imbalance, such as the case of Taiwanese dataset, the proposed solution still outperformed baseline methods not exclusively in AUC, but also in all other metrics, including accuracy, precision, and recall, as evidenced in the German and Taiwanese dataset results. However, despite that FT-GA comparably showed slightly better resilience and delivered the best results across all performance metrics, the results suggest that integrating sampling techniques in the preprocessing phases could further enhance the process and abolish the effect of larger imbalance ratios. Thus, it signifies an opportunity to extend our current method with hybrid strategies to make it more resilient in a broader range of imbalance scenarios, and suggests that while FT-GA is still resilient, its risk-guided objective function relies on a minimum signal from the minority class when imbalance becomes extreme, therefore the boosting guidance weakens.
In addition, the comparative results show that FT-GA achieves performance that is broadly comparable to state-of-the-art solutions, despite fundamentally addressing the feature selection scope of the scoring pipeline, as shown in Table 11. In cases where FT-GA trails deep learning or boosting methods, such as gcForest and AugBoost-ELM, the accuracy gap remains below 4%, which is consistent with expectations given that these methods learn internal feature representations rather than selecting transparent subsets as in our approach. More importantly, FT-GA delivers similar or better AUC performance across multiple studies, aligning with its objective of maximizing class separability rather than maximizing raw accuracy. Unlike pipeline-based or ensemble models that incorporate sampling, stacking, multi-layer transformations, or meta-parameter tuning, FT-GA is a single-stage optimization mechanism that neither resamples nor augments representations and yet attains comparable predictive capability.
Table 11. Comparison between published results and FT-GA expressed in delta performance.
When relating FT-GA to prior studies, three performance patterns emerge. First, on the German dataset, which reflects moderate imbalance and resembles realistic credit risk distributions, FT-GA consistently outperformed many sophisticated architectures, including hybrid ensembles and deep-learning–inspired designs, across both accuracy and AUC. The only exception was gcForest, where FT-GA fell behind marginally by −1.37% in terms of AUC, a gap attributable to gcForest’s hierarchical multi-layer forest structure that learns latent representations far beyond what conventional feature selectors are intended to capture. Second, FT-GA demonstrated substantial separability advantages relative to traditional and clustering-based approaches such as K-means + SVDD, where AUC gains exceeded 19 percentage points, and remained favorable even against SVM–RF hybrids (+0.77% AUC). Conversely, FT-GA’s largest performance deficits occurred on small balanced datasets such as Australian and Japanese credit scoring, with gcForest again exhibiting the largest advantage (−3.79% AUC). These behaviors reinforce that FT-GA is better aligned with operational scenarios characterized by moderate imbalance and meaningful minority representation, whereas deep/hybrid pipelines gain modeling capacity on compact datasets through representational learning.
Third, a similar behavior was observed in accuracy comparisons, where FT-GA outperformed most published methods, often by up to 12%, however trailed slightly when compared with more sophisticated boosting and deep learning architectures. This slight deficit, however, must be interpreted in light of methodological scope; FT-GA is a simple, purely evolutionary feature selector requiring no representation learning, resampling, stacked architectures, or deep composition layers. Unlike gcForest and hybrid ensembles, FT-GA performs no internal augmentation or multi-stage inference, yet its performance gaps remain consistently small, and typically between 2–4%. This suggests that FT-GA offers a favorable accuracy–efficiency trade-off, achieving competitive risk separation while remaining structurally lightweight, interpretable, and operationally less complicated. Accordingly, integration with resampling or representation-aware components represents a promising extension to maximize its performance under extreme imbalance or small-data regimes.
From an operational perspective, deploying FT-GA within financial institutions aligns with established risk governance practices. However, its evolutionary search process is computationally heavier than filter or wrapper methods, meaning that most of the incurred cost appears during model development rather than real-time scoring. Once the optimal subset is identified, the trained model operates without additional computational overhead relative to conventional models. This makes FT-GA particularly suitable for institutions that periodically retrain or validate credit models within regulatory review cycles or under changing risk-weighting criteria. In other words, organizations can benefit from periodic retraining and weight readjustment, especially when class imbalance becomes extreme or when economic dynamics necessitate revisiting risk sensitivities. Moreover, this tuning burden is not unique to FT-GA—parameter sensitivity is a well-established characteristic of evolutionary optimization methods, where tuning population size, mutation rate, and crossover strategy is typically required to achieve stable convergence. Nonetheless, FT-GA benefits from its guided separability objective, which helped convergence around 30 generations in three out of four datasets in our experiments, thereby yielding measurable gains in class separability without requiring exhaustive evolutionary search relative to unguided approaches.

5. Conclusions

In this work, we proposed a FastTree-guided Genetic Algorithm (FT-GA) to enhance feature selection for credit scoring applications that optimizes class separation. Compared to the best-performing baseline pairs that combined traditional feature selection and training models, FT-GA consistently achieved superior or similar results, particularly in terms of AUC and accuracy. Notably, it scored the highest AUC on all four datasets, with a significant improvement, particularly in the German dataset of up to 11.5% compared to other traditional approaches. Additionally, the proposed technique reduced the feature set by an average of 21%, while also preserving its outperforming separation between classes. These findings served as proof that FT-GA provides a competitive alternative that outperforms many established methods. Moreover, it also demonstrated strong adaptability across different datasets of varying sizes and imbalance ratios, further emphasizing its generalization. More importantly, FT-GA achieved a balanced trade-off between offering opportunities to eligible applicants while prioritizing lending risk mitigation, making it highly suitable for real-world credit risk assessment. However, it is worth mentioning that while class separation performance degraded under extreme data imbalance, FT-GA’s true strength lies in handling small to medium-sized datasets with moderate imbalance ratios without having to consider preprocessing resampling techniques. Despite its degraded performance in highly imbalanced datasets, it is nevertheless a more resilient option when compared to other baselines and architectures proposed in the literature. Finally, the separability-driven nature of FT-GA suggests potential applicability beyond credit scoring, particularly in other high-stakes decision domains such as insurance or healthcare, where misclassification risks are also asymmetric. Exploring such extensions forms a promising avenue for future research while allowing this study to remain centered on credit scoring evidence and validation.

Author Contributions

Conceptualization, R.B.; methodology, R.B., N.H. and Y.H.; software, R.B.; validation, N.H. and Y.H.; formal analysis, R.B.; investigation, R.B.; resources, Y.H.; data curation, R.B.; writing—original draft preparation, R.B.; writing—review and editing, N.H. and Y.H.; visualization, R.B.; supervision, N.H. and Y.H.; project administration, N.H. and Y.H.; funding acquisition, Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Dakota State University through the funding program “Rising II for Faculty Retention” under grant number 81R203.

Data Availability Statement

The datasets used in this study are publicly available from open-access sources. The German and Australian credit datasets are available at the UCI Machine Learning Repository https://archive.ics.uci.edu/ (accessed on 14 February 2025) the Taiwanese credit dataset is available on Kaggle https://www.kaggle.com/datasets/uciml/default-of-credit-card-clients-dataset (accessed on 17 April 2025), and the Japanese credit dataset is available from prior published research studies. The source code is available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Results for the Australian Dataset.
Table A1. Results for the Australian Dataset.
MethodFeaturesModelAccuracyPrecisionRecallF1 ScoreAUC
Chi210Logistic Regression0.85990.88190.88890.88540.9217
Naive Bayes0.82610.81250.92860.86670.9048
Random Forest0.85990.89430.87300.88350.9215
Neural Network0.80680.85830.81750.83740.8503
Mutual Info10Logistic Regression0.85990.87600.89680.88630.9216
Naive Bayes0.83090.81820.92860.86990.9068
Random Forest0.86470.88890.88890.88890.9254
Neural Network0.74400.81200.75400.78190.7858
RFE10Logistic Regression0.85020.88000.87300.87650.9128
Naive Bayes0.86470.87120.91270.89150.8935
Random Forest0.87440.88460.91270.89840.9209
Neural Network0.85510.86360.90480.88370.8979
L1_LogReg14Logistic Regression0.85020.86820.88890.87840.9135
Naive Bayes0.82610.81690.92060.86570.9029
Random Forest0.86960.88980.89680.89330.9291
Neural Network0.74400.76260.84130.80000.7403
Random Forest7Logistic Regression0.85990.89430.87300.88350.9200
Naive Bayes0.80190.78910.92060.84980.9019
Random Forest0.87440.90320.88890.89600.9253
Neural Network0.75850.79690.80950.80310.7965
GA12Fast Tree0.87600.88300.90220.89250.9287
Logistic Regression0.86470.92240.84920.88430.9166
Naive Bayes0.83570.82860.92060.87220.9006
Random Forest0.86470.90160.87300.88710.9271
Neural Network0.78260.85220.77780.81330.8475
Table A2. Results for the German Dataset.
Table A2. Results for the German Dataset.
MethodFeaturesModelsAccuracyPrecisionRecallF1 ScoreAUC
Chi210Logistic Regression0.72330.75610.89000.81760.7724
Naive Bayes0.74000.76310.90910.82970.7683
Random Forest0.75670.77870.90910.83890.7641
Neural Network0.69670.69671.00000.82120.4788
Mutual Info10Logistic Regression0.72670.74710.91870.82400.7773
Naive Bayes0.72670.77490.85650.81360.7753
Random Forest0.73670.77540.87560.82250.7761
Neural Network0.69670.69671.00000.82120.4462
RFE10Logistic Regression0.71330.75100.88040.81060.7712
Naive Bayes0.69670.85540.67940.75730.7979
Random Forest0.67670.75450.79430.77390.6903
Neural Network0.72000.78030.83250.80560.7330
L1_LogReg21Logistic Regression0.73000.76230.89000.82120.7573
Naive Bayes0.74670.80930.83250.82080.7830
Random Forest0.76000.77290.92820.84350.7697
Neural Network0.59000.79450.55500.65350.6631
Random Forest11Logistic Regression0.71330.75520.87080.80890.7530
Naive Bayes0.71670.74800.89470.81480.7577
Random Forest0.76000.77730.91870.84210.7662
Neural Network0.70670.70371.00000.82610.7190
GA6Fast Tree0.83360.86420.94290.90180.8306
Logistic Regression0.75330.77780.90430.83630.7783
Naive Bayes0.70330.81910.73680.77580.7791
Random Forest0.72670.77970.84690.81190.7352
Neural Network0.74000.77180.89000.82670.7812
Table A3. Results for the Japanese Dataset.
Table A3. Results for the Japanese Dataset.
MethodFeaturesModelAccuracyPrecisionRecallF1 ScoreAUC
Chi210Logistic Regression0.83570.78900.88660.83500.9033
Naive Bayes0.72950.79710.56700.66270.8645
Random Forest0.85990.85420.84540.84970.9202
Neural Network0.66670.67500.55670.61020.7034
Mutual Info10Logistic Regression0.84060.79630.88660.83900.9030
Naive Bayes0.75360.82860.59790.69460.8807
Random Forest0.82610.80810.82470.81630.9146
Neural Network0.67150.69860.52580.60000.7362
RFE10Logistic Regression0.84060.79090.89690.84060.9054
Naive Bayes0.82130.84090.76290.80000.8913
Random Forest0.85510.83840.85570.84690.8927
Neural Network0.84060.82650.83510.83080.8886
L1_LogReg15Logistic Regression0.83090.78700.87630.82930.8973
Naive Bayes0.75850.82190.61860.70590.8556
Random Forest0.87440.86600.86600.86600.9237
Neural Network0.59420.60660.38140.46840.5320
Random Forest8Logistic Regression0.82610.77480.88660.82690.8902
Naive Bayes0.71500.78790.53610.63800.8560
Random Forest0.85510.83170.86600.84850.9065
Neural Network0.58450.55450.57730.56570.5779
GA7Fast Tree0.85540.80770.88320.84380.9223
Logistic Regression0.84540.79820.89690.84470.9078
Naive Bayes0.73910.87720.51550.64940.8653
Random Forest0.87920.87500.86600.87050.9302
Neural Network0.72460.70830.70100.70470.7951
Table A4. Results for the Taiwanese Dataset.
Table A4. Results for the Taiwanese Dataset.
MethodFeaturesModelAccuracyPrecisionRecallF1 ScoreAUC
Chi210Logistic Regression0.78190.20000.00050.00100.6550
Naive Bayes0.35900.24050.90050.37960.6289
Random Forest0.78470.51290.22300.31080.6971
Neural Network0.78180.00000.00000.00000.5047
Mutual Info10Logistic Regression0.78220.00000.00000.00000.6332
Naive Bayes0.46070.26110.80660.39450.6813
Random Forest0.80140.57130.35360.43680.7337
Neural Network0.76910.40420.12700.19330.6319
RFE10Logistic Regression0.78220.50000.00050.00100.6606
Naive Bayes0.35720.24030.90260.37950.6295
Random Forest0.78730.52710.22810.31840.6985
Neural Network0.64230.25980.34740.29730.5658
L1_LogReg23Logistic Regression0.78210.00000.00000.00000.6601
Naive Bayes0.37780.24370.88270.38190.6691
Random Forest0.81330.62300.36170.45770.7549
Neural Network0.75060.25720.07700.11860.5428
Random Forest10Logistic Regression0.78200.00000.00000.00000.6580
Naive Bayes0.41520.24870.83370.38310.6464
Random Forest0.81310.62870.34640.44670.7406
Neural Network0.77720.21520.00870.01670.6557
GA11Fast Tree0.81090.63530.35270.45360.7669
Logistic Regression0.78220.00000.00000.00000.6155
Naive Bayes0.79480.63360.13670.22490.6784
Random Forest0.80400.58260.35260.43930.7368
Neural Network0.74500.36500.23110.28300.6314

References

  1. Adegoke, T.; Ofodile, O.; Ochuba, N.; Akinrinol, O. Evaluating the fairness of credit scoring models: A literature review on mortgage accessibility for under-reserved populations. GSC Adv. Res. Rev. 2024, 18, 189–199. [Google Scholar] [CrossRef]
  2. Zanke, P. Machine Learning Approaches for Credit Risk Assessment in Banking and Insurance. Internet Things Edge Comput. J. 2023, 3, 29–47. [Google Scholar]
  3. Gambacorta, L.; Huang, Y.; Qiu, H.; Wang, J. How do machine learning and non-traditional data affect credit scoring? New evidence from a Chinese fintech firm. J. Financ. Stab. 2024, 73, 101284. [Google Scholar] [CrossRef]
  4. Bunn, D.; Wright, G. Interaction of judgemental and statistical forecasting methods: Issues & analysis. Manag. Sci. 1991, 37, 501–518. [Google Scholar] [CrossRef]
  5. Abdou, H.A.; Pointon, J. Credit scoring, statistical techniques and evaluation criteria: A review of the literature. Intell. Syst. Account. Financ. Manag. 2011, 18, 59–88. [Google Scholar] [CrossRef]
  6. Dumitrescu, E.; Hué, S.; Hurlin, C.; Tokpavi, S. Machine Learning for Credit Scoring: Improving Logistic Regression with Non-Linear Decision-Tree Effects. Eur. J. Oper. Res. 2022, 297, 1178–1192. [Google Scholar] [CrossRef]
  7. Kanaparthi, V. Credit Risk Prediction using Ensemble Machine Learning Algorithms. In Proceedings of the 2023 International Conference on Inventive Computation Technologies (ICICT), Lalitpur, Nepal, 26–28 April 2023; pp. 41–47. [Google Scholar] [CrossRef]
  8. Mestiri, S.; Hiboun, S.M. Credit Scoring Using Machine Learning and Deep Learning-Based Models. Data Sci. Financ. Econ. 2024, 2, 236–248. [Google Scholar] [CrossRef]
  9. Biju, A.K.V.N.; Thomas, A.S.; Thasneem, J. Examining the Research Taxonomy of Artificial Intelligence, Deep Learning & Machine Learning in the Financial Sphere—A Bibliometric Analysis. Qual. Quant. 2024, 58, 849–878. [Google Scholar] [CrossRef]
  10. Chen, Y.; Calabrese, R.; Martin-Barragán, B. Interpretable Machine Learning for Imbalanced Credit Scoring Datasets. Eur. J. Oper. Res. 2024, 312, 357–372. [Google Scholar] [CrossRef]
  11. Mane, M.N.S.; Joshi, P. Role of AI based E-Wallets in Business and Financial Transactions. Int. Res. J. Humanit. Interdiscip. Stud. (IRJHIS) 2023, 77–82. Available online: https://www.researchgate.net/publication/377780543_Role_of_AI_based_E-Wallets_in_Business_and_Financial_Transactions (accessed on 22 April 2025).
  12. Challoumis, C. The Landscape of AI in Finance. In Proceedings of the XVII International Scientific Conference, London, UK, 5–6 September 2024; pp. 109–144. [Google Scholar]
  13. Cao, L.; Yang, Q.; Yu, P.S. Data Science and AI in FinTech: An Overview. Int. J. Data Sci. Anal. 2021, 12, 81–99. [Google Scholar] [CrossRef]
  14. Bozanic, Z.; Kraft, P.; Tillet, A. Qualitative Disclosure and Credit Analysts’ Soft Rating Adjustments. Accepted for Publication in *Accounting and Business Research*. 2022. Available online: https://ssrn.com/abstract=2962491 (accessed on 22 April 2025).
  15. Muñoz-Cancino, R.; Bravo, C.; Ríos, S.A.; Graña, M. On the dynamics of credit history and social interaction features, and their impact on creditworthiness assessment performance. Expert Syst. Appl. 2023, 218, 119599. [Google Scholar] [CrossRef]
  16. Chatla, S.; Shmueli, G. Linear Probability Models (LPM) and Big Data: The Good, the Bad, and the Ugly. SSRNWorking Paper. 2016. Available online: https://ssrn.com/abstract=2353841 (accessed on 5 December 2025).
  17. Xu, J.; Cheng, Y.; Wang, L.; Xu, K.; Li, Z. Credit Scoring Models Enhancement Using Support Vector Machines; ResearchGate: Berlin, Germany, 2024. [Google Scholar]
  18. Wang, C.; Han, D.; Liu, Q.; Luo, S. A deep learning approach for credit scoring of peer-to-peer lending using attention mechanism LSTM. IEEE Access 2018, 7, 2161–2168. [Google Scholar] [CrossRef]
  19. Wah, Y.B.; Ibrahim, N.; Hamid, H.A.; Abdul-Rahman, S.; Fong, S. Feature Selection Methods: Case of Filter and Wrapper Approaches for Maximising Classification Accuracy. Pertanika J. Sci. Technol. 2018, 26, 291–310. [Google Scholar]
  20. Yang, D.; Xiao, B. Feature Enhanced Ensemble Modeling with Voting Optimization for Credit Risk Assessment. IEEE Access 2024, 12, 115124–115136. [Google Scholar] [CrossRef]
  21. Zhao, Z.; Cui, T.; Ding, S.; Li, J.; Bellotti, A.G. Resampling Techniques Study on Class Imbalance Problem in Credit Risk Prediction. Mathematics 2024, 12, 701. [Google Scholar] [CrossRef]
  22. Cao, B.; Liu, Y.; Hou, C.; Fan, J.; Zheng, B.; Yin, J. Expediting the Accuracy-Improving Process of SVMs for Class Imbalance Learning. IEEE Trans. Knowl. Data Eng. 2021, 33, 3550–3567. [Google Scholar] [CrossRef]
  23. Datta, S.; Nag, S.; Das, S. Boosting with Lexicographic Programming: Addressing Class Imbalance without Cost Tuning. IEEE Trans. Knowl. Data Eng. 2020, 32, 883–897. [Google Scholar] [CrossRef]
  24. Jemai, J.; Zarrad, A. Feature Selection Engineering for Credit Risk Assessment in Retail Banking. Information 2023, 14, 200. [Google Scholar] [CrossRef]
  25. Chandrashekar, G.; Sahin, F. A Survey on Feature Selection Methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
  26. Bouaguel, W. Efficient Multi-Classifier Wrapper Feature-Selection Model: Application for Dimension Reduction in Credit Scoring. Comput. Sci. 2022, 23, 133–155. [Google Scholar] [CrossRef]
  27. Qin, C.; Zhang, Y.; Bao, F.; Zhang, C.; Liu, P.; Liu, P. XGBoost Optimized by Adaptive Particle Swarm Optimization for Credit Scoring. Math. Probl. Eng. 2021, 2021, 6655510. [Google Scholar] [CrossRef]
  28. Krishna, G.J.; Ravi, V. Feature Subset Selection Using Adaptive Differential Evolution: An Application to Banking. In Proceedings of the ACM India Joint International Conference on Data Science and Management of Data (CoDS-COMAD), Kolkata, India, 3–5 January 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 157–163. [Google Scholar] [CrossRef]
  29. Liu, H.; Zhou, M.; Liu, Q. An Embedded Feature Selection Method for Imbalanced Data Classification. IEEE/CAA J. Autom. Sin. 2019, 6, 703–715. [Google Scholar] [CrossRef]
  30. Zhang, X.; Yang, Y.; Zhou, Z. A Novel Credit Scoring Model Based on Optimized Random Forest. In Proceedings of the 2018 IEEE 8th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 8–10 January 2018; IEEE: New York, NY, USA, 2018; pp. 60–65. [Google Scholar] [CrossRef]
  31. Shofiyah, F.; Sofro, A. Split and Conquer Method in Penalized Logistic Regression with LASSO (Application on Credit Scoring Data). J. Phys. Conf. Ser. 2018, 1108, 012107. [Google Scholar] [CrossRef]
  32. Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
  33. Ahmed, Z.; Amizadeh, S.; Bilenko, M.; Carr, R.; Chin, W.S.; Dekel, Y.; Dupré, X.; Eksarevskiy, V.; Erhardt, E.; Eseanu, C.; et al. Machine Learning at Microsoft with ML.NET. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2448–2458. [Google Scholar] [CrossRef]
  34. Microsoft Docs. FastTree Binary Classifier. 2022. Available online: https://learn.microsoft.com/en-us/dotnet/machine-learning/algorithms/fasttree (accessed on 3 April 2025).
  35. Shawe-Taylor, J. Classification Accuracy Based on Observed Margin. Algorithmica 1998, 22, 157–172. [Google Scholar] [CrossRef]
  36. Dua, D.; Graff, C. Statlog (German Credit Data) Dataset; UCI Machine Learning Repository: Irvine, CA, USA, 2019. [Google Scholar]
  37. Xu, S.; Ding, Y.; Wang, Y.; Luo, J. FAUC-S: Deep AUC Maximization by Focusing on Hard Samples. Neurocomputing 2024, 571, 127172. [Google Scholar] [CrossRef]
  38. Dua, D.; Graff, C. Australian Credit Approval Dataset; UCI Machine Learning Repository: Irvine, CA, USA, 2019. [Google Scholar]
  39. Yamane, T. Japanese Credit Scoring Dataset; Kaggle: San Francisco, CA, USA, 2020. [Google Scholar]
  40. Yeh, I.C. Default of Credit Card Clients Dataset; UCI Machine Learning Repository: Irvine, CA, USA, 2009. [Google Scholar] [CrossRef]
  41. Khatir, A.A.H.A.; Almustfa, A.; Bee, M. Machine Learning Models and Data-Balancing Techniques for Credit Scoring: What Is the Best Combination? Risks 2022, 10, 169. [Google Scholar] [CrossRef]
  42. Talaat, F.M.; Aljadani, A.; Badawy, M.; Elhosseini, M. Toward Interpretable Credit Scoring: Integrating Explainable Artificial Intelligence with Deep Learning for Credit Card Default Prediction. Neural Comput. Appl. 2024, 36, 4847–4865. [Google Scholar] [CrossRef]
  43. Rainio, O.; Teuho, J.; Klén, R. Evaluation Metrics and Statistical Tests for Machine Learning. Sci. Rep. 2024, 14, 6086. [Google Scholar] [CrossRef]
  44. Li, J. Area Under the ROC Curve Has the Most Consistent Evaluation for Binary Classification. PLoS ONE 2024, 19, e0316019. [Google Scholar] [CrossRef]
  45. Yao, J.R.; Chen, J.R. A New Hybrid Support Vector Machine Ensemble Classification Model for Credit Scoring. J. Inf. Technol. Res. (JITR) 2019, 12, 77–88. [Google Scholar] [CrossRef]
  46. Goh, R.Y.; Lee, L.S.; Seow, H.V.; Gopal, K. Hybrid Harmony Search–Artificial Intelligence Models in Credit Scoring. Entropy 2020, 22, 989. [Google Scholar] [CrossRef]
  47. Li, G.; Ma, H.D.; Liu, R.Y.; Shen, M.D.; Zhang, K.X. A Two-Stage Hybrid Default Discriminant Model Based on Deep Forest. Entropy 2021, 23, 582. [Google Scholar] [CrossRef]
  48. Zhang, W.; Yang, D.; Zhang, S. A new hybrid ensemble model with voting-based outlier detection and balanced sampling for credit scoring. Expert Syst. Appl. 2021, 174, 114744. [Google Scholar] [CrossRef]
  49. Bao, W.; Ning, L.; Yue, K. Integration of unsupervised and supervised machine learning algorithms for credit risk assessment. Expert Syst. Appl. 2019, 128, 301–315. [Google Scholar] [CrossRef]
  50. Yuan, K.; Chi, G.; Zhou, Y.; Yin, H. A novel two-stage hybrid default prediction model with k-means clustering and support vector domain description. Res. Int. Bus. Financ. 2022, 59, 101536. [Google Scholar] [CrossRef]
  51. Jin, Y.; Liu, Y.; Zhang, W.; Zhang, S.; Lou, Y. A novel multi-stage ensemble model with multiple k-means-based selective undersampling: An application in credit scoring. J. Intell. Fuzzy Syst. 2021, 40, 9471–9484. [Google Scholar] [CrossRef]
  52. Jiao, W.; Hao, X.; Qin, C. The image classification method with CNN–XGBoost model based on adaptive particle swarm optimization. Information 2021, 12, 156. [Google Scholar] [CrossRef]
  53. Rofik, R.; Aulia, R.; Musaadah, K.; Ardyani, S.S.F.; Hakim, A.A. The optimization of credit scoring model using stacking ensemble learning and oversampling techniques. J. Inf. Syst. Explor. Res. 2024, 2, 11–20. [Google Scholar] [CrossRef]
  54. Liu, W.; Fan, H.; Xia, M. Multi-grained and multi-layered gradient boosting decision tree for credit scoring. Appl. Intell. 2021, 51, 10643–10661. [Google Scholar] [CrossRef]
  55. Zou, Y.; Gao, C. Extreme learning machine enhanced gradient boosting for credit scoring. Algorithms 2022, 15, 149. [Google Scholar] [CrossRef]
  56. Yotsawat, W.; Wattuya, P.; Srivihok, A. Improved credit scoring model using XGBoost with Bayesian hyper-parameter optimization. Int. J. Electr. Comput. Eng. 2021, 11, 5477–5487. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.