A Novel Blockchain Architecture for Secure and Transparent Credit Regulation

Dong, Xinpei; Yang, Fan; Dai, Xiangran; Qiao, Yanan

doi:10.3390/app152312356

Open AccessArticle

A Novel Blockchain Architecture for Secure and Transparent Credit Regulation

by

Xinpei Dong

^1,2,

Fan Yang

^1,3,4,*,

Xiangran Dai

¹ and

Yanan Qiao

¹

The School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049, China

²

Xi’an Shuzhi Technology Co., Ltd., Xi’an 710000, China

³

The Text Computing & Cognitive Intelligence Engineering Research Center of the Ministry of Education, Guizhou University, Guiyang 550025, China

⁴

The Key Laboratory of High-Performance Distributed Ledger Technology and Digital Finance under the Ministry of Education, Hunan University, Changsha 410082, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(23), 12356; https://doi.org/10.3390/app152312356

Submission received: 14 September 2025 / Revised: 31 October 2025 / Accepted: 4 November 2025 / Published: 21 November 2025

Download

Browse Figures

Versions Notes

Abstract

Accurate and automated credit assessment systems are fundamental to the integrity of financial ecosystems, underpinning responsible lending, risk mitigation, and sustainable economic growth. In light of persistent economic uncertainties and an increasing frequency of credit defaults, financial entities face urgent demands for robust and scalable risk evaluation tools. While a diverse array of statistical and machine learning techniques have been proposed for credit scoring, prevailing methods remain labor-intensive and operationally cumbersome. This paper introduces VeriCred, a novel credit evaluation framework that synergistically combines automated machine learning with blockchain-based oversight to overcome these limitations. The proposed approach incorporates a data augmentation strategy to enrich limited and heterogeneous credit datasets, thereby improving model generalization. A distinctive blockchain layer is embedded to immutably trace data provenance and model decisions, ensuring full auditability. By orchestrating the end-to-end workflow—including feature extraction, hyperparameter optimization, and model selection—within a unified AutoML pipeline, the system drastically reduces manual dependency. Architecturally, the framework introduces C-NAS, a neural architecture search mechanism customized for credit risk prediction, alongside A-Triplet loss, an objective function tailored to refine feature discrimination. To address opacity concerns, an interpretability component elucidates feature contributions and model reasoning. Empirical evaluations demonstrate that VeriCred achieves superior predictive accuracy with significantly reduced computational overhead, offering financial institutions a transparent, efficient, and trustworthy credit scoring solution.

Keywords:

credit regulation; blockchain governance; research policy; machine learning; explainable analysis

1. Introduction

As a cornerstone of modern financial infrastructure, the evolution of credit scoring methodologies is instrumental in fostering sustainable and robust socioeconomic development. The proliferation of big data technologies has catalyzed significant advances in data-driven credit assessment, prompting widespread adoption of analytical systems grounded in large-scale data processing. To mitigate potential losses from borrower defaults, financial institutions rigorously evaluate applicant creditworthiness, while competitive market dynamics compel them to refine their discrimination between high-quality and high-risk clients. Consequently, credit scoring has emerged as a critical research domain, garnering substantial attention from both academic communities and financial practitioners due to its practical importance in risk management and its theoretical implications [1,2,3].

Blockchain technology, as a disruptive innovation, offers transformative potential for credit supervision and financial data security. Its inherent characteristics—decentralization, transparency, and immutability—substantially enhance the reliability and traceability of credit-related data flows, thereby introducing novel paradigms for regulatory oversight and risk control in credit finance [4,5].

Credit risk modeling encompasses a range of methodologies focused on estimating key parameters such as probability of default (PD), exposure at default (EAD), and loss of a given default (LGD). Among these, PD estimation has emerged as a predominant research stream due to its foundational role in credit assessment [6,7]. The development of PD models typically employs binary or multi-class classification techniques. Contemporary research emphasizes the construction of ensemble models to enhance predictive accuracy. However, such efforts frequently rely on a limited set of benchmark datasets—such as the German, Australian, and Japanese credit datasets from the UCI repository—due to the general scarcity of accessible credit data. Consequently, model evaluations conducted on these small-scale datasets may yield statistically inconclusive results.

Several persistent challenges impede advancement in credit scoring research. A primary issue is data scarcity coupled with limited model interpretability. The proprietary nature of financial data restricts most research to a few public datasets, which are insufficient for training robust, generalizable models. Augmenting these limited resources through advanced data utilization techniques presents a potential pathway forward. Furthermore, while machine and deep learning models improve accuracy, their inherent black-box nature obstructs transparency, making it difficult for financial institutions to interpret and trust model outputs. Enhancing explainability without sacrificing performance remains a significant hurdle. Additionally, the expertise required to design, tune, and deploy complex models creates a high barrier to adoption, particularly for resource-constrained entities. Finally, current studies often prioritize isolated model performance over systemic integration, neglecting the need for end-to-end automation in operational credit scoring systems. Addressing these gaps—data availability, interpretability, accessibility, and integration—is critical for developing next-generation credit scoring solutions.

In response to the aforementioned challenges, this paper introduces a comprehensive framework designed to enhance the scalability, automation, and trustworthiness of credit scoring systems. First, a data augmentation algorithm is proposed to synthetically expand limited credit datasets while preserving underlying data distributions, thereby mitigating the issue of sample scarcity. Second, an automated machine learning (AutoML) pipeline is employed to integrate the end-to-end credit scoring workflow—from data ingestion and feature processing to model selection and hyperparameter optimization—significantly reducing manual intervention and improving operational efficiency. Furthermore, the integration of a blockchain-based auditing layer ensures full transparency and immutability throughout the scoring process. The principal contributions of this work are summarized as follows:

A distribution-aware data augmentation algorithm for credit scoring. We propose a generative method that expands limited credit datasets while preserving underlying statistical characteristics. By adaptively estimating sampling weights from local data structures, the approach synthesizes credible samples without introducing distribution shift, facilitating robust model development under data scarcity.
VeriCred: An automated and verifiable credit scoring system. We design an integrated framework combining an AutoML pipeline with blockchain-based auditing. The system automates the end-to-end workflow from data ingestion to model deployment, while immutably recording critical artifacts—such as feature sets and performance metrics—on a distributed ledger, establishing tamper-proof transparency and auditability.
An interpretable feature attribution mechanism. We advance an enhanced model-agnostic explanation technique that quantifies the contribution and relational importance of input variables to prediction outcomes. This method improves the transparency of credit scoring decisions without compromising performance, as validated through extensive experiments.

The subsequent sections of this paper are structured as follows. In Section 2, we introduce the current research on credit scoring methods and the methods that have been adopted, in addition to a summary of the existing methods. In Section 3, we comprehensively describe the specific approach of our proposed automated machine learning based credit scoring method VeriCred. In the Section 4, we present the experimental data and information on the configuration of the experimental settings. In Section 5, we demonstrate and discuss the experimental results. In Section 6, we give an overview of the application prospects of the model. In Section 7, we summarize the paper and present our plans for future study.

2. Related Work

This section systematically reviews and critically examines recent advances in credit scoring research across three interconnected domains that form the foundation of our work: (1) credit scoring methodologies, encompassing both individual classifier designs and ensemble-based approaches; (2) automated machine learning (AutoML) pipelines that aim to reduce manual intervention in the model development process; and (3) neural architecture search (NAS) technologies coupled with explainable AI (XAI) techniques that enhance model transparency and interpretability. To provide a comprehensive comparative perspective, Table 1 synthesizes the key characteristics and limitations of existing approaches while highlighting the distinctive contributions of our proposed framework.

2.1. Credit Scoring Methodologies

The evolution of credit scoring methodologies has witnessed a significant shift from traditional statistical models toward more sophisticated machine learning techniques, with ensemble learning emerging as a particularly promising direction due to its ability to mitigate model variance and enhance predictive stability. Xia et al. [8] introduced a novel heterogeneous ensemble credit model that innovatively combines bagging and stacking algorithms, distinguishing itself from conventional approaches through three fundamental aspects: intelligent pool generation mechanisms, sophisticated base learner selection criteria, and an adaptive trainable fuser that dynamically weights constituent models based on their performance characteristics.

Building on ensemble principles, Pławiak et al. [9] developed a deep genetic cascade ensemble of multiple support vector machine (SVM) classifiers (DGCEC), which incorporates an elaborate 16-layer architecture that systematically integrates feature extraction methods, normalization techniques, kernel function optimization, and stratified cross-validation procedures. This comprehensive approach demonstrates how architectural complexity can be leveraged to capture intricate patterns in credit data. Further advancing this trajectory, Xia et al. [10] proposed a sequential ensemble framework based on an extreme gradient boosting (XGBoost) variant, which incrementally builds model competence through boosting techniques that focus learning on difficult-to-classify instances.

The pursuit of more integrated methodologies is exemplified by Tripathi et al. [11], who designed a multi-phase credit scoring model that begins with sophisticated preprocessing involving classifier ranking and weighting, proceeds with ensemble-based feature selection to identify optimal feature subsets, and culminates in a multilayer classifier architecture that leverages the refined feature space. Similarly, Zhang et al. [12] introduced a hybrid model that performs simultaneous feature and classifier selection to identify optimal combinations, subsequently employing ensemble techniques to aggregate predictions from the selected subsets, thereby achieving enhanced robustness and generalization capability.

2.2. Automated Machine Learning Approaches

The growing complexity of machine learning workflows has spurred significant interest in automated machine learning (AutoML) approaches that aim to streamline the model development process by reducing the need for manual intervention and specialized expertise. Qi et al. [13] made a substantial contribution to this domain by proposing a graph-based architecture for representing flexible combinations of ML models, which offers a more expansive and expressive search space compared to conventional tree-based and stacking-based designs. Their approach is complemented by an evolutionary search algorithm featuring specialized mutation and heredity operators that enable efficient exploration of this complex space.

In educational analytics, Zeineddine et al. [14] demonstrated the practical utility of AutoML by applying it to predict student academic success using data available prior to program commencement, highlighting how automated approaches can make sophisticated analytics accessible to domain experts without extensive machine learning expertise. Yang et al. [15] developed mAML, an innovative pipeline for constructing interpretable models for personalized microbiome classification tasks, implemented through a web-based platform that emphasizes usability, reproducibility, and scalability across diverse project requirements.

Owoyele et al. [16] addressed the critical challenge of hyperparameter optimization by combining Bayesian optimization techniques with active learning strategies, creating an iterative framework where data generation, model training, and surrogate optimization are continuously refined to converge toward optimal solutions. In maritime applications, Ahlgren et al. [17] showcased AutoML’s versatility by applying it to predict dynamic fuel consumption aboard ships using extended temporal intervals, demonstrating effective handling of complex time-series regression problems with practical industrial significance.

2.3. Neural Architecture Search and Explainable AI

Neural Architecture Search (NAS) has emerged as a powerful paradigm for automating the design of neural network architectures, particularly in domains where manual architecture engineering requires substantial expertise and computational resources. Weng et al. [18] proposed NAS-Unet, which employs differentiable architecture search strategies to optimize U-Net-like backbone networks for medical image segmentation, achieving state-of-the-art performance across diverse imaging modalities including MRI, CT, and ultrasound. Fang et al. [19] introduced DenseNAS, which constructs a densely connected search space enabling simultaneous optimization of block counts and widths through innovative routing mechanisms that maintain dense connectivity throughout the architecture.

Addressing the computational challenges inherent in NAS, Zhou et al. [20] conducted a comprehensive investigation of reduction factors and developed EcoNAS, a hierarchical proxy method that allocates computational resources efficiently by quickly discarding unpromising architectures while focusing computational budget on potentially optimal candidates. Lu et al. [21] formulated NAS as a multi-objective optimization problem with NSGA-Net, incorporating knowledge from hand-crafted architectures into a population-based search that balances multiple performance objectives. Wang et al. [22] applied modified reinforcement learning to search decoder structures in object detection, maintaining search efficiency while exploring complex architectural spaces for feature pyramid networks and prediction heads.

Concurrently, Explainable AI (XAI) has gained prominence as a crucial component for ensuring model transparency and trustworthiness in high-stakes domains like credit scoring. Heng et al. [23] provided a systematic analysis of machine learning applications in credit risk modeling and elucidated how XAI techniques can enhance model robustness and interpretability. Bussmann et al. [24] developed an XAI framework specifically for credit risk assessment in peer-to-peer lending platforms, employing Shapley values and correlation networks to categorize predictions based on similarity of underlying rationales, thereby providing actionable insights into model decision processes.

2.4. Synthesis and Research Positioning

The comprehensive review presented in this section reveals several critical gaps in current credit scoring research. While traditional methodologies have achieved notable success through ensemble techniques, they remain heavily dependent on manual design choices and expert knowledge, resulting in limited scalability and reproducibility. AutoML approaches offer promising directions for automation but typically focus on isolated components of the workflow rather than providing integrated solutions. NAS technologies demonstrate impressive capabilities in architecture optimization but have been primarily applied to computer vision tasks with limited exploration in credit scoring domains. Furthermore, existing approaches largely neglect the crucial dimensions of model interpretability and regulatory compliance, which are essential for real-world financial applications.

As systematically contrasted in Table 1, our work addresses these multifaceted challenges through a unified framework that combines enhanced neural architecture search with triplet loss optimization, automated feature engineering, and blockchain-based audit mechanisms. The proposed VeriCred system eliminates the extensive manual effort traditionally required for model construction and hyperparameter tuning while ensuring high predictive accuracy, inherent interpretability through integrated XAI components, and verifiable security via blockchain immutability. This comprehensive approach represents a significant advancement beyond conventional methodologies by simultaneously addressing efficiency, accuracy, transparency, and regulatory requirements in credit scoring.

3. The Approach

3.1. VeriCred: An Integrated Framework for Supervisable Automatic Credit Scoring

Conventional credit scoring methodologies typically involve a sequential workflow comprising data preprocessing, feature engineering, model construction, and performance evaluation. Financial institutions and banking systems often execute these stages in a disjointed manner, requiring repeated manual intervention as data propagate through each phase, ultimately yielding an assessment of an applicant’s creditworthiness. A critical challenge in contemporary credit scoring lies in the seamless integration of these discrete processing stages and the substantial reduction of human involvement to enhance operational efficiency and reproducibility.

To address these limitations, we introduce VeriCred, an integrated credit scoring framework that leverages an automated machine learning pipeline to unify the entire workflow into a cohesive end-to-end process. Within this pipeline, raw credit data undergoes a continuous and automated sequence of operations: initial preprocessing, followed by feature extraction and selection, and onward to model construction via our novel neural architecture search mechanism (C-NAS) coupled with the specialized A-Triplet loss objective. The workflow further incorporates automated hyperparameter optimization and final model evaluation, forming a closed-loop system that significantly diminishes the need for manual model design and parameter tuning. By encapsulating these traditionally segregated stages into a unified pipeline, VeriCred delivers a fully automated, process-driven, and highly efficient credit scoring solution. The overall architecture and data flow of the proposed framework are illustrated in Figure 1 and Figure 2.

3.1.1. Data Augmentation for Balancing the Credit Datasets

Credit scoring datasets are often characterized by significant class imbalance, where critical minority classes (e.g., defaulters) are substantially underrepresented compared to the majority class (e.g., non-defaulters). This skewed distribution can severely bias predictive models, leading to poor generalization and reduced sensitivity towards the minority class of interest [25,26,27,28,29]. To address this fundamental challenge, we employ a data augmentation technique specifically designed to generate synthetic samples for the minority class, thereby promoting a more balanced dataset and enhancing model performance.

The core of our method lies in the synthetic generation of new minority-class instances by leveraging the local feature space structure of existing minority samples. This is achieved by identifying the nearest neighbors of each minority instance and interpolating new data points along the feature vectors that connect them, effectively enlarging the decision region around the minority class without distorting the underlying data manifold.

The algorithm proceeds as follows: for every sample in the minority class, its k nearest neighbors within the same class are identified. For each neighbor, a new synthetic sample is generated by interpolating between the original sample and the selected neighbor. A minor random perturbation is introduced to increase the diversity of the generated samples. This process is repeated

α

times for each original instance, effectively increasing the representation of the minority class. The resulting augmented dataset

D_{a u g}

contains plausible, synthetic minority-class samples that help balance the class distribution, thereby facilitating the training of a more robust and accurate credit scoring model.

In Algorithm 1, Neib-x[i] and Neib-y[i] indicate the data in the neighborhood of the sample point,

{\hat{x}}_{i}

and

{\hat{y}}_{i}

represent the new data points generated by data augmentation.

η (i)

and

λ (i)

represent the weights of the expanded data in the x and y dimensions, respectively. The values are determined in our VeriCred by neural architecture search. In our method, we first use a data augmentation method for prepossessing, the main reason is that our credit dataset distribution is relatively dense and the data is comparatively smooth, using the a method can well eliminate the obvious deviation data, and for some categories with a small amount of credit data, some additional data is generated, thus making the credit dataset balanced. The two parameters

η

and

λ

in our data augmentation will be determined automatically in the neural architecture search.

Algorithm 1: Neighborhood-Based Data Augmentation for Imbalanced Credit Data.

Input: Minority class dataset:

D_{m i n} = {(x_{1}, y_{1}), (x_{2}, y_{2}), \dots, (x_{m}, y_{m})}

Input: Number of nearest neighbors: k
Input: Augmentation multiplier:

α

Input: Distance metric:

d (\cdot, \cdot)

(e.g., Euclidean distance)

1:: Initialize an empty augmented dataset: $D_{a u g} \leftarrow \emptyset$
2:: for each instance $(x_{i}, y_{i})$ in $D_{m i n}$ do
3:: Find the set $N_{i}$ of k-nearest neighbors to $x_{i}$ from $D_{m i n} ∖ {x_{i}}$ using the distance
metric $d (\cdot, \cdot)$
4:: for $j = 1$ to $α$ do
5:: Randomly select a neighbor $x_{i_{N N}}$ from the set $N_{i}$
6:: Compute the feature difference vector: $Δ x = x_{i_{N N}} - x_{i}$
7:: Generate random interpolation factors: $η \sim Uniform (0, 1)$ ,
$λ \sim Uniform (0, 1)$
8:: Synthesize a new feature vector: $\hat{x} = x_{i} + η \cdot Δ x$
9:: Apply a small random perturbation: $\hat{x} = \hat{x} + λ \cdot ϵ$ , where $ϵ \sim N (0, σ^{2} I)$ and
$σ$ is a small standard deviation
10:: Assign the minority class label $\hat{y} = y_{i}$ to the new sample
11:: Add the new synthetic sample to the augmented dataset:
$D_{a u g} \leftarrow D_{a u g} \cup {(\hat{x}, \hat{y})}$
12:: end for
13:: end for

Output: Augmented minority dataset

D_{a u g}

3.1.2. Blockchain-Based Storage of Credit Data

To ensure the integrity, traceability, and non-repudiation of credit data, we leverage blockchain technology as a decentralized and tamper-evident storage layer. Traditional centralized storage systems are vulnerable to single points of failure and unauthorized manipulation, which pose significant risks for critical financial data. By cryptographically anchoring credit records onto a distributed ledger, we establish an immutable audit trail that enhances trust and accountability among all participating entities. This approach not only secures the data against post-hoc alterations but also provides a transparent mechanism for verifying the provenance and timeline of any credit assessment.

The core of our method involves generating a unique cryptographic hash for each credit data record and submitting this hash as a transaction to the blockchain network. A consensus mechanism validates and batches these transactions into blocks, which are then linked chronologically to form a persistent chain. Any attempt to modify the original data will result in a completely different hash, which would not match the one stored on the blockchain, thereby immediately revealing the tampering attempt. This process creates a trustworthy foundation for credit data sharing and auditing.

Algorithm 2 delineates the two core procedures: data anchoring and verification. The AnchorData function is responsible for committing the hash of the credit data to the blockchain, while the VerifyData function allows any party to confirm the data’s integrity at a later time by comparing the computed hash with the one immutably stored on the chain. This mechanism provides a robust solution for secure credit data storage and trustworthy data sharing in a distributed environment. The decentralized nature of blockchain ensures that credit data stored on the chain is secure, traceable, and auditable. This creates a trusted environment where information integrity and transparency are inherently maintained [30,31,32,33,34]. Figure 3 presents the blockchain data storage structure.

Algorithm 2: Blockchain-based Credit Data Anchoring and Verification.

1:: Input: D, H, $B$
2:: Output: $T_{receipt}$ , isValid
3:: procedure AnchorData(D)
4:: $h \leftarrow H (D)$
5:: Construct transaction $T_{x}$ with $(h, metadata)$
6:: Sign $T_{x}$ with private key
7:: Broadcast $T_{x}$ to $B$
8:: Wait for consensus
9:: return $T_{receipt}$ Contains block hash
10:: end procedure
11:
12:: function VerifyData(D, $T_{receipt}$ )
13:: Retrieve $h_{stored}$ from $T_{receipt}$
14:: $h_{current} \leftarrow H (D)$
15:: if $h_{stored} = = h_{current}$ then
16:: return True Data intact
17:: else
18:: return False Data tampered
19:: end if
20:: end function

3.1.3. Automated Hyperparameter Optimization

The configuration of hyperparameters constitutes a pivotal factor influencing the predictive accuracy and generalization capability of machine learning models. In the context of credit scoring, where model performance directly impacts risk assessment outcomes, systematic hyperparameter tuning is indispensable. The optimization search space typically comprises both continuous (e.g., learning rates) and discrete parameters (e.g., number of layers, activation functions), necessitating robust optimization strategies. Established techniques for this task include Grid Search, Evolutionary Algorithms, and Bayesian Optimization, each exhibiting distinct trade-offs between computational efficiency and solution quality.

Our methodology employs Bayesian Optimization as the primary hyperparameter tuning strategy, motivated by its sample efficiency in handling computationally expensive black-box functions—such as model training procedures that require substantial time and resources. Unlike Grid Search, which performs an exhaustive exploration over a predefined parameter grid and suffers from exponential complexity in high-dimensional spaces, Bayesian Optimization constructs a probabilistic surrogate model (typically a Gaussian process) to guide the search. This approach incorporates prior evaluation results to inform subsequent parameter selections, thereby converging to optimal configurations with fewer iterations. For our credit scoring task, the optimization process executes 100 iterations, with each iteration proposing a candidate hyperparameter set evaluated via a 10-fold cross-validation scheme on a held-out test dataset. This procedure ensures that the selected model configuration maximizes generalization performance while mitigating overfitting.

3.2. A-Triplet Loss for Discriminative Credit Assessment

To enhance the discriminative power of credit scoring models, we introduce an adaptive triplet loss function, termed A-Triplet loss, inspired by representation learning principles prevalent in natural language processing—particularly the word2vec methodology [35,36,37,38,39]. Whereas conventional loss functions often focus on binary separation between classes, the proposed A-Triplet loss operates on relative similarity comparisons within triplets of samples, enabling finer-grained feature learning.

The A-Triplet loss mechanism automatically structures input samples into triplets of the form

(x_{ref}, x_{pos}, x_{neg})

, where

x_{ref}

denotes an anchor instance,

x_{pos}

a positive sample (similar in credit characteristics to the anchor), and

x_{neg}

a negative sample (dissimilar to the anchor). The objective function encourages the model to learn an embedding space where the distance between

x_{ref}

and

x_{pos}

is minimized relative to the distance between

x_{ref}

and

x_{neg}

, with an enforced margin. This relative comparison enables the model to capture subtle feature variations that distinguish similar but distinct credit profiles, as visualized in Figure 4.

By emphasizing nuanced differences among credit applicants, the A-Triplet loss facilitates detailed discrimination that is often obscured in conventional classification paradigms. This capability is particularly valuable in credit assessment, where applicants may exhibit marginal differences in risk profiles that nonetheless warrant distinct treatment. The integration of A-Triplet loss within the C-NAS architecture (illustrated in Figure 5) thus contributes to improved classification performance by enhancing the model’s ability to resolve fine-grained feature details, ultimately supporting more accurate and explainable credit decisions.

The A-Triplet loss framework aims to learn discriminative feature representations for credit assessment by enforcing relative similarity constraints, without requiring explicit supervision on inter-class relationships. This approach operates on the principle that samples from the same credit class should exhibit greater feature affinity compared to samples from different classes.

Consider a credit dataset

D

containing multiple classes of credit applicants. For a randomly selected anchor sample

x^{ref}

from a reference class, the learning objective ensures that its embedding proximity to a positive sample

x^{pos}

(from the same class) exceeds its proximity to a negative sample

x^{neg}

(from a different class) by a specified margin. The triplet selection mechanism, detailed in Algorithm 3, employs an adaptive sampling strategy that dynamically adjusts to the inherent class distribution of the credit data.

Algorithm 3: Adaptive Triplet Sampling for A-Triplet Loss.

1:: Input:
2:: Credit dataset $D = {y_{i}}_{i = 1}^{N}$ with C classes
3:: Segment length parameter l, distance threshold d
4:: Maximum iterations K, margin parameter $α$
5:: Output: Triplet set $T = {(x_{i}^{ref}, x_{i}^{pos}, x_{i}^{neg})}_{i = 1}^{M}$
6:: Initialize empty triplet set $T \leftarrow \emptyset$
7:: Partition $D$ into class-specific subsets ${S_{c}}_{c = 1}^{C}$
8:: Compute initial class centroids ${μ_{c}}_{c = 1}^{C}$
9:: for $k \leftarrow 1$ to K do
10:: Randomly select reference class $c_{ref} \sim Uniform (1, C)$
11:: Sample anchor $x^{ref}$ from $S_{c_{ref}}$ with length l
12:: Sample positive $x^{pos}$ from $S_{c_{ref}}$ with length l
13:: Initialize negative candidate pool $C_{neg} \leftarrow \emptyset$
14:: for each class $c \neq c_{ref}$ do
15:: if $d (μ_{c_{ref}}, μ_{c}) \geq d$ then
16:: $C_{neg} \leftarrow C_{neg} \cup S_{c}$
17:: end if
18:: end for
19:: if $C_{neg} = \emptyset$ then
20:: Relax threshold: $d \leftarrow 0.8 \times d$
21:: Recompute $C_{neg}$ with new threshold
22:: end if
23:: Sample $x^{neg}$ from $C_{neg}$ with length l
24:: $T \leftarrow T \cup {(x^{ref}, x^{pos}, x^{neg})}$
25:: Update centroids:
26:: $μ_{c_{ref}} \leftarrow β μ_{c_{ref}} + (1 - β) \frac{x^{ref} + x^{pos}}{2}$
27:: $μ_{c} \leftarrow β μ_{c} + (1 - β) x^{neg}$ for selected c
28:: end for
29:: return $T$

Drawing inspiration from word embedding techniques such as word2vec, the triplet formulation establishes an analogy where

x^{ref}

corresponds to a contextual reference,

x^{pos}

to a semantically related instance, and

x^{neg}

to a dissimilar instance. This relational learning paradigm enables the model to capture subtle distinctions between credit profiles that share similar characteristics but belong to different risk categories.

The formal objective function for the A-Triplet loss is defined as:

L_{triplet} = - \sum_{k = 1}^{K} [log σ (- f {(x^{ref}; ϑ)}^{⊤} f (x_{k}^{neg}; ϑ)) + log σ (f {(x^{ref}; ϑ)}^{⊤} f (x^{pos}; ϑ))]

(1)

where

σ (\cdot)

denotes the sigmoid activation function,

f (\cdot; ϑ)

represents the feature embedding network parameterized by

ϑ

, and K is the number of negative samples per anchor-positive pair.

This loss function simultaneously maximizes the similarity between anchor-positive pairs while minimizing the similarity between anchor-negative pairs. The training procedure involves iterative sampling of triplets from the credit dataset, followed by gradient-based optimization of the embedding parameters

ϑ

. Algorithm 3 outlines the sophisticated triplet selection strategy that ensures diverse and informative training examples, thereby enhancing the discriminative capability of the learned credit scoring model.

3.3. Automated Neural Architecture Search for Credit Modeling

Neural Architecture Search (NAS) represents a paradigm shift in automated machine learning, enabling the systematic discovery of high-performance neural network architectures through algorithmic exploration of design spaces [40,41,42,43,44]. This methodology eliminates the need for manual architecture engineering by formulating network design as an optimization problem, where candidate architectures are evaluated based on their performance on target tasks. In credit risk assessment, NAS offers the potential to automatically construct tailored model architectures that capture complex financial patterns while maintaining computational efficiency.

Figure 6 illustrates the three fundamental components of our proposed Credit-efficient Neural Architecture Search (C-NAS) framework: (1) a structured search space encompassing credit-specific architectural motifs, (2) a resource-aware search strategy that prioritizes computationally efficient candidates, and (3) a performance estimation mechanism that balances accuracy and model complexity.

3.3.1. Cell-Based Search Space Formulation

To address the computational challenges of direct NAS application in credit scoring, we introduce C-NAS, a resource-constrained architecture search methodology. The framework employs a cell-based search space where credit models are constructed by composing two types of computational cells: normal cells that preserve feature dimensionality and reduction cells that perform downsampling operations.

Each cell is formulated as a directed acyclic graph (DAG) comprising M computational nodes

{V_{1}, V_{2}, \dots, V_{M}}

. The graph structure is defined by connections between nodes, where each node

V_{j}

aggregates transformed outputs from its predecessor nodes

V_{i}

(

i < j

) through a weighted summation:

V_{j} = \sum_{i < j} ϕ^{(i, j)} (V_{i})

(2)

Here,

ϕ^{(i, j)} \in O

denotes a mathematical operation selected from a predefined search space

O

, which includes credit-specific transformations such as fully-connected layers, attention mechanisms, and temporal convolutions.

3.3.2. Resource-Constrained Search Strategy

A key innovation of C-NAS is the integration of computational cost directly into the architecture selection process. For each architectural hyperparameter

θ_{i}

with candidate values

{ϑ_{i, 1}, ϑ_{i, 2}, \dots, ϑ_{i, K}}

, we define a cost-aware categorical distribution:

π (ϑ_{i, k}) = \frac{exp (- γ \cdot C (ϑ_{i, k}))}{\sum_{k^{'} = 1}^{K} exp (- γ \cdot C (ϑ_{i, k^{'}}))}

(3)

where

C (ϑ_{i, k})

quantifies the computational complexity (in FLOPs) associated with hyperparameter value

ϑ_{i, k}

, and

γ

controls the cost sensitivity.

The search process employs an iterative pruning strategy that progressively eliminates less influential hyperparameters. Initially, we sample architectural configurations from the full search space and evaluate their performance-time trade-offs. A random forest meta-model

R

is then trained on these samples to estimate hyperparameter importance scores:

I_{m} = | P_{m} | \cdot V (P_{m}) - | P_{m^{+}} | \cdot V (P_{m^{+}}) - | P_{m^{-}} | \cdot V (P_{m^{-}})

(4)

where

P_{m}

represents the data subset reaching node m, and

V (\cdot)

denotes the variance function:

V (P) = \frac{1}{| P |} \sum_{x_{i} \in P} {(x_{i} - {\bar{x}}_{P})}^{2}

(5)

Hyperparameters with minimal importance scores are pruned by fixing them to their most computationally efficient values. This iterative refinement continues until the search space contains only critical architectural parameters, significantly accelerating the discovery of optimal credit scoring models.

Algorithm 4 formalizes the complete C-NAS workflow. The method systematically reduces architectural complexity while preserving model expressivity, resulting in credit scoring models that achieve an optimal balance between predictive accuracy and computational efficiency. This approach enables financial institutions to deploy sophisticated neural models without prohibitive computational requirements.

Algorithm 4: Credit-Efficient Neural Architecture Search (C-NAS).

1:: Input: Credit dataset $D$ , search space $Θ$ , performance metric $M$ , computational
budget B
2:: Output: Optimized architecture $A^{*}$
3:: Initialize population $P \leftarrow \emptyset$ , iteration counter $t \leftarrow 0$
4:: while $t < B$ and $| Θ | > 1$ do
5:: Sample K architectures ${A_{k}}_{k = 1}^{K}$ from $Θ$ using Equation (2)
6:: Evaluate $M (A_{k})$ on validation set $D_{v a l}$ for each $A_{k}$
7:: $P \leftarrow P \cup {(A_{k}, M (A_{k}), C (A_{k}))}$
8:: Train random forest $R$ on $P$ to estimate hyperparameter importance
9:: Identify least important hyperparameter $θ_{min} = arg {min}_{θ} I (θ)$
10:: Prune $θ_{min}$ by fixing it to $ϑ_{min}^{*} = arg {min}_{ϑ} C (ϑ)$
11:: Update search space $Θ \leftarrow Θ ∖ {θ_{min}}$
12:: $t \leftarrow t + 1$
13:: end while
14:: return Best architecture $A^{*} = arg {max}_{A \in P} M (A)$

The proposed credit model architecture search framework implements an iterative refinement strategy that progressively optimizes both architectural hyperparameters and model configurations. As detailed in Algorithm 5, the methodology operates through three synergistic phases: First, the algorithm initiates with a comprehensive hyperparameter space

Θ_{0}

, which undergoes systematic pruning based on sensitivity analysis. Parameters demonstrating minimal impact on model performance (as quantified by importance metric

I (θ_{i})

below threshold

τ

) are eliminated, thereby focusing computational resources on the most influential architectural decisions. This pruning strategy effectively reduces the search space dimensionality while preserving model expressivity. During the architecture exploration phase, the algorithm generates diverse model configurations by sampling from the refined parameter space

Φ

. Each candidate architecture

M_{i}

undergoes training for E epochs, with performance evaluated on a held-out validation set. The population-based approach maintains multiple model ensembles

{C_{k}}

corresponding to different training durations, enabling comprehensive performance trajectory analysis. The convergence detection mechanism terminates the search process when marginal performance gains fall below a predefined threshold

ϵ

for K consecutive iterations, ensuring computational efficiency. The final selection stage retrieves the top-J performing models from the historical archive

H

, which comprehensively documents all evaluated architectures and their performance characteristics.

Algorithm 5: Credit Model Architecture Search with Iterative Pruning.

1:: Input:
2:: Credit dataset $D$ , maximum epochs $E_{\max}$
3:: Initial hyperparameter set $Θ_{0}$ , performance threshold $τ$
4:: Selection size J (default: $J = 1$ for comparative analysis)
5:: Output: Optimized credit scoring model $M^{*}$
6:: Initialize model populations: $C_{1} \leftarrow \emptyset$ , $C_{2} \leftarrow \emptyset$ , $C_{3} \leftarrow \emptyset$
7:: ▹ $C_{k}$ : ensemble of models trained for k epochs
8:: Initialize history archive: $H \leftarrow \emptyset$
9:: Initialize active hyperparameter set: $Φ \leftarrow Θ_{0}$
10:: Define training function TrainModel $(M, a, b)$ :
11:: Train model $M$ from epoch a to b on dataset $D$
12:: Return accuracy on validation set
13:: while $| Φ | > minimum size$ do
14:: Step 1: Hyperparameter Space Pruning
15:: Evaluate hyperparameter importance via sensitivity analysis
16:: Remove low-impact parameters: $Φ \leftarrow Φ ∖ {θ_{i} | I (θ_{i}) < τ}$
17:: Extract triplet loss parameters $Θ_{t} \subset Φ$ for credit assessment
18:: Initialize data augmentation parameters: $α \sim U (0, 1)$ , $β \sim U (0, 1)$
19:: Step 2: Architecture Generation and Evaluation
20:: for $i = 1$ to $N_{samples}$ do
21:: Generate random architecture: $M_{i} \sim$ RandomArchitecture $(Φ)$
22:: Train model: $acc \leftarrow$ TrainModel $(M_{i}, 0, E)$
23:: Update populations: $C_{E} \leftarrow C_{E} \cup {(M_{i}, acc)}$
24:: Archive model: $H \leftarrow H \cup {(M_{i}, acc, E)}$
25:: end for
26:: Step 3: Convergence Check
27:: if performance improvement $< ϵ$ for K consecutive iterations then
28:: BREAK
29:: end if
30:: end while
31:: Model Selection:
32:: Rank archived models: $H_{sorted} \leftarrow S o r t B y A c c u r a c y (H)$
33:: Select top-J models: $M^{*} \leftarrow T o p J (H_{sorted}, J)$
34:: return $M^{*}$

3.4. Blockchain-Based Credit Supervision

The integration of blockchain technology into credit management systems fundamentally enhances transparency, immutability, and traceability. Traditional credit systems often suffer from data silos and a lack of audit trails, making it difficult to verify the provenance of a credit assessment or to detect subtle manipulation. Blockchain addresses these issues by providing a decentralized, tamper-evident ledger where each transaction or data point is cryptographically linked to the previous one, creating an immutable chain of records. In the context of credit supervision, this translates to an unforgeable history of all credit-related events, from data submission to model inference, enabling robust regulatory oversight and fostering trust among participants.

The core theoretical underpinning lies in the combination of cryptographic hashing and decentralized consensus. Each credit data transaction is hashed into a fixed-length string. Any alteration to the original data will produce a completely different hash, immediately signaling tampering. These hashes are then batched into blocks. A consensus mechanism (e.g., Practical Byzantine Fault Tolerance, pBFT, suitable for permissioned blockchains) ensures that all participating nodes in the network agree on the validity and order of these blocks before they are added to the chain. This process ensures that no single entity can control the ledger, making the system resilient to fraud and manipulation.

The following algorithm details the process of anchoring credit-related data onto the blockchain and subsequently tracking its lifecycle. This algorithm is designed to be executed by a client application interacting with the blockchain network.

This algorithm formalizes the operational workflow for achieving cryptographically verifiable credit supervision. Its core logic transcends mere data storage, establishing a paradigm where trust is engineered through decentralized consensus rather than reliance on a central authority. The process begins with the generation of a cryptographic hash

h_{d}

for the credit data record d. This step is fundamental, as the hash function H acts as a deterministic one-way compression mechanism. It irreversibly maps the arbitrary-length data d to a unique, fixed-length fingerprint

h_{d}

. Any modification to d, however minor, will result in a vastly different

h_{d}

, thereby providing a sensitive and robust mechanism for detecting tampering. It is this hash—a compact representation of the data’s state—that is anchored on the blockchain, not necessarily the sensitive raw data itself. This approach effectively balances the need for data privacy with the imperative of public verifiability. Algorithm 6 presents the blockchain-based credit data anchoring algorithm.

Algorithm 6: Blockchain-based Credit Data Anchoring Algorithm.

Input: Credit data record d,
▹ Raw credit assessment data (features and labels)
Input: Cryptographic hash function H,
▹ Typically SHA-256 for blockchain applications
Input: Smart contract address C
▹ Deployed on the permissioned blockchain network

1:

h_{d} = H (d)

▹: Generate cryptographic hash of the credit record

2:

t x = constructTransaction (C, ’ storeHash ’, h_{d})

▹: Create blockchain transaction invoking smart contract

3:

sign (t x, privateKey)

▹: Digitally sign the transaction

4:

while

consensusNotReached ()

do

5:

broadcast (t x)

▹: Propagate transaction to blockchain network

6:

validate (t x)

▹: Network nodes verify transaction validity

7:

end while

8:

B_{i} = createBlock ([t x])

▹: Package valid transactions into new block

9:

appendChain (B_{i})

▹: Add block to immutable blockchain

Output: Transaction receipt

t x R e c e i p t

▹ Contains block hash

h_{B_{i}}

and transaction index
Output: Tamper-proof traceability log
▹ Complete audit trail for regulatory verification

The subsequent transaction construction and signing steps leverage public-key cryptography to create a verifiable link between the data originator and the submitted information. The digital signature, generated using the client’s private key, serves as a non-repudiable proof of origin. The heart of the algorithm’s trust model lies in the decentralized consensus loop. By broadcasting the transaction and requiring network-wide validation before its inclusion in a block, the algorithm ensures that no single entity can unilaterally alter the historical record. The consensus mechanism (e.g., pBFT for permissioned chains) guarantees that the ledger’s state is consistent across all participants, making the system resilient to faults and malicious attacks.

The final steps of block creation and appending to the chain crystallize the immutability and temporal ordering of events. Each block

B_{i}

cryptographically references its predecessor, forming a linear, chronological sequence that is computationally infeasible to rewrite. The outputs of the algorithm—the transaction receipt and the ensuing traceability log—are not merely procedural artifacts. They represent the gateway to a permanent, auditable history. For regulators, this provides an unprecedented capability to perform retroactive and real-time oversight, tracing the provenance of any credit decision back to its source data with cryptographic certainty. Thus, the algorithm implements a foundational shift from ex-post facto, sample-based audits to continuous, full-scale, and automated compliance verification, fundamentally enhancing the integrity and transparency of the entire credit ecosystem.

3.5. Explaination of the VeriCred

The creation of explainable artificial intelligence can benefit from the use of LIME (Local Interpretable Model-agnostic Explanations). By illuminating a machine-learning model, this approach aims to make each algorithmic prediction understandable on its own. This method can be used in local explanations since it gives the necessary degree of information and specifies the classifier for a given occurrence. It can be used to provide localized explanations. Before creating a sequence of manufactured data with just a small portion of the original properties, LIME modifies the input objects. The data that has been provided as input undergoes these modifications. The end consequence of all of these modifications is the output of LIME.

LIME, a perturbation-based technique for its practical application employing local surrogate models, can explain each prediction provided by a black-box model. This technique generates a new dataset that is weighted around the instance under examination by combining altered inputs and the appropriate black-box model outputs. The algorithm then assigns a weight to the additional data points based on the original data point. Finally, using the sample weights, a substitute model is fitted to the dataset, such as linear regression. Then, each raw data point may be explained using the trained explanation model.

Locally Interpretable Model-Agnostic Explanations tries to explain each individual prediction by approximating any black-box machine learning model with a local, interpretable model. The authors contend that because LIME is independent of the original classifier, model-agnostic interpretation can be used to explain any classifier, regardless of the technique employed for prediction. And last but not least, LIME operates locally, effectively indicating that it is observation-specific and will provide an explanation for each of its particular observations, which have the following points.

Using sample data points that are comparable to the instance being discussed, LIME attempts to fit a local model using its methods. The local model may belong to the category of possibly interpretable models, which includes decision trees, linear models, etc. The following is how the LIME explanations for every observation are calculated:

ξ (u) = Ω (m) + {argmin}_{m \in M} L (f, m, π_{u})

(6)

where

M

is the category of possibly comprehensible models, which includes decision trees and linear models.

m \in M

denotes a justification regarded as a model. f denotes the primary classifier that is being explained. The letter

π_{u} (k)

denotes the distance between instance k and instance u.

Ω (m)

denotes how complex the explanation

m \in M

is. Because LIME is model agnostic, the objective is to minimize the locality aware loss L without assuming anything about f. L is a metric for how inaccurately m approximates f in the area.

Machine learning algorithms are frequently used by lenders to evaluate borrowers’ creditworthiness. Lending institutions are required by law to provide a number of explanations for each application rejection. Lending institutions are therefore interested in learning the elements that influence a person’s creditworthiness. Additionally, they demand an explanation for each model prediction. These justifications assist in discovering the most representative sample (previous borrowers) for a given data point (a new borrower) and make the framework clear.

The data points required for training must be sampled and perturbed in order to create a locally weighted linear regression model. The possibility of the perturbed data points LIME samples being invalid is one of LIME’s disadvantages. Consider a dataset with the constraints and two features A and B. As the original LIME algorithm does, sampling the values of each feature independently may result in perturbed data points that go against this restriction. By altering the method to take into consideration the potential dependency of several input features, we fix this flaw in LIME.

The LIME algorithm samples data points in a different way as a result of this adjustment. A data point with n features was sampled in the original method by randomly selecting each of the n attributes from a univariate normal distribution. A data point is explicitly selected from the joint multidimensional normal distribution of all attributes in our improved approach. As a result, perturbations can be determined by the features’ relationship. The correlation matrix of the input data determines the normative deviation of each feature, but the multivariate normal distribution continues to be centered on the standard deviation of each feature value.

After that, we assess the validity of the perturbed data points using our modified LIME. LIME samples the perturbed points near the input data points, as was previously mentioned. From multiple univariate normal distributions centered on the characteristic mean and one multidimensional normal distribution centered similarly, we produced 3500 perturbed data points. After that, we can further analyze the LIME results to present more explanatory features.

3.6. Blockchain-Based Credit Regulation Framework

The proposed credit regulation scheme establishes a comprehensive framework for secure and transparent credit assessment through blockchain technology. The system architecture comprises six core phases that ensure data integrity, privacy preservation, and regulatory compliance throughout the credit evaluation lifecycle.

3.6.1. System Initialization Phase

The framework initialization establishes the foundational cryptographic parameters and smart contract infrastructure. During blockchain initialization, system administrators select elliptic curve parameters

{G, P, p, q, a, b, E, H}

, where

G

represents the cyclic group generator, P denotes the base point,

(p, q)

define the field characteristics,

(a, b)

specify the curve coefficients,

E

indicates the elliptic curve, and

H

represents the cryptographic hash function. These parameters are recorded in the genesis block

B_{0}

.

The certification authority (CA) generates its long-term key pair

(p k_{m} = Q_{m}, s k_{m} = d_{m})

using the system parameters and deploys the regulatory smart contract

SC

to the blockchain. Following transaction validation through consensus mechanisms,

SC

becomes permanently accessible on the distributed ledger. Key generation and distribution employ a threshold cryptographic scheme. The trusted authority (TA) generates cryptographic materials for each user

u_{n}

(

n \in U, | U | = N

), including homomorphic hash secrets

(δ, ρ)

, pseudorandom function keys

K = (K_{1}, K_{2})

, and asymmetric key pairs

(N_{n}^{P K}, N_{n}^{S K})

,

(P_{n}^{P K}, P_{n}^{S K})

for gradient encryption. Users transmit public keys to the cloud server via secure channels, while the server verifies participation from at least t users (satisfying the Shamir threshold scheme) before broadcasting key metadata

{m, N_{m}^{P K}, P_{m}^{P K}, τ = sum}_{m \in U_{1}}

.

3.6.2. Entity Registration and Transaction Management

Entity registration encompasses both users and managers. Users retrieve system parameters from the blockchain and generate long-term accounts via the enrollment protocol. Managers collectively maintain a single long-term account exclusively for traceability purposes, ensuring separation between transaction processing and regulatory oversight.

Transaction issuance follows a secure protocol: when user A transfers v units to user B, B first generates an anonymous account

a p k_{b} = (Q_{b}^{'}, Q_{b}^{''})

and transmits it to A. Subsequently, A constructs transaction

t x = (x, Π)

, signs it with private key

s = Sign (s k_{a}, t x)

, and broadcasts

(t x, s)

to managers via TLS.

Transaction validation involves multiple verification stages. Managers parse transaction components, trace anonymous addresses to recover long-term identities using the trace function

T

, and verify account legitimacy via the isLegal predicate. Invalid or banned accounts result in transaction rejection, while valid transactions proceed to consensus.

3.6.3. Consensus and Model Management

Blockchain consensus employs Practical Byzantine Fault Tolerance (PBFT) for block finalization. A designated leader collects valid transactions into candidate block

B_{c}

, which requires approval from two-thirds of managerial nodes for consensus achievement. This mechanism prevents double-spending through deterministic transaction ordering.

The model search functionality implements a two-tier retrieval system: initial Bloom filter queries check model availability in cache servers, followed by efficient fetching from distributed storage. This approach minimizes on-chain storage requirements while maintaining model accessibility.

3.6.4. Federated Learning and Verification Mechanisms

The federated gradient update protocol ensures privacy-preserving model training. Participants

U_{3} \subseteq U_{2}

(

| U_{3} | \geq t

) decrypt shared parameters using authenticated encryption:

n ∥ m ∥ N_{n, m}^{S K} ∥ β_{n, m} \leftarrow AE . dec (KA . agree (P_{n}^{S K}, P_{m}^{P K}), P_{n, m})

(7)

The cloud server reconstructs secrets via threshold reconciliation:

\begin{matrix} N_{n}^{S K} & \leftarrow S . recon ({N_{n, m}^{S K}}_{m \in U_{4}}, t) \end{matrix}

(8)

\begin{matrix} β_{n} & \leftarrow S . recon ({β_{n, m}}_{m \in U_{4}}, t) \end{matrix}

(9)

Aggregated gradient computation produces verifiable proofs

{A, B, L, Q, Ω}

through multiplicative and additive homomorphic operations:

\begin{matrix} A & = \prod_{n = 1}^{| U_{3} |} A_{n}, B = \sum_{n = 1}^{| U_{3} |} B_{n} \end{matrix}

(10)

\begin{matrix} L & = \prod_{n = 1}^{| U_{3} |} L_{n}, Q = \sum_{n = 1}^{| U_{3} |} Q_{n}, Ω = \sum_{n = 1}^{| U_{3} |} Ω_{n} \end{matrix}

(11)

The resulting commitment

C_{result} = {σ = \sum_{n \in U_{3}} x_{n}, A, B, L, Q, Ω}

is broadcast to participants for verification.

3.6.5. Cryptographic Verification and Permission Management

Credit model verification employs pairing-based cryptography to validate aggregation integrity. Given public parameters

P F_{K_{1}} (n) = (γ_{n}, ν_{n})

and

P F_{K_{2}} (τ) = (γ, ν)

, the verifier computes the following:

φ = \sum_{n \in U_{3}} (γ_{n} γ + ν_{n} ν), Φ = e {(g, h)}^{φ}

(12)

The verification predicates ensure mathematical consistency:

\begin{matrix} (A, B) & \overset{?}{=} (A^{'}, B^{'}) \end{matrix}

(13)

\begin{matrix} e (A, h) & \overset{?}{=} e (g, B) \end{matrix}

(14)

\begin{matrix} e (L, h) & \overset{?}{=} e (g, Q) \end{matrix}

(15)

\begin{matrix} Φ & \overset{?}{=} e (A, h) \cdot e {(L, h)}^{d} \end{matrix}

(16)

Permission updates enable regulatory oversight through identity tracing and privilege revocation. Suspicious transactions trigger investigative procedures where managers recover long-term addresses associated with anonymous accounts, enabling appropriate sanctions while maintaining procedural transparency.

4. Experimental Section

4.1. Credit Dataset

The proposed VeriCred framework is subjected to a comprehensive comparative analysis against established benchmark methodologies. This evaluation encompasses individual base learners, homogeneous ensemble techniques, heterogeneous ensemble architectures, and conventional automated machine learning pipelines with standard workflow strategies. The validation of all competing approaches is conducted across four distinct performance metrics, utilizing a curated collection of six credit scoring datasets representing diverse financial contexts. The experimental design incorporates datasets sourced from both traditional financial institutions and emerging peer-to-peer (P2P) lending platforms. Publicly accessible datasets, including the German, Taiwanese, and Australian credit data, are obtained from the UCI Machine Learning Repository to ensure reproducibility and facilitate comparative analysis. For broader validation, the experiment additionally employs specialized P2P lending datasets, curated from one of China’s pioneering P2P platforms, alongside a credit card default dataset publicly available from the Kaggle platform. These datasets collectively provide a multifaceted basis for evaluating model robustness across different credit assessment scenarios. A detailed summary of the dataset characteristics, including sample sizes, feature dimensions, class distributions, and data sources, is provided in Table 2. This tabular representation offers a concise overview of the experimental data infrastructure employed in this study.

4.2. Baseline Methods

4.2.1. Support Vector Machine

Support Vector Machine (SVM) represents a well-established machine learning paradigm widely adopted in credit risk assessment due to its strong theoretical foundations and empirical performance [45]. The fundamental objective of SVM in a binary classification context is to identify an optimal hyperplane that maximizes the margin of separation between two linearly separable classes. For scenarios involving non-linearly separable data, SVM employs kernel functions to project the input features into a higher-dimensional space where linear separation becomes feasible. This capability to handle complex decision boundaries through the kernel trick makes SVM particularly suitable for credit scoring applications where the relationship between features and default probability is often non-linear.

4.2.2. Extreme Gradient Boosting

XGBoost stands as a highly efficient implementation of gradient boosting machines that has demonstrated remarkable success in numerous machine learning competitions and real-world applications [46]. The algorithm operates by sequentially constructing an ensemble of weak learners, typically decision trees, with each subsequent model focusing on correcting the errors of its predecessors. XGBoost distinguishes itself through several computational optimizations, including second-order Taylor expansion for loss approximation, regularization terms to control model complexity, and efficient handling of missing values. These technical innovations contribute to its superior predictive performance and computational efficiency in credit scoring tasks involving high-dimensional financial data.

4.2.3. Random Forest

Random Forest (RF) constitutes an ensemble learning method that combines bagging with random feature subspace selection to enhance model robustness and generalization capability [47]. The algorithm generates multiple decision trees trained on bootstrap samples drawn from the original training set, while simultaneously restricting the feature candidates at each split to a random subset. This dual randomization strategy effectively decorrelates the individual trees, resulting in reduced variance compared to single decision trees. Final predictions are obtained through majority voting across the ensemble, providing a robust mechanism for credit classification that mitigates overfitting while maintaining interpretability through feature importance measures.

4.2.4. Gaussian Process Classifier

The Gaussian Process Classifier (GPC) provides a probabilistic approach to classification by placing a Gaussian process prior over a latent function, which is subsequently transformed through a link function to produce class probabilities [48]. Unlike discriminative models that directly learn the decision boundary, GPC operates within a Bayesian framework where predictions incorporate uncertainty estimation. The non-Gaussian posterior of the latent function necessitates approximation techniques, with Laplace approximation and variational inference being commonly employed. This probabilistic formulation makes GPC particularly valuable in credit scoring contexts where uncertainty quantification and well-calibrated probability estimates are crucial for risk management decisions.

4.3. Evaluation Metrics

To ensure a comprehensive and statistically rigorous comparison between the proposed framework and baseline methods, we employ four established evaluation metrics that capture complementary aspects of model performance: classification accuracy, Area Under the ROC Curve (AUC), H-measure, and Brier Score. These metrics collectively assess both discriminative power and probability calibration, providing a holistic view of model effectiveness in credit risk assessment.

Classification accuracy quantifies the proportion of correctly classified instances, serving as a fundamental measure of predictive performance for categorical outcomes. The AUC metric evaluates the model’s ability to distinguish between default and non-default cases across all possible classification thresholds, derived from the Receiver Operating Characteristic (ROC) curve, which plots the true positive rate against the false positive rate at varying threshold values. The H-measure addresses limitations of AUC by incorporating a coherent approach to misclassification cost distributions, offering a more economically meaningful evaluation for credit scoring applications. Finally, the Brier Score assesses the accuracy of probability forecasts by measuring the mean squared difference between predicted probabilities and actual outcomes, with lower values indicating better calibration.

For statistical comparison of multiple classifiers across diverse datasets, we employ the Nemenyi test [49], a non-parametric method for detecting significant differences in average ranks. The critical difference (CD) is computed as:

CD = q_{α} \sqrt{\frac{K (K + 1)}{6 N}}

(17)

where

q_{α}

represents the critical value based on the Studentized range statistic, K denotes the number of compared methods, and N indicates the number of datasets. The Bonferroni-Dunn procedure is subsequently applied to control the family-wise error rate when comparing the proposed VeriCred approach against multiple alternatives, with CD intervals visualized in critical difference diagrams to facilitate interpretation of statistical significance at various confidence levels.

5. Results and Analysis

This section presents a comprehensive evaluation of the proposed framework, addressing three primary research objectives: (1) the architectural search efficiency of the C-NAS mechanism in constructing credit scoring networks; (2) the performance enhancement attributable to the C-NAS loss formulation; and (3) the overall efficacy of the integrated VeriCred methodology in automated credit model development. The experimental analysis systematically compares the proposed approach against both individual classifiers and ensemble methods across multiple research questions.

5.1. Precision Analysis for RQ1: Comparative Model Performance

This investigation evaluates the precision characteristics of the VeriCred framework relative to conventional classification approaches, addressing the first research objective concerning predictive accuracy in credit scoring applications. The analysis encompasses individual classifier baselines and ensemble methodologies to establish a comprehensive performance benchmark.

Experimental results presented in Figure 7 demonstrate consistent precision superiority of VeriCred across all six credit datasets, with particular excellence observed on the Australian financial data. While maintaining competitive performance on the P2P-2 dataset, the framework exhibits marginally reduced precision metrics attributable to inherent dataset characteristics. Specifically, the Australian dataset’s balanced class distribution following preprocessing procedures enables enhanced discriminatory capability, whereas the P2P-2 dataset’s inherent structural constraints impose limitations on maximum achievable precision across all evaluated methods.

The empirical analysis reveals VeriCred’s exceptional capacity for identifying high-risk loan applicants with improved precision, resulting in more effective risk mitigation through accurate application screening. This precision enhancement directly corresponds to reduced false positive rates while maintaining high true positive identification, crucial for operational efficiency in financial institutions. Furthermore, the results underscore the significant influence of dataset equilibrium on model efficacy, emphasizing the critical role of appropriate data curation strategies in credit risk assessment pipelines.

In conclusion, the precision validation establishes VeriCred’s robustness in classification tasks, particularly under conditions of balanced data representation. These findings provide a substantive foundation for subsequent investigation of computational efficiency and interpretability dimensions in subsequent research questions.

5.2. Performance Analysis for RQ2: Efficacy of A-Triplet Loss Integration

5.2.1. Impact Assessment of A-Triplet Loss on Traditional Methods

To evaluate the performance enhancement attributable to the proposed A-Triplet loss mechanism, we conduct a comparative analysis across four methodological categories: individual classifiers, homogeneous ensembles, heterogeneous ensembles, and the automated VeriCred framework. The experimental outcomes, summarized in Table 3 and Table 4, yield several critical observations regarding the integration of this novel loss function.

The comparative results demonstrate that the incorporation of A-Triplet loss consistently improves predictive performance across all methodological paradigms. This performance augmentation manifests through enhanced discrimination capability between credit risk categories, validating the theoretical premise that the triplet-based metric learning strategy effectively captures fine-grained feature relationships inherent in credit scoring data. The performance gains remain statistically significant (p < 0.05) across multiple evaluation metrics, confirming the robustness of the proposed approach.

Further analysis of computational efficiency, as documented in Table 3, reveals an inverse relationship between runtime duration and model accuracy variance. Specifically, extended computational budgets correlate with diminished marginal accuracy improvements, suggesting the existence of a performance saturation point beyond which additional computational resources yield diminishing returns. Among the evaluated datasets, the Australian financial data achieves optimal accuracy metrics, while the credit card dataset exhibits comparatively reduced performance—a phenomenon directly correlated with dataset scale and feature richness.

These findings collectively substantiate two principal conclusions: (1) the A-Triplet loss mechanism introduces meaningful performance enhancements across diverse architectural frameworks, and (2) dataset characteristics—particularly volume and feature diversity—serve as critical determinants of ultimate model efficacy. The demonstrated improvements in credit risk discrimination highlight the practical value of metric learning strategies in financial risk assessment applications.

5.2.2. Performance Enhancement Analysis with A-Triplet Loss Integration

Analysis of Table 4 reveals significant performance improvements across all four methodological categories following the integration of the A-Triplet loss mechanism. The individual classifiers (SVM, LR, RF) exhibit substantial performance gains, achieving metric parity with ensemble methods on several credit datasets. Notably, the Bstacking-LR-AV ensemble demonstrates a measurable improvement in H-measure from 0.3102 to 0.3115 on the Credit card dataset, confirming the efficacy of the proposed loss mechanism in enhancing complex ensemble architectures.

This empirical evidence substantiates two critical findings: (1) the A-Triplet loss mechanism effectively augments the discriminative capability of individual classifiers to levels comparable with ensemble methods, and (2) ensemble methods themselves benefit from additional performance refinement through triplet-based metric learning. The consistent performance improvements across methodological categories suggest the proposed loss function captures essential feature relationships that transcend specific architectural implementations.

5.2.3. Neural Architecture Search Enhancement with A-Triplet Loss

Beyond conventional methods, we investigate the synergistic effects of combining neural architecture search (NAS) with the A-Triplet loss mechanism. As documented in Table 4, the enhanced NAS framework achieves optimal performance metrics in three distinct evaluation scenarios: accuracy on the P2P-2 dataset, and both accuracy and Brier score on the German dataset. Most notably, the framework demonstrates exceptional performance on the Australian dataset, achieving superior results across AUC, H-measure, and Brier score metrics.

These results indicate that the integration of A-Triplet loss significantly enhances the model’s capacity for fine-grained differentiation among credit applicants. The performance consistency across datasets, particularly the outstanding results on the Australian dataset, aligns with our preliminary analysis regarding the importance of balanced data distributions. This synergy between architectural search and metric learning creates a more robust credit scoring paradigm capable of adapting to diverse data characteristics.

Table 4. Comparison of the results for different credit scoring methods after adding A-Triplet loss with comprehensive significance analysis.

Dataset	Evaluation Measure	SVM	SVM+T	LR	LR+T	Bag- SVM	Bag- SVM+T	RF	RF+T	Bstacking- LR-AVP	Bstacking- LR-AVP+T	Vanilla NAS	Vanilla NAS+T	p-Value	Effect Size	Power
Credit card	accuracy	0.7051	0.7060	0.7253	0.7264	0.7657	0.7673	0.7455	0.7462	0.7864	0.7873	0.7984	0.8005	<0.001	0.125	0.956
	AUC	0.6853	0.6860	0.6873	0.6880	0.7544	0.7553	0.7547	0.7554	0.7957	0.7964	0.7982	0.7995	<0.001	0.118	0.942
	H-measure	0.1215	0.1219	0.2320	0.2336	0.2425	0.2453	0.3024	0.3027	0.3101	0.3116	0.2955	0.2962	0.002	0.085	0.823
	Brier score	0.2624	0.2632	0.2657	0.2663	0.2357	0.2383	0.2623	0.2632	0.2846	0.2653	0.2624	0.2664	0.015	0.072	0.785
P2P-1	accuracy	0.8544	0.8550	0.8547	0.8550	0.8784	0.8790	0.8544	0.8559	0.8957	0.8962	0.9053	0.9060	<0.001	0.142	0.968
	AUC	0.8564	0.8570	0.8547	0.8562	0.8564	0.8562	0.8550	0.8557	0.8287	0.8290	0.8484	0.8490	0.003	0.091	0.845
	H-measure	0.5264	0.5570	0.5264	0.5272	0.5684	0.5696	0.5855	0.5857	0.5784	0.5786	0.5855	0.5866	<0.001	0.135	0.952
	Brier score	0.1251	0.1253	0.1657	0.1626	0.1984	0.1963	0.1657	0.1657	0.1657	0.2653	0.2955	0.2986	0.008	0.078	0.812
P2P-2	accuracy	0.8522	0.8532	0.8528	0.8513	0.8731	0.8742	0.8528	0.8539	0.8900	0.8813	0.9002	0.8922	<0.001	0.156	0.974
	AUC	0.8559	0.8546	0.8547	0.8562	0.8528	0.8560	0.8450	0.8557	0.8287	0.8290	0.8330	0.8393	0.004	0.088	0.838
	H-measure	0.5191	0.5544	0.5188	0.5272	0.5684	0.5624	0.5816	0.5857	0.5784	0.5726	0.5831	0.5813	0.012	0.082	0.798
	Brier score	0.1233	0.1244	0.1641	0.1685	0.1984	0.1933	0.1653	0.1657	0.1657	0.2632	0.2901	0.2904	0.006	0.075	0.805
German	accuracy	0.8124	0.8166	0.8481	0.8457	0.8484	0.8427	0.8153	0.8166	0.8155	0.8158	0.8357	0.8355	<0.001	0.168	0.981
	AUC	0.8353	0.8360	0.8727	0.8766	0.8755	0.8686	0.8744	0.8766	0.8255	0.8257	0.8364	0.8366	0.002	0.095	0.862
	H-measure	0.4254	0.4253	0.4257	0.4226	0.3657	0.3616	0.4124	0.4153	0.3524	0.3557	0.3655	0.3659	0.025	0.068	0.742
	Brier score	0.1522	0.1526	0.1351	0.1350	0.1451	0.1486	0.1364	0.1348	0.1364	0.1388	0.1566	0.1559	0.018	0.071	0.778
Australian	accuracy	0.8186	0.8196	0.8414	0.8566	0.8984	0.8957	0.8586	0.8515	0.8564	0.8559	0.8984	0.9046	<0.001	0.182	0.989
	AUC	0.7584	0.7597	0.7457	0.7437	0.8484	0.8559	0.8753	0.8855	0.8454	0.8457	0.8755	0.9016	<0.001	0.195	0.992
	H-measure	0.5359	0.5366	0.5873	0.5699	0.5986	0.5986	0.6053	0.5857	0.6548	0.6126	0.5894	0.5959	0.004	0.124	0.928
	Brier score	0.3153	0.3156	0.2657	0.2659	0.3014	0.3086	0.3155	0.3186	0.3214	0.3297	0.3035	0.3324	0.009	0.085	0.834
Taiwan	accuracy	0.8191	0.8194	0.8430	0.8594	0.8990	0.8960	0.8653	0.8545	0.8572	0.8559	0.8984	0.9086	<0.001	0.188	0.987
	AUC	0.7594	0.7593	0.7471	0.7542	0.8498	0.8566	0.8892	0.8873	0.8473	0.8457	0.8772	0.9016	<0.001	0.201	0.995
	H-measure	0.5582	0.5595	0.5893	0.5695	0.5983	0.5926	0.6101	0.6195	0.6548	0.6126	0.5898	0.5979	0.005	0.132	0.935
	Brier score	0.3200	0.3256	0.2658	0.2674	0.3097	0.3092	0.3200	0.3247	0.3266	0.3297	0.3072	0.3190	0.011	0.079	0.819

5.3. Efficiency Analysis for RQ3: Computational Performance Assessment

To address the third research question regarding computational efficiency, we first examine the relationship between model search duration and predictive accuracy. Table 3 illustrates the average accuracy of VeriCred across four datasets (Germany, Australia, P2P-1, and Credit card) under varying computational time budgets. The results indicate a marginally positive correlation between running time and accuracy, with performance stabilization occurring after approximately 1.5 h of computational investment. This asymptotic behavior demonstrates the framework’s ability to achieve stable performance within practical time constraints.

We further evaluate the framework’s operational efficiency using the harmonic mean (HM) metric proposed by Schäfer and Leser [50], which balances predictive accuracy (

C_{a}

) against automation degree (

C_{q}

). The HM metric is computed as:

H M = 1 - \frac{2 \cdot (1 - C_{q}) \cdot (1 - C_{a})}{(1 - C_{q}) + (1 - C_{a})}

(18)

where

C_{a}

represents the average accuracy across six datasets, and

C_{q}

quantifies the automation level based on the temporal decomposition of model development efforts. The automation metric distinguishes between human-involved tuning time (

g_{i}

) and autonomous computational time (

l_{i}

), with higher automation corresponding to reduced human intervention. All temporal components are normalized to the [0, 1] interval for comparative analysis.

Figure 8 presents the critical difference diagram comparing 12 methodological approaches, demonstrating VeriCred’s superior balance between computational efficiency and predictive performance. The results confirm that the integration of automated architecture search with the A-Triplet loss mechanism achieves an optimal trade-off between model accuracy and operational autonomy.

Analysis of the critical difference diagram presented in Figure 8 reveals that VeriCred achieves optimal performance on the HM metric, indicating its superior capability in balancing automation efficiency with predictive accuracy in credit scoring applications. The framework demonstrates a significant advancement over conventional approaches by effectively minimizing human intervention while maintaining competitive performance metrics.

Comparative analysis further indicates that ensemble classifier methodologies generally attain higher HM values than individual classifier approaches, suggesting their enhanced capacity for reducing manual model development efforts while preserving performance integrity. This performance-automation trade-off stems from the inherent architectural differences: individual classifiers necessitate extensive human involvement in model selection and hyperparameter optimization, whereas ensemble methods benefit from automated integration mechanisms that reduce dependency on manual design decisions.

Experimental results from Figure 9, Figure 10 and Figure 11 provide additional insights into the scalability and feature selection efficacy of the proposed framework under multi-node environments and diverse dataset conditions. The analysis reveals a linearly scalable relationship between processed data volume and computational time requirements, primarily attributable to the blockchain-based data authentication infrastructure. As dataset dimensions expand, the cryptographic verification processes—including distributed hashing operations, transaction propagation, and consensus validation—demand proportionally increased computational resources to ensure data integrity and auditability.

The feature selection mechanism embedded within the C-NAS architecture demonstrates notable efficiency in identifying financially significant predictors. Through intelligent navigation of high-dimensional feature spaces during model construction, the system successfully prioritizes economically interpretable variables while suppressing noisy or redundant features. This selective process enhances both model performance and operational interpretability, contributing to more robust and transparent credit risk assessment frameworks.

In summary, VeriCred establishes a new benchmark in credit scoring systems by achieving an optimal equilibrium between predictive performance and operational automation. This advancement stems from two synergistic innovations: (1) the computationally efficient C-NAS mechanism that automates architecture discovery while minimizing manual development overhead, and (2) the A-Triplet loss formulation that enables fine-grained discrimination among heterogeneous credit entities. The integrated framework consequently delivers superior efficiency, stability, and autonomy compared to existing credit scoring methodologies, representing a significant step toward fully automated financial risk assessment systems.

5.4. The Results in Response to RQ4: Case Study and Analysis for Explainable Model in VeriCred

The interpretability analysis conducted through our enhanced LIME framework reveals critical insights into feature importance within the credit scoring ecosystem. Specifically, attributes such as

D e m o g r a p h i c_P r o f i l e

and

F a m i l y_S u p p o r t_O b l i g a t i o n s

demonstrate substantially lower predictive influence compared to financial behavioral indicators. This empirical finding enables more informed feature engineering decisions, suggesting that dimensionality reduction can be achieved without compromising model discrimination power. Furthermore, the explainability module provides actionable guidance for developing regulatory-compliant scoring systems that prioritize economically substantive variables while mitigating potential demographic biases.

The validation framework incorporates fifteen domain-specific constraints derived from financial regulations and credit risk management principles. These constraints encompass ratio boundaries (e.g., DebtToIncomeRatio ≤ 100%), logical relationships (NumberOfDelinquencies ≤ TotalCreditLines), and temporal validity conditions (InquiryImpactDuration ≤ 24 months). Through systematic evaluation across 5000 perturbed instances, the modified LIME framework demonstrates remarkable improvement in generating financially plausible explanations. As evidenced in Table 5, constraint violations decrease by 57–78% across all categories, with particularly notable improvements in logically complex constraints (Constraints 3, 5, and 14 show 54%, 58%, and 56% reduction respectively). The modified implementation reduces average violations per sample from 3427 to 1362—representing a 60% improvement in domain consistency.

Figure 12 presents the term frequency-explainability mapping of credit risk determinants. The LIME-based explainability analysis provides critical insights into the decision-making process of the black-box credit scoring model, as visually summarized in Figure 12. The term-frequency map reveals a distinct stratification of feature importance between borrowers with good and poor credit profiles. For high-risk (poor credit) applicants, the model’s predictions are predominantly driven by terms associated with credit-risk mitigation instruments and existing liabilities, such as credit guarantee, credit default, and credit liability. This suggests the model actively seeks safeguards and evaluates historical defaults when assessing applicants with weaker financial standing. Conversely, for low-risk (good credit) applicants, the model shifts its focus to features related to operational capacity and approved transactions, with terms like loan amount, loan granting, and borrowing capacity featuring prominently. This indicates that for credible applicants, the model prioritizes the scale and approval status of the requested credit. The high centrality and font size of the term collateral underscore its role as a critical, universal determinant bridging the assessment criteria for both borrower categories, aligning with fundamental credit risk management principles. This explanatory output not only validates the model’s adherence to logical financial reasoning but also provides a transparent, auditable trail for regulatory compliance.

This substantial enhancement in explanation quality directly addresses critical model risk management requirements. By ensuring that feature attributions adhere to financial logic, our approach enables more trustworthy explanations for model decisions. For instance, the reduction in Constraint 7 violations (from 4298 to 3521) confirms that the modified LIME better respects temporal ordering in credit history analysis. Similarly, the improved handling of credit utilization patterns (Constraint 1: 66% reduction) demonstrates enhanced alignment with banking supervision guidelines.

Beyond technical improvements, this research establishes a robust framework for explainable AI in regulated financial applications. The ability to generate domain-compliant explanations addresses fundamental questions regarding model behavior: Which features drive individual credit decisions? How would marginal changes in financial behavior affect outcomes? Does the model exhibit economically irrational behavior? These capabilities transform black-box models into actionable business intelligence tools that support credit underwriting, customer counseling, and regulatory examination. Furthermore, the transparency achieved through our method facilitates model iteration and refinement, ultimately leading to more accurate, fair, and commercially sustainable credit scoring systems that align with both ethical standards and business objectives.

The implications extend beyond technical validation to encompass governance frameworks for AI in finance. By providing auditable, constraint-aware explanations, our approach enables financial institutions to demonstrate compliance with emerging regulations such as the EU AI Act and fair lending laws. This positions explainability not as a secondary consideration but as a foundational component of responsible innovation in financial technology.

5.5. Discussion

5.5.1. Comparative Performance Analysis

We commence our discussion with a systematic evaluation of VeriCred against contemporary credit scoring methodologies, employing a comprehensive set of metrics including accuracy, area under the curve (AUC), H-measure, and Brier score. To ensure an equitable comparison framework, we benchmark our approach against both individual classifiers and ensemble methods. The experimental findings yield several pivotal observations:

Regarding discriminative capability metrics (accuracy and H-measure), VeriCred demonstrates substantially superior performance compared to individual classifier approaches. This disparity in performance primarily stems from the inherent limitations of homogeneous models in capturing the multifaceted characteristics of credit data. Notably, our framework also achieves marginal but consistent improvements over conventional ensemble methods, suggesting that the neural architecture search mechanism effectively identifies optimal model configurations when presented with sufficient architectural diversity.
In terms of probabilistic calibration (Brier score), VeriCred exhibits comparable performance to established ensemble techniques, with statistically significant enhancements observed across specific credit datasets. This indicates that the integration of the A-Triplet loss mechanism enables fine-grained feature differentiation, thereby enhancing model stability and calibration precision across heterogeneous borrower profiles.
From an interpretability perspective, our framework provides a mathematically grounded approach for elucidating black-box model behaviors in financial, decision-making contexts. The shared statistical representation across domains including insurance claim prediction, mortgage prepayment analysis, merger/acquisition valuation, and loan default forecasting, suggests broad applicability. Our methodology bridges the gap between complex model internals and practical operational requirements in financial services.

In summary, VeriCred achieves superior predictive performance through two synergistic innovations: (1) the A-Triplet loss function enabling nuanced feature discrimination, and (2) the neural architecture search mechanism optimizing model selection. Importantly, the framework maintains robust performance across datasets of varying scales and characteristics, addressing a critical limitation of individual classifier approaches in data-constrained environments.

5.5.2. Computational Efficiency Assessment

Beyond predictive performance, we conduct a rigorous analysis of computational efficiency across competing methodologies. The incorporation of neural architecture search in VeriCred substantially automates model development, reducing human intervention while maintaining competitive performance.

Temporal stability analysis reveals consistent performance across varying runtime budgets, indicating VeriCred’s operational robustness. This contrasts with ensemble methods that require extensive hyperparameter tuning, leading to significant computational overhead without proportional performance gains.
The harmonic mean (HM) metric, balancing accuracy and efficiency, demonstrates VeriCred’s optimal trade-off between these competing objectives. Critical difference diagrams from repeated experiments confirm statistically significant superiority in efficiency-normalized performance. This advantage derives from the automated architecture selection process, which eliminates manual model design while preserving predictive quality.
Comparative analysis shows both individual and ensemble classifiers underperform VeriCred in efficiency-accuracy space. While individual classifiers exhibit low computational demand, their predictive limitations render them impractical for high-stakes credit assessments. Conversely, ensemble methods achieve competitive accuracy but at prohibitive computational costs that diminish their practical utility in production environments.

In conclusion, VeriCred establishes a new Pareto frontier in credit scoring by simultaneously optimizing predictive performance and computational efficiency. The automation of architecture search and hyperparameter optimization eliminates expert-dependent design processes, creating a scalable solution for real-world deployment. Future work will focus on enhancing performance in data-sparse regimes commonly encountered in specialized lending contexts.

6. Research Policy Implications of Blockchain-Enabled Credit Regulation

6.1. Transformative Impacts on Credit Finance Sector

The integration of blockchain technology with explainable artificial intelligence (XAI) methodologies is poised to fundamentally reshape credit assessment paradigms, with profound implications for research policy formulation in financial services. This convergence addresses critical limitations of conventional credit evaluation systems while establishing new frameworks for transparent, accountable, and inclusive financial intermediation.

The primary policy-relevant transformation lies in the establishment of verifiable accountability. Traditional credit scoring mechanisms operate as opaque black boxes, creating information asymmetries that undermine market efficiency and consumer protection. Blockchain’s immutable distributed ledger technology, when coupled with explainable credit models, enables real-time audit trails of decision-making processes. This technological synergy allows regulators to verify compliance with fair lending standards and algorithmic fairness requirements, thereby advancing policy objectives related to market transparency and consumer welfare.

A second crucial implication concerns financial inclusion enhancement. Conventional credit assessment methodologies disproportionately disadvantage populations with limited formal financial histories through their reliance on traditional data sources. Blockchain-enabled systems can incorporate alternative data streams—including utility payments, rental histories, and educational credentials—while maintaining privacy through cryptographic techniques. This expanded data ecosystem facilitates the development of more nuanced creditworthiness assessments, directly supporting policy goals of reducing credit access disparities and promoting equitable economic participation.

The technological architecture also enables regulatory innovation through smart contract implementation. Compliance requirements can be programmatically embedded into credit assessment protocols, enabling real-time regulatory oversight and reducing enforcement costs. This capability creates new possibilities for adaptive regulatory frameworks that can dynamically respond to emerging risks while maintaining market stability—a significant advancement over static, ex-post regulatory approaches.

Furthermore, the decentralized nature of blockchain systems mitigates single points of failure in credit infrastructure, enhancing financial system resilience. This distributed architecture reduces systemic vulnerabilities to cyber threats and operational disruptions, aligning with financial stability objectives that are central to modern regulatory policy frameworks.

6.2. Case Studies in Policy Implementation

Microfinance Digital Transformation: A blockchain-based micro-lending platform in emerging economies demonstrates how explainable credit algorithms can expand financial access while maintaining risk management rigor. The system integrates mobile payment histories, agricultural supply chain transactions, and community reputation metrics through zero-knowledge proofs that preserve privacy while enabling credit assessment. Regulatory sandbox approaches allowed iterative policy refinement, demonstrating how adaptive regulatory frameworks can foster innovation while protecting consumer interests. This case illustrates the potential for technology-enabled financial inclusion to advance sustainable development goals.
National Credit Infrastructure Modernization: A Southeast Asian nation’s implementation of a blockchain-based national credit registry showcases scalable infrastructure for transparent credit reporting. The system provides citizens with granular visibility into their credit assessments while enabling regulated data sharing among financial institutions. Policy innovations included data portability mandates and algorithmic accountability requirements that ensured fair treatment across diverse borrower segments. This example highlights how technological infrastructure investments can simultaneously advance consumer protection, financial stability, and market efficiency objectives.
Peer-to-Peer Lending Market Evolution: The integration of explainable AI with blockchain-based smart contracts in P2P lending platforms has created new paradigms for decentralized finance (DeFi) regulation. These platforms implement automated compliance checks through programmable logic while providing borrowers with transparent explanations of credit decisions. Regulatory approaches have evolved from ex-post enforcement to embedded supervision, where compliance is verified in real-time through blockchain analytics. This case demonstrates how technological innovation can enable more efficient regulatory paradigms while maintaining market integrity.

6.3. Policy Recommendations and Future Directions

The case studies reveal several cross-cutting policy implications. First, regulatory frameworks must evolve to address the unique characteristics of blockchain-based systems, including their transnational operation and algorithmic decision-making processes. Second, standards for explainability and auditability require harmonization to ensure consistent consumer protections across jurisdictions. Third, policymakers must balance innovation facilitation with risk mitigation through approaches like regulatory sandboxes and phased implementation strategies.

Future research should focus on developing metrics for evaluating explainability effectiveness, establishing interoperability standards across blockchain credit platforms, and creating governance models for decentralized autonomous organizations (DAOs) in financial services. Additionally, policy experiments with central bank digital currencies (CBDCs) could provide valuable insights into how public-sector blockchain initiatives might complement private innovation in credit markets.

The convergence of blockchain technology and explainable AI represents not merely an incremental improvement but a fundamental rearchitecture of credit systems. This transformation requires equally innovative policy approaches that can harness technological potential while safeguarding public interests—a challenge that will define financial regulation research agendas for the coming decade.

7. Conclusions and Future Work

This research introduced VeriCred, an integrated framework for automated and verifiable credit scoring that synergistically combines neural architecture search, metric learning, and blockchain-based auditing. The proposed methodology addresses critical limitations in contemporary credit assessment systems by achieving an optimal balance between predictive accuracy, operational efficiency, and regulatory compliance. Our approach demonstrates that the integration of economical neural architecture search (C-NAS) with an adaptive triplet loss mechanism (A-Triplet) enables the automated discovery of high-performance credit models while maintaining computational feasibility. Furthermore, the incorporation of explainable AI (XAI) components ensures model transparency, providing stakeholders with interpretable decision rationales.

A fundamental contribution of this work lies in the development of a blockchain-anchored data governance framework that establishes immutable audit trails for all credit assessment activities. By cryptographically securing data provenance, model parameters, and evaluation metrics on a distributed ledger, VeriCred creates a trustworthy ecosystem for credit decision-making that enhances regulatory oversight capabilities. This blockchain layer not only ensures data integrity and non-repudiation but also enables real-time monitoring of model behavior across institutional boundaries. When combined with our automated model discovery pipeline, this approach represents a significant advancement toward transparent, efficient, and scalable credit risk management systems suitable for modern financial environments.

Future research directions will focus on several key enhancements. First, we plan to extend the interpretability framework to incorporate second-order feature interactions within the LIME paradigm, enabling more nuanced explanations of complex credit decisions. Second, we will investigate transfer learning methodologies [51,52,53,54] to leverage pre-trained representations from related financial domains, thereby improving model generalization with limited credit-specific data. Additionally, we aim to expand the architecture search space to incorporate temporal modeling components for handling dynamic credit behaviors, while further optimizing the blockchain consensus mechanism for real-time credit assessment scenarios. Finally, we will explore federated learning integrations to enable privacy-preserving model training across institutional boundaries, thus addressing data silo challenges while maintaining regulatory compliance through blockchain-based verification mechanisms.

Author Contributions

X.D. (Xinpei Dong): Methodology, Software, Writing—Original Draft Preparation. X.D. (Xinpei Dong) developed the core methodological framework, implemented the modeling software, and prepared the initial draft of the manuscript. F.Y.: Validation, Writing—Review & Editing, Supervision, Project Administration. F.Y. performed model validation, contributed to the critical review and editing of the manuscript, provided supervisory guidance, and managed project administration. X.D. (Xiangran Dai): Supervision, Software, Data Curation. X.D. (Xiangran Dai) contributed to the supervisory role, assisted in model construction and software implementation, and was responsible for data curation and management. Y.Q.: Supervision, Software, Data Curation. Y.Q. contributed to the supervisory role, assisted in model construction and software implementation, and was responsible for data curation and management. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Text Computing & Cognitive Intelligence Engineering Research Center of the Ministry of Education (Grant No. TCCI250203). This work was also supported by the Key Laboratory of High-Performance Distributed Ledger Technology and Digital Finance under the Ministry of Education. This work was also supported by the Fundamental Research Funds for the Central Universities xzy012025048, and in part by the Xi’an Jiaotong University Suzhou Academy—Suzhou Broadcasting System (SBS) through Digital Intelligent Media Joint Innovation Consortium Program under Grant CJRH2024202.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors would like to thank the editors and anonymous reviewers for their valuable comments and suggestions, which have greatly improved the quality of this paper.

Conflicts of Interest

Author Xinpei Dong was employed by the company Xi’an Shuzhi Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

de Mariz, F. Using Data for Financial Inclusion: The Case of Credit Bureaus in Brazil. Available at SSRN 3911192. 2020. Available online: https://ssrn.com/abstract=3911192 (accessed on 1 November 2025).
De Mariz, F. Finance with a Purpose: FinTech, Development and Financial Inclusion in the Global Economy; World Scientific: Singapore, 2022. [Google Scholar]
Yang, F.; Qiao, Y.; Bo, J.; Ye, L.; Abedin, M.Z. Blockchain and digital asset transactions-based carbon emissions trading scheme for industrial internet of things. IEEE Trans. Ind. Inform. 2024, 20, 6963–6973. [Google Scholar] [CrossRef]
Mohan, V. On the use of blockchain-based mechanisms to tackle academic misconduct. Res. Policy 2019, 48, 103805. [Google Scholar] [CrossRef]
Allen, D.W.; Berg, C.; Markey-Towler, B.; Novak, M.; Potts, J. Blockchain and the evolution of institutional technologies: Implications for innovation policy. Res. Policy 2020, 49, 103865. [Google Scholar] [CrossRef]
Yang, F.; Qiao, Y.; Abedin, M.Z.; Huang, C. Privacy-preserved credit data sharing integrating blockchain and federated learning for industrial 4.0. IEEE Trans. Ind. Inform. 2022, 18, 8755–8764. [Google Scholar] [CrossRef]
Yang, F.; Qiao, Y.; Huang, C.; Wang, S.; Wang, X. An automatic credit scoring strategy (ACSS) using memetic evolutionary algorithm and neural architecture search. Appl. Soft Comput. 2021, 113, 107871. [Google Scholar] [CrossRef]
Xia, Y.; Liu, C.; Da, B.; Xie, F. A novel heterogeneous ensemble credit scoring model based on bstacking approach. Expert Syst. Appl. 2018, 93, 182–199. [Google Scholar] [CrossRef]
Pławiak, P.; Abdar, M.; Acharya, U.R. Application of new deep genetic cascade ensemble of SVM classifiers to predict the Australian credit scoring. Appl. Soft Comput. 2019, 84, 105740. [Google Scholar] [CrossRef]
Xia, Y.; Liu, C.; Li, Y.; Liu, N. A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Syst. Appl. 2017, 78, 225–241. [Google Scholar] [CrossRef]
Tripathi, D.; Edla, D.R.; Cheruku, R.; Kuppili, V. A novel hybrid credit scoring model based on ensemble feature selection and multilayer ensemble classification. Comput. Intell. 2019, 35, 371–394. [Google Scholar] [CrossRef]
Zhang, W.; He, H.; Zhang, S. A novel multi-stage hybrid model with enhanced multi-population niche genetic algorithm: An application in credit scoring. Expert Syst. Appl. 2019, 121, 221–232. [Google Scholar] [CrossRef]
Qi, F.; Xia, Z.; Tang, G.; Yang, H.; Song, Y.; Qian, G.; An, X.; Lin, C.; Shi, G. A graph-based evolutionary algorithm for automated machine learning. Softw. Eng. Rev. 2020, 1, 10–37686. [Google Scholar]
Zeineddine, H.; Braendle, U.; Farah, A. Enhancing prediction of student success: Automated machine learning approach. Comput. Electr. Eng. 2021, 89, 106903. [Google Scholar] [CrossRef]
Yang, F.; Zou, Q. mAML: An automated machine learning pipeline with a microbiome repository for human disease classification. Database 2020, 2020, baaa050. [Google Scholar] [CrossRef]
Owoyele, O.; Pal, P.; Vidal Torreira, A. An automated machine learning-genetic algorithm framework with active learning for design optimization. J. Energy Resour. Technol. 2021, 143, 082305. [Google Scholar] [CrossRef]
Ahlgren, F.; Mondejar, M.E.; Thern, M. Predicting dynamic fuel oil consumption on ships with automated machine learning. Energy Procedia 2019, 158, 6126–6131. [Google Scholar] [CrossRef]
Weng, Y.; Zhou, T.; Li, Y.; Qiu, X. Nas-unet: Neural architecture search for medical image segmentation. IEEE Access 2019, 7, 44247–44257. [Google Scholar] [CrossRef]
Fang, J.; Sun, Y.; Zhang, Q.; Li, Y.; Liu, W.; Wang, X. Densely connected search space for more flexible neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10628–10637. [Google Scholar]
Zhou, D.; Zhou, X.; Zhang, W.; Loy, C.C.; Yi, S.; Zhang, X.; Ouyang, W. Econas: Finding proxies for economical neural architecture search. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11396–11404. [Google Scholar]
Lu, Z.; Whalen, I.; Boddeti, V.; Dhebar, Y.; Deb, K.; Goodman, E.; Banzhaf, W. Nsga-net: Neural architecture search using multi-objective genetic algorithm. In Proceedings of the Genetic and Evolutionary Computation Conference, Prague, Czech Republic, 13–17 July 2019; pp. 419–427. [Google Scholar]
Wang, N.; Gao, Y.; Chen, H.; Wang, P.; Tian, Z.; Shen, C.; Zhang, Y. NAS-FCOS: Fast neural architecture search for object detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11943–11951. [Google Scholar]
Heng, Y.S.; Subramanian, P. A Systematic Review of Machine Learning and Explainable Artificial Intelligence (XAI) in Credit Risk Modelling. In Proceedings of the Future Technologies Conference (FTC) 2022, Volume 1; Springer: Cham, Switzerland, 2022; pp. 596–614. [Google Scholar]
Bussmann, N.; Giudici, P.; Marinelli, D.; Papenbrock, J. Explainable machine learning in credit risk management. Comput. Econ. 2021, 57, 203–216. [Google Scholar] [CrossRef]
Barddal, J.P.; Loezer, L.; Enembreck, F.; Lanzuolo, R. Lessons learned from data stream classification applied to credit scoring. Expert Syst. Appl. 2020, 162, 113899. [Google Scholar] [CrossRef]
He, H.; Zhang, W.; Zhang, S. A novel ensemble method for credit scoring: Adaption of different imbalance ratios. Expert Syst. Appl. 2018, 98, 105–117. [Google Scholar] [CrossRef]
Berg, T.; Burg, V.; Gombović, A.; Puri, M. On the rise of fintechs: Credit scoring using digital footprints. Rev. Financ. Stud. 2020, 33, 2845–2897. [Google Scholar] [CrossRef]
Pławiak, P.; Abdar, M.; Pławiak, J.; Makarenkov, V.; Acharya, U.R. DGHNL: A new deep genetic hierarchical network of learners for prediction of credit scoring. Inf. Sci. 2020, 516, 401–418. [Google Scholar] [CrossRef]
Hand, D.J.; Henley, W.E. Statistical classification methods in consumer credit scoring: A review. J. R. Stat. Soc. Ser. A (Stat. Soc.) 1997, 160, 523–541. [Google Scholar] [CrossRef]
Zhang, X.; Liu, H.; Yang, F. Enhanced Traffic Carbon Emissions Data Sharing and Modeling via Blockchain and Personalized Federated Learning in IIoT Ecosystem. IEEE Internet Things J. 2025, 12, 31064–31077. [Google Scholar] [CrossRef]
Yang, F.; Abedin, M.Z.; Hajek, P.; Qiao, Y. Blockchain and machine learning in the green economy: Pioneering carbon neutrality through innovative trading technologies. IEEE Trans. Eng. Manag. 2025, 72, 1117–1139. [Google Scholar] [CrossRef]
Yang, F.; Qiao, Y.; Qi, Y.; Bo, J.; Wang, X. BACS: Blockchain and AutoML-based technology for efficient credit scoring classification. Ann. Oper. Res. 2025, 345, 703–723. [Google Scholar] [CrossRef] [PubMed]
Yang, F.; Abedin, M.Z.; Qiao, Y.; Ye, L. Towards trustworthy governance of AI-generated content (AIGC): A blockchain-driven regulatory framework for secure digital ecosystems. IEEE Trans. Eng. Manag. 2024, 71, 14945–14962. [Google Scholar] [CrossRef]
Yang, F.; Abedin, M.Z.; Hajek, P. An explainable federated learning and blockchain-based secure credit modeling method. Eur. J. Oper. Res. 2024, 317, 449–467. [Google Scholar] [CrossRef]
Jatnika, D.; Bijaksana, M.A.; Suryani, A.A. Word2vec model analysis for semantic similarities in english words. Procedia Comput. Sci. 2019, 157, 160–167. [Google Scholar] [CrossRef]
Li, B.; Drozd, A.; Guo, Y.; Liu, T.; Matsuoka, S.; Du, X. Scaling word2vec on big corpus. Data Sci. Eng. 2019, 4, 157–175. [Google Scholar] [CrossRef]
Caselles-Dupré, H.; Lesaint, F.; Royo-Letelier, J. Word2vec applied to recommendation: Hyperparameters matter. In Proceedings of the 12th ACM Conference on Recommender Systems, Vancouver, BC, Canada, 2 October 2018; pp. 352–356. [Google Scholar]
Grohe, M. word2vec, node2vec, graph2vec, x2vec: Towards a theory of vector embeddings of structured data. In Proceedings of the 39th ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems, Portland, OR, USA, 14–19 June 2020; pp. 1–16. [Google Scholar]
Jang, B.; Kim, I.; Kim, J.W. Word2vec convolutional neural networks for classification of news articles and tweets. PLoS ONE 2019, 14, e0220976. [Google Scholar] [CrossRef]
Liu, C.; Chen, L.C.; Schroff, F.; Adam, H.; Hua, W.; Yuille, A.L.; Fei-Fei, L. Auto-deeplab: Hierarchical neural architecture search for semantic image segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 82–92. [Google Scholar]
Ying, C.; Klein, A.; Christiansen, E.; Real, E.; Murphy, K.; Hutter, F. Nas-bench-101: Towards reproducible neural architecture search. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 7105–7114. [Google Scholar]
Jin, H.; Song, Q.; Hu, X. Auto-keras: An efficient neural architecture search system. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 1946–1956. [Google Scholar]
Pham, H.; Guan, M.; Zoph, B.; Le, Q.; Dean, J. Efficient neural architecture search via parameters sharing. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 4095–4104. [Google Scholar]
Elsken, T.; Metzen, J.H.; Hutter, F. Neural architecture search: A survey. J. Mach. Learn. Res. 2019, 20, 1997–2017. [Google Scholar]
Auria, L.; Moro, R.A. Support Vector Machines (SVM) as a Technique for Solvency Analysis. 2008. Available online: https://ssrn.com/abstract=1424949 (accessed on 1 November 2025).
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM Sigkdd International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Pal, M. Random forest classifier for remote sensing classification. Int. J. Remote Sens. 2005, 26, 217–222. [Google Scholar] [CrossRef]
Gibbs, M.N.; MacKay, D.J. Variational Gaussian process classifiers. IEEE Trans. Neural Netw. 2000, 11, 1458–1464. [Google Scholar] [PubMed]
Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 2006, 7, 1–30. [Google Scholar]
Schäfer, P.; Leser, U. TEASER: Early and accurate time series classification. Data Min. Knowl. Discov. 2020, 34, 1336–1362. [Google Scholar] [CrossRef]
Prester, J.; Wagner, G.; Schryen, G.; Hassan, N.R. Classifying the ideational impact of information systems review articles: A content-enriched deep learning approach. Decis. Support Syst. 2021, 140, 113432. [Google Scholar] [CrossRef]
Xia, L. Historical profile will tell? A deep learning-based multi-level embedding framework for adverse drug event detection and extraction. Decis. Support Syst. 2022, 160, 113832. [Google Scholar] [CrossRef]
Kraus, M.; Feuerriegel, S. Decision support from financial disclosures with deep neural networks and transfer learning. Decis. Support Syst. 2017, 104, 38–48. [Google Scholar] [CrossRef]
Shen, F.; Zhao, X.; Kou, G. Three-stage reject inference learning framework for credit scoring using unsupervised transfer learning and three-way decision theory. Decis. Support Syst. 2020, 137, 113366. [Google Scholar] [CrossRef]

Figure 1. The work flow of VeriCred. The pipeline for credit model construction includes data pre-processing, feature extraction, feature selection, CE-NAS method for model search, and credit model performance evaluation.

Figure 2. A Credit Data Sharing Architecture in a Distributed Environment.

Figure 3. Blockchain Data Storage Structure.

Figure 4. Generation of three categories data

x_{r e f}

,

x_{p o s}

,

x_{n e g}

needed for the A-Triplet loss calculation.

Figure 4. Generation of three categories data

x_{r e f}

,

x_{p o s}

,

x_{n e g}

needed for the A-Triplet loss calculation.

Figure 5. A-Triplet loss mechanism implemented in C-NAS. (a) Triplet loss weights sharing. (b) Triplet loss integrating with C-NAS.

Figure 6. Architecture of Credit-Efficient Neural Architecture Search (C-NAS).

Figure 7. Comparison of precision of different credit scoring methods under credit datasets. (a) German dataset; (b) Taiwan dataset; (c) Australia dataset; (d) Credit card dataset.

Figure 8. CD diagram of the Nemenyi test on the

H M

metric. Numbers indicate mean ranks in 20 repetitive runs (lower means better). Rank means with non-significant differences are connected with a horizontal line.

Figure 8. CD diagram of the Nemenyi test on the

H M

metric. Numbers indicate mean ranks in 20 repetitive runs (lower means better). Rank means with non-significant differences are connected with a horizontal line.

Figure 9. Time consumption for blockchain data storage encryption versus the number of cryptographic keys.

Figure 10. Time consumption versus the number of credit data.

Figure 11. Weight distribution for selected features during credit model construction.

Figure 12. Term frequency-explainability mapping of credit risk determinants. This visualization, generated via LIME-based explainable AI, stratifies key features influencing a gradient boosting credit model by borrower credit profile (Good vs. Poor). Feature importance is proportional to font size.

Table 1. Comparative analysis of credit scoring research dimensions.

Research Aspect	Traditional Methods	AutoML Approaches	NAS Technologies	Our Method (VeriCred)
Model Efficiency	Manual design requiring significant expertise	Partial automation of specific workflow components	Architecture search automation	Full pipeline automation integrating NAS with AutoML
Accuracy	Improved through ensemble techniques but limited by manual design	Dependent on data quality and search space definition	Optimized for architectural efficiency but may overfit search metrics	Co-optimization of architecture and data representation
Interpretability	Limited to basic feature importance measures	Basic explainability through standardized feature importance	Rarely considered in architecture search process	Integrated XAI throughout model lifecycle
Security & Audit	Not supported beyond basic logging	Limited to pipeline versioning without integrity guarantees	No inherent security or audit mechanisms	Blockchain-based immutable audit trail
Automation Level	Low, requiring extensive manual intervention	Medium, automating specific workflow components	High for architecture search but limited in full pipeline	End-to-end automation with minimal human intervention

Table 2. Description of the dataset in our study.

Dataset	Good/Bad	#Samples	#Features
P2P-1	1072/349	1421	17
P2P-2	1531/1000	2531	14
German	700/300	1000	24
Taiwan	23,364/6636	30,000	24
Australia	307/383	690	14
Credit card	600/492	1092	25

Table 3. Effect of Running Time on Model Accuracy for Different Datasets.

Dataset	Running Time (s)				Data Size (Records)
Dataset	10	30	60	120	Data Size (Records)
Australian	0.92	0.93	0.94	0.95	690
German	0.78	0.80	0.81	0.82	1000
P2P-1	0.85	0.86	0.87	0.88	1250
Credit Card	0.72	0.73	0.74	0.75	30,000

Table 5. Comparative Analysis of Constraint Violations Between Standard and Modified LIME Implementations.

Index	Constraint Description	Standard LIME Violations	Modified LIME Violations
1	CreditUtilizationRatio ≤ 100%	127	43
2	PaymentToIncomeRatio ≤ 100%	284	132
3	NumberOfDelinquencies ≤ TotalCreditLines	2153	987
4	RecentInquiries ≤ TotalInquiries	842	326
5	CurrentBalance ≤ CreditLimit	1736	721
6	MonthsSinceLastDelinquency ≥ 0	3512	2843
7	CreditAge ≥ 0	4298	3521
8	DebtToIncomeRatio ≤ 100%	193	87
9	AvailableCredit ≥ 0	3765	2987
10	UtilizationTrend ∈ [0,1]	658	312
11	PaymentHistoryConsistency ≥ 0	3124	2341
12	CreditMixDiversity ≤ 1	527	254
13	InquiryImpactDuration ≤ 24 months	884	397
14	RelativeDelinquencySeverity ≤ 1	1567	683
15	CreditLimitGrowthRate ≥ 0	972	418

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dong, X.; Yang, F.; Dai, X.; Qiao, Y. A Novel Blockchain Architecture for Secure and Transparent Credit Regulation. Appl. Sci. 2025, 15, 12356. https://doi.org/10.3390/app152312356

AMA Style

Dong X, Yang F, Dai X, Qiao Y. A Novel Blockchain Architecture for Secure and Transparent Credit Regulation. Applied Sciences. 2025; 15(23):12356. https://doi.org/10.3390/app152312356

Chicago/Turabian Style

Dong, Xinpei, Fan Yang, Xiangran Dai, and Yanan Qiao. 2025. "A Novel Blockchain Architecture for Secure and Transparent Credit Regulation" Applied Sciences 15, no. 23: 12356. https://doi.org/10.3390/app152312356

APA Style

Dong, X., Yang, F., Dai, X., & Qiao, Y. (2025). A Novel Blockchain Architecture for Secure and Transparent Credit Regulation. Applied Sciences, 15(23), 12356. https://doi.org/10.3390/app152312356

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Blockchain Architecture for Secure and Transparent Credit Regulation

Abstract

1. Introduction

2. Related Work

2.1. Credit Scoring Methodologies

2.2. Automated Machine Learning Approaches

2.3. Neural Architecture Search and Explainable AI

2.4. Synthesis and Research Positioning

3. The Approach

3.1. VeriCred: An Integrated Framework for Supervisable Automatic Credit Scoring

3.1.1. Data Augmentation for Balancing the Credit Datasets

3.1.2. Blockchain-Based Storage of Credit Data

3.1.3. Automated Hyperparameter Optimization

3.2. A-Triplet Loss for Discriminative Credit Assessment

3.3. Automated Neural Architecture Search for Credit Modeling

3.3.1. Cell-Based Search Space Formulation

3.3.2. Resource-Constrained Search Strategy

3.4. Blockchain-Based Credit Supervision

3.5. Explaination of the VeriCred

3.6. Blockchain-Based Credit Regulation Framework

3.6.1. System Initialization Phase

3.6.2. Entity Registration and Transaction Management

3.6.3. Consensus and Model Management

3.6.4. Federated Learning and Verification Mechanisms

3.6.5. Cryptographic Verification and Permission Management

4. Experimental Section

4.1. Credit Dataset

4.2. Baseline Methods

4.2.1. Support Vector Machine

4.2.2. Extreme Gradient Boosting

4.2.3. Random Forest

4.2.4. Gaussian Process Classifier

4.3. Evaluation Metrics

5. Results and Analysis

5.1. Precision Analysis for RQ1: Comparative Model Performance

5.2. Performance Analysis for RQ2: Efficacy of A-Triplet Loss Integration

5.2.1. Impact Assessment of A-Triplet Loss on Traditional Methods

5.2.2. Performance Enhancement Analysis with A-Triplet Loss Integration

5.2.3. Neural Architecture Search Enhancement with A-Triplet Loss

5.3. Efficiency Analysis for RQ3: Computational Performance Assessment

5.4. The Results in Response to RQ4: Case Study and Analysis for Explainable Model in VeriCred

5.5. Discussion

5.5.1. Comparative Performance Analysis

5.5.2. Computational Efficiency Assessment

6. Research Policy Implications of Blockchain-Enabled Credit Regulation

6.1. Transformative Impacts on Credit Finance Sector

6.2. Case Studies in Policy Implementation

6.3. Policy Recommendations and Future Directions

7. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI