A Structure-Invariant Transformer for Cross-Regional Enterprise Delisting Risk Identification

Li, Kang; Li, Xinyang

doi:10.3390/su18010397

Open AccessArticle

A Structure-Invariant Transformer for Cross-Regional Enterprise Delisting Risk Identification

by

Kang Li

^1,*,† and

Xinyang Li

^2,†

¹

School of Economics and Management, Zhejiang University of Water Resources and Electric Power, Hangzhou 310018, China

²

School of Economics and Management, Zhejiang Shuren University, Shaoxing 312028, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Sustainability 2026, 18(1), 397; https://doi.org/10.3390/su18010397

Submission received: 28 October 2025 / Revised: 8 December 2025 / Accepted: 29 December 2025 / Published: 31 December 2025

Download

Browse Figures

Versions Notes

Abstract

Cross-regional enterprise financial distress can undermine long-term corporate viability, weaken regional industrial resilience, and amplify systemic risk, making robust early-warning tools essential for sustainable financial governance. This study investigates the problem of cross-regional enterprise delisting-related distress identification under heterogeneous economic structures and highly imbalanced risk samples. We propose a cross-domain learning framework that aims to deliver stable, interpretable, and transferable risk signals across regions without requiring access to labeled data from the target domain. Using a multi-source empirical dataset covering Beijing, Shanghai, Jiangsu, and Zhejiang, we conduct leave-one-domain-out evaluations that simulate real-world regulatory deployment. The results demonstrate consistent improvements over representative sequential and graph-based baselines, indicating stronger cross-regional generalization and more reliable identification of borderline and noisy cases. By linking cross-domain stability with uncertainty-aware risk screening, this work contributes a practical and economically meaningful solution for sustainable corporate oversight, offering actionable value for policy-oriented financial supervision and regional economic sustainability.

Keywords:

enterprise risk identification; domain generalization; transformer; structure-invariant modeling; uncertainty-aware learning

1. Introduction

In recent years, with the rapid accumulation of multi-source financial data and the increasing complexity of regional economic structures, enterprise risk identification has played an increasingly crucial role in financial regulation and macro-level decision-making [1]. Enterprises across different regions exhibit significant differences in industrial structure, financing models, and policy environments, leading to pronounced regional heterogeneity in risk characteristics. Such differences not only increase the difficulty of risk prediction but also impose higher requirements on the cross-domain generalization ability of models. How to construct a stable, interpretable, and transferable risk identification model under multi-domain scenarios has become a key research direction in the field of intelligent financial analysis.

However, existing studies still face several challenges in handling regional heterogeneity and data uncertainty [2]. On the one hand, traditional machine learning methods rely on the assumption of a single data distribution, making it difficult to adapt effectively to variations in sample distributions across regions [3]. On the other hand, although deep learning models possess strong representation capabilities, they are prone to structural drift and overfitting under cross-domain environments, leading to insufficient feature alignment and unstable predictions [4]. Furthermore, most existing studies fail to adequately consider the uncertainty inherent in risk data, resulting in reduced robustness when facing anomalous or boundary samples.

To address these issues, this paper proposes an enterprise risk identification framework that integrates hierarchical structure-invariant modeling with uncertainty-aware decoding. In this study, enterprise risk specifically refers to the delisting-related financial distress risk indicated by abnormal operating conditions and regulatory warning signals, which may ultimately lead to delisting outcomes. From a sustainability perspective, accurately identifying such delisting risk is economically meaningful because it reflects the breakdown of long-term corporate viability, weakens regional industrial resilience, and can trigger broader supply chain and employment disruptions. The proposed Hierarchical Structure-Invariant Encoder (HSIE) achieves cross-domain structural alignment in multi-level feature spaces, effectively mitigating semantic drift caused by regional discrepancies. Meanwhile, the Uncertainty-aware Domain-Robust Decoder (UDR-T) dynamically adjusts decoding weights based on sample variance, balancing the influence of high- and low-confidence samples during prediction. This framework maintains structural consistency while significantly enhancing the model’s cross-domain robustness and risk discrimination accuracy for delisting-risk identification across heterogeneous regional financial environments.

The main contributions of this paper are as follows:

(1): We propose a hierarchical structure-invariant modeling framework for multi-domain enterprise risk identification, where enterprise risk refers to delisting-related financial distress risk. The framework is designed to align multi-level financial feature structures across regions with heterogeneous industrial bases and financing patterns, so that the learned risk signals remain economically comparable and stable when a new region is treated as the target domain.
(2): We design an uncertainty-aware decoding mechanism to adaptively calibrate cross-domain predictive distributions. From an economic and regulatory perspective, this mechanism explicitly distinguishes high-confidence broad-based distress signals from ambiguous boundary cases (e.g., firms with temporary liquidity stress versus firms approaching delisting warnings), thereby improving the reliability of early-warning decisions under noisy or incomplete financial disclosures.
(3): We construct a multi-regional empirical dataset covering four representative Chinese regions (Beijing, Shanghai, Jiangsu, and Zhejiang) to support a transparent evaluation of cross-regional delisting-risk generalization. This setting enables a practical leave-one-domain-out validation that mimics real regulatory deployment: a model trained on several regions can be directly tested for risk screening in an unseen region with different economic structures.
(4): We conduct visualization and ablation experiments to clarify the logic of how each proposed module contributes to risk identification in economic terms. Specifically, the analyses demonstrate how hierarchical structure alignment stabilizes regional risk factor representations, while uncertainty-aware decoding enhances the robustness of delisting-risk discrimination for anomalous or borderline firms, offering interpretable evidence for multi-domain financial supervision and sustainable corporate governance.

To provide a clear roadmap for readers, the remainder of this paper is organized as follows. Section 2 reviews the related literature on multi-domain risk identification, structure-invariant representation learning, and uncertainty-aware modeling. Section 3 presents the proposed framework, detailing the design of HSIE and UDR-T as well as the overall optimization objective. Section 4 introduces the multi-regional dataset, experimental settings, baselines, and evaluation metrics. Section 5 reports the main results along with ablation and visualization analyses to validate the effectiveness and robustness of each module. Finally, Section 6 concludes the paper and discusses future research directions. Its detailed settings are shown in Table 1.

2. Related Work

2.1. Transformer-Based Discriminative Modeling in Financial Scenarios

The Transformer architecture has emerged as a powerful paradigm in financial discrimination tasks and has been extensively applied to domains such as corporate default prediction, credit risk assessment, and financial fraud detection. Korangi et al. [5] proposed a Transformer-based model for default prediction in mid-cap enterprises, which leverages the self-attention mechanism to capture temporal dependencies and improve classification performance. Wang and Xiao [6] designed an embedded feature Transformer network for credit scoring, significantly enhancing the learning efficiency of traditional deep learning methods under high-dimensional feature settings. Yang et al. [7] developed the FinChain-BERT model, integrating natural language processing with financial text to achieve high-precision automated fraud detection. Tang and Liu [8] introduced a distributed knowledge distillation framework based on Transformer for large-scale financial fraud identification, maintaining strong model consistency in distributed environments. Furthermore, An et al. [9] proposed the Finsformer model, incorporating a clustered attention mechanism into Transformer to enhance the model’s ability to distinguish complex financial attack patterns. Li and Xu [10] combined graph Transformer structures to adaptively filter malicious relationships, thereby improving fraud detection accuracy in social networks.

In addition, several studies have focused on improving the interpretability and robustness of Transformer models. Ashiq et al. [11] implemented an efficient hybrid Transformer architecture for textual financial risk identification, demonstrating the potential of Transformer models in multilingual financial scenarios. Schwab and Kriebel [12] investigated the adversarial robustness of Transformer-based models in credit scoring and proposed mitigation strategies to improve model security. Wu et al. [13] analyzed the performance of Transformer models in predicting credit default using both human-written and AI-generated texts, validating the strong representational capacity of textual features under the Transformer framework. Babaei and Giudici [14] applied GPT-like large language models to credit classification tasks, proving the transferability and generalization capability of large-scale models in financial discrimination problems. Overall, existing research has confirmed the superior performance of Transformer models in financial discrimination; however, most studies remain limited to single-domain data or fixed time windows, lacking systematic modeling of cross-domain generalization and dynamic risk features, which provides important theoretical insights and directions for improvement in this study.

2.2. Domain Generalization for Financial Risk Prediction

In recent years, the field of financial risk prediction has gradually shifted from single-domain modeling toward cross-domain generalization and robust learning, aiming to enhance model robustness and transferability under distributional shifts. Zhang et al. [15] proposed a domain-adaptive multi-stage ensemble learning framework for credit risk assessment, which significantly improved model stability across multiple data sources. Mushava and Murray [16] constructed a comprehensive credit scoring dataset covering out-of-sample (OOS), out-of-time (OOT), and out-of-universe (OUO) settings, providing a standardized benchmark for validating the robustness of cross-domain models. Nikolaidis and Doumpos [17] introduced a drift-adaptive approach based on local competence regions for dynamic adjustment in credit scoring, effectively addressing the problem of concept drift. Suryanto et al. [18] applied transfer learning and domain adaptation mechanisms to enable knowledge transfer for credit risk modeling in small-sample scenarios. Polo et al. [19] proposed a unified dataset shift diagnosis framework (DetectShift) capable of identifying sources of distributional change across various financial contexts. Hjelkrem and De Lange [20] examined the interpretability and robustness of deep learning models using open banking data, revealing the critical role of out-of-domain data in enhancing credit scoring performance. Furthermore, Hjelkrem et al. [21] quantitatively assessed the added value of heterogeneous banking data for cross-institutional credit scoring, providing empirical evidence for financial domain generalization.

Meanwhile, researchers have also explored cross-domain risk prediction from the perspectives of feature-level modeling and fairness. Pang et al. [22] proposed an interpretable three-way decision model for credit risk prediction, achieving robust classification under uncertainty. Li et al. [23] investigated grouped effects and feature selection mechanisms in credit attributes using a regularized diagonal distance metric learning approach, thereby improving model generalization across heterogeneous domains. Chang et al. [24] conducted a systematic comparison of various machine learning and deep learning methods for credit risk modeling, verifying the importance of cross-domain evaluation. Garcia et al. [25] further emphasized the role of fairness-aware modeling in mitigating domain bias, highlighting the need for balanced generalization in financial risk prediction across diverse populations and markets.

3. Method

3.1. Problem Definition

In this study, we formalize the enterprise risk prediction problem as a cross-domain classification task. Suppose there are N enterprise samples, where each enterprise i is represented by a feature vector

x_{i} \in R^{d}

and a corresponding risk label

y_{i} \in {0, 1}

, where

y_{i} = 1

indicates that the enterprise exhibits potential risk during the evaluation period, and

y_{i} = 0

indicates that the enterprise is operating normally. All samples are collected from four distinct geographic domains: Beijing (

D_{B J}

), Shanghai (

D_{S H}

), Jiangsu (

D_{J S}

), and Zhejiang (

D_{Z J}

), each containing data from enterprises registered within the respective region. For the k-th domain, the dataset can be denoted as

D_{k} = {(x_{i}, y_{i})}_{i = 1}^{n_{k}}

, where the feature distribution

p_{k} (x)

exhibits significant domain shift, i.e.,

p_{i} (x) \neq p_{j} (x)

, reflecting differences in regional economic structures and industrial environments.

Under the domain generalization setting, our goal is to learn a discriminative function

f_{θ} (x) \to \hat{y}

, whose parameters

θ

can maintain stable predictive performance on previously unseen domains. During training, any three of the four domains are selected as source domains

{D_{s_{1}}, D_{s_{2}}, D_{s_{3}}}

for model optimization and parameter learning, while the remaining domain

D_{t}

is treated as the target test domain, which is excluded from training and used solely for evaluation. Formally, the training and evaluation processes can be defined as follows:

θ^{*} = arg min_{θ} \sum_{k = 1}^{3} E_{(x, y) \sim D_{s_{k}}} [ℓ (f_{θ} (x), y)],

(1)

Evaluate : E_{(x, y) \sim D_{t}} [ℓ (f_{θ^{*}} (x), y)],

(2)

where

ℓ (\cdot)

denotes the classification loss function. This setting emphasizes the model’s ability to generalize to the unseen domain

D_{t}

, rather than merely fitting the source-domain distributions. By alternately testing across the four regions—Beijing, Shanghai, Jiangsu, and Zhejiang—the model’s robustness and domain-invariant feature learning capability can be comprehensively evaluated under varying geographic, economic, and industrial structures.

3.2. Transformer Model Architecture

In this study, the Transformer architecture is adopted as the feature extractor for the enterprise risk classification task, aiming to model the global dependencies among high-dimensional and heterogeneous enterprise features under multi-domain settings. The model architecture is shown in Figure 1.

In the input layer, each enterprise sample’s raw feature vector

x_{i} \in R^{d}

is projected into the model’s latent space through a linear mapping to obtain the initial embedding representation:

h_{i}^{(0)} = W_{e} x_{i} + b_{e},

(3)

where

W_{e} \in R^{d_{h} \times d}

is a trainable projection matrix,

b_{e}

denotes the bias term, and

d_{h}

represents the hidden dimension. The obtained embedding is then normalized and positionally encoded before being fed into the multi-head self-attention layers to capture dynamic dependencies among enterprise features.

In the l-th Transformer encoder layer, a Multi-Head Self-Attention (MHSA) mechanism is employed to reconstruct the feature representations. For each attention head m, the query, key, and value matrices are computed through linear transformations as follows:

X_{m} = H^{(l - 1)} W_{m}^{X}, X_{m} \in {Q_{m}, K_{m}, V_{m}}, X \in {Q, K, V} .

(4)

and the semantic relationships among features are obtained using the scaled dot-product attention mechanism:

{Attention}_{m} (Q_{m}, K_{m}, V_{m}) = softmax (\frac{Q_{m} K_{m}^{⊤}}{\sqrt{d_{k}}}) V_{m},

(5)

where

d_{k}

denotes the dimensionality of the key vectors and is used as a scaling factor to stabilize the gradients. The outputs of all attention heads are concatenated and linearly transformed, followed by a Feed-Forward Network (FFN) to produce the layer output representation:

H^{(l)} = FFN (Concat [{Attention}_{1}, \dots, {Attention}_{M}] W^{O}),

(6)

where

W^{O}

represents the output weight matrix and M is the number of attention heads. The final-layer output

H^{(L)}

serves as the high-level feature representation of each enterprise, integrating multi-domain information and cross-feature dependencies to provide a robust input for the subsequent risk discrimination module. Under the domain generalization setting, this feature extractor shares parameters across regional datasets, enabling the learning of domain-invariant representations and achieving unified modeling of enterprise risk patterns.

3.3. Hierarchical Structure-Invariant Encoding

To address the structural shift caused by differences in enterprise data distributions across regions, this study introduces a Hierarchical Structure-Invariant Encoding (HSIE) mechanism. Its module architecture is shown in Figure 2.

HSIE is designed to achieve multi-level semantic alignment and structural regularization within the Transformer framework. Unlike traditional approaches that perform domain adaptation within a single feature space, HSIE constructs hierarchical representation mappings between encoder layers, ensuring that enterprise features from different regions maintain topological consistency in the latent space. Let the input representation of the l-th layer be

H^{(l - 1)} \in R^{n \times d}

, which is updated through a linear transformation and layer normalization as follows:

Z^{(l)} = LayerNorm (H^{(l - 1)} W^{(l)} + b^{(l)}),

(7)

where

W^{(l)} \in R^{d \times d}

denotes the learnable parameter matrix. To capture higher-order correlations among enterprises, the model computes a structural affinity matrix using the multi-head self-attention mechanism:

A^{(l)} = softmax (\frac{Z^{(l)} {(Z^{(l)})}^{⊤}}{\sqrt{d}}),

(8)

and aggregates contextual embeddings through weighted fusion:

H^{(l)} = A^{(l)} Z^{(l)} .

(9)

To align the feature spaces across multiple source domains, HSIE introduces a cross-domain projection function

Φ^{(l)} (\cdot)

in intermediate layers, which constrains the geometric relationship among different source-domain representations. Let

{D_{s}^{(i)}}_{i = 1}^{M}

denote the feature distributions of M source domains. The structural consistency constraint is redefined as minimizing the pairwise discrepancy between the projected representations of all source domains:

min_{Φ^{(l)}} \sum_{i < j} {∥ Φ^{(l)} (H_{s_{i}}^{(l)}) - Φ^{(l)} (H_{s_{j}}^{(l)}) ∥}_{2}^{2},

(10)

where

H_{s_{i}}^{(l)}

denotes the feature representation of the i-th source domain at layer l. To further ensure statistical alignment and improve the robustness of the learned feature space, a covariance alignment constraint is introduced as

\sum_{i < j} {∥ Cov (Φ^{(l)} (H_{s_{i}}^{(l)})) - Cov (Φ^{(l)} (H_{s_{j}}^{(l)})) ∥}_{F}^{2} < ϵ,

(11)

where

Cov (\cdot)

denotes the covariance matrix and

ϵ

is a small tolerance constant. This formulation enforces consistent second-order statistics among all source domains, allowing the model to learn domain-invariant structural representations without relying on target-domain information.

Where

ϵ

denotes the structural relaxation threshold. To further capture hierarchical dependencies, HSIE defines an inter-layer transition function:

H^{(l + 1)} = f_{trans} (H^{(l)}, H^{(l - 1)}),

(12)

which performs progressive feature aggregation via a gated fusion mechanism:

f_{trans} (H^{(l)}, H^{(l - 1)}) = σ (U H^{(l)} + V H^{(l - 1)}) ⊙ H^{(l)} + (1 - σ (\cdot)) ⊙ H^{(l - 1)},

(13)

where

σ (\cdot)

represents the Sigmoid function and ⊙ denotes element-wise multiplication. This structure enables hierarchical feature preservation and dynamic fusion, thereby enhancing the model’s cross-domain stability and representational continuity.

Finally, the latent representations from all layers are projected into a shared structural embedding

S

, defined as

S = \frac{1}{L} \sum_{l = 1}^{L} Φ^{(l)} (H^{(l)}),

(14)

where L is the number of Transformer encoder layers. The embedding

S

serves as a global structure-invariant representation that preserves hierarchical feature correlations while maintaining geometric consistency across regions. This mechanism allows the model to sustain a unified risk discrimination logic despite regional economic variations, providing enhanced generalization robustness and structural stability for enterprise risk prediction tasks.

3.4. Uncertainty-Aware Domain-Robust Transformer Decoder

To further enhance the cross-domain robustness of the model in enterprise risk prediction tasks, this study designs an Uncertainty-Aware Domain-Robust Transformer Decoder (UDR-T) in the decoding stage. Unlike conventional Transformer decoders, UDR-T introduces both sample-level and domain-level uncertainty modeling to dynamically adapt to input distribution shifts. The architectural structure of this module is illustrated in Figure 3.

Assuming that the inputs are drawn from multiple source domains

{D_{s_{1}}, D_{s_{2}}, \dots, D_{s_{N}}}

, and the target domain is denoted as

D_{t}

, the embedding vector of each sample can be represented as

E_{s_{i}} = f_{emb} (x_{s_{i}}) + p_{s_{i}}, i = 1, \dots, N,

(15)

where

f_{emb} (\cdot)

denotes the input embedding layer and

p_{s_{i}}

represents the positional encoding. To characterize domain discrepancies, a domain embedding projection matrix

W_{d}

is introduced, defined as

{\tilde{E}}_{s_{i}} = E_{s_{i}} W_{d} + b_{d},

(16)

where

W_{d} \in R^{d \times d}

captures the linear shifts in enterprise features across different regions.

Within the multi-head attention mechanism, a dynamic uncertainty modulation factor

σ_{m}

is introduced for each attention head m to balance the attention strength across domains. The computations are formulated as follows:

α_{m} = softmax (\frac{Q_{m} K_{m}^{⊤}}{\sqrt{d_{k}} σ_{m}}), σ_{m} = f_{ψ} (Var (H^{(l - 1)})),

(17)

where

f_{ψ} (\cdot)

denotes a lightweight uncertainty estimation network that dynamically adjusts the attention temperature based on the variance of input features, thereby reducing attention strength on high-uncertainty samples.

When aggregating features from multiple source domains, a weighted domain fusion mechanism is defined as

Z = \sum_{i = 1}^{N} ω_{i} {\tilde{E}}_{s_{i}}, ω_{i} = \frac{exp (- τ {\bar{Δ}}_{i})}{\sum_{j = 1}^{N} exp (- τ {\bar{Δ}}_{j})},

(18)

where

{\bar{Δ}}_{i} = \frac{1}{N - 1} \sum_{j \neq i} {∥ μ_{s_{i}} - μ_{s_{j}} ∥}_{2}

measures the average feature distribution discrepancy between the i-th source domain and the remaining source domains, and

τ

is a temperature parameter controlling the sharpness of the weighting distribution.

This design enables the model to adaptively emphasize source domains that exhibit closer inter-domain similarity, thereby enhancing the stability of feature fusion and improving generalization to unseen target domains. After stacking multiple attention and feed-forward layers, the final decoder output is obtained as

H_{dec}^{(L)} = FFN (MultiHead (Z, H^{(L - 1)})),

(19)

and the enterprise risk prediction probability is computed through a linear mapping followed by a Softmax classifier:

\hat{y} = Softmax (W_{c} H_{dec}^{(L)} + b_{c}),

(20)

where

\hat{y} \in [0, 1]

represents the predicted enterprise risk probability. To mitigate uncertainty shifts among regions, the model parameters are updated during training through multi-domain parallel backpropagation:

θ \leftarrow θ - η \sum_{i = 1}^{N} \frac{1}{| D_{s_{i}} |} \sum_{(x, y) \in D_{s_{i}}} \nabla_{θ} ℓ (f_{θ} (x), y),

(21)

where

η

is the learning rate and

ℓ (\cdot)

denotes the loss function.

This decoding architecture explicitly models inter-domain uncertainty relationships during information propagation. Through dynamic attention modulation and distribution-weighted fusion, the model achieves stable and generalizable risk discrimination across enterprise datasets from different regions, effectively addressing the prediction bias caused by regional economic heterogeneity.

3.5. Training Objective

In the cross-domain generalization framework for enterprise risk prediction, the training objective aims to jointly optimize the structure-invariant representation and the domain-robust decoding mechanism, thereby achieving stable risk discrimination performance on unseen target domains. Let the model be parameterized by

θ = {θ_{e}, θ_{d}}

, where

θ_{e}

represents the parameters of the Hierarchical Structure-Invariant Encoder (HSIE) and

θ_{d}

denotes those of the Uncertainty-Aware Domain-Robust Transformer Decoder (UDR-T). During training, sample pairs

{(x_{i}, y_{i})}

are drawn from multiple source domains

{D_{s_{1}}, D_{s_{2}}, D_{s_{3}}}

, and the objective is to minimize both the cross-domain classification error and structural discrepancy. The overall optimization objective is defined as

L_{total} = L_{cls} + λ_{1} L_{struct} + λ_{2} L_{unc},

(22)

where

L_{cls}

denotes the classification loss,

L_{struct}

constrains the structural consistency among source domains in the embedding space, and

L_{unc}

achieves domain-robust alignment through uncertainty modeling. The coefficients

λ_{1}

and

λ_{2}

control the balance among the three components. This objective enables the encoder and decoder to be jointly updated in an end-to-end manner, ensuring that the model preserves structural consistency while mitigating domain feature shifts.

Specifically, the classification term is defined as the expected prediction error across domains:

L_{cls} = E_{(x, y) \sim D_{s}} [- log p_{θ} (y | x)],

(23)

The structural invariance constraint is constructed based on the geometric similarity of multi-layer encoder outputs:

L_{struct} = \frac{1}{L} \sum_{l = 1}^{L} \sum_{i < j} {∥Φ^{(l)} (H_{s_{i}}^{(l)}) - Φ^{(l)} (H_{s_{j}}^{(l)})∥}_{2}^{2} .

(24)

The uncertainty regularization term is introduced to balance the influence of high-variance samples in the attention weighting process, and is formulated as

L_{unc} = E_{x \sim D_{s}} [σ (x) \cdot ∥ H_{pred} - H_{true} ∥_{2}^{2}],

(25)

where

σ (x)

denotes the uncertainty estimated from the input variance. By jointly optimizing

L_{cls}

,

L_{struct}

, and

L_{unc}

, the model maintains both inter-domain feature consistency and robust risk prediction during training. The final learned parameters

θ^{*}

satisfy

θ^{*} = arg min_{θ} L_{total},

(26)

thereby achieving unified enterprise risk modeling across multiple domains and improving out-of-domain generalization capability.

4. Dataset Introduction

This study selects corporate annual reports and publicly available online data from four regions—Beijing, Shanghai, Jiangsu, and Zhejiang—as the primary data sources, with an explicit focus on economically interpretable firm-level indicators that are relevant to delisting-related financial distress screening. Through web scraping and regulated corporate disclosure channels, we collected structured variables covering firms’ operational conditions, financial performance, governance characteristics, and audit-related signals, together with the supervision label STPT. To ensure clarity from an economic perspective, the main input variables used in this study are summarized and grouped in Table 2.

Concretely, the model inputs include the following:

(i): industry and firm-type attributes (e.g., Industry name, Finance-industry dummy, Manufacturing-industry dummy, SOE indicator),
(ii): firm size and capital structure indicators (e.g., Size, Leverage),
(iii): profitability and earnings-quality indicators (e.g., ROA, ROE, Gross profit, Net profit growth, Earnings management),
(iv): liquidity and cash-flow indicators (e.g., Current ratio, Quick ratio, Cash flow),
(v): growth and value-related indicators (e.g., Growth, Asset growth, Book-to-market, Price-to-book, Tobin’s Q, SA index and its absolute value),
(vi): governance and ownership structure indicators (e.g., CEO duality, Board size, Independent director ratio, Managerial shareholding, Female director ratio, Institutional ownership, Top1/Top3 ownership concentration, ownership Herfindahl indices, balance-of-power measures),
(vii): audit and disclosure signals (e.g., Audit opinion, Audit fee, Big4 indicator).

These variables are standard inputs for distress and delisting-risk assessment in empirical corporate finance and regulatory early-warning practices, and provide a transparent economic basis for cross-regional risk modeling.

The label column STPT serves as the main supervision signal to determine whether a company has potential operational risks. In this study, STPT is used as a proxy label for delisting-related financial distress risk based on abnormal status and regulatory warning information disclosed for listed firms. Specifically, a label value of 0 indicates a normal business status (positive sample), while a label value of 1 indicates that the enterprise is at risk (negative sample).

To ensure representativeness and consistency across regions, all datasets underwent standardized preprocessing, including feature normalization, missing value imputation, and outlier detection. The distribution of positive and negative samples for each region is shown in Table 3.

As shown in the table, all regional datasets exhibit a noticeable class imbalance problem, with risk samples (negative samples) generally accounting for less than

3 %

of the total. This poses typical challenges of small-sample learning and skewed distributions during model training. To mitigate the impact of class imbalance on domain generalization performance, random over-sampling and stratified splitting strategies were adopted in subsequent experiments. Furthermore, to ensure fair cross-domain generalization evaluation, a Leave-One-Domain-Out (LODO) validation strategy was employed, where one region is alternately treated as the target domain while the remaining three serve as source domains for training and validation, thereby objectively assessing the model’s generalization capability across regions. This setting mirrors a practical regulatory deployment scenario, where a model trained on several economically distinct regions is expected to provide stable early-warning signals for an unseen region without relying on its historical labeled distress records.

5. Experimental Results and Analysis

5.1. Evaluation Metric

To comprehensively evaluate the model’s performance in enterprise risk prediction, five commonly used classification metrics are adopted: Precision, Recall, Accuracy, F1-Score, and AUC. These metrics reflect the model’s recognition capability and stability from different perspectives, especially under imbalanced data conditions. The specific definitions are as follows.

In this study, the risky enterprises are defined as the positive class (

y = 1

), while the normal enterprises are defined as the negative class (

y = 0

). Accordingly, all evaluation metrics treat the risky class as the positive category.

(1) Precision. Precision measures the proportion of enterprises predicted as risky that are actually risky. A higher Precision indicates a lower false positive rate and better reliability in identifying risky enterprises. It is defined as

Precision = \frac{T P}{T P + F P},

(27)

where

T P

denotes the number of true positives (risky enterprises correctly identified as risky) and

F P

denotes the number of false positives (normal enterprises incorrectly identified as risky). Precision effectively reflects the model’s accuracy in recognizing positive (risky) samples.

(2) Recall. Recall measures the proportion of all true risky enterprises that are correctly identified by the model. A higher Recall indicates fewer missed detections and stronger coverage of potential risks. It is defined as

Recall = \frac{T P}{T P + F N},

(28)

where

F N

represents the number of false negatives (risky enterprises incorrectly classified as normal). Recall reflects the model’s sensitivity and alertness to potential risk enterprises.

(3) Accuracy. Accuracy measures the overall proportion of correctly predicted samples among all samples and is the most intuitive indicator of overall performance. Although Accuracy may be influenced by the majority class in imbalanced datasets, it still serves as a comprehensive measure of the model’s prediction ability. It is defined as

Accuracy = \frac{T P + T N}{T P + T N + F P + F N},

(29)

where

T N

denotes the number of true negatives (normal enterprises correctly identified as normal). Accuracy reflects the model’s overall discriminative ability across different classes.

(4) F1-Score. The F1-Score is the harmonic mean of Precision and Recall, balancing both accuracy and completeness of the model’s predictions. A higher F1-Score indicates a better trade-off between false alarms and missed detections. It is computed as

F 1 - Score = \frac{2 \times Precision \times Recall}{Precision + Recall} .

(30)

The F1-Score provides a robust reflection of overall classification performance under imbalanced datasets and serves as a key metric in risk identification tasks.

(5) AUC. The Area Under the ROC Curve (AUC) measures the model’s overall discriminative capability across different classification thresholds. A value closer to 1 indicates a stronger ability to distinguish risky enterprises from normal ones. It is defined as

AUC = \int_{0}^{1} T P R (F P R) d (F P R),

(31)

where

T P R

and

F P R

denote the true positive rate and false positive rate, respectively. As a threshold-independent metric, AUC objectively reflects the model’s stability and robustness under cross-domain generalization conditions.

5.2. Experimental Setup

This study was conducted using an NVIDIA RTX 4090D GPU environment, with model training and evaluation implemented on the PyTorch deep learning framework. To ensure the reproducibility and fairness of cross-domain experiments, identical training hyperparameters and model configurations were applied across all regional datasets. The model was trained for 200 epochs using the AdamW optimizer to enhance generalization stability under weight decay. The initial learning rate was set to

1 \times 10^{- 4}

and dynamically adjusted through a cosine annealing schedule during training. The batch size and regularization parameters were kept consistent to ensure the comparability of results across different domains. The detailed experimental setup is summarized in Table 4.

5.3. Comparison of Experimental Results with Other Models

To comprehensively evaluate the domain generalization capability of the proposed model, a series of comparative experiments were conducted across four target domains, namely Beijing, Shanghai, Jiangsu, and Zhejiang. Multiple representative baseline methods, including traditional sequential models and graph-based architectures, were selected for benchmarking under identical experimental conditions. The results obtained from these comparative analyses effectively demonstrate the robustness and stability of the proposed approach across diverse regional domains. The experimental results are shown in Table 5.

As shown in Table 5, when treating the four regions (Beijing, Shanghai, Jiangsu, and Zhejiang) as target domains, the proposed model demonstrates stable and significant advantages across all five evaluation metrics—Precision, Recall, Accuracy, F1-Score, and AUC. Traditional deep learning models such as 1DCNN, LSTM, and BiLSTM exhibit certain limitations in dynamic feature modeling and struggle to capture semantic shifts among cross-domain samples. Although graph neural networks (GAT) and Transformer-based models (e.g., BERT and Mamba) possess strong structural modeling capabilities, their generalization performance remains constrained when facing substantial inter-regional distributional discrepancies. In contrast, the proposed Hierarchical Structure-Invariant Encoder (HSIE) effectively preserves hierarchical associations among enterprise features while suppressing local distributional noise. Combined with the Uncertainty-Aware Domain-Robust Transformer Decoder (UDR-T), the model achieves enhanced robustness and accuracy in domain transfer scenarios.

Further analysis reveals a consistent performance improvement trend across all regions, particularly in Jiangsu and Shanghai, where the proportion of risky samples is relatively small. In these regions, both AUC and F1-Score achieve relative improvements exceeding 2%, indicating that the proposed approach exhibits strong adaptability to class imbalance and cross-regional distributional shifts. Notably, the model maintains a well-balanced relationship between Recall and Precision, suggesting that it can accurately identify high-risk enterprises while avoiding excessive false alarms caused by over-prediction. Overall, these results validate the effectiveness of integrating multi-task semantic modeling with domain generalization mechanisms in enterprise risk identification, highlighting the model’s dynamic adaptability and robust generalization capability under complex financial environments. This paper also gives an image of the loss function changing with epoch during training when different regions are used as target domains. The experimental results are shown in Figure 4.

As shown in Figure 4, when different regions are used as target domains, both training loss and validation loss exhibit a stable downward trend, indicating that the model achieves good convergence and stability during cross-domain training. Overall, the loss curves for each region drop rapidly within the first 50 epochs and then gradually stabilize, with only a small gap between the validation and training losses. This suggests that the model does not show significant overfitting in any domain. These results demonstrate that the proposed structure-invariant encoding and uncertainty modeling mechanisms effectively suppress performance fluctuations caused by inter-domain distribution shifts, enabling the model to maintain a consistent optimization direction and robust risk identification capability throughout multi-source learning.

5.4. Ablation Experiment Results

To further verify the effectiveness of each proposed module, ablation experiments were conducted by selectively removing or modifying key components of the model. This analysis aims to assess the individual contribution of the hierarchical structure-invariant encoder and the uncertainty-aware decoder to the overall performance. All experiments were carried out under consistent training settings to ensure fair and reliable comparisons. The experimental results are shown in Table 6.

As shown in Table 6, when different regions are treated as target domains, the model performance exhibits a consistent improvement trend as each module is progressively introduced. The baseline Transformer framework shows certain limitations in cross-domain risk identification, as it fails to effectively capture the structural differences of enterprise features across regions. After incorporating the Hierarchical Structure-Invariant Encoder (HSIE), the model learns more consistent domain-shared representations within multi-level feature spaces, thereby mitigating semantic shifts caused by regional economic heterogeneity. This demonstrates that structure-invariant modeling can significantly enhance feature alignment in domain generalization scenarios, providing a more stable representational foundation for subsequent uncertainty modeling.

Building on this, the addition of the Uncertainty-Aware Domain-Robust Transformer Decoder (UDR-T) further improves the generalization performance across all metrics. The UDR-T module dynamically adjusts the temperature coefficient of the predictive distribution, adaptively balancing the weights of high-confidence and low-confidence samples. This enables the model to maintain reliable decision-making even under conditions of scarce risk samples. The complete model integrating both HSIE and UDR-T achieves the best performance across all four target domains, confirming their complementarity in structural robustness and uncertainty modeling. These results indicate that the proposed cross-domain risk identification framework not only alleviates regional heterogeneity issues but also achieves stronger robust prediction and out-of-domain generalization capabilities on real-world financial data.

5.5. Confusion Matrix Experiment Results

As an essential visualization tool for evaluating the performance of classification models, the confusion matrix intuitively illustrates the model’s recognition accuracy and error distribution across different categories. By comparing the predicted results with the ground truth labels, it is possible to further assess the model’s discriminative ability and identify easily confused categories. Moreover, the confusion matrix provides targeted insights for subsequent model optimization, helping to better understand the model’s behavior under varying data distributions. The experimental results are shown in Figure 5.

As shown in Figure 5, when different regions are used as target domains, the proposed method exhibits a higher concentration along the diagonal of the confusion matrix, indicating a more stable discriminative ability across various sample categories. Compared with the baseline Transformer model, the proposed approach achieves more accurate differentiation between positive and negative samples, with a significantly reduced number of misclassified instances. Notably, even in regions with large discrepancies in sample distributions (such as Jiangsu and Zhejiang), the model maintains a high level of recognition consistency, demonstrating strong cross-domain generalization capability. These results suggest that the proposed structure-invariant feature modeling and uncertainty-aware mechanisms effectively enhance the model’s robustness and discriminative power, providing a more reliable decision basis for enterprise risk identification tasks.

5.6. AUROC Experimental Results

In enterprise risk identification tasks, the influence of imbalanced sample distribution and category disparity often makes it difficult for metrics such as accuracy or recall under a single threshold to comprehensively reflect model performance. To address this, the receiver operating characteristic curve is further introduced to intuitively evaluate the overall discriminative capability of the model under varying decision thresholds. By observing the trend of the area under the curve (AUC), we can objectively analyze the stability and robustness of the model in distinguishing between high-risk and low-risk enterprises under cross-domain generalization conditions. The experimental results are shown in Figure 6.

In the ROC curve experiments shown in Figure 6, the proposed method achieves larger areas under the curve and steeper rising trends across different regions as target domains, indicating stronger risk identification capability under cross-domain conditions. Compared with the baseline Transformer model, the proposed hierarchical structure-invariant encoding and uncertainty-aware decoding mechanisms effectively alleviate feature drift caused by regional data distribution discrepancies, leading to more stable classification boundaries. Furthermore, during multi-source feature fusion, the model adaptively adjusts inter-domain weights, enhancing the discriminative contribution of high-confidence samples and further improving generalization robustness and risk differentiation capability in complex financial environments.

5.7. t-SNE Experimental Results

This paper also gives the experimental results of t-SNE for four regions, as shown in Figure 7.

From Figure 7, it can be observed that when different regions are used as target domains, the proposed model exhibits clearer distribution boundaries between high-risk samples (red points) and normal samples (blue points) in the t-SNE visualization, with higher inter-class compactness and lower intra-class dispersion. This demonstrates that the designed Hierarchical Structure-Invariant Encoding (HSIE) module achieves stable structural alignment across multi-domain feature spaces, effectively mitigating feature drift caused by regional economic structure discrepancies, and forming more consistent risk representations in the latent space.

Meanwhile, compared with the baseline Transformer model, the proposed Uncertainty-aware Domain-Robust Decoder (UDR-T) further refines the cross-domain sample mapping distribution, making the positions of boundary and high-variance samples more reasonable in the low-dimensional space. This enhances the model’s discriminative capability for uncertain samples in complex financial scenarios. These results fully verify the robustness and generalization ability of the proposed model in multi-source heterogeneous environments, indicating that the joint optimization of structure-invariant modeling and uncertainty regularization enables more interpretable and stable enterprise risk identification.

5.8. Model Stability Experiment

This paper also explores the experimental results of model stability. The experimental results mainly add noise of different sizes to the features extracted from the data. The experimental results are shown in Figure 8.

As shown in Figure 8, the feature noise sensitivity experiment illustrates the model’s performance under different regional target domains. It can be observed that as the noise level gradually decreases from 0.1 to 0, all evaluation metrics exhibit a consistent upward trend, indicating that the proposed method demonstrates strong robustness and stability against input feature perturbations. Among them, the increases in AUC and accuracy (Acc) are the most prominent, suggesting that the model maintains a high level of risk discrimination capability even under feature space disturbances. Meanwhile, the balance between precision and recall is effectively preserved without significant shifts or fluctuations, reflecting that the proposed structure-invariant feature modeling and uncertainty-aware mechanisms can effectively suppress feature drift induced by noise interference.

Furthermore, a comparison among the four regions reveals that the model performs most stably in Jiangsu and Zhejiang, where the AUC improvement is the most significant. This finding indicates that even in regions with substantial economic structural disparities, the model retains strong cross-domain generalization ability. Overall, the results demonstrate that the proposed framework can maintain feature consistency and decision robustness under noisy multi-source financial data, providing a more reliable predictive foundation and theoretical support for enterprise risk identification. In summary, the feature noise sensitivity experiment validates the high robustness and anti-interference capability of the proposed method in complex financial environments.

6. Conclusions

This study tackles cross-regional enterprise risk identification in multi-source heterogeneous financial environments by developing a Transformer-based framework that jointly emphasizes hierarchical structure-invariant representation learning and uncertainty-aware decision refinement. By incorporating the Hierarchical Structure-Invariant Encoder (HSIE) and the Uncertainty-aware Domain-Robust Decoder (UDR-T), the proposed approach enables cross-domain structural consistency and produces more stable and comparable risk representations across diverse regional feature spaces. Extensive leave-one-domain-out evaluations on four target domains (Beijing, Shanghai, Jiangsu, and Zhejiang) indicate that our method consistently surpasses representative sequential and graph-based baselines, highlighting improved generalization capacity and stronger resistance to regional distribution shifts. In addition, the noise robustness and t-SNE analyses further suggest that the learned embeddings preserve discriminative structure under feature perturbations and sample-level uncertainty, supporting the reliability of the proposed uncertainty-modulated attention mechanism. Collectively, these findings strengthen the methodological linkage between cross-domain structural alignment and uncertainty-aware learning, and provide an interpretable and transferable intelligent tool for enterprise financial risk identification. The framework is expected to facilitate earlier and more accurate risk warning and practical decision support for financial regulators, credit institutions, and corporate risk managers.

Looking ahead, as macroeconomic dynamics and enterprise operating conditions continue to evolve, risk identification systems will need to address more intricate challenges spanning temporal non-stationarity, structural variation, and multimodal information fusion. Future work may be extended along three promising directions: (1) integrating richer multimodal signals, such as textual disclosures, macroeconomic indicators, and supply chain- or inter-firm network structures, to capture complementary risk drivers; (2) introducing adaptive transfer, continual learning, or meta-learning strategies to enhance rapid calibration and resilience in unseen or fast-shifting regions; and (3) coupling large language models with knowledge graphs to build transparent, evidence-grounded risk reasoning pipelines that improve both explainability and regulatory trust. Overall, this study offers a coherent and extensible foundation for constructing generalized, interpretable, and robust enterprise financial risk identification systems, and provides actionable methodological insights for future fintech governance and intelligent decision-making.

Author Contributions

Methodology, K.L.; Software, X.L.; Data curation, X.L.; Writing—original draft, K.L. All authors have read and agreed to the published version of the manuscript.

Funding

The research was supported by the following fund: Annual Regular Project of Zhejiang Provincial Philosophy and Social Science Planning: Research on the Impact Effect and Mechanism of Identity Construction on the Social Entrepreneurial Performance of New Rural Entrepreneurs (26NDJC241YB).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data used in the paper are available upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Liu, Z.; Zhang, G.; Lu, J. Semi-supervised heterogeneous domain adaptation for few-sample credit risk classification. Neurocomputing 2024, 596, 127948. [Google Scholar] [CrossRef]
Chen, L.; Fan, X. Financial risk forecasting with RGCT-prerisk: A relational graph and cross-temporal contrastive pretraining framework. J. King Saud Univ. Comput. Inf. Sci. 2025, 37, 143. [Google Scholar] [CrossRef]
Chen, P.; Ji, M. Deep learning-based financial risk early warning model for listed companies: A multi-dimensional analysis approach. Expert Syst. Appl. 2025, 283, 127746. [Google Scholar] [CrossRef]
Sun, H. Research on financial risk assessment algorithm based on graph neural network. In Proceedings of the 2024 4th International Conference on Big Data, Artificial Intelligence and Risk Management, Chengdu, China, 28–30 June 2024; pp. 932–937. [Google Scholar]
Korangi, K.; Mues, C.; Bravo, C. A transformer-based model for default prediction in mid-cap corporate markets. Eur. J. Oper. Res. 2023, 308, 306–320. [Google Scholar] [CrossRef]
Wang, C.; Xiao, Z. A deep learning approach for credit scoring using feature embedded transformer. Appl. Sci. 2022, 12, 10995. [Google Scholar] [CrossRef]
Yang, X.; Zhang, C.; Sun, Y.; Pang, K.; Jing, L.; Wa, S.; Lv, C. FinChain-BERT: A high-accuracy automatic fraud detection model based on NLP methods for financial scenarios. Information 2023, 14, 499. [Google Scholar] [CrossRef]
Tang, Y.; Liu, Z. A distributed knowledge distillation framework for financial fraud detection based on transformer. IEEE Access 2024, 12, 62899–62911. [Google Scholar] [CrossRef]
An, H.; Ma, R.; Yan, Y.; Chen, T.; Zhao, Y.; Li, P.; Li, J.; Wang, X.; Fan, D.; Lv, C. Finsformer: A novel approach to detecting financial attacks using transformer and cluster-attention. Appl. Sci. 2024, 14, 460. [Google Scholar] [CrossRef]
Li, L.; Xu, J. Graph transformer-based self-adaptive malicious relation filtering for fraudulent comments detection in social network. Knowl.-Based Syst. 2023, 280, 111005. [Google Scholar] [CrossRef]
Ashiq, W.; Kanwal, S.; Rafique, A.; Waqas, M.; Khurshaid, T.; Montero, E.C.; Alonso, A.B.; Ashraf, I. Roman urdu hate speech detection using hybrid machine learning models and hyperparameter optimization. Sci. Rep. 2024, 14, 28590. [Google Scholar] [CrossRef]
Schwab, B.; Kriebel, J. Mitigating adversarial attacks on transformer models in credit scoring. Eur. J. Oper. Res. 2025, 328, 309–323. [Google Scholar] [CrossRef]
Wu, Z.; Dong, Y.; Li, Y.; Shi, B. Unleashing the power of text for credit default prediction: Comparing human-generated and AI-generated texts. SSRN 2023, 4601317. [Google Scholar] [CrossRef]
Babaei, G.; Giudici, P. GPT classifications, with application to credit lending. Mach. Learn. Appl. 2024, 16, 100534. [Google Scholar] [CrossRef]
Zhang, X.; Yu, L.; Yin, H. Domain adaptation-based multistage ensemble learning paradigm for credit risk evaluation. Financ. Innov. 2025, 11, 27. [Google Scholar] [CrossRef]
Mushava, J.; Murray, M. Comprehensive credit scoring datasets for robust testing: Out-of-sample, out-of-time, and out-of-universe evaluation. Data Brief 2024, 54, 110262. [Google Scholar] [CrossRef] [PubMed]
Nikolaidis, D.; Doumpos, M. Credit scoring with drift adaptation Using Local Regions of competence. Oper. Res. Forum 2022, 3, 67. [Google Scholar] [CrossRef]
Suryanto, H.; Mahidadia, A.; Bain, M.; Guan, C.; Guan, A. Credit risk modeling using transfer learning and domain adaptation. Front. Artif. Intell. 2022, 5, 868232. [Google Scholar] [CrossRef]
Polo, F.M.; Izbicki, R.; Lacerda, E.G., Jr.; Ibieta-Jimenez, J.P.; Vicente, R. A unified framework for dataset shift diagnostics. Inf. Sci. 2023, 649, 119612. [Google Scholar] [CrossRef]
Hjelkrem, L.O.; Lange, P.E.d. Explaining deep learning models for credit scoring with SHAP: A case study using Open Banking Data. J. Risk Financ. Manag. 2023, 16, 221. [Google Scholar] [CrossRef]
Hjelkrem, L.O.; De Lange, P.E.; Nesset, E. The value of open banking data for application credit scoring: Case study of a Norwegian bank. J. Risk Financ. Manag. 2022, 15, 597. [Google Scholar] [CrossRef]
Pang, M.; Wang, F.; Li, Z. Credit risk prediction based on an interpretable three-way decision method: Evidence from Chinese SMEs. Appl. Soft Comput. 2024, 157, 111538. [Google Scholar] [CrossRef]
Li, T.; Kou, G.; Peng, Y.; Yu, P.S. Feature selection and grouping effect analysis for credit evaluation via regularized diagonal distance metric learning. INFORMS J. Comput. 2024, 37, 1391–1412. [Google Scholar] [CrossRef]
Chang, V.; Sivakulasingam, S.; Wang, H.; Wong, S.T.; Ganatra, M.A.; Luo, J. Credit risk prediction using machine learning and deep learning: A study on credit card customers. Risks 2024, 12, 174. [Google Scholar] [CrossRef]
Garcia, A.C.B.; Garcia, M.G.P.; Rigobon, R. Algorithmic discrimination in the credit domain: What do we know about it? AI Soc. 2024, 39, 2059–2098. [Google Scholar] [CrossRef]
Du, X. Financial text analysis using 1D-CNN: Risk classification and auditing support. In Proceedings of the 2025 International Conference on Artificial Intelligence and Computational Intelligence, Kuala Lumpur, Malaysia, 14–16 February 2025; pp. 515–520. [Google Scholar]
Li, J.; Xu, C.; Feng, B.; Zhao, H. Credit risk prediction model for listed companies based on CNN-LSTM and attention mechanism. Electronics 2023, 12, 1643. [Google Scholar] [CrossRef]
Song, Y.; Chiangpradit, M.; Busababodhin, P. Hyperband-Optimized CNN-BiLSTM with Attention Mechanism for Corporate Financial Distress Prediction. Appl. Sci. 2025, 15, 5934. [Google Scholar] [CrossRef]
Li, M.; Walsh, J. FedGAT-DCNN: Advanced Credit Card Fraud Detection Using Federated Learning, Graph Attention Networks, and Dilated Convolutions. Electronics 2024, 13, 3169. [Google Scholar] [CrossRef]
Gou, R.; Sohaib, O. RiskMamba: A Lightweight and Efficient Model for Enterprise Financial Risk Prediction with Multi-Scale Temporal Modeling. J. Organ. End User Comput. 2025, 37, 1–20. [Google Scholar] [CrossRef]
Liao, L.; Yang, C. Enterprise Risk Information Extraction Based on BERT. In Proceedings of the 2022 7th International Conference on Intelligent Computing and Signal Processing (ICSP), Xi’an, China, 15–17 April 2022; IEEE: New York, NY, USA, 2022; pp. 1453–1458. [Google Scholar]
Vilella, S.; Capozzi, A.; Fornasiero, M.; Moncalvo, D.; Ricci, V.; Ronchiadin, S.; Ruffo, G. Weirdnodes: Centrality based anomaly detection on temporal networks for the anti-financial crime domain. Appl. Netw. Sci. 2025, 10, 1–29. [Google Scholar] [CrossRef]
Gourgoulias, K.; Ibraimoski, S.; Cazan, A.P.; Ayub, M.; Frade, A.; Rios, L.M.; Moran, S. ADkit: A Framework for Anomaly Detection from Natural Language. In Proceedings of the 2025 IEEE/ACM 2nd Workshop on Software Engineering Challenges in Financial Firms (FinanSE), Ottawa, ON, Canada, 29 April 2025; IEEE: New York, NY, USA, 2025; pp. 15–16. [Google Scholar]
Zhang, S.; Zhu, C.; Xin, J. CloudScale: A lightweight AI framework for predictive supply chain risk management in small and medium manufacturing enterprises. Spectr. Res. 2024, 4. [Google Scholar]

Figure 1. Overall architecture of the proposed Transformer-based domain generalization model for enterprise risk prediction. The encoder integrates the Hierarchical Structure-Invariant Encoding module to align multi-level enterprise representations across regions, while the decoder employs the Uncertainty-Aware Domain-Robust Transformer module to enhance stability and adaptivity under inter-regional distribution shifts.

Figure 2. Comparison of the previous Transformer-based risk prediction framework and the proposed hierarchical encoding structure. The upper part illustrates the conventional model that fuses multi-domain features directly, while the lower part shows our hierarchical representation process that performs structured fusion and feedback optimization to enhance generalization on the target domain.

Figure 3. Architecture of the Uncertainty-Aware Domain-Robust Transformer Decoder for enterprise risk prediction. The model incorporates multi-head attention and uncertainty-aware feed-forward layers to dynamically adjust cross-domain representations, thereby mitigating distribution gaps between multiple source domains and the target domain.

Figure 4. Loss curves across epochs when different regions are used as target domains (Source: Authors’ experiments).

Figure 5. Confusion matrix comparison between the proposed method and the Transformer baseline across target domains (Source: Authors’ experiments).

Figure 6. ROC curve comparison between the proposed method and the Transformer baseline across four target domains (Source: Authors’ experiments).

Figure 7. t-SNE visualization comparing different methods across four regions (Source: Authors’ experiments).

Figure 8. Noise sensitivity results across different regions (Source: Authors’ experiments).

Table 1. Organization of this paper.

Section	Description
Section 2	Summarizes prior studies on enterprise financial risk identification, domain generalization, structure-invariant representation learning, and uncertainty-aware modeling.
Section 3	Introduces the proposed sustainability-oriented structure-invariant transformer framework, describing the Hierarchical Structure-Invariant Encoder (HSIE), the Uncertainty-aware Domain-Robust Decoder (UDR-T), and the overall training objective.
Section 4	Describes the multi-regional empirical dataset, the leave-one-domain-out protocol, implementation details, baselines, and evaluation metrics.
Section 5	Presents quantitative comparisons, ablation studies, and visualization analyses to assess cross-domain robustness, structural stability, and uncertainty-awareness.
Section 6	Concludes the work, highlights key findings, and outlines potential future extensions for multi-domain intelligent financial analysis.

Table 2. Economically interpretable input variables for cross-regional delisting-risk identification.

Category	Representative Variables (All Used as Model Inputs)
Industry and firm-type attributes	Industry name; Finance-industry dummy; Manufacturing-industry dummy; SOE indicator.
Size and capital structure	Size; Leverage.
Profitability and earnings quality	ROA; ROE; Gross profit; Net profit growth; Earnings management.
Liquidity and cash-flow conditions	Current ratio; Quick ratio; Cash flow.
Growth and market/value signals	Growth; Asset growth; Book-to-market; Price-to-book; Tobin’s Q; SA index; absolute SA index.
Governance and ownership structure	CEO duality; Board size; Independent director ratio; Managerial shareholding; Female director ratio; Employee scale; Institutional ownership; Top1/Top3 ownership concentration; ownership Herfindahl indices; balance-of-power measures.
Audit and disclosure signals	Audit opinion; Audit fee; Big4 indicator.
Regional identifiers	East/West/Mid indicators; Listing year; Establishment year.

Table 3. Statistical summary of enterprise risk datasets from four regions.

Region	Normal Samples (0)	Risk Samples (1)	Total
Beijing	4488	77	4565
Jiangsu	5487	124	5611
Shanghai	4095	122	4217
Zhejiang	5830	94	5924
Total	19,900	417	20,317

Table 4. Experimental Environment and Hyperparameter Configuration.

Item	Setting
GPU Device	NVIDIA RTX 4090D (24 GB VRAM)
Deep Learning Framework	PyTorch 2.1.0
Batch Size	64
Training Epochs	200
Optimizer	AdamW
Initial Learning Rate	$1 \times 10^{- 4}$
Learning Rate Schedule	Cosine Annealing
Weight Decay	0.01
Activation Function	ReLU
Validation Strategy	Leave-One-Domain-Out
Random Seed	42

Table 5. Comparison results of different models across target domains (Source: Authors’ experiments).

Method	Precision	Recall	Acc	F1-Score	AUC
Target Domain: Beijing
1DCNN [26]	78.11	77.56	79.02	77.82	83.12
LSTM [27]	79.35	78.89	80.11	79.11	84.05
BILSTM [28]	80.02	79.61	81.03	80.12	84.77
GAT [29]	81.44	80.92	82.22	81.26	85.13
Mamba [30]	82.31	81.73	83.14	82.09	85.73
BERT [31]	82.95	82.48	83.63	82.73	86.01
Weirdnodes [32]	83.01	82.72	83.77	82.93	86.32
ADkit [33]	83.47	83.12	84.21	83.41	86.55
CloudScale [34]	83.88	83.35	84.63	83.68	86.89
Ours	84.24	83.51	85.12	83.87	87.22
Target Domain: Shanghai
1DCNN	77.46	76.73	78.41	77.12	82.04
LSTM	78.93	78.26	79.65	78.61	83.02
BILSTM	79.88	79.15	80.34	79.42	83.48
GAT	80.97	80.36	81.42	80.67	83.91
Mamba	81.56	81.04	82.03	81.33	84.37
BERT	82.11	81.55	82.67	81.92	84.66
Weirdnodes	82.54	81.99	82.99	82.24	85.02
ADkit	82.87	82.33	83.45	82.69	85.33
CloudScale	83.15	82.61	83.78	83.04	85.58
Ours	85.02	84.71	85.26	84.88	87.10
Target Domain: Jiangsu
1DCNN	78.33	77.65	79.03	77.98	83.26
LSTM	79.24	78.71	80.04	79.13	84.03
BILSTM	80.16	79.46	80.89	80.04	84.51
GAT	81.09	80.38	81.77	80.98	84.88
Mamba	81.85	81.07	82.51	81.66	85.33
BERT	82.57	81.84	83.08	82.34	85.79
Weirdnodes	83.01	82.42	83.56	82.77	86.01
ADkit	83.37	82.76	83.93	83.11	86.35
CloudScale	83.81	83.09	84.25	83.52	86.67
Ours	86.02	85.73	86.21	85.97	88.05
Target Domain: Zhejiang
1DCNN	77.52	76.73	78.24	77.11	82.13
LSTM	78.86	78.15	79.52	78.56	83.04
BILSTM	79.75	79.01	80.32	79.43	83.51
GAT	80.68	80.04	81.26	80.41	83.96
Mamba	81.25	80.69	81.88	81.09	84.28
BERT	81.89	81.35	82.53	81.66	84.63
Weirdnodes	82.41	81.88	82.91	82.14	84.97
ADkit	82.76	82.22	83.24	82.49	85.22
CloudScale	83.04	82.58	83.51	82.83	85.54
Ours	85.33	84.95	85.67	84.72	87.42

Table 6. Ablation study results of different modules across target domains (Source: Authors’ experiments).

Method	Precision	Recall	Acc	F1-Score	AUC
Target Domain: Beijing
Transformer	80.31	79.44	81.24	80.05	83.42
+HSIE	82.41	81.33	83.02	82.12	85.16
+UDR-T	83.42	82.54	84.26	83.01	86.54
Ours	84.24	83.51	85.12	83.87	87.22
Target Domain: Shanghai
Transformer	81.24	80.33	81.87	81.09	83.21
+HSIE	82.86	81.92	83.15	82.31	84.58
+UDR-T	83.74	82.68	84.21	83.06	85.67
Ours	85.02	84.71	85.26	84.88	87.10
Target Domain: Jiangsu
Transformer	82.34	81.55	83.03	82.01	84.12
+HSIE	84.08	83.26	84.71	83.72	85.91
+UDR-T	85.21	84.48	85.93	84.92	87.06
Ours	86.02	85.73	86.21	85.97	88.05
Target Domain: Zhejiang
Transformer	81.92	81.04	82.11	81.36	83.54
+HSIE	83.37	82.58	83.67	82.93	85.06
+UDR-T	82.47	82.61	84.86	82.54	84.31
Ours	85.33	84.95	85.67	84.72	87.42

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, K.; Li, X. A Structure-Invariant Transformer for Cross-Regional Enterprise Delisting Risk Identification. Sustainability 2026, 18, 397. https://doi.org/10.3390/su18010397

AMA Style

Li K, Li X. A Structure-Invariant Transformer for Cross-Regional Enterprise Delisting Risk Identification. Sustainability. 2026; 18(1):397. https://doi.org/10.3390/su18010397

Chicago/Turabian Style

Li, Kang, and Xinyang Li. 2026. "A Structure-Invariant Transformer for Cross-Regional Enterprise Delisting Risk Identification" Sustainability 18, no. 1: 397. https://doi.org/10.3390/su18010397

APA Style

Li, K., & Li, X. (2026). A Structure-Invariant Transformer for Cross-Regional Enterprise Delisting Risk Identification. Sustainability, 18(1), 397. https://doi.org/10.3390/su18010397

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Structure-Invariant Transformer for Cross-Regional Enterprise Delisting Risk Identification

Abstract

1. Introduction

2. Related Work

2.1. Transformer-Based Discriminative Modeling in Financial Scenarios

2.2. Domain Generalization for Financial Risk Prediction

3. Method

3.1. Problem Definition

3.2. Transformer Model Architecture

3.3. Hierarchical Structure-Invariant Encoding

3.4. Uncertainty-Aware Domain-Robust Transformer Decoder

3.5. Training Objective

4. Dataset Introduction

5. Experimental Results and Analysis

5.1. Evaluation Metric

5.2. Experimental Setup

5.3. Comparison of Experimental Results with Other Models

5.4. Ablation Experiment Results

5.5. Confusion Matrix Experiment Results

5.6. AUROC Experimental Results

5.7. t-SNE Experimental Results

5.8. Model Stability Experiment

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI