From Unstructured Text to Automated Insights: An Explainable AI Approach to Internal Control in Banking Systems

Liu, Ya; Li, Xinqiu; Su, Congli

doi:10.3390/systems14030234

Open AccessArticle

From Unstructured Text to Automated Insights: An Explainable AI Approach to Internal Control in Banking Systems

by

Ya Liu

^1,*,

Xinqiu Li

¹

and

Congli Su

²

¹

China School of Banking and Finance, University of International Business and Economics, Beijing 100105, China

²

School of Public Finance and Economics, Shanxi University of Finance and Economics, Taiyuan 030012, China

^*

Author to whom correspondence should be addressed.

Systems 2026, 14(3), 234; https://doi.org/10.3390/systems14030234

Submission received: 13 January 2026 / Revised: 12 February 2026 / Accepted: 22 February 2026 / Published: 25 February 2026

(This article belongs to the Special Issue Business Intelligence and Data Analytics in Enterprise Systems)

Download

Browse Figures

Versions Notes

Abstract

The complexity of internal control in commercial banks continues to increase, and relevant reports exhibit notable lag and template issues. In response to the demand to transform unstructured disclosures into actionable insights, this study proposes an “augmented Business Intelligence (BI) framework” that integrates a text-based internal control quality assessment system, a dual-validation process, and the resulting Intelligent Internal Control Decision Support System (IIC-DSS). By combining large language models and neural-symbolic models of regulatory prototypes, a quality evaluation system for internal control based on complex text is constructed using a mixed probability mechanism to reduce interference from defensive disclosures. A dual validation process is designed with Partial Least Squares Structural Equation Modeling (PLS-SEM). PLS-SEM verification confirms the construct validity of this evaluation system, while XGBoost verification indicates that internal control quality has incremental predictive ability for asset quality deterioration. The IIC-DSS uses SHapley Additive exPlanations (SHAP) to explain XGBoost outputs, quantifying the marginal contribution of each control factor to the predicted risk. Overall, this study advances internal-control measurement by establishing a neural-symbolic, text-to-indicator representation within an augmented BI architecture and empirically demonstrating its utility in improving predictive power for bank asset quality deterioration and in enhancing decision transparency via explainable AI.

Keywords:

internal control quality; augmented business intelligence; decision support systems; text analytics; explainable AI

1. Introduction

As a complex system integrating IT, operations, and culture, internal control in commercial banks plays a pivotal role in determining performance and risk governance [1,2]. The increase in the complexity of the internal control system is mainly attributed to two reasons: not only the continuous expansion of fintech, which has broadened the boundaries and connotations of internal control [3,4], but also the increasingly strict regulatory constraints in the post-crisis era, especially the significantly enhanced compliance requirements under China’s “dual-pillar” framework [5,6]. To address this situation, the construction of the internal control system of banks needs to take into account dual standards: deeply integrating domestic regulations and international practices at the compliance level while seeking a balance of efficient operation within the constraints of core risk indicators, such as the capital adequacy ratio, at the business level. Nevertheless, the lack of transparency in internal control mechanisms leaves investors and regulators dependent on restricted disclosure channels, such as assessment reports, to obtain a direct picture of the system. However, such reports often exhibit delayed disclosure, incomplete coverage of information, inconsistent statements, templated language, and a lack of unified evaluation standards [7]. A representative illustration can be found in the recent Internal Control Evaluation Report of a leading state-owned commercial bank. Its key conclusions are primarily delivered through binary selection formats. For instance, checking “No” for “major deficiencies in financial reporting” and “Effective” for the overall conclusion. The narrative then proceeds to a broad, standardized assertion that the bank has maintained effective controls “in all material respects,” while specific operational defects are often generalized as “rectified general deficiencies” without elaborating on the underlying risks. Consequently, critical details about control processes and risk response mechanisms are often omitted.

With the development and widespread application of text analytics and artificial intelligence technologies [8,9,10], new mitigation paths have emerged for the aforementioned predicaments. Corporate annual reports and Environmental, Social, and Governance (ESG) reports provide vast, heterogeneous, and more specific information sources with higher information entropy and more dispersed semantics. The narrative sections in annual reports (such as corporate governance, comprehensive risk management and compliance, business description, major events and regulatory penalty rectification, etc.) present rich insights into internal control elements in a multi-dimensional and decentralized manner, forming a more complete evidence chain of “how internal control is implemented and improved in business processes”. ESG reports further strengthen compliance culture and risk awareness at the governance level, providing additional semantics for identifying the operational status of internal controls. By integrating multiple technologies to process publicly disclosed annual bank reports and ESG reports, converting large-scale, complex, unstructured content into computable, structured indicators, and integrating these indicators into Business Intelligence (BI) decision support systems, this approach can serve as an operational solution.

Nevertheless, significant challenges persist in deriving quantitative internal control metrics from unstructured text and embedding them into decision-support frameworks. A major hurdle is the “coarse-grained” nature of current metric construction. Since internal controls rely on multiple coupled elements, a single aggregated metric rarely captures the system’s full complexity. The text data itself presents further difficulties. Banks often use templated “defensive disclosures” that obscure specific risks in vague language, so traditional keyword matching captures noise rather than substance. High data dimensionality exacerbates this problem, making the modeling process even more challenging. Beyond these technical issues, there is a functional disconnect between data and decision-making. Most text mining stops at generating indicators without linking them to decision systems. This leaves regulators and managers unable to readily identify sources of risk or translate insights into action.

Our study focuses on three core research questions:

(RQ1) How can unstructured text in bank reports be turned into a multidimensional quantitative framework that captures the layered structure of internal controls?

(RQ2) Does a model that includes text-mined internal-control variables predict outcomes significantly better than models that use other internal-control variables?

(RQ3) How can text-driven indicators be operationalized to support model interpretation and risk prioritization in the banking sector?

To address these issues and make the results usable for BI decision support, we developed a workflow that links text, indicators, models, and dashboards. We leverage high-performance embedding models, such as those from the Beijing Academy of Artificial Intelligence General Embedding (BGE), along with a dual regulatory-semantic knowledge base to map disclosure texts to vector spaces. By computing hybrid probabilities against regulatory prototypes, we filter out noise and convert raw text into a rigorous, five-element internal control quality indicator system (IC-5Q). Validating this structure requires a two-step approach: we first use Partial Least Squares Structural Equation Modeling (PLS-SEM) to test construct validity, and then pair the indicators with Extreme Gradient Boosting (XGBoost) to evaluate their out-of-sample predictive performance for asset quality risk. To ensure practical utility, we embed SHapley Additive exPlanations (SHAP)- based explainable AI into BI dashboards, thereby creating an Intelligent Internal Control Decision Support System (IIC-DSS). This system visualizes the marginal contribution of each element, providing managers with intuitive risk assessments that directly support governance decisions.

The remainder of this paper is organized as follows. Section 2 conducts a critical review of the literature on internal control measurement, text mining in internal control, and the technological integration of business intelligence, machine learning, and explainable artificial intelligence. Section 3 explains how to extract the internal control indicator system from unstructured disclosure texts and presents an empirical validation framework for the index system: using PLS-SEM to test the construct validity, evaluating the out-of-sample prediction performance of asset quality risks based on models such as XGBoost, and introducing SHAP to provide traceable explanations at the internal control component level. Section 4 describes the dataset construction process, reports verified empirical results, and presents the application scenario via a business intelligence dashboard. Section 5 summarizes the research conclusions and proposes directions for future research.

2. Literature Review

2.1. Evaluation Methods for Internal Control Systems

There are various indices for internal control assessment, which can be classified into two categories: goal-oriented and process-oriented [11].

The goal-oriented indices measure quality by checking how well a company meets its targets in strategy, operations, reporting, and compliance [12,13]. Nevertheless, internal controls can provide only a reasonable level of assurance and cannot, by themselves, ensure the attainment of goals. As a result, these indices tend to reflect a company’s overall strength but struggle to pinpoint specific weaknesses, making diagnosis and targeted improvement harder.

Process-oriented indices adopt a different perspective by examining the system’s configuration, integrity, and efficiency. They assess quality mainly through disclosed deficiencies and the five standard components of internal control [14,15,16]. The downside is that they rely heavily on disclosure quality, and because scoring often depends on subjective expert judgment, results can vary significantly across different studies [17,18].

To better evaluate complex internal control systems, their shortcomings, and guidance for improvement, this study aims to develop an optimized process index. On the one hand, this study complements and explores appraisal indices through abundant unstructured “soft information.” On the other hand, it uses a weighted method combining subjective and objective approaches to adjust the weight coefficients and reduce the influence of human opinion on appraisal indices.

2.2. Evolution of Text Analysis in Internal Control

The application of text analysis to internal controls has deepened with advances in natural language processing technology in recent years. Early research in this field mainly depended on lexicon-based methods and automated content analysis. For example, Boritz et al. [19] identified words in audit reports related to IT weakness by building a lexicon. Rich et al. [20] argued that unstructured text provides clues about not only the control environment but also the text’s tone, which is strongly correlated with the quality of future internal control. With the advancement of text analysis tools, text mining and machine learning techniques have been increasingly adopted. For instance, Boskou et al. [21] extracted internal audit value by building a classification model using specific terminology and n-gram syntax, which improved performance. Similarly, Liu et al. [22] confirmed that text analysis based on Python, combined with machine learning, effectively measures internal control intent—a useful approach for controlling earnings management behavior.

Nowadays, innovations in deep learning are driving text analysis tools from conventional approaches to large language models (LLMs). Huang et al. [23] and Yang et al. [24] developed FinBERT and FinGPT, respectively, capable of analyzing unstructured information in the financial sector more deeply. Chiu and Hung [25] further advanced this line of research and developed a finance-specific LLaMA-2 model enhanced with an AI-driven summarization process. The results demonstrated superior performance in sentiment analysis and return prediction compared to existing approaches.

Currently, there is very little work on using large language models for deep semantic analysis of vast amounts of Chinese texts in the field of internal control. In particular, the question of how to systematically map and quantify multidimensional textual features onto specific elements of internal control requires further attention.

2.3. Business Intelligence, Machine Learning, and Explainable Artificial Intelligence in Enterprise Systems

BI is a cornerstone of management decision-making, and its theoretical framework and practical value have been extensively discussed. Chen et al. [26] systematically clarified the evolutionary trajectory and current state of application of BI. Visinescu et al. [27] further examined decision effectiveness and, by constructing a simplified model, revealed the internal mechanism by which BI enhances decision quality, thereby providing crucial theoretical support for understanding the relationship between BI and decision quality.

Compared with traditional BI analysis, the introduction of machine learning has endowed enterprise systems with deeper insights and is gradually being internalized as a core tool. The application scenarios of this technology are increasingly diverse: Ji and Li [28] combined gradient boosting decision trees with dynamic indicator selection to construct an enterprise financial risk prediction system that achieves high accuracy in identifying potential risks; Duan et al. [29] examined the internal audit process and constructed an evaluation model integrating machine learning and process mining, enriching the technical means of internal control through quantitative analysis of transaction anomalies.

However, the enhancement of model capabilities has also brought about trust and compliance pressures caused by “black boxes”, and explainable artificial intelligence (XAI) has thus become an important supplement. Barredo Arrieta et al. [30] reviewed XAI research, emphasizing the need to improve explainability to reduce concerns about AI applications and promote its implementation. Subsequently, XAI rapidly penetrated vertical fields: Weber [31] summarized different XAI paths in financial scenarios; Lu and Lin [32] integrated XGBoost with SHAP techniques to explore the determinants of voluntary disclosure, enhancing the interpretability of financial disclosure prediction; Kou et al. [33] further introduced large language models and XAI into annual report text analysis, proposing new digital measurement ideas to make complex text information more traceable and interpretable.

BI systems have been actively integrating machine learning and AI capabilities to deliver stronger end-to-end insights and decision support. Rane et al. [34] and Ebule [35] both note that embedding technologies such as NLP and computer vision into BI systems can substantially improve an organization’s capacity to generate actionable insights and automate decision-making. Chebrolu’s [36] review supports this with empirical data, showing that AI-driven automation reduces manual data processing by approximately 70% and improves prediction accuracy by 35–50%. These efficiency gains directly strengthen an enterprise’s ability to manage risk and make strategic moves.

In summary, existing research indicates that business intelligence has evolved from traditional data analysis tools into an intelligent decision-support system powered by two engines: machine learning and explainable artificial intelligence. This paper establishes a complete technical loop—”regulatory disclosure text → semantically enhanced quantification → IC-5Q indicator system → XGBoost machine learning risk prediction validation → SHAP-driven explainable BI dashboard (IIC-DSS)”—to organically integrate business intelligence, machine learning, and explainable AI within risk early-warning scenarios of internal control systems. This approach offers new research perspectives and practical pathways for the integrated application of these emerging technologies in internal control.

3. Methodology

This section follows a single thread, moving from unstructured disclosures to interpretable, verifiable, and actionable internal control measurement and risk governance, and proposes an end-to-end methodological framework for implementing the IIC-DSS (Figure 1). Additionally, as shown in Figure S1 in the Supplementary Materials, a roadmap designed for non-technical readers is available. In plain terms, the workflow proceeds in four stages: (i) Quantification, that is, turning disclosure text into structured indicators; (ii) Validation, that is, verifying that the indicators jointly define COSO-based internal control constructs and forming ICI; (iii) Prediction, that is, using ICI to forecast asset quality risk; and (iv) Diagnosis, that is, using explainable AI to identify which control elements drive each risk signal. To clarify the logical connection between the research objectives and the technical implementation, Table 1 maps each research question to its corresponding methodological section and the key technical approaches employed.

3.1. Developing a “Regulation–Semantics Dual-Driven” Internal Control Indicator System

To address RQ1, the following section proposes the “dual-driven regulatory-semantic” internal control indicator system, which forms the formative index IC-5Q and the composite index ICI. The indicator system is constructed using a bottom-up strategy, initially based on third-level indicators. Using knowledge-enhanced corpus modelling and detailed element mapping, the indices are ultimately aggregated layer by layer. Our disclosure corpus includes both annual reports and ESG reports. ESG reports are incorporated because their governance narratives often disclose internal-control arrangements, such as compliance culture, audit mechanisms, and risk governance. Additionally, ESG reports include numeric governance KPIs that can serve as “hard evidence” during feature construction.

3.1.1. Building a “Regulation–Semantics Dual-Driven” Knowledge Base

The building of a “Regulatory-Semantic Dual-Driven” knowledge base aims to accumulate a rich regulatory corpus while clarifying “what constitutes knowledge and under what conditions it is permitted to enter the repository.” To achieve this, the evaluation of textual information is grounded in two complementary knowledge bases. It starts from institutionally defined conceptual boundaries: internal control frameworks and banking regulatory requirements issued by Committee of Sponsoring Organizations of the Treadway Commission (COSO), the Basel Committee on Banking Supervision (BCBS), and the China Banking and Insurance Regulatory Commission (CBIRC) are used as references (the document titles and sources are listed in the Supplementary Materials), and these references delineate the core meaning and scope of the five components of internal control. In parallel, semantic “prototypes” are constructed from sentence embeddings to capture semantic equivalence in banking disclosures under synonym substitution, syntactic rewriting, and shifts in writing style, enabling a stable identification of differences in expression.

The regulatory-driven component requires translating institutional provisions into sentence-level evidence and forming actionable rules. First, construct an internal control element mapping pattern library to serve as an anchor for coarse annotation of regulatory corpora and rule generation. The tag space is limited to the five COSO elements while standardizing Chinese and English expressions and abbreviations. Next, seed term clusters for each component are extracted from COSO/regulatory texts and domain glossaries, such as governance structure, stress testing, segregation of duties, risk reporting, and internal audit rectification. Multiple expressions within the same semantic cluster are merged into regular patterns. Pilot runs using regulatory sentences as samples record hits, conflicts, and omissions; overly broad patterns are narrowed, while high-frequency omissions are supplemented. After stabilization, the system consolidated into a version-maintainable mapping table. Once regulatory documents were parsed into sentences, a “regulatory sentence-element” correspondence could be generated. In the final phase of K^lex construction, we introduced embedding space validation: only when a candidate rule achieved a cosine similarity exceeding threshold τ with its source regulatory sentence in the embedding space was it formally added to the repository and archived under the five elements.

The semantic-driven component incorporates prototype theory to mitigate discrepancies between standard terminology and banking disclosure expressions. Semantic prototype vectors are constructed based on five elements. Regulatory sentences are first mapped to elements E ∈ [CE, RA, CA, IC, MA], and the seed set S_E^seed is then filtered. Filtering criteria include rule-matching strength and explicit action-verb characteristics (e.g., “establish, implement, monitor, rectify, audit”). The Chinese sentence-embedding model fine-tuned for finance (BGE) computes the centroid vector v_E^proto for the seed set (Equation (1)), which is incorporated into the knowledge base alongside K^lex.

{v_{E}}^{proto} = \frac{1}{∣ {S_{E}}^{seed} ∣} \sum_{s \in {S_{E}}^{seed}} Embed (s)

(1)

3.1.2. Sentence–Component Mapping via Hybrid Probabilistic Constraints

To handle the interwoven nature of disclosure texts, we use a knowledge-based ‘neural-symbolic’ strategy. Appendix B illustrates how we process raw text into probability scores using a specific example. Each sentence i is represented as a membership-probability vector over the five internal control components, allowing a single sentence to load on multiple components simultaneously.

The method combines two probability measures to handle both implicit context and explicit rules. For the embedding-based semantic probability P_i,E^embed, we L2-normalize the sentence embedding e_i and the component prototype centroids C_E, compute dot-product similarities s_i,E, and then apply a numerically stable row-wise Softmax (shifted by the row maximum) to obtain a valid distribution:

\begin{array}{l} s_{i, E} = \frac{e_{i}}{{‖e_{i}‖}_{2}} \cdot \frac{c_{E}}{{‖c_{E}‖}_{2}}, {P_{i, E}}^{e m b e d} = \frac{\exp (s_{i, E} - m_{i})}{\sum_{k = 1}^{5} \exp (s_{i, k} - m_{i})}, m_{i} = \max s_{i, k} k \in \{1, 2, 3, 4, 5\} \end{array}

(2)

To capture a more direct regulatory consistency signal, we derive the dictionary-based rule probability P_i,E^lex by grouping K^lex into weighted sub-items for each component. Hit weights are accumulated into Score_i,E, where we apply a saturation map defined as 1 − 1/(1 + Score_i,E) to keep values within the [0,1) range before row normalization. The process also validates regular expressions and automatically falls back to fixed-string matching. We combine the two distributions into a mixed-membership probability P_i,E^mix, using the mixing weight α, as shown in Equation (3). The value of α is tuned by grid search over (0,0.6] for each year. In the normalized embedding space, we measure sentence dissimilarity with cosine distance d = 1 − cos(⋅). We select the fusion weight α as the value that yields the highest silhouette score for the clusters. To validate this method, we compared it against fixed α values ranging from 0.0 to 0.8. As shown in Appendix C, the model is robust. Specifically, when α is between 0.2 and 0.6, the rankings remain highly correlated, and the top-tier classifications stay consistent. However, performance drops at 0.8. This decline confirms that we should cap the weight at 0.6 to prevent semantic patterns from overpowering clear regulatory signals.

{P_{i, E}}^{m i x} = (1 - α) {P_{i, E}}^{e m b e d} + α {P_{i, E}}^{l e x} α \in (0, 0.6]

(3)

Considering the prevalence of “model sentences” and cross-year reuse in regulatory texts, Equation (4) further transforms the mix probability into the final contribution weight. The quality term ϕ_i^qual integrates three constraints. Min-wise Independent Permutations Locality-Sensitive Hashing (MinHash-LSH) provides a non-duplication coefficient to penalize highly similar or cross-year-reused statements. PDF document tree reconstruction provides chapter position weights, giving greater importance to core sections such as risk management and internal control self-assessment. Digital features, combined with strong action verbs, constitute an evidence-enhancing term, increasing the contribution of sentences containing quantitative information and substantive actions. In simple terms, this step acts as a ‘quality filter.’ It penalizes vague, ‘boilerplate’ language (sentences that look like copy-pasted templates) while rewarding specific, verifiable evidence (such as numbers or hard deadlines). This ensures that the final index reflects the substance of internal control rather than the mere volume of text.

w_{i, E} = \frac{{P_{i, E}}^{m i x} \cdot {ϕ_{i}}^{q u a l}}{\sum_{j = 1}^{n} {P_{j, E}}^{m i x} \cdot {ϕ_{j}}^{q u a l} + ε}

(4)

3.1.3. Hierarchical Formative Index Construction and Aggregation

After completing the knowledge base construction and component mapping, we followed the bottom-up index construction logic and set the starting point for extracting and constructing text information at the third level of the index.

In order to remove a large amount of marketing statements and macro-level noise from the disclosure text, alleviate the intertemporal fluctuations in length and writing style caused by “disclosure overload”, and avoid noise diluting effective signals and causing bias in indicator construction, we first preprocessed the raw texts using Python (v3.10) and then applied sentence-level screening. Specifically, for each sentence, we calculate its relevance score w_i,E,t under year t and the corresponding internal control element E. Instead of overwhelming the model with hundreds of repetitive keywords, we consolidate them into six distinct themes (such as ‘Disclosure Quality’ or ‘Hard Evidence’). This reduces noise and ensures that the indicators are robust across different writing styles. Then, based on the empirical distribution of this score across samples from that year, we use the Otsu dynamic threshold method [37] to determine the segmentation point τ_{t, E}^Otsu that distinguishes between relevant and irrelevant sentences. Considering that the threshold may be too low when the signal is weak, we imposed a minimum threshold constraint. We used the higher value between the Otsu-derived cutoff and the prespecified lower bound as the effective threshold for that year–element pair. To conclude the process, we retained all sentences whose relevance scores met or exceeded this threshold to construct the representative sentence subset for bank b in year t under element E. The screening rule and the resulting subset are detailed in Equation (5).

R_{b, t, E} = [s_{i} : i \in b, w_{i, E, t} \geq τ_{t, E}], τ_{t, E} = m a x (τ_{t, E}^{O t s u}, τ_{m i n}) .

(5)

We built the Level 3 system using “general” and “specific” dimensions. This dual approach evaluates both the format’s credibility and the content’s substance.

The general dimensions are designed to filter out purely formal noise, thereby keeping the indicators anchored in meaningful content for every internal control component. Here, we use relative attention and semantic coverage to gauge disclosure intensity and relevance. Additionally, we strengthened the “Hard Measures” dimension by extracting quantitative ESG data. Specifically, we scan governance sections in ESG reports for numeric values, such as audit frequency and board meeting counts. These figures serve as verifiable evidence.

The specific dimensions strictly correspond to the heterogeneity logic of the five COSO elements. That is, the control environment focuses on governance structure and culture; risk assessment emphasizes data quantification and foresight; control activities revolve around process automation and separation of duties; information communication examines the effectiveness of communication channels; and monitoring activities focus on the implementation of audit independence and the closure of rectification. All indicators and their calculation methods are shown in Appendix Table A1.

After constructing the third-level indicators (L3), we propose a two-stage weighting scheme that balances data distribution characteristics and theoretical priors by combining subjective and objective weighting. To make the L3 indicators more informative when they are rolled up to second-level indicators (L2), and to produce a stable IC-5Q index when L2 is further aggregated to first-level indexes (L1), we adopt two weighting steps that address different needs. Because third-level indicators can be correlated, we apply the CRITIC method [38] in the L3-to-L2 mapping to reflect both the comparative strength and the degree of conflict among standardized indicators, thereby ensuring the resulting weights better reflect the distinguishability and value of each piece of information. When aggregating from L2 to L1 elements and constructing the IC-5Q index, we introduce a game-theoretic combinatorial weighting model. The data-driven weights are combined with a uniformly distributed (subjective) prior weight vector that serves as an uninformed baseline, and the combination coefficients are chosen by minimizing the sum of squared deviations between the candidate weight vectors, yielding the final weighting scheme (Equation (6)).

\min_{λ_{1}, λ_{2}} {∥ λ_{1} ω_{u n i f} + λ_{2} ω_{C R I T I C} - ω_{C R I T I C} ∥}_{2}^{2} + {∥ λ_{1} ω_{u n i f} + λ_{2} ω_{C R I T I C} - ω_{u n i f} ∥}_{2}^{2} s . t . λ_{1} + λ_{2} = 1, λ_{1} \geq 0, λ_{2} \geq 0 y i e l d i n g ω = λ_{1} ω_{u n i f} + λ_{2} ω_{C R I T I C} λ_{1} = λ_{2} = 0.5

(6)

3.2. Multi-Level Validation Framework: From Construct Validity to Predictive Power

We use a progressive, multilevel validation framework to examine measurement validity (whether the indicator system forms the intended construct), criterion validity (whether ICI relates to an established benchmark), and predictive validity (whether ICI explains future credit risk) to address RQ2.

3.2.1. Measurement and Criterion Validity via Formative PLS-SEM

We estimate the PLS-SEM model following standard hierarchical procedures [39,40]. Rather than entering high-dimensional L3 textual items directly, we consolidate them into six L2 dimensions per element: Disclosure Breadth, Quality, Distinctiveness, Regulatory Alignment, Hard Measures, and Specific Measures. These dimensions are treated as formative indicators because together they define the components of internal control rather than merely reflecting them. Each dimension captures a distinct aspect of disclosure, such as breadth of coverage or strength of supporting evidence, and these aspects are not interchangeable. Removing any single dimension would therefore inappropriately narrow the scope of the construct. To handle multicollinearity, we look beyond simple Variance Inflation Factor (VIF) values. We use CRITIC-based weighting during the aggregation phase to strictly reduce the impact of redundant data. We also ensure the stability of results by examining the dispersion of bootstrap weights. If diagnostics indicate potential overlap, we re-estimate the model using alternative specifications.

Structurally, the path model groups L2 dimensions into first-order elements (E_k), which then combine to form the second-order composite index (ICI). For element k in year t, the formative measurement model is defined as:

E_{k, t} = \sum_{d = 1}^{5} γ_{k, d} \cdot L_{k, d, t} + ζ_{k, t}

(7)

where γk,d represents the formative weight and ζ_k,t the disturbance. Convergent validity is strongly supported by the redundancy analysis. The SEM-derived latent constructs are nearly identical to their corresponding aggregate targets (Target_Ek), with path coefficients consistently close to 1.0 and high R2 values. This indicates that abstracting the six disclosure dimensions into first-order internal control elements results in minimal information loss, validating the reliability of the hierarchical structure.

At the second-order level, ICI is formed as:

I C I_{t} = \sum_{k = 1}^{5} ω_{k} E_{k, t} + ξ_{t}

(8)

where ω_k is the weight and ξ_t the residual. We validate ICI by assessing its association with the DIB Internal Control Index (ICDI) and conducting supplementary panel regressions (Equation (9)) to ensure the index preserves benchmark ranking logic after controlling for firm characteristics.

I C D I_{i, t} = β I C I_{i, t} + Γ^{'} X_{i, t} + ε_{i, t}

(9)

3.2.2. Out-of-Sample Predictive Validity of the Internal Control Index (ICI)

To validate the predictive capability of the internal control index (ICI) for future credit risk, this section compares out-of-sample forecasting performance across multiple models. Specifically, this study discretizes the non-performing loan change rate into a binary risk transition indicator to represent future credit risk (see Equation (11)). To avoid forward-looking bias when determining decision thresholds, the paper adopts the data-driven adaptive approach shown in Equation (10). To capture the tail risk of asset quality deterioration, we set the benchmark for the parameter at the upper quartile of historical data. Although higher quantiles can better capture extreme crises, they are prone to causing a scarcity of positive samples in small datasets, thereby making it difficult for the model to converge. In contrast, the 75th percentile can effectively capture the early stages of asset deterioration and ensure sufficient information density for model training. In addition, we use the boundary condition [τ_min, τ_abs] to filter out fine noise during the stationary period without sacrificing sensitivity to crises.

τ_{t} = m i n {τ_{a b s}, m a x [τ_{m i n}, Q_{q} ({Δ N P L_{\cdot, \leq t - 1}})]}, q = 0.75

(10)

Y_{i, t + 1} = I {Δ N P L_{i, t + 1} \geq τ_{t}}

(11)

Once the non-performing loan change rate was transformed into a binary risk-transition indicator and the threshold criteria were defined, out-of-sample forecasting was performed using XGBoost as the primary model. Unlike traditional linear regression, which assumes risk factors act independently, XGBoost allows us to capture complex interactions. For instance, it can detect that a weak control environment becomes critically dangerous only when combined with rapid asset expansion, a nuance that simpler models would likely miss.

Rolling-window cross-validation was used to get a reliable assessment of the model’s predictive performance. The training process for each prediction window between 2017 and 2023 used only historical data before time T. Yet, all testing activities took place during the current period at time T. Since the risk events in this paper are frequently unbalanced in nature, performance evaluation relies on PR-AUC and ROC-AUC for discrimination ability, Best F1 for the precision–recall trade-off, and the Brier score alongside the Top-K capture rate to quantify calibration and high-risk detection accuracy.

3.3. The IIC-DSS Framework: SHAP-Based Diagnosis and Decision Support

By applying the TreeSHAP algorithm to the XGBoost framework, we isolate the marginal impact of the five internal control components on the risk of sudden NPL increases. This step essentially translates the model’s complex mathematical output into a human-readable explanation, identifying the specific why behind each risk prediction. On this basis, using probability calibration and natural language generation technologies, elaborate mathematical results are transformed into visual indicators and diagnostic reports within the business intelligence (BI) dashboard, and, ultimately, an internal control decision support system (IIC-DSS) integrating “prediction—interpretation—presentation” is constructed.

By applying the TreeSHAP algorithm, we decompose the model output in the logarithmic probability space into an additive form of “pivot value + feature contribution”, and map it to the final jump probability through the logical function σ(⋅). The risk prediction of bank i at time point t satisfies:

p_{i, t + 1} = σ (ϕ_{0} + \sum_{j} ϕ_{i, j})

(12)

In Formula (12), ϕ0 is the benchmark term, and ϕi and j quantify the marginal effect of feature j on mechanism i. When the SHAP value is positive, it indicates that this feature increases risk; when it is negative, it quantifies the buffering effect of effective internal control on risk.

During the empirical process, NPL leap labels in the training set are generated using dynamic hybrid thresholds. Based on this, the XGBoost model is trained. After training is complete, TreeSHAP is called to perform attribution analysis, output the SHAP contribution matrix, and calculate the corresponding risk probability, thereby improving the clarity of the interpretation and decomposition of the prediction results. TreeSHAP not only summarizes the SHAP importance of the five elements of internal control, but also provides the mean importance and 95% confidence interval through Bootstrap repeated sampling to achieve a robust characterization of “which type of internal control subsystem is more critical”, and generates a unique SHAP contribution vector for each bank for risk diagnosis.

After completing the attribution analysis, this study used XAI to convert calibrated risk probabilities and attribution results into decision-support information and integrated it into the BI system. Given that the jump in the non-performing loan ratio exhibits low frequency and that the prediction probability is easily perturbed by sample imbalance, a robust calibration mechanism is introduced into the model after obtaining SHAP values. Specifically, the Platt scaling method based on logistic regression [41] is preferred for probability calibration; If the results show instability, the prior correction strategy is enabled. Through logarithmic probability transformation, the predicted probability is aligned with the training set’s overall distribution.

The constructed visual interaction platform consists of three core functional modules. The summary display module presents the calibrated risk probability distribution and constructs the “System Resilience Intensity” by summing the negative SHAP values of the five internal control elements, quantifying each element’s offset contribution to risk. The diagnostic analysis module performs a global importance assessment with the Bootstrap repeated sampling method and presents the contribution differences of various internal control elements in quantified form through confidence interval error bar plots. To support precise and effective tiered management, the interference recommendation module employs a dynamic threshold classification technique that searches the probability quantile matrix to determine the optimal threshold and safety floor for the F1 score. Based on these two thresholds, banks are segmented into three tiers: high, medium, and low. Building on these tiering results, the systemic framework uses Natural Language Generation technology to generate heterogeneous reports that not only expose the essential weaknesses, along with their SHAP contributions, for high-risk banks, but also identify risks for medium-risk banks and present the principal benefits provided by low-risk banks. In the foregoing procedure, the IIC-DSS framework translates the outputs of complex statistical models into a set of internal control governance measures that can be implemented directly.

4. Results and Discussion

4.1. Dataset and Descriptive Statistics

The text data are derived from the annual and ESG reports of commercial banks listed on China’s A-share market. Table A2 in Appendix A summarizes the step-by-step preprocessing pipeline that converts raw PDF annual/ESG reports into a sentence-level, section-tagged corpus with quality weights. At the numerical level, key financial and risk variables from the Wind database are aggregated and incorporated into the Internal Control Index (ICDI) provided by the DIB Internal Control Index database.

To address the small number of missing values in the sample, we evaluated the performance of the interpolation method using a combination of rolling time-window cross-validation and ground-truth masking. The algorithms selected for evaluation include panel means and medians, k-Nearest Neighbors (k-NN), Random Forests, and MICE with Predictive Mean Matching (MICE-PMM). The evaluation process involves rolling training and validation sets annually and randomly masking known observations in the validation set before reconstruction. Standardized NRMSE and NMAE were calculated between interpolated and actual values. Based on the principle of minimizing NRMSE and NMAE, we ultimately employed the random forest algorithm for data interpolation. The programming for the index construction and validation procedures described above was conducted using R (v4.1.0). The processed descriptive statistics are presented in Table 2.

4.2. Construct Validity and External Consistency of the Index System

Based on the methodology, we applied PLS-SEM to verify the construct validity and external consistency of the internal control index system constructed from complex textual content. Figure 2 illustrates the hierarchical formative path model used for this validation. The specific verification results are shown in Table 3.

The first-order measurement model results for Stage 1 show that the external weights for the six process quality dimensions are all significantly positive (***), and the bias-corrected BCa confidence intervals do not include zero, establishing the statistical significance of the indicators. The collinearity diagnosis shows that the variance inflation factor (VIF) of all indicators is below the critical value of 3.0, eliminating the interference of multicollinearity and confirming that the attributes, such as disclosure breadth and consistency, provide independent and non-redundant information contributions, effectively constituting the five elements of internal control: control environment (CE), risk assessment (RA), control activities (CA), information and communication (IC), and monitoring activities (MA). The weight ranges vary among different elements. For example, the weight range for the L2 dimension is 0.302–0.541 in the control environment (CE) and 0.244–0.517 in the control activities (CA), indicating that the marginal contributions of each process dimension across different governance semantics are not balanced.

Similarly, at the second-order structural level (Stage 2), the five elements, as formative indicators of the internal control index (ICI), are also significant. The weight ranking shows that information and communication (IC, 0.319) contributes most to ICI, followed by monitoring activities (MA, 0.258), control activities (CA, 0.222), and risk assessment (RA, 0.218), while the control environment (CE, 0.162) has the least significant contribution. Convergent validity is supported by the redundancy analysis: each construct’s path coefficient to its global single-item target variable is close to 1.0, and R² ranges from 0.959 to 0.994 (ICI: 0.988), indicating that the mapping from text features to latent construct scores exhibits no material information distortion.

4.3. Out-of-Sample Predictive Performance

To assess the incremental predictive value of the textual Internal Control Index (ICI), we employed an optimized XGBoost model interacting ICI with proxies for organizational complexity (lnAssets), risk vulnerability (NPL_lag1), and performance incentives (ROE). Other controls (CAR, leverage, and LDR) primarily reflect regulatory buffers or balance-sheet structure; treating them as main effects already absorbs important financial differences, while interacting ICI with all controls would substantially increase feature dimensionality and can reduce stability and interpretability under the very low base rate of NPL jumps.

Table 4 shows that the best-performing XGBoost specification is Controls + ESG + ICI + ICI × (lnAssets, NPL_lag1, ROE), achieving ROC-AUC = 0.909 and PR-AUC = 0.0909, with the strongest overall classification quality (Best F1 = 0.167) and strong tail-event prioritization (Top-10 capture = 0.667). Importantly, adding ICI provides incremental value beyond ESG ratings: the Controls + ESG model captures only 33.3% of actual jump events in the top decile, whereas the ESG + ICI model captures 66.7%. This is practically meaningful in a rare-event setting (base rate ≈ 0.31%): it means that when regulators or risk managers can intensively review only the top 10% of banks flagged by the model, incorporating ICI doubles the yield of true distressed cases relative to relying on financial controls and ESG scores alone. This superior tail-risk sensitivity confirms that incorporating textual internal control quality enables the detection of nonlinear risk precursors that linear models and general governance scores fail to capture.

4.4. From Explanation to Action: SHAP Diagnostics and IIC-DSS Application

We integrated the SHAP attribution mechanism into the optimal XGBoost model, aiming to identify the core elements driving the risk jump and convert them into governance diagnostic bases under the IIC-DSS framework.

The IIC-DSS is operationalized as a deployable business intelligence platform with a streamlined user workflow. Users (regulators, risk managers, or investors) upload bank PDF reports through a drag-and-drop web interface. Once a report is uploaded, the backend automatically runs the full analysis pipeline. The results appear in an interactive “Risk Diagnosis” panel. This panel shows the calibrated risk probability, SHAP force plots that break down each element’s contribution, and auto-generated remediation suggestions in plain language. An offline snapshotof the dashboard interface is available in the Supplementary Materials. The summary results indicate that the average predicted probability of a calibrated bad-loan event is approximately 0.93%. Based on the aggregated SHAP contributions of the five internal control elements, the “system resilience strength” is approximately 88.3%, indicating that, in the vast majority of sample banks, the current internal control system has exerted a net inhibitory effect on credit risk and effectively buffered potential risk exposure.

In the diagnostic analysis module, the global importance assessment based on bootstrap resampling (Table 5) reveals differences in the contributions of internal control elements. The Control Environment (CE) has the highest weight (mean |SHAP| = 0.592), followed by Information & Communication (IC, 0.463) and Control Activities (CA, 0.422).

In the intervention recommendation module, TreeSHAP generates corresponding contribution profiles for each sample bank. By decomposing the prediction results, the model quantifies the marginal driving or buffering effects of each internal control element on the risk probability. On this basis, the IIC-DSS system implements a three-level hierarchical strategy of “dynamic threshold as the main approach, and quantile distribution as the auxiliary approach”. To ensure sensitivity to tail risks, the system employs an optimal F1 threshold, combined with a head-protective mechanism, to jointly identify high-risk groups, thereby achieving adequate coverage of the top 10% of risk samples. For non-high-risk areas, the model further delineates clear boundaries for medium and low risks based on the quartile points of the probability distribution. The system then generates differentiated attribution diagnoses that clearly identify the main governance weaknesses and their contribution directions, providing targeted weak links and actionable improvement recommendations for management, regulatory authorities, and external investors. Representative sample results are shown in Table 6.

We use China Minsheng Bank to illustrate how algorithm results can turn into governance advice. The model estimates a 12.95% chance of an NPL jump and labels the bank as “High Risk.” SHAP then shows which factors increase the risk. The biggest drivers are the Control Environment (SHAP +0.356) and Information and Communication (SHAP +0.351). This points to governance culture and internal information sharing as the core problems, not day-to-day operating errors. Based on this, the IIC-DSS does not suggest adding broad capital buffers. Instead, it recommends specific governance fixes. For example, the bank can redesign internal reporting to reduce information silos and increase board-level monitoring. Conversely, the model supports a “maintenance” strategy for the low-risk Bank of Ningbo. A negative score for “Control Activities” (−0.592) confirms that current procedures effectively reduce risk, meaning no remediation is needed.

5. Conclusions and Limitations

This paper proposes a set of procedures for quantifying complex textual information to evaluate the internal control quality of Chinese listed banks and to deeply integrate business intelligence to develop a visual, intelligent internal control decision support system (IIC-DSS). The PLS-SEM and XGBoost validation results indicate that this indicator system exhibits good construct validity and performs well in predicting the probability of an increase in non-performing loans. Furthermore, the system dashboard integrates interpretable tools such as TreeSHAP, enabling the model to analyze the marginal contributions of internal control elements and automatically generate intelligent diagnostic reports for individual banks, helping them more effectively identify governance weak links and clarify improvement directions.

Several limitations remain, mainly related to data coverage, regulatory dependence, and external validity. The analysis uses Chinese A-share-listed banks because their annual reports are standardized and consistently accessible; as a result, the model may not fully capture the risk patterns of non-listed banks. In addition, the textual feature engineering was developed around China’s Basic Standard for Enterprise Internal Control and guidance from the National Financial Regulatory Administration. While COSO and Basel principles are widely applicable, their linguistic realization varies by jurisdiction, so applying the model under regimes such as the U.S. Sarbanes–Oxley Act or the European Banking Authority Guidelines would require revalidating the semantic dictionary. Cross-regional transfer is therefore not plug-and-play: parameters estimated from Chinese disclosures cannot be directly used for EU or U.S. banks, although the modular design supports adaptation. Components that transfer relatively well include the COSO five-element structure, preprocessing logic, the XGBoost framework, and SHAP-based interpretation. By contrast, regulatory seed terms, section-tagging rules, and semantic prototype vectors need to be rebuilt using local regulations and disclosure corpora. Generalizability may also be constrained by institutional differences, including the role of state ownership in China’s banking sector. Future work will broaden validation across regulatory settings, incorporate process-mining logs, and explore more advanced NLP methods to automate governance diagnostics further.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/systems14030234/s1, Figure S1: Methodological roadmap; Table S1: Detailed diagnostic results for the remaining sample banks; Table S2: Regulatory Documents Related to Internal Control of Commercial Banks (International and China); File S1: IIC-DSS Intelligent Internal Control Decision System (Offline Snapshot).

Author Contributions

Conceptualization, Y.L. and X.L.; methodology, Y.L. and X.L.; software, X.L.; validation, Y.L., X.L. and C.S.; formal analysis, Y.L. and X.L.; investigation, X.L.; resources, C.S.; data curation, X.L. and C.S.; writing—original draft preparation, X.L.; writing—review and editing, Y.L. and C.S.; visualization, X.L.; supervision, Y.L.; project administration, C.S.; funding acquisition, C.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by Program for the Philosophy and Social Sciences Research of Higher Learning Institutions of Shanxi (No. 2024W066).

Data Availability Statement

The data presented in this study are available from the corresponding author upon request, as they are part of ongoing research.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Structure of the Internal Control Index System.

L1 Element	L2 Dimension	L3 Indicator	Operational Definition & Computation Method
Common Indicators (Applied to all 5 Elements)	Disclosure Breadth	Focus_E	Relative Attention: Sum of hybrid weights w_i,E, normalized by total sentence count; measures the intensity of disclosure for element E.
		Coverage_E	Semantic Coverage: Overlap between bank sentences and element-specific sub-theme embeddings, calculated via dynamic thresholds and sigmoid smoothing.
		Topic Entropy_E	Thematic Diversity: Normalized Shannon entropy of embedding clusters; measures the diversity of topics in element E.
	Disclosure Quality	Readability_E	Linguistic Readability: Weighted average (w_i,E) of sentence readability scores (Gaussian-smoothed sentence length).
		Commit_E	Implementation Strength: Weighted frequency of explicit action verbs (e.g., “establish”, “enforce”), penalizing vague/hedging expressions.
	Distinctiveness	Spec_E	Peer Divergence: Jensen–Shannon Divergence (JSD) between the bank’s topic distribution vector and the peer group average; measures idiosyncrasy.
	Reg. Alignment	Align_E	Prototype Affinity: Weighted cosine similarity between bank sentences and the Regulatory Semantic Prototype (vproto) of element E.
		RegCover_E	Regulatory Breadth: The proportion of regulatory corpus sentences (from the RD database) semantically “covered” (Top-k similarity) by bank disclosures.
	Hard Measures	Meas_E	Quantitative Density: Density of general quantitative tokens (numbers, percentages, currency) per 1000 characters within element E.
CE Control Environment	Specific Measures	GovScore_CE	Governance Structure: Weighted hit rate of corporate governance entities (e.g., “Board”, “Supervisory Board”, “Three Lines of Defense”).
		Ethics_CE	Culture & Ethics: Weighted density of terms related to “integrity”, “code of conduct”, “compliance culture”, and “anti-corruption”.
		Whistle_CE	Whistleblowing Mechanism: Intensity of disclosures regarding reporting channels, whistleblower protection, and anonymous reporting.
RA Risk Assessment	Specific Measures	RDQ_RA	Risk Data Quantification: Density of specific risk metrics (e.g., NPL ratio, LCR, VaR, stress test results) defined in risk dictionaries.
		RiskClass_RA	Risk Taxonomy: The count of distinct risk types (e.g., Credit, Market, Liquidity, Climate) mentioned, normalized by the total risk taxonomy size.
		Foresight_RA	Forward-looking Capability: Weighted frequency of future-oriented modal words and terms like “stress scenario”, “sensitivity analysis”.
CA Control Activities	Specific Measures	CAD_CA	Control Descriptors: Weighted density of procedural control terms (e.g., “approval”, “reconciliation”, “verification”, “limit management”).
		Seg_CA	Segregation of Duties: Intensity of disclosures related to “incompatible posts”, “separation of duties (SoD)”, and “checks and balances”.
		AutoCtrl_CA	Automated Controls: Density of terms related to IT General Controls (ITGC), RPA, system constraints, and rigid control embedding.
		ChgMgmt_CA	Change Management: Hit rate of terms concerning system changes, UAT testing, code review, and version control.
IC Info & Comm	Specific Measures	ChannelDF_IC	Channel Diversity: Weighted summation of distinct communication channel mentions (e.g., “hotline”, “portal”, “app”, “matrix”).
		DataGov_IC	Data Governance: Intensity of terms related to “data quality”, “data lineage”, “standardization”, and “privacy protection”.
		ITInfra_IC	IT Infrastructure Depth: Product of IT infrastructure term density (e.g., “cloud”, “data lake”) and their entropy (diversity).
MA Monitoring Activities	Specific Measures	Assure_MA	Independent Assurance: Weighted presence of external audit terms, assurance opinions, and “unqualified opinion” declarations.
		ContMon_MA	Continuous Monitoring: Density of terms related to “real-time monitoring”, “early warning”, “automatic detection”, and “continuous audit”.
		Remedy_MA	Remediation Loop: Intensity of disclosures regarding “rectification”, “defects”, “tracking”, and “closed-loop management”.
		MonFreq_MA	Monitoring Frequency: Weighted score based on temporal frequency keywords (Real-time = 5 > Daily = 4 > ... > Annual = 1).
		ExtCons_MA	External Constraint: Hit rate of signals regarding regulatory inspections, notifications, and external supervision feedback.

Table A2. Step-by-Step Preprocessing Pipeline for Bank Disclosure Documents.

Step	Technical Implementation	Purpose & Output
Step 1: Format Unification & Parsing	Engine: Primary parsing via PyMuPDF; fallback to pdfminer or pdfplumber OCR: Selective Tesseract OCR is applied only when the text density Filter: Pages containing keywords like “Contents” or “Index” are identified as TOC and discarded.	Converts heterogeneous PDF formats (scanned/digital) into a unified text stream while removing non-substantive navigation pages.
Step 2: Structural Cleaning	Header/Footer Removal: Frequency-based detection; lines appearing on >60% of pages (excluding page numbers) are stripped. Noise Removal: Cleaning of control characters and normalization of Unicode.	Eliminates recurring page artifacts that create false duplicates and inflate noise levels.
Step 3: Segmentation & Normalization	Split: Hybrid sentence segmentation using Regex (handling quotes/brackets) + HanLP NLP toolkit. Min-Length: Sentences < 6 characters are dropped. Conversion: Global conversion of Traditional Chinese to Simplified Chinese (via OpenCC/ZhConv) to unify script variants.	Transforms continuous text blocks into discrete, grammatically complete, and standardized sentence units for analysis.
Step 4: Numeric & Keyword Tagging	We apply regex matching to flag the following: Hard Evidence: Numeric units (%, yuan, times, dates) Boilerplate: High-frequency template phrases (e.g., “The Board guarantees truthfulness”). Vague Terms: Hedging words (e.g., “basically”, “to a certain extent”).	Pre-computation step: Annotates each sentence with binary flags. Note: The specific weighting logic using these tags (e.g., penalties/bonuses) is detailed in Section 3.1.2.

Appendix B. Examples of Disclosure Assessment Based on the COSO Framework

Appendix B.1. High-Specificity Disclosure (Source: Annual Report)

We analyze the following sentence: “The Board of Directors annually reviews the effectiveness of the internal control system, approves the bank’s risk appetite statement, and ensures that material deficiencies identified by the internal audit department are remediated within 90 days.” The model begins by extracting key signals from the text. Phrases like “Board of Directors” point to the Control Environment, while “risk appetite” maps to Risk Assessment. The specific mention of “remediated within 90 days” supports Monitoring Activities. Additionally, the “internal audit department” indicates a clear reporting channel. Next, the embedding step compares the text to standard categories. It finds the strongest match with Control Environment (0.82), followed by Risk Assessment and Monitoring. The model then mixes these rule-based and embedding results. This hybrid approach produces a primary score focused on the Control Environment. The model also evaluates the quality of the writing. The sentence is readable and precise. Using concrete terms like “90 days” avoids vagueness and earns a quality bonus. As a result, the sentence retains its full weight. Ultimately, the analysis treats this as strong evidence of oversight, which improves the scores for Control Environment and Monitoring.

Appendix B.2. Low-Specificity Disclosure (Source: ESG Report)

We then evaluate the following sentence: “The Bank continuously improves its internal control management system and strives to ensure strict compliance with relevant national laws and regulations to support sustainable development.” In the first stage, rule-based keyword matching produces only weak signals. “Internal control management system” registers a mild hit under Control Activities, but the term is generic and lacks specificity. Similarly, “compliance” loosely maps to the Control Environment, though it reads more as a broad aspiration than a concrete governance mechanism. In the embedding check, the sentence sits close to a boilerplate pattern (0.85). It is far from clear that COSO component prototypes have all scores below 0.35, so the topic focus is unclear. The hybrid step combines both sources, resulting in a set of probabilities that is scattered. No single component stands out. In the quality step, the base score is 0.90. The algorithm applies a −35% boilerplate penalty because the phrasing follows a common template, such as “continuously improves…”. It also applies a −25% vagueness penalty due to soft words such as “strives to,” “relevant,” and “support,” as well as the lack of concrete actions or targets. There is no numeric or verifiable detail, so no bonus is added. The final weight is 0.36 (0.90 × (1 − 0.35 − 0.25)). As a result, the sentence is down-weighted by 64% and adds little to the index. It is treated as defensive wording rather than solid evidence, which helps prevent score inflation from vague disclosure.

Appendix C. Sensitivity Analysis of Hyperparameter α

Table A3. Robustness of the Internal Control Index (ICI) under Different α Values.

Comparison Scenario	Spearman’s ρ	Top 20% Overlap	Mean Absolute Rank Change	Quartile Change Rate
α = 0.0 vs. Baseline	0.319	0.340	11.34	0.677
α = 0.2 vs. Baseline	0.976	0.888	1.84	0.187
α = 0.4 vs. Baseline	0.976	0.843	1.85	0.175
α = 0.6 vs. Baseline	0.966	0.822	2.31	0.215
α = 0.8 vs. Baseline	0.941	0.778	2.89	0.256
Overall Consistency (Kendall’s W	0.793

Notes: This table compares the ICI derived from fixed α values against the baseline (dynamically optimized α). Metrics are averages across the sample period. Kendall’s W is calculated across all scenarios (including baseline) to measure global agreement.

Table A4. Structural Stability of Internal Control Components.

Panel A: Global Consistency Across All Scenarios
Element	Kendall’s W	Coefficient of Variation	Stability Assessment
Control Activities (CA)	0.899	0.045	Very High
Control Environment (CE)	0.825	0.077	High
Risk Assessment (RA)	0.800	0.057	High
Monitoring Activities (MA)	0.749	0.117	Moderate
Information & Comm (IC)	0.705	0.119	Moderate
Panel B: Boundary Testing (α = 0.8 vs. Baseline)
Element	Spearman’s ρ	MARC (Rank Change)	Interpretation of α = 0.8 Impact
Control Activities (CA)	0.961	2.36	Robust to high semantic weight.
Control Environment (CE)	0.928	3.23	Benefits from high semantic weight.
Risk Assessment (RA)	0.881	3.88	Degraded: Semantic drift dilutes governance rules.
Monitoring Activities (MA)	0.778	5.79	Degraded: Audit outcomes require strict rule matching.
Information & Comm (IC)	0.736	6.73	Degraded: Risk metrics require precise quantification.

Note: This table assesses whether the five internal control elements remain stable. Panel A reports the global consistency (Kendall’s W) across all α. Panel B highlights the degradation of specific elements (CE, MA) when α exceeds the 0.6 cap.

References

Aebi, V.; Sabato, G.; Schmid, M. Risk management, corporate governance, and bank performance in the financial crisis. J. Bank. Financ. 2012, 36, 3213–3226. [Google Scholar] [CrossRef]
Baugh, M.; Ege, M.S.; Yust, C.G. Internal Control Quality and Bank Risk-Taking and Performance. Audit. J. Pract. Theory 2020, 40, 49–84. [Google Scholar] [CrossRef]
Basel Committee on Banking Supervision. Sound Practices: Implications of Fintech Developments for Banks and Bank Supervisors; Bank for International Settlements: Basel, Switzerland, 2018; Available online: https://www.bis.org/bcbs/publ/d431.htm (accessed on 21 February 2026).
The People’s Bank of China. Financial Technology Development Plan (2022–2025); The People’s Bank of China: Beijing, China, 2022. Available online: https://www.pbc.gov.cn/zhengwugongkai/4081330/4406346/4693549/4470403/index.html (accessed on 21 February 2026).
Basel Committee on Banking Supervision. Basel III: Finalising Post-Crisis Reforms; The Bank for International Settlements: Basel, Switzerland, 2017; Available online: https://www.bis.org/bcbs/publ/d424.pdf (accessed on 21 February 2026).
The People’s Bank of China. China Financial Stability Report (2025); The People’s Bank of China: Beijing, China, 2025. Available online: https://www.pbc.gov.cn/goutongjiaoliu/113456/113469/2025122616592613805/index.html (accessed on 21 February 2026).
Kuang, Y.; Li, Z.; Liang, R. Disclosure of internal control evaluation reports of Chinese enterprises: History, problems and strategies. Financ. Res. Lett. 2024, 66, 105642. [Google Scholar] [CrossRef]
Senave, E.; Jans, M.J.; Srivastava, R.P. The application of text mining in accounting. Int. J. Account. Inf. Syst. 2023, 50, 100624. [Google Scholar] [CrossRef]
Bochkay, K.; Brown, S.V.; Leone, A.J.; Tucker, J.W. Textual Analysis in Accounting: What’s Next? Contemp. Account. Res. 2023, 40, 765–805. [Google Scholar] [CrossRef]
Monteiro, A.; Cepêda, C.; Da Silva, A.C.F.; Vale, J. The Relationship between AI Adoption Intensity and Internal Control System and Accounting Information Quality. Systems 2023, 11, 536. [Google Scholar] [CrossRef]
Chen, H.; Huang, X. Internal control indexes for listed firms in China: Logic, design and validation. Audit. Res. 2019, 207, 55–63. [Google Scholar] [CrossRef]
Lawrence, A.G.; Martin, P.L.; Chih-Yang, T. Enterprise risk management and firm performance: A contingency perspective. J. Account. Public Policy 2009, 28, 301–327. [Google Scholar] [CrossRef]
Lin, B.; Lin, D.; Hu, W.; Xie, F.; Yang, Y. Research on goal-oriented internal-control index. Account. Res. 2014, 8, 16–24. [Google Scholar] [CrossRef]
Ashbaugh-Skaife, H.; Collins, D.W.; Kinney, W.R.K. The discovery and reporting of internal control deficiencies prior to SOX-mandated audits. J. Account. Econ. 2007, 44, 166–192. [Google Scholar] [CrossRef]
Jeffrey, D.; Weili, G.; Sarah, M. Determinants of weaknesses in internal control over financial reporting. J. Account. Econ. 2007, 44, 193–223. [Google Scholar] [CrossRef]
Deumes, R.; Knechel, W.R. Economic incentives for voluntary reporting on internal risk management and control systems. Audit. J. Pract. Theory 2008, 27, 35–66. [Google Scholar] [CrossRef]
Lin, B.; Lin, D.; Hu, W.; Xie, F.; Yang, Y. Research of Internal Control Index Based on Information Disclosure. Account. Res. 2016, 12, 12–20. [Google Scholar] [CrossRef]
Chen, H.; Dong, W.; Han, H.; Zhou, N. A comprehensive and quantitative internal control index: Construction, validation, and impact. Rev. Quant. Financ. Account. 2017, 49, 337–377. [Google Scholar] [CrossRef]
Boritz, J.E.; Hayes, L.; Lim, J.-H. A content analysis of auditors’ reports on IT internal control weaknesses: The comparative advantages of an automated approach to control weakness identification. Int. J. Account. Inf. Syst. 2013, 14, 138–163. [Google Scholar] [CrossRef]
Rich, K.T.; Roberts, B.L.; Zhang, J.X. Linguistic Tone and Internal Control Reporting: Evidence from Municipal Management Discussion and Analysis Disclosures. J. Gov. Nonprofit Account. 2018, 7, 24–54. [Google Scholar] [CrossRef]
Boskou, G.; Kirkos, E.; Spathis, C. Classifying internal audit quality using textual analysis: The case of auditor selection. Manag. Audit. J. 2019, 34, 924–950. [Google Scholar] [CrossRef]
Liu, B.; Li, Y.; Chi, J.D. Internal control willingness, internal control level and earnings management methods—The measurement method based on text analysis and machine learning. Sci. Res. Manag. 2021, 42, 166–174. [Google Scholar] [CrossRef]
Huang, A.; Wang, H.; Yang, Y. FinBERT: A Large Language Model for Extracting Information from Financial Text. Contemp. Account. Res. 2022, 40, 806–841. [Google Scholar] [CrossRef]
Yang, H.; Liu, X.-Y.; Wang, C.D. FinGPT: Open-Source Financial Large Language Models. arXiv 2023, arXiv:2306.06031. [Google Scholar] [CrossRef]
Chiu, I.C.; Hung, M.-W. Finance-specific large language models: Advancing sentiment analysis and return prediction with LLaMA 2. Pac.-Basin Financ. J. 2025, 90, 102632. [Google Scholar] [CrossRef]
Chen, H.; Chiang, R.H.L.; Storey, V.C. Business Intelligence and Analytics: From Big Data to Big Impact. MIS Q. 2012, 36, 1165–1188. [Google Scholar] [CrossRef]
Visinescu, L.; Jones, M.; Sidorova, A. Improving Decision Quality: The Role of Business Intelligence. J. Comput. Inf. Syst. 2015, 57, 58–66. [Google Scholar] [CrossRef]
Ji, L.; Li, S. A dynamic financial risk prediction system for enterprises based on gradient boosting decision tree algorithm. Syst. Soft Comput. 2025, 7, 200189. [Google Scholar] [CrossRef]
Duan, H.K.; Vasarhelyi, M.A.; Codesso, M. Integrating Process Mining and Machine Learning for Advanced Internal Control Evaluation in Auditing. J. Inf. Syst. 2025, 39, 55–75. [Google Scholar] [CrossRef]
Barredo Arrieta, A.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
Weber, P.; Carl, K.V.; Hinz, O. Applications of Explainable Artificial Intelligence in Finance—A systematic review of Finance, Information Systems, and Computer Science literature. Manag. Rev. Q. 2024, 74, 867–907. [Google Scholar] [CrossRef]
Lu, Y.-H.; Lin, Y.-C. The determinants of voluntary disclosure: Integration of eXtreme gradient boost (XGBoost) and explainable artificial intelligence (XAI) techniques. Int. Rev. Financ. Anal. 2024, 96, 103577. [Google Scholar] [CrossRef]
Kou, H.; Tang, R.; Chen, N. Enterprise Digitalization and ESG Performance: Evidence from Interpretable AI Large Language Models. Systems 2025, 13, 832. [Google Scholar] [CrossRef]
Rane, N.; Paramesha, M.; Choudhary, S.; Rane, J. Business Intelligence and Business Analytics with Artificial Intelligence and Machine Learning: Trends, Techniques, and Opportunities. SSRN Electron. J. 2024. [Google Scholar] [CrossRef]
Ebule, A. The Role of Business Intelligence and Artificial Intelligence in Real-Time Decision Making. Int. J. Sci. Res. Manag. (IJSRM) 2025, 13, 1902–1916. [Google Scholar] [CrossRef]
Chebrolu, S.K. AI-Powered Business Intelligence: A Systematic Literature Review on the Future of Decision-Making in Enterprises. Am. J. Sch. Res. Innov. 2025, 4, 33–62. [Google Scholar] [CrossRef]
Otsu, N. A Threshold Selection Method from Gray-Level Histograms. IEEE Trans. Syst. Man Cybern. 1979, 9, 62–66. [Google Scholar] [CrossRef]
Diakoulaki, D.; Mavrotas, G.; Papayannakis, L. Determining objective weights in multiple criteria problems: The critic method. Comput. Oper. Res. 1995, 22, 763–770. [Google Scholar] [CrossRef]
Becker, J.-M.; Klein, K.; Wetzels, M. Hierarchical latent variable models in PLS-SEM: Guidelines for using reflective-formative type models. Long Range Plan. Int. J. Strateg. Manag. 2012, 45, 359–394. [Google Scholar] [CrossRef]
Hair, J.; Hult, G.T.M.; Ringle, C.; Sarstedt, M. A Primer on Partial Least Squares Structural Equation Modeling (PLS-SEM), 3rd ed.; SAGE Publications, Inc.: Thousand Oaks, CA, USA, 2022. [Google Scholar]
Niculescu-Mizil, A.; Caruana, R. Predicting good probabilities with supervised learning. In Proceedings of the 22nd International Conference on Machine Learning; Association for Computing Machinery: New York, NY, USA, 2005; pp. 625–632. [Google Scholar] [CrossRef]

Figure 1. End-to-end methodological framework of the IIC-DSS. The architecture integrates “unstructured disclosure text → structured evidence → formative index → validation and prediction → executable interventions” into a traceable analytical chain. Layer 1 parses PDF reports to build a structure-aware corpus, then constructs a dual-driven knowledge base by combining a rule-based regulatory dictionary with embedding-based semantic prototype vectors. Layer 2 applies an optimized hybrid membership-probability algorithm and a quality filter to convert unstructured disclosures into a quality-weighted component evidence matrix, mapping sentences to internal control components. Layer 3 uses adaptive Otsu thresholding to extract representative evidence, then aggregates micro-level indicators into a hierarchical formative index, IC-5Q, via the CRITIC method and a game-theoretic combined-weighting scheme. Layer 4 integrates measurement validity testing using PLS-SEM with rolling-window predictive validity testing using XGBoost; TreeSHAP attribution is used for diagnostic explanation, yielding actionable governance intervention recommendations. A control-theoretic feedback loop (red dashed line) feeds these insights back to guide future refinement of disclosure and internal control improvement.

Figure 2. Hierarchical Formative PLS-SEM Path Model. * p < 0.05, *** p < 0.001.

Table 1. Mapping of Research Questions to Methodological Framework.

Research Question (RQ)	Corresponding Section	Key Methods & Techniques
(RQ1) How can unstructured text in bank reports be turned into a multidimensional quantitative framework?	Section 3.1	Knowledge Base Construction: Merging regulatory rules with embedding-based semantic prototypes. Neural-Symbolic Mapping: Using hybrid membership probability to map sentences to internal control components. Index Aggregation: Constructing the hierarchical formative index (IC-5Q) via CRITIC and game-theoretic weighting.
(RQ2) Does a model that includes text-mined internal-control variables predict outcomes significantly better?	Section 3.2	Construct Validity: Formative PLS-SEM to verify the structural relationship of indicators. Predictive Validity: Out-of-sample rolling-window forecasting using XGBoost to test the incremental predictive power of the Internal Control Index (ICI) on asset quality.
(RQ3) How can text-driven indicators be operationalized to support model interpretation and risk prioritization?	Section 3.3	Explainable AI: Applying TreeSHAP to isolate marginal contributions of control factors. Risk Calibration: Using Platt scaling and dynamic thresholding for tiered risk management. Decision Support: Integrating Natural Language Generation into a BI dashboard (IIC-DSS) for actionable governance interventions.

Table 2. Descriptive Statistics of Main Variables.

Variable	Definition	Mean	SD	Min	Median	Max
Panel A: Internal Control Indices
ICI	Composite Internal Control Index (0–100)	47.644	9.275	20.338	48.800	69.201
CE	Control Environment Index	52.112	8.631	23.485	52.921	81.123
RA	Risk Assessment Index	51.566	10.741	19.514	52.293	84.664
CA	Control Activities Index	42.397	12.365	13.345	43.153	74.968
IC	Information & Communication Index	47.907	11.465	15.270	48.623	77.893
MA	Monitoring Activities Index	45.267	10.644	18.032	45.817	72.196
Panel B: Benchmark & Outcome Variables
ICDI	DIB Internal Control Index	39.892	4.984	5.530	39.884	54.190
NPL	Non-Performing Loan Ratio	0.014	0.004	0.007	0.014	0.025
Panel C: Control Variables
lnAssets	Natural Log of Total Assets	27.934	1.686	24.992	27.730	31.519
ROE	Return on Equity (%)	0.116	0.029	0.034	0.113	0.264
CAR	Capital Adequacy Ratio (%)	0.139	0.019	0.105	0.136	0.339
LDR	Loan-to-Deposit Ratio	0.740	0.107	0.349	0.755	1.052
Leverage	Leverage Ratio	0.066	0.010	0.036	0.066	0.097

Notes: The sample consists of 420 bank-year observations representing 42 banks over a 10-year period. All Internal Control Indices in Panel A are standardized to a scale of 0 to 100.

Table 3. PLS-SEM Validation Results for the IC5Q/ICI System.

Stage/Aspect	Construct/Path	Weight/Coeff. (β)	SE	95% BCa CI	R²
Stage 1: First-Order Constructs
Formative Weights (Range)	L2 → CE (6 items)	0.302–0.541 ***	0.018–0.048	All Positive	–
	L2 → RA (6 items)	0.269–0.511 ***	0.019–0.049	All Positive	–
	L2 → CA (6 items)	0.244–0.517 ***	0.012–0.030	All Positive	–
	L2 → IC (6 items)	0.014–0.471	0.017–0.039	[−0.021, 0.552]	–
	L2 → MA (6 items)	0.214–0.434 ***	0.013–0.043	All Positive	–
Redundancy Analysis	CE → Target_CE	0.995 ***	0.001	[0.992, 0.996]	0.989
Convergent Validity	RA → Target_RA	0.989 ***	0.002	[0.986, 0.992]	0.979
	CA → Target_CA	0.995 ***	0.001	[0.994, 0.996]	0.990
	IC → Target_IC	0.979 ***	0.004	[0.969, 0.986]	0.959
	MA → Target_MA	0.997 ***	0.0005	[0.996, 0.998]	0.994
Stage 2: Second-Order ICI
Formative Weights	CE → ICI	0.162 ***	0.032	[0.093, 0.216]	–
	RA → ICI	0.218 ***	0.031	[0.155, 0.274]	–
	CA → ICI	0.222 ***	0.026	[0.180, 0.284]	–
	IC → ICI	0.319 ***	0.023	[0.270, 0.361]	–
	MA → ICI	0.258 ***	0.035	[0.199, 0.333]	–
Redundancy Analysis	ICI → Target_ICI	0.994 ***	0.001	[0.991, 0.995]	0.988
Criterion Validity	ICI → ICDI	0.244 *	0.087	[0.060, 0.397]	0.060

Notes: * p < 0.05, *** p < 0.001. Standard errors and confidence intervals are obtained from 800 cluster bootstrap samples, with “bank” as the clustering unit and a BCa adjustment.

Table 4. Out-of-Sample Prediction Performance for NPL Jumps (2020–2023 Rolling CV).

Model	Specification	ROC-AUC	PR-AUC	Brier	Best F1	Top-10 Capture
XGBoost	Controls + ESG + ICI + ICI × (lnAssets, NPL_lag1, ROE)	0.909	0.0909	0.179	0.167	0.667
XGBoost	Controls + ICI + ICI × (lnAssets, NPL_lag1, ROE)	0.755	0.0577	0.161	0.129	0.667
XGBoost	Controls + ESG	0.758	0.0596	0.155	0.133	0.333
XGBoost	Controls Only	0.752	0.0559	0.167	0.125	0.333
XGBoost	Controls + G-score + ICI + ICI × (lnAssets, NPL_lag1, ROE)	0.830	0.0508	0.333	0.097	0.333
XGBoost	Controls + ICDI + ICDI × (lnAssets, NPL_lag1, ROE)	0.570	0.0241	0.202	0.056	0.000

Notes: All metrics are computed from out-of-sample predictions obtained via annual rolling cross-validation for 2020–2023 and then pooled. Control variables include lnAssets, capital adequacy ratio (CAR), return on equity (ROE), lagged NPL ratio (NPL_lag1), leverage ratio, and loan-to-deposit ratio (LDR). ICI is the newly constructed composite internal control index; ICDI is the DIB internal control index. “XGBoost” denotes gradient-boosted decision trees. The Top-10 capture rate is, for each year, the proportion of all actual NPL jump events that fall within the top 10% of observations ranked by predicted jump probability.

Table 5. SHAP Importance of Internal Control Elements (Bootstrap 95% CI).

Element	Mean \|SHAP\|	95% CI Lower	95% CI Upper
Control Environment (CE)	0.592	0.471	0.726
Risk Assessment (RA)	0.346	0.302	0.394
Control Activities (CA)	0.422	0.367	0.475
Information & Communication (IC)	0.463	0.421	0.505
Monitoring Activities (MA)	0.336	0.282	0.388

Table 6. IIC-DSS intelligent reporting: summary of diagnostic results for representative banks.

Bank	Predicted Jump Probability	Risk Level	Diagnostic Summary (Based on Local SHAP Attribution)
China Minsheng Bank	12.947%	High	Classified as HIGH RISK. Key weaknesses: Control Environment (CE) (contribution +0.356); Information & Communication (IC) (contribution +0.351).
Xi’an Bank	14.196%	High	Classified as HIGH RISK. Key weaknesses: Information & Communication (IC) (contribution +0.293).
China Merchants Bank	0.124%	Low	Classified as LOW RISK. Key strength: Control Environment (CE) (contribution −0.530).
Bank of Shanghai	0.294%	Low	Classified as LOW RISK. Key strength: Control Environment (CE) (contribution −0.618).
Industrial and Commercial Bank of China	0.035%	Low	Classified as LOW RISK. Key strength: Control Environment (CE) (contribution −1.252).
Shanghai Pudong Development Bank	1.671%	Medium	Classified as MEDIUM ATTENTION. Main potential issue: Control Environment (CE) (contribution +0.423).
Bank of Ningbo	0.081%	Low	Classified as LOW RISK. Key strength: Control Activities (CA) (contribution −0.592).

Notes: The banks listed in this table provide a representative example, and the complete results for the whole sample are in the Supplementary Materials.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, Y.; Li, X.; Su, C. From Unstructured Text to Automated Insights: An Explainable AI Approach to Internal Control in Banking Systems. Systems 2026, 14, 234. https://doi.org/10.3390/systems14030234

AMA Style

Liu Y, Li X, Su C. From Unstructured Text to Automated Insights: An Explainable AI Approach to Internal Control in Banking Systems. Systems. 2026; 14(3):234. https://doi.org/10.3390/systems14030234

Chicago/Turabian Style

Liu, Ya, Xinqiu Li, and Congli Su. 2026. "From Unstructured Text to Automated Insights: An Explainable AI Approach to Internal Control in Banking Systems" Systems 14, no. 3: 234. https://doi.org/10.3390/systems14030234

APA Style

Liu, Y., Li, X., & Su, C. (2026). From Unstructured Text to Automated Insights: An Explainable AI Approach to Internal Control in Banking Systems. Systems, 14(3), 234. https://doi.org/10.3390/systems14030234

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

From Unstructured Text to Automated Insights: An Explainable AI Approach to Internal Control in Banking Systems

Abstract

1. Introduction

2. Literature Review

2.1. Evaluation Methods for Internal Control Systems

2.2. Evolution of Text Analysis in Internal Control

2.3. Business Intelligence, Machine Learning, and Explainable Artificial Intelligence in Enterprise Systems

3. Methodology

3.1. Developing a “Regulation–Semantics Dual-Driven” Internal Control Indicator System

3.1.1. Building a “Regulation–Semantics Dual-Driven” Knowledge Base

3.1.2. Sentence–Component Mapping via Hybrid Probabilistic Constraints

3.1.3. Hierarchical Formative Index Construction and Aggregation

3.2. Multi-Level Validation Framework: From Construct Validity to Predictive Power

3.2.1. Measurement and Criterion Validity via Formative PLS-SEM

3.2.2. Out-of-Sample Predictive Validity of the Internal Control Index (ICI)

3.3. The IIC-DSS Framework: SHAP-Based Diagnosis and Decision Support

4. Results and Discussion

4.1. Dataset and Descriptive Statistics

4.2. Construct Validity and External Consistency of the Index System

4.3. Out-of-Sample Predictive Performance

4.4. From Explanation to Action: SHAP Diagnostics and IIC-DSS Application

5. Conclusions and Limitations

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B. Examples of Disclosure Assessment Based on the COSO Framework

Appendix B.1. High-Specificity Disclosure (Source: Annual Report)

Appendix B.2. Low-Specificity Disclosure (Source: ESG Report)

Appendix C. Sensitivity Analysis of Hyperparameter α

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI