Intelligent Information Processing for Corporate Performance Prediction: A Hybrid Natural Language Processing (NLP) and Deep Learning Approach

Yu, Qidi; Xing, Chen; He, Yanjing; Ahn, Sunghee; Na, Hyung Jong

doi:10.3390/electronics15020443

Open AccessArticle

Intelligent Information Processing for Corporate Performance Prediction: A Hybrid Natural Language Processing (NLP) and Deep Learning Approach

by

Qidi Yu

^1,†,

Chen Xing

^2,†,

Yanjing He

³,

Sunghee Ahn

⁴ and

Hyung Jong Na

^5,*

¹

Department of Business Administration, Graduate School, Semyung University, 65 Semyungro, Jecheonsi 27136, Republic of Korea

²

School of Economics and Finance, Hankou University, No. 299, Culture Av. Jiangxia District, Wuhan 430200, China

³

Department of Postgraduate Affairs, Nanjing University of Information Science and Technology (NUIST), No. 219 Ningliu Road, Pukou District, Nanjing 210044, China

⁴

Department of Accounting, The Catholic University of Korea, 43 Jibongro, Wonmigu, Bucheonsi 14662, Republic of Korea

⁵

Department of Accounting and Taxation, Semyung University, 65 Semyeongro, Jecheonsi 27136, Republic of Korea

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work as co-first authors.

Electronics 2026, 15(2), 443; https://doi.org/10.3390/electronics15020443

Submission received: 26 November 2025 / Revised: 28 December 2025 / Accepted: 29 December 2025 / Published: 20 January 2026

(This article belongs to the Special Issue Advances in Intelligent Information Processing)

Download

Browse Figure

Versions Notes

Abstract

This study proposes a hybrid machine learning framework that integrates structured financial indicators and unstructured textual strategy disclosures to improve firm-level management performance prediction. Using corporate business reports from South Korean listed firms, strategic text was extracted and categorized under the Balanced Scorecard (BSC) framework into financial, customer, internal process, and learning and growth dimensions. Various machine learning and deep learning models—including k-nearest neighbors (KNNs), support vector machine (SVM), light gradient boosting machine (LightGBM), convolutional neural network (CNN), long short-term memory (LSTM), autoencoder, and transformer—were evaluated, with results showing that the inclusion of strategic textual data significantly enhanced prediction accuracy, precision, recall, area under the curve (AUC), and F1-score. Among individual models, the transformer architecture demonstrated superior performance in extracting context-rich semantic features. A soft-voting ensemble model combining autoencoder, LSTM, and transformer achieved the best overall performance, leading in accuracy and AUC, while the best single deep learning model (transformer) obtained a marginally higher F1 score, confirming the value of hybrid learning. Furthermore, analysis revealed that customer-oriented strategy disclosures were the most predictive among BSC dimensions. These findings highlight the value of integrating financial and narrative data using advanced NLP and artificial intelligence (AI) techniques to develop interpretable and robust corporate performance forecasting models. In addition, we operationalize information security narratives using a reproducible cybersecurity lexicon and derive security disclosure intensity and weight share features that are jointly evaluated with BSC-based strategic vectors.

Keywords:

corporate performance prediction; strategic textual data; BSC; deep learning; hybrid machine learning model; information security; intelligent information processing; text analytics; ensemble learning

1. Introduction

The rapid advancement of AI, particularly in the domains of machine learning and deep learning, has significantly expanded the methodological frontier for analyzing corporate performance. Traditional models have relied primarily on structured financial variables such as return on assets (ROAs), firm size, and leverage. However, these metrics often fail to capture the strategic intent and qualitative direction underlying managerial decision-making processes. To address this gap, recent scholarship has increasingly integrated unstructured data—particularly textual disclosures in corporate business reports—into performance prediction models, enabling deeper contextual understanding of firms’ strategic trajectories [1,2].

Corporate business reports contain rich narrative elements that not only reflect management’s strategic orientation, innovation priorities, and sustainability goals, but also embed signals relevant to corporate risk and information security. Strategic disclosures often include references to digital transformation, cybersecurity investments, and data governance frameworks, which are critical in today’s environment where cyber threats and information leakage directly influence firm performance. Empirical studies have confirmed the predictive value of such narrative disclosures. For instance, Luo et al. (2020) showed that the tone of quarterly reports conveys managerial expectations, while Jegadeesh and Wu (2021) demonstrated that strategic keywords, when structured under frameworks such as the BSC, correlate with performance outcomes [3,4]. Importantly, when these disclosures involve security-related strategies—such as system resilience, privacy protection, and compliance—they also signal how firms mitigate vulnerabilities that could undermine operational continuity and financial stability.

Advances in NLP and AI architectures, particularly transformer-based models, have further enabled the extraction of context-rich features from unstructured texts, including subtle cues related to information security strategies. By integrating financial data with narrative security disclosures, AI models can provide not only more accurate forecasts of firm performance but also insights into how cybersecurity readiness contributes to competitive advantage. Recent research underscores that breaches in corporate information security have tangible impacts on market value and operational efficiency, highlighting the necessity of embedding security-oriented variables into performance prediction frameworks [5,6].

Despite these advancements, few studies have systematically combined structured financial indicators with strategy-oriented textual variables that explicitly capture security-related narratives. Even fewer have applied the Balanced Scorecard framework to classify such disclosures into financial, customer, internal process, and learning and growth perspectives, thereby evaluating the relative contribution of security-driven strategies to firm performance. This gap suggests an opportunity to examine how the convergence of structured financial data, narrative strategy disclosures, and information security considerations—when mediated by advanced AI models—can yield robust and interpretable forecasts of managerial outcomes.

To address this gap, the present study proposes a hybrid machine learning framework that synthesizes structured financial indicators with BSC-classified strategic texts extracted from corporate business reports, including disclosures related to cybersecurity and data governance. By employing deep learning models such as autoencoder, LSTM, and transformer within a soft-voting ensemble architecture, the research aims to capture both the strategic intent and security posture embedded in corporate narratives. Specifically, cybersecurity narratives are captured through a transparent keyword lexicon and encoded both as a standalone security disclosure signal and as security-related terms distributed across the four BSC perspectives, thereby aligning the conceptual framing with empirical implementation.

This study offers several contributions. First, it demonstrates that integrating narrative strategy disclosures, particularly those referencing information security and digital resilience, significantly improves the predictive power of AI-based models. Second, it identifies which BSC strategic dimensions—including security-relevant internal processes and customer trust perspectives—influence corporate performance the most. Third, it provides empirical evidence from South Korean firms, contributing regional specificity to a predominantly Western-centric literature. Lastly, the study underscores the practical importance of information security narratives in corporate strategy, offering insights valuable to investors, analysts, and policymakers concerned with both performance forecasting and risk management.

In summary, this research advances the field of AI-driven corporate analytics by illustrating how strategic narratives—especially those reflecting cybersecurity and governance practices—can be systematically leveraged alongside financial data to enhance performance prediction. The integration of financial ratios, security-oriented disclosures, and NLP-derived strategy variables offers a novel modeling approach that enriches both theoretical understanding and practical application in the era of digital risk.

2. Related Literature and Research Questions

The evolution of AI, particularly through machine learning (ML) and deep learning (DL), has significantly expanded the methodological toolkit for predicting corporate performance. Traditionally, structured financial indicators such as ROA, leverage, and firm size have served as the primary inputs for such predictions. However, there is growing recognition that unstructured strategic disclosures—especially those embedded in corporate business reports—can offer complementary insights that enrich performance forecasting models.

Textual elements in corporate disclosures, including managerial commentary, strategic goals, and sustainability narratives, convey nuanced information about firms’ future directions and operational priorities. Luo et al. (2020) demonstrated that the tone of quarterly reports reflects managerial sentiment and future expectations, which correlate meaningfully with financial performance [3]. In a similar vein, Jegadeesh and Wu (2021) employed a content analysis approach to identify strategic intent from textual data, revealing the predictive power of strategic keywords structured under frameworks like the BSC [4].

The integration of environmental, social, and governance (ESG) factors into textual analytics has further broadened the scope of performance prediction. For instance, Li and Xu (2024) confirmed that ESG ratings serve as a proxy for lower corporate carbon emissions, suggesting that narrative ESG disclosures can reflect underlying operational behavior [1]. Complementing this, Chen et al. (2021) showed that incorporating ESG variables into portfolio optimization enhances decision-making under sustainability constraints [2]. These studies support the view that textual strategy disclosures are more than symbolic—they are predictive of real-world outcomes.

Prior research directly examines how information security and cybersecurity-related disclosures map into firm value and performance in capital markets. Using U.S. filing-based disclosure indicators, Gordon et al. (2010) report that voluntary information security disclosures are positively associated with firm market value, consistent with a signaling interpretation [7]. Complementary evidence shows that the content of disclosed information security risk factors helps predict subsequent breach realizations and shapes market reactions to breach events [8]. Recent accounting research further documents that firms strategically adjust cybersecurity risk disclosures following breach incidents, suggesting that disclosure is used to mitigate information asymmetry and reputational concerns [9]. In addition, external, verifiable security signals such as information security certification announcements have been linked to positive valuation effects, reinforcing the performance relevance of security-related communication [10].

Alongside textual data, advances in model architecture have played a pivotal role. Transformer-based models, known for their ability to capture long-range dependencies and contextual meaning, have outperformed traditional models in various domains. Li et al. (2021) demonstrated their utility in time-series forecasting, emphasizing their robustness in handling sequential and contextualized inputs [6]. Moreover, Sarwar et al. (2024) highlighted the power of combining NLP-derived insights with ensemble learning methods to yield more accurate and interpretable predictive outputs [5].

Critically, these advanced models require careful hyperparameter optimization to achieve peak performance. Bergstra and Bengio (2012) showed that random search can outperform traditional grid search in high-dimensional spaces, while Probst et al. (2019) demonstrated that certain hyperparameters disproportionately affect model accuracy, emphasizing the importance of tunability [11,12].

From a regional standpoint, Lee and Cho (2021) analyzed South Korean firms and found that narrative disclosures related to carbon emissions and sustainability had statistically significant effects on firm valuation [13]. This affirms the relevance of forward-looking strategic language in influencing both market perception and operational performance.

Despite these contributions, few studies have attempted to systematically combine structured financial indicators with strategy-oriented textual variables—particularly those categorized through the BSC framework—into a unified predictive system. Even fewer have explored hybrid model architectures that aggregate outputs from multiple advanced classifiers to enhance predictive generalizability and robustness.

To address these research gaps, this study develops and evaluates a series of machine learning models that integrate both structured and unstructured data. Strategic keywords are extracted from corporate business reports and categorized using the BSC framework, while traditional financial indicators serve as complementary inputs. A hybrid ensemble model, leveraging autoencoder, LSTM, and transformer outputs, is proposed to synthesize the unique strengths of each model type.

Based on this review, the following research questions are formulated:

(1): Does the inclusion of strategic textual information from business reports improve the prediction accuracy of corporate management performance compared to models based solely on financial variables?
(2): Which strategic perspective—classified via the Balanced Scorecard framework (i.e., financial, customer, internal process, learning and growth)—exerts the strongest influence on corporate performance predictions?
(3): Do hybrid ensemble models that integrate the outputs of multiple deep learning architectures outperform individual machine learning or deep learning models in predicting firm-level management performance?

3. Data and Methods

3.1. Text Data

The company’s business report provides a detailed account of its strategies and future prospects [10]. This report encompasses the company’s management strategies, objectives, future business plans, and anticipated changes in the market environment [14,15]. In this study, text data from the business report was extracted using text mining techniques and subsequently processed using NLP methods [16,17]. During the text preprocessing stage, unnecessary punctuation marks and special characters were eliminated through text normalization. Additionally, redundant or irrelevant words were removed via stop word removal, and tokenization was performed by dividing the text into words, sentences, or other meaningful units.

The text data extracted in this study was then represented as vectors using the bag of words (bag of words (BoW)) model [18]. Significant recurring terms were identified by calculating the term frequency–inverse document frequency (TF-IDF) values [19]. Words with high TF-IDF values were considered keywords and selected accordingly.

Subsequently, keywords related to management strategies, extracted from the business report using the BSC framework, were classified into the categories of financial, customer, internal process, and learning and growth perspectives [20,21]. This classification aimed to identify which perspective—financial, customer, internal process, or learning and growth—the extracted keywords were most frequently associated with, thereby determining the primary strategic focus emphasized by the company.

Keywords were selected based on a TF-IDF value threshold of 2.0 or higher. According to the BSC framework, the selected keywords were then categorized into financial, customer, internal process, and learning and growth categories [20,21]. The classification of the company’s emphasized management strategies was based on the perspective to which the majority of high TF-IDF keywords were assigned—whether it be finance, customers, internal processes, or learning and growth. We selected the 2.0 threshold as a conservative cut-off that retains high-salience terms while suppressing generic narrative tokens. To justify this cut-off, we conducted a pilot sensitivity analysis by varying the TF–IDF threshold (1.5, 2.0, and 2.5) and examining the resulting keyword set size, BSC label stability, and downstream prediction performance. The threshold of 2.0 provided a balanced trade-off by filtering generic terms while retaining sufficiently informative, strategy-related keywords for reliable BSC classification. The main conclusions remained stable under adjacent thresholds, indicating that the results are not driven by this specific parameter choice.

For instance, if four keywords were associated with the financial perspective, five with the customer perspective, and three with the internal process perspective, the company’s management strategy was determined to emphasize the customer perspective. As a result, a value of one was assigned to the customer variable, while zero was assigned to the remaining BSC framework variables. Through this methodology, the study quantified the management strategy information that companies prioritize.

The management strategy data utilized in this study were directly sourced from the business reports published on the Financial Supervisory Service’s electronic disclosure system (http://dart.fss.or.kr) (accessed on 25 January 2025). The business report data collected includes the corporate vision, goals, and business outlook for each company on an annual basis.

Security narratives were operationalized in a reproducible manner using a cybersecurity disclosure lexicon compiled from prior information security disclosure research and practitioner taxonomies. After applying the same preprocessing steps as the strategic text pipeline, we identified security-related keywords (e.g., breach, vulnerability, incident response, access control, encryption, privacy, authentication, compliance, and security governance) and calculated two firm-year measures: (i) Security Disclosure Intensity, defined as the proportion of security keywords among all extracted keywords, and (ii) Security Weight Share, defined as the share of summed keyword weights (TF-IDF/Okapi BM25 (BM25)) attributable to security keywords.

These security measures provide an explicit and auditable “information security” dimension that can be replicated with the disclosed lexicon and coding rules, and they are used in extended model specifications and validation exercises.

3.2. Classification of Strategic Texts Under the BSC Framework

In line with the BSC framework proposed, the strategic orientation of each firm was classified into four dimensions: financial, customer, internal process, and learning and growth [21]. This approach allows for a structured interpretation of narrative management strategies disclosed in corporate business reports by aligning textual content with key strategic perspectives that collectively reflect a company’s overall performance orientation [20].

To operationalize this classification, text-mined keywords extracted through the TF-IDF method were mapped to the most relevant BSC dimension based on their semantic association with corresponding management themes. Specifically, keywords such as profitability, revenue growth, cost efficiency, and financial leverage were linked to the financial perspective; terms including customer satisfaction, service quality, brand loyalty, and market share were categorized under the customer perspective; keywords reflecting production process, supply chain management, quality control, and operational efficiency were assigned to the internal process perspective; and finally, expressions such as innovation, employee training, knowledge sharing, and R&D investment were associated with the learning and growth perspective.

Each firm’s primary management strategy was determined by identifying which of the four BSC dimensions exhibited the highest concentration of high-weighted TF-IDF keywords. The dimension with the largest count of dominant keywords was assigned a value of one, indicating the firm’s primary strategic emphasis, while the remaining dimensions were assigned a value of zero. Thus, four dummy variables were constructed—Financial, Customer, Internal Process, and Learning and Growth—and represented the firm’s core strategic orientation. To mitigate potential information loss and instability from a winner-takes-all encoding, we additionally constructed continuous strategy intensity vectors for all four BSC perspectives, computed both as keyword-count shares and as summed-weight shares, and we also considered a multi-label Top-2 coding under a near-tie rule. To address the concern that a single dominant (1/0) label may omit multidimensional strategic nuance, we additionally considered alternative representations of strategic orientation. First, we constructed continuous BSC intensity scores by computing (i) the proportion of keywords mapped to each BSC perspective and (ii) the share of summed keyword weights (e.g., summed TF–IDF/BM25 weights) for each perspective. Second, we considered a multi-label coding scheme in which more than one BSC perspective can be active when the second-largest perspective is close to the largest (Top-2 activation under a near-tie rule). We then re-estimated the main models under these alternative encodings and confirmed that the empirical conclusions are robust.

For instance, if a company’s business report contained the highest frequency of customer-related strategic keywords (e.g., customer loyalty, user experience, client service innovation), the “Customer” variable was coded as one, and the remaining variables as zero. This binary representation allows for straightforward incorporation of strategic emphasis into the hybrid machine learning model, facilitating the empirical analysis of how different strategic orientations influence firm-level performance.

This BSC-based classification not only ensures theoretical alignment with established strategic management frameworks but also provides interpretability to AI-driven prediction models by linking textual features to managerial intent. Furthermore, it enables a comparative analysis of how firms prioritize different strategic perspectives—such as financial performance, customer value, operational efficiency, or learning capability—exhibiting distinct performance trajectories when analyzed through advanced AI and NLP methodologies.

To clarify how security narratives are embedded within the BSC structure, security-related terms are not treated as an external add-on only; instead, they are conceptually allocated to the BSC perspectives according to the mechanism through which security affects value creation. For example, control, monitoring, and incident response language is primarily associated with internal process capability; training, upskilling, and security-by-design initiatives map to learning and growth; privacy and trust commitments link to the customer perspective; and loss prevention, cyber insurance, and resilience investment rationales connect to the financial perspective. At the same time, we retain the explicit security-intensity and weight-share measures to preserve interpretability and enable direct empirical validation of the “information security” dimension.

Reliability and objectivity of the keyword-to-BSC mapping were ensured through a structured protocol that minimizes subjective discretion and supports reproducibility.

First, we developed a rule-based BSC mapping dictionary (codebook) prior to model training, grounded in the conceptual definitions of the four BSC perspectives and aligned with the strategy themes used in the related literature [20,21]. Keywords were mapped using predefined inclusion rules and synonym families to reduce ad hoc assignment.

Second, to strengthen reliability, the dictionary and mapping rules were independently reviewed by multiple researchers, and disagreements were reconciled through discussion using explicit decision rules. Inter-coder agreement was assessed to verify consistency of assignments, and the finalized mapping dictionary was frozen before downstream modeling.

Third, we performed a validation check against human judgment by manually reviewing a stratified subset of firm-year reports and confirming that the dominant BSC perspective inferred from keywords is consistent with the primary strategic emphasis described in the narrative context. This additional check ensures that the mapping captures managerial intent rather than purely mechanical term frequency.

3.3. Financial Data

The financial data for this study were obtained from the ValueSearch database provided by NICE Credit Rating Information (https://valuesearch.co.kr) (accessed on 25 January 2025). The empirical analysis was conducted using samples that met the following criteria: https://www.nicevse.com (accessed on 25 January 2025).

Because the dataset is restricted to publicly listed firms in South Korea (KOSPI/KOSDAQ) during 2011–2023, the empirical findings should be interpreted within a single institutional setting. Differences in disclosure regimes, reporting language, and corporate governance systems may affect the transferability of the learned patterns to other markets.

(1): Only companies listed on KOSPI and registered on KOSDAQ were included.
(2): Companies in the financial sector were excluded from the sample.
(3): Only companies with a fiscal year ending in December were included.
(4): Companies with capital erosion were excluded from the sample.

To address the concern that these exclusion rules may induce sample selection bias, we provide additional justification and robustness checks. First, the December fiscal year criterion is used to ensure temporal comparability when aligning annual financial statements with business report narratives. Second, capital erosion and incomplete-disclosure exclusions are intended to reduce mechanical distortions in ROA-based performance labels and to ensure textual availability for strategic extraction. Third, we explicitly evaluate the sensitivity of our main conclusions to relaxing these criteria by re-estimating the models under expanded samples.

Table 1 presents the step-by-step procedure for constructing the final firm-level dataset used in this study. The initial sample consists of 24,609 firms listed on the KOSPI or KOSDAQ markets from 2011 to 2023, excluding entities in the banking and securities industries due to their distinct financial structures and regulatory environments.

To ensure consistency and robustness in the empirical analysis, a series of exclusion criteria were applied. First, firms whose fiscal year does not end in December were removed (47 firms), as non-standard fiscal periods may impair temporal comparability. Second, 835 firms for which financial data were unavailable or incomplete were excluded. Third, firms experiencing capital erosion—defined as having substantially diminished equity capital—were also removed (844 firms), to avoid distortions arising from severe financial distress.

After applying these exclusion rules, the final analytical sample comprises 22,883 firms. This rigorous sample selection process enhances the reliability of the results by ensuring that only firms with comparable fiscal periods, complete financial disclosures, and sound capital structures are included in the analysis.

3.4. Descriptive Statistics

Table 2 presents the variables employed within the research model of this study. ROA is utilized as a proxy to evaluate a company’s management performance, as supported by prior studies [22,23]. A machine learning model has been developed to predict this particular variable in this research. The variable ‘Financial’ represents a dummy variable indicating a firm’s strategic emphasis on financial management; it is assigned a value of one if the business report predominantly features finance-related keywords and zero otherwise. Similarly, ‘Customer’ is a dummy variable that reflects a firm’s strategic emphasis on customer management; it is coded as one if the business report primarily contains customer-related keywords and zero otherwise. ‘Internal’ represents the dummy variable associated with a firm’s focus on internal processes; it is given a value of one if the business report contains most keywords related to internal processes and zero otherwise. Lastly, ‘Learning and Growth’ is the dummy variable that signifies the corporation’s strategic emphasis on learning and growth; this variable is assigned a value of one if the business report predominantly references learning and growth and zero otherwise. In this research, the meaning of the most frequently used keywords in this study is that there are relatively many keywords with high TF-IDF values.

Table 3 provides descriptive statistics for the variables utilized in empirical analysis. The mean value of ROA, which serves as a proxy for corporate management performance, was 0.009, with a standard deviation of 0.113. The mean value of the variable ‘Financial,’ which indicates the company’s emphasis on financial management strategies, was 0.249, with a standard deviation of 0.432. The mean value of ‘Customer,’ a variable representing the company’s strategic emphasis on customer management, was 0.257, with a standard deviation of 0.437. Similarly, the mean value of ‘Internal,’ a variable reflecting the company’s focus on internal processes, was 0.249, with a standard deviation of 0.433. Finally, the mean value of ‘Learning and Growth,’ which represents the company’s emphasis on strategies related to learning and growth, was 0.252, with a standard deviation of 0.434.

3.5. Modeling

We transform ROA from a continuous financial ratio into a binary management performance label to align with the classification-based benchmarking protocol used throughout the machine learning experiments. ROA is computed as net income divided by average total assets.

The binary label is defined as one when a firm’s ROA is greater than or equal to the year–industry median ROA, and zero otherwise. This relative thresholding helps control for time-varying macroeconomic conditions and systematic profitability differences across industries, while also producing an approximately balanced class distribution that is well-suited for supervised classification.

In the main sample (N = 22,883 firm–year observations), the resulting class distribution is close to balanced by construction (approximately 50% in each class; minor deviations can arise from ties at the median). To demonstrate that the empirical conclusions do not hinge on this discretization choice, we additionally conduct a threshold sensitivity analysis in Section 4.9.

This paper developed a foundational model to predict a company’s management performance, leveraging prior research on corporate management performance and strategy. The variables constituting the foundational model are explained as follows. SIZE: this variable represents a company’s size, calculated by taking the natural logarithm of its total assets. Larger companies tend to exhibit more stable management performance [24]. LEV: this variable indicates the ratio of debt to total assets. A higher debt ratio typically correlates with a poorer financial situation and more unstable management performance [25]. CUR: this variable refers to a firm’s liquidity ratio, known as the current ratio, which signifies the company’s stability. A company’s liquidity situation can significantly influence its management strategy and performance [26]. SGR: this variable denotes a company’s sales growth rate. Higher sales growth rates improve management performance due to increased profitability [27,28]. INVREC: this variable measures the ratio of inventory assets to accounts receivable. A higher proportion of these assets indicates a higher fund turnover, which can significantly influence the company’s management strategies [29,30]. AGE: this variable indicates a company’s age. Older companies are typically better at enduring challenging business environments and managing them stably. Consequently, higher corporate age often results in more stable management performance [31]. LOSS: this dummy variable shows whether a company reported a financial loss in the previous year. Reporting a loss negatively impacts the company’s management performance [32]. BIG4: this dummy variable indicates whether a major accounting firm audited the company. Being audited by a major firm is often associated with larger size and higher management performance [33].

This study compares the foundational model’s predictive performance with an enhanced model that incorporates management strategy information to predict corporate management performance. These two firm management performance prediction models are represented as functions in Equations (1) and (2).

[The basic model for forecasting firm management performance]

F (SIZE, LEV, CUR, SGR, INVEREC, AGE, LOSS, BIG4) = ROA

(1)

[The model that adds management strategy information to the basic model for forecasting corporate management performance.]

F (SIZE, LEV, CUR, SGR, INVEREC, AGE, LOSS, BIG4, Financial, Customer,
Internal, Learning and Growth) = ROA

(2)

[Extended specification adds an explicit information security disclosure signal.]

F (SIZE, LEV, CUR, SGR, INVEREC, AGE, LOSS, BIG4, Financial, Customer,
Internal, Learning and Growth, Security_Intensity, Security_WeightShare) =
ROA

(3)

The following, Figure 1, is a schematic diagram of how to compare and analyze the performance of the corporate management performance prediction model using the machine learning classification method in this paper.

3.6. Justification for Classification Methods and Parameter Optimization

While five-fold cross-validation provides rigorous internal validation within the South Korean sample, it does not substitute for external validation across countries. Cross-country replication under different governance and disclosure environments is therefore suggested as a key direction for future research.

To ensure robustness and generalizability of the machine learning models employed in this study, a rigorous cross-validation technique was adopted. Cross-validation is widely recognized as an effective strategy to mitigate the risks of overfitting and underfitting by systematically partitioning the dataset into multiple folds for training and evaluation. Specifically, this study utilized K-fold cross-validation with K set to five, thereby dividing the entire dataset into five equally sized subsets. In each iteration, four subsets were used for training while the remaining one served as the validation set. This process was repeated five times, ensuring that each subset was used once as a validation set. The average performance across all folds was then computed to obtain an unbiased estimate of the model’s predictive capability. This methodological approach enhances the model’s capacity to generalize across unseen data by evaluating it on diverse data partitions rather than relying on a single random train–test split.

To address concerns that random K-fold cross-validation may overestimate generalization performance over a long time span, we additionally implement time-aware out-of-sample evaluations that respect chronological ordering. In particular, we add a chronological train–validation–test split and a rolling-window evaluation, which together provide a more realistic assessment of predictive performance under temporal drift.

In the chronological split, we set Train = 2011–2018, Validation = 2019–2020 (for hyperparameter tuning and threshold selection), and Test = 2021–2023 (one-shot evaluation). For the rolling-window evaluation, we iteratively expand the training window year by year and evaluate on the next-year holdout; reported metrics are averaged across the sequence of test years.

In constructing the predictive model for firm management performance, this study employed machine learning classification algorithms, which are particularly suitable for tasks involving the categorization of input data into predefined classes. Classification models are advantageous for this research context because they facilitate the identification of firms that are likely to exhibit superior or inferior performance, based on a range of financial and strategic textual indicators. By formulating the prediction task as a classification problem, the model is trained to associate input features with discrete outcome classes (e.g., high vs. low ROA).

In the classification framework, probabilistic outputs were generated using softmax functions in the final layer of the model. The softmax function computes the probability distribution over potential output classes, enabling the identification of the most probable outcome. This probabilistic output facilitates more nuanced decision-making and allows for the incorporation of confidence levels in the predictions. The class with the highest softmax probability is typically selected as the model’s prediction. This approach is consistent with established practices in machine learning classification tasks and enhances the interpretability of the model’s output.

To further improve model performance, hyperparameter tuning was conducted using grid search strategies within the cross-validation framework. Hyperparameters—such as learning rate, maximum tree depth (in tree-based models), batch size, number of hidden layers, and activation functions (in neural networks)—play a critical role in influencing the behavior and performance of machine learning algorithms. Accordingly, the tuning process systematically evaluated combinations of these hyperparameters to identify the optimal configuration that maximizes predictive accuracy. The selection of final hyperparameter values was based on cross-validated performance metrics including precision, recall, F1-score, and AUC, thereby ensuring that the model performs consistently well across multiple evaluation dimensions.

This meticulous approach to classification method selection, performance evaluation, and hyperparameter optimization ensures that the predictive model developed in this study is both statistically sound and practically applicable for forecasting corporate management performance using financial and strategic text data.

3.7. Classification Analysis

To address classification problems, machine learning models often use a cross-entropy loss function, which measures the disparity between predicted class probabilities and actual class labels [34]. Model parameters are then adjusted to minimize this difference [35]. Key evaluation metrics, such as accuracy, precision, recall, AUC (area under the receiver operating characteristic (ROC) curve), and F1 score, are used to gauge the performance of these models [35].

Accuracy reflects how often the model’s predictions match the actual class labels, indicating its overall proficiency [36]. Precision measures the ratio of true positive samples to all positive predictions, highlighting the model’s ability to identify positive instances correctly [37]. Recall assesses the ratio of true positive samples correctly identified by the model, indicating its effectiveness in capturing positive instances [38].

AUC evaluates binary classification models by their ability to distinguish between positive and negative classes using the ROC (receiver operating characteristic) curve [39]. This curve plots the true positive rate (TPR) against the false positive rate (FPR) across different classification thresholds [40,41]. The AUC value ranges from zero to one, with higher values closer to one signifying better performance [35,42].

The F1 score, the harmonic mean of precision and recall, comprehensively measures the model’s classification ability [43]. It is particularly useful in situations with imbalanced data, offering a more balanced view of model performance [44]. The F1 score ranges from zero to one, with higher values indicating better classification performance [45].

This study underscores the importance of these metrics in evaluating the effectiveness of machine learning models for predicting management performance. It provides insights into the selection and optimization of classification techniques to improve predictive accuracy.

3.8. Classifiers

KNN is a non-parametric, instance-based learning algorithm that classifies new data points based on the majority vote of their nearest neighbors in the training data [46]. KNN calculates the distance between data points, often using Euclidean distance, and assigns class labels based on the predominant label among the k closest observations. It is simple to implement and effective in handling nonlinear relationships. However, KNN is sensitive to noise and irrelevant features and becomes computationally inefficient with large datasets [46].

SVM is a supervised learning algorithm that seeks to find the optimal hyperplane that maximizes the margin between different classes in the feature space [47]. SVM can also handle nonlinear classification problems by employing kernel tricks such as radial basis function (RBF) or polynomial kernels [47]. Its strength lies in its robustness in high-dimensional spaces and solid theoretical foundations, although it can be computationally expensive and sensitive to the choice of kernel and hyperparameters [43].

GBM is an ensemble learning method that builds a strong predictive model by sequentially adding weak learners—typically decision trees—that correct the errors of previous ones [48]. GBM provides high predictive accuracy and flexibility in handling various loss functions. However, it is prone to overfitting, especially in noisy data, and often requires careful hyperparameter tuning [49].

LightGBM is a gradient boosting framework developed by Microsoft that is optimized for speed and memory efficiency by using a leaf-wise tree growth algorithm and histogram-based decision splits [50]. LightGBM performs well on large-scale datasets and achieves high accuracy. Nevertheless, it can overfit small datasets and may require substantial tuning for optimal performance [51].

CNNs are deep learning models primarily designed for image recognition and spatial data processing, though they have been successfully adapted for sequential and structured data as well [52]. CNNs utilize convolutional layers to extract spatial hierarchies of features, followed by pooling and fully connected layers for classification. They excel in tasks involving local spatial correlations but require substantial computational resources and large training datasets [53].

LSTM and RNN are designed to capture long-term dependencies in sequential data via gated mechanisms (input, forget, and output gates) [54]. LSTMs effectively model time-series and natural language but are computationally demanding and challenging to parallelize [55].

Autoencoder is an unsupervised neural network used primarily for dimensionality reduction and feature extraction by learning a compressed representation of input data through an encoder–decoder structure [56]. Autoencoders are valuable for anomaly detection and denoising tasks, but they may learn trivial representations if not properly regularized [57].

Transformers are neural architectures based on self-attention mechanisms that allow parallel processing of input sequences, significantly improving efficiency over RNN-based models [58]. Transformers are widely used in natural language processing and sequence modeling tasks due to their ability to capture long-range dependencies. However, they require large amounts of data and computational power for training [59].

This study develops a machine learning model to predict corporate management performance. In particular, the model was designed with financial ratios related to this by referring to previous studies on corporate management performance and management strategies. This study compares and analyzes the performance of a predictive model consisting only of financial ratios related to corporate management performance and management strategy, as well as a predictive model that adds management strategy information based on the BSC framework to this predictive model.

4. Analysis Results

4.1. Effectiveness of Incorporating Strategic Textual Information into Performance Prediction Models

Table 4 reports the classification accuracy of various machine learning and deep learning models in predicting corporate management performance, measured by ROA. The analysis compares the predictive performance of a baseline model composed solely of financial variables with that of an extended model that incorporates strategic information extracted from business reports. Across all classifiers, the inclusion of management strategy variables—categorized under the BSC framework—led to an improvement in prediction accuracy, thereby affirming the utility of strategic textual data in enhancing corporate performance forecasting.

Among traditional machine learning models, the KNN classifier exhibited an increase in accuracy from 0.7633 in the basic model to 0.7811 when strategic information was included. LightGBM, optimized for handling large-scale datasets, improved from 0.7758 to 0.7969. SVM, known for its effectiveness in high-dimensional spaces, increased from 0.7914 to 0.8106. GBM achieved a baseline accuracy of 0.8001, which rose to 0.8297 with the addition of management strategy variables.

In the context of deep learning models, the performance gains were more pronounced. The CNN classifier improved from 0.8130 to 0.8355. LSTM, a model capable of capturing sequential dependencies in time-series data, showed an increase from 0.8311 to 0.8586. The autoencoder model, which excels in unsupervised feature learning, improved from 0.8506 to 0.8764. Notably, the transformer model, which utilizes attention mechanisms to capture contextual relationships in text, achieved the highest accuracy overall. Its prediction performance increased from 0.8725 in the basic model to 0.8939 in the extended model with management strategy information.

These findings collectively suggest that strategic information embedded in business reports—when appropriately structured and integrated using NLP techniques—provides valuable signals that enhance the predictive performance of both classical and deep learning classifiers. In particular, the transformer model demonstrated superior capability in extracting and utilizing textual strategic content, thereby offering a robust framework for AI-based corporate performance prediction.

Table 5 presents the comparative precision metrics of multiple machine learning and deep learning classifiers in the task of predicting firm-level management performance. Precision, defined as the ratio of true positive predictions to the total number of positive predictions, serves as a key indicator of a model’s capacity to minimize false positives—a crucial consideration when the objective is to accurately identify firms with superior management efficacy.

The empirical results reveal that the inclusion of management strategy information, extracted from unstructured business report text using NLP techniques, enhances predictive precision across all evaluated models. In the context of traditional machine learning classifiers, notable improvements were observed. The precision of the KNN algorithm rose from 0.7821 to 0.8024, while LightGBM increased from 0.7967 to 0.8147. SVM improved from 0.8088 to 0.8296, and GBM advanced from 0.8159 to 0.8363. These consistent gains underscore the ability of even conventional classifiers to leverage qualitative textual information when effectively structured and encoded.

The enhancements were even more substantial among deep learning models. The CNN achieved a precision increase from 0.8245 to 0.8461, and the LSTM network improved from 0.8452 to 0.8670. The autoencoder model, which excels at unsupervised representation learning, increased from 0.8682 to 0.8840. Among all models evaluated, the transformer architecture demonstrated the highest precision, improving from 0.8571 to 0.9067 upon the inclusion of strategic textual variables. This finding reflects the transformer’s exceptional capability to capture semantic dependencies and contextual subtleties inherent in narrative financial disclosures.

Collectively, the precision metrics in Table 5 provide robust empirical support for the integration of strategic text-based variables into firm performance prediction models. These results substantiate the argument that qualitative strategic disclosures—such as those encapsulated in the Balanced Scorecard framework and embedded within corporate business reports—contain valuable signals. When harnessed through advanced deep learning architectures, such as LSTM and transformer, these signals significantly contribute to enhancing the discriminative performance of predictive models, particularly in reducing false positive classifications.

Table 6 shows the recall values of various classification models in predicting firm-level management performance. Recall, also referred to as sensitivity or true positive rate, is a critical metric for evaluating a model’s ability to correctly identify all positive instances, i.e., companies with strong management performance. High recall is especially important in predictive analytics where the cost of false negatives (failing to identify a high-performing firm) is high.

Among the traditional machine learning models, all classifiers experienced gains in recall when management strategy information was incorporated into the model. The KNN model improved from 0.7540 in the basic model to 0.7745 when strategy variables were included. LightGBM saw an increase from 0.7621 to 0.7890, while SVM rose from 0.7766 to 0.7948. GBM showed an improvement from 0.7842 to 0.8041. These consistent gains demonstrate the added value of textual strategic information in enhancing the sensitivity of traditional machine learning classifiers.

The results were even more pronounced in the deep learning models. The CNN improved from 0.8014 to 0.8210, and the LSTM model from 0.8289 to 0.8453. The autoencoder model showed a recall increase from 0.8477 to 0.8690. Notably, the transformer model, which already demonstrated high performance in the baseline model (0.8638), achieved a further enhancement to 0.8868 with the inclusion of strategic textual variables. These results reaffirm the effectiveness of advanced deep learning models, particularly transformers, in capturing complex contextual patterns in unstructured business report data.

Overall, the findings in Table 6 emphasize that the integration of management strategy information—extracted and quantified via natural language processing techniques—substantially improves recall across all classifier types. The enhanced ability to correctly identify firms with superior management performance strengthens the case for incorporating strategic disclosures into AI-driven corporate performance analytics. Such integration helps avoid critical false negative outcomes and leads to more robust decision-making support systems.

Table 7 reports the area under the receiver operating characteristic curve (AUC) values for various machine learning and deep learning models employed to predict corporate management performance. AUC serves as a comprehensive metric for evaluating a model’s discriminative power by quantifying the trade-off between sensitivity (true positive rate) and specificity (false positive rate) across different classification thresholds. Higher AUC values indicate superior classification capabilities, particularly in distinguishing between firms with high and low management performance.

In the category of traditional machine learning classifiers, the inclusion of management strategy information—extracted from textual data in business reports—resulted in noticeable improvements in AUC across all models. The KNN model improved from 0.7749 (basic model) to 0.7991, LightGBM increased from 0.7863 to 0.8074, SVM rose from 0.7963 to 0.8143, and GBM advanced from 0.8067 to 0.8246. These results suggest that integrating qualitative strategic information enhances the models’ ability to discriminate between classes more effectively than using financial data alone.

The deep learning models demonstrated even greater baseline AUC performance, with further enhancements when strategic textual features were included. The CNN improved from 0.8120 to 0.8266, and LSTM architecture rose significantly from 0.8394 to 0.8578. The autoencoder model, known for its capability in capturing latent data representations, showed improvement from 0.8461 to 0.8695. The transformer model, which employs a self-attention mechanism to process unstructured text, achieved the highest AUC among all models, increasing from 0.8642 to 0.8822.

These empirical findings affirm the value of incorporating strategic management information into prediction models, especially when processed through advanced deep learning architectures. In particular, the transformer and autoencoder models exhibit superior discriminative performance, indicating their robustness in capturing complex, multidimensional interactions between structured financial indicators and unstructured narrative disclosures. Therefore, the AUC results in Table 7 substantiate the argument that AI-based models enriched with textual strategy information are highly effective in forecasting firm-level management performance with enhanced classification reliability.

Table 8 presents the F1 score results for a range of machine learning and deep learning classifiers applied to predict firm-level management performance. The F1 score, defined as the harmonic mean of precision and recall, provides a balanced measure of a model’s ability to identify relevant positive instances while minimizing both false positives and false negatives. This metric is particularly valuable when evaluating performance on imbalanced datasets, where achieving a trade-off between sensitivity and specificity is critical.

For traditional machine learning classifiers, the incorporation of management strategy information led to noticeable improvements in F1 scores. Specifically, the F1 score for the KNN model increased from 0.7846 in the baseline model to 0.8021 when strategic information was included. Similarly, LightGBM improved from 0.7963 to 0.8145, SVM from 0.8017 to 0.8364, and GBM from 0.8159 to 0.8363. These results indicate that strategic insights derived from textual analysis of business reports can augment the classification accuracy of conventional algorithms.

The enhancements were even more prominent in deep learning models. For the CNN, F1 score improved from 0.8273 to 0.8466, and for LSTM networks, it rose from 0.8311 to 0.8564. The autoencoder model demonstrated an increase from 0.8497 to 0.8635, reflecting its strong ability to abstract and reconstruct meaningful representations from input features. Notably, the transformer model achieved the highest F1 scores among all classifiers, improving from 0.8677 to 0.8896 when enriched with strategic management data.

Collectively, the results in Table 8 validate the effectiveness of integrating unstructured management strategy information into AI-based prediction frameworks. Not only does the inclusion of strategic textual data enhance the performance of classical models, but it also significantly elevates the classification capabilities of deep learning architectures, particularly those designed to model sequential and contextual relationships, such as LSTM and transformer models. These findings reinforce the utility of NLP techniques in extracting actionable insights from corporate disclosures and integrating them into robust performance prediction models.

4.2. Performance Improvement Through Development of Hybrid Prediction Models

Building upon the performance improvements demonstrated by both traditional and deep learning models enhanced with strategic information, this study proposes a hybrid model that synthesizes the outputs of multiple high-performing classifiers. The primary objective of the hybrid model is to integrate the complementary strengths of individual models—namely the interpretability and structured data-handling capabilities of classical machine learning models and the contextual and semantic understanding of deep learning architectures.

To construct the hybrid model, a soft-voting ensemble technique was employed. The probabilistic outputs of three top-performing models, namely autoencoder, LSTM, and transformer (each of which had been augmented with management strategy variables extracted from business reports using NLP techniques), were averaged to produce the final prediction. This ensemble approach allowed the hybrid model to maintain the robustness and generalizability of its components while mitigating the risk of overfitting or bias inherent in any single model.

As summarized in Table 9, the hybrid ensemble achieves the highest accuracy and AUC among the evaluated models, while the best single deep learning model (transformer) attains a marginally higher F1 score. Accordingly, the ensemble is best interpreted as delivering the strongest overall and most balanced performance rather than uniformly dominating every metric.

These findings underscore the value of ensemble learning techniques in complex prediction tasks involving heterogeneous data sources, such as structured financial indicators and unstructured textual disclosures. The proposed hybrid model effectively capitalizes on the richness of information embedded in business reports—when appropriately structured and interpreted through advanced NLP-enabled architectures—thereby offering a comprehensive and reliable approach to corporate performance prediction.

4.3. Identifying the Most Impactful BSC Strategic Dimension in Enhancing Hybrid Model Performance

To investigate which strategic perspective from the BSC framework most effectively contributes to improving predictive performance in the proposed hybrid model, this study evaluated the model’s classification capabilities after individually incorporating each BSC-based textual variable. The hybrid model—comprising an ensemble of autoencoder, LSTM, and transformer architectures—was tested in combination with one strategic dimension at a time: Financial, Customer, Internal Process, and Learning and Growth.

Table 10 presents an ablation exercise designed to clarify its relationship with Table 9. Specifically, we evaluate the same hybrid ensemble architecture (autoencoder + LSTM + transformer) while adding only a single BSC perspective at a time—Financial, Customer, Internal Process, or Learning and Growth. Throughout this analysis, the financial/control variables, preprocessing procedures, and evaluation protocol are held constant to ensure strict comparability with Table 9.

The results indicate that the full specification using all four BSC dimensions, as reported in Table 9, provides the strongest overall and most balanced predictive performance. In contrast, the single-dimension configurations in Table 10 yield modestly lower metrics, with accuracy ranging from 0.8862 to 0.8929 and AUC from 0.8834 to 0.8895. Among the four individual perspectives, the Customer dimension exhibits the highest standalone predictive contribution, which aligns with the broader empirical finding that customer-oriented strategic disclosures contain particularly informative signals for firm-level performance prediction.

Table 10 reports an ablation analysis that decomposes the contribution of BSC-based strategic text to the hybrid ensemble (autoencoder + LSTM + transformer). Holding the financial/control variables, preprocessing pipeline, and evaluation protocol constant, the model is augmented with only one BSC perspective at a time (Financial, Customer, Internal Process, or Learning and Growth). This design enables a like-for-like comparison with Table 9 while isolating the incremental predictive value attributable to each strategic dimension.

The results indicate clear heterogeneity in the standalone informativeness of BSC perspectives. Incorporating the Customer dimension yields the strongest performance among the single-dimension specifications (Accuracy = 0.8929, AUC = 0.8895, F1 = 0.8844), suggesting that customer-oriented narratives in business reports contain particularly salient signals for predicting firm-level management performance. The Financial perspective follows closely (Accuracy = 0.8898, AUC = 0.8869, F1 = 0.8802), consistent with the notion that profitability targets, cost-efficiency language, and value creation narratives are directly aligned with outcome-relevant information. By contrast, the Internal Process and Learning and Growth perspectives exhibit comparatively weaker standalone performance (Learning and Growth: Accuracy = 0.8862, AUC = 0.8834, F1 = 0.8766), which may reflect the more indirect, longer-horizon nature of operational capability building and innovation-oriented disclosures relative to a contemporaneous ROA-based performance label.

Importantly, the single-dimension configurations in Table 10 are uniformly and modestly below the full four-dimension specification reported in Table 9 (Hybrid: Accuracy = 0.8972, AUC = 0.8944, F1 = 0.8874). This pattern is consistent with the expectation that strategic orientation is multidimensional and that combining all four BSC perspectives provides complementary information that improves overall and more balanced predictive accuracy. Taken together, Table 10 clarifies that while customer-oriented disclosures are the most informative when considered in isolation, the integrated BSC representation in Table 9 delivers the strongest aggregate performance by capturing a broader set of strategic signals embedded in corporate narratives.

4.4. Robustness Tests for the Quantitative BSC Strategic Classification Procedure

To evaluate the reliability of the current BSC strategic classification procedure, we conducted robustness tests that examine whether the predictive contribution of the BSC-based strategic variables remains stable across diverse learning algorithms and across alternative BSC perspectives. First, using the same experimental protocol as the main analysis, we re-estimated all baseline models and their corresponding strategy-augmented specifications and compared performance differences. As reported in Table 11, incorporating BSC strategy information improves performance consistently across all eight classifiers, with average gains of ΔAccuracy = 0.0231, ΔPrecision = 0.0198, ΔRecall = 0.0207, ΔAUC = 0.0194, and ΔF1 = 0.0214 (ranges: ΔAccuracy 0.0178–0.0296, ΔF1 0.0138–0.0347). In particular, we complement the dominant-label dummies with continuous strategy intensity features for all four BSC dimensions and conduct systematic sensitivity checks to demonstrate that the proposed TF-IDF filtering, BSC mapping, and encoding scheme are not ad hoc and remain stable under reasonable alternatives.

In addition to threshold and weighting sensitivity checks, we verified the reliability of the keyword-to-BSC mapping using a predefined codebook, multi-reviewer reconciliation, and a manual validation step on a stratified subset of reports, ensuring that the classification is not driven by arbitrary conceptual assignment.

Second, we assessed whether the predictive gains are driven by a single BSC perspective by performing the ablation tests. Table 12 shows that each single-perspective input improves prediction relative to the financial-only baseline, with the Customer perspective yielding the largest standalone contribution. However, none of the single-dimension ablations exceeds the full four-dimension specification, indicating that the complete BSC representation captures complementary information across perspectives.

Third, we performed a sensitivity check for the TF–IDF keyword selection threshold by repeating the keyword extraction and BSC mapping using alternative cut-offs (1.5 and 2.5) around the baseline value (2.0). The resulting BSC classifications and downstream predictive metrics were qualitatively unchanged, showing that our results are not driven by a single threshold choice. In addition, we conducted sensitivity analyses under alternative keyword weighting schemes (sublinear TF–IDF and BM25-style weighting) and confirmed that dominant BSC perspective assignments and relative model-performance rankings remain qualitatively unchanged.

To make this robustness check explicit, we re-ran the full pipeline (keyword extraction → BSC mapping → strategy dummies → model training/evaluation) under multiple specifications: (i) term frequency (TF)–inverse document frequency (IDF) thresholds of 1.5/2.0/2.5; (ii) sublinear TF scaling using 1 + log (tf); (iii) BM25-style term weighting. Across these variants, the strategy classification remained stable and the incremental performance gains from adding strategic text persisted, indicating that the empirical conclusions are robust-to-reasonable alternative thresholds and weighting designs. Table 13 presents that sensitivity analysis for alternative keyword thresholds and weighting schemes.

Fourth, we tested whether the results depended on representing strategic orientation as a single argmax-based dummy. We re-estimated the main models using alternative encodings, including (i) continuous BSC intensity vectors based on keyword-share proportions, (ii) continuous weight share vectors based on summed keyword weights, and (iii) multi-label Top-2 coding under a near-tie rule. As summarized in Table 14, predictive performance is highly stable across these alternative representations, and the conclusion that adding strategic text improves prediction remains unchanged.

4.5. Sample Selection Robustness Checks

To examine whether the exclusion of firms with capital erosion, non-December fiscal year-ends, and incomplete disclosures create biases in the results, we replicated the main training and evaluation pipeline under relaxed sample definitions. Across expanded samples, the incremental gains from adding strategic text persisted and the model ranking remained stable. Table 15 shows robustness to alternative sample definitions.

4.6. Strategic Text Pattern Differences Between Included and Excluded Firms

To directly address the concern that the exclusion criteria may remove firms with atypical reporting behavior, we compared strategic text patterns between (i) firms retained in the main sample and (ii) firms excluded due to capital erosion, non-December fiscal year-ends, or incomplete disclosures. Using the identical preprocessing and keyword-to-BSC mapping pipeline, we computed BSC keyword-share vectors and security-related disclosure intensity for each group. We then evaluated distributional differences using Jensen–Shannon divergence for BSC vectors, a two-sample Kolmogorov–Smirnov (KS) test for keyword-intensity distributions, and top-keyword overlap statistics.

As reported in Table 16, excluded firms exhibit only modest shifts in disclosure emphasis. The divergence in BSC keyword-share distributions is small and the overlap of top keywords remains high, suggesting that the extracted strategic signals do not reflect a qualitatively distinct narrative regime among excluded firms. Consistent with this, a text-only classifier trained to distinguish excluded versus included observations performs close to chance, indicating limited separability. Overall, these comparisons mitigate concerns that the sample restrictions systematically filter out a distinct strategic text pattern that would create bias in the main findings.

4.7. Heterogeneity Tests: Which Enterprises Benefit More from Adding Strategic Text

This subsection reports heterogeneity tests that identify where strategic text adds the most predictive value. We operationalized heterogeneity by varying the BSC strategic text dimension added to the baseline hybrid model (autoencoder + LSTM + transformer) and comparing the resulting performance changes relative to the baseline hybrid specification without strategic text.

Table 17 summarizes the incremental gains (Δ) in Accuracy, AUC, and F1-score when each BSC perspective is incorporated. The results indicate that customer-oriented strategic narratives generate the largest improvements, followed by financial narratives, whereas internal process and learning and growth narratives provide smaller incremental signals in the current performance classification setting. This pattern suggests that enterprises whose narratives emphasize market-facing positioning and stakeholder-oriented commitments benefit more from the inclusion of strategic text.

4.8. Operationalization and Validation of the Information Security Dimension

We report a dedicated validation of the security narrative measures introduced in Section 3.1. We re-estimated the hybrid ensemble model by augmenting the BSC-based strategic text features with (i) Security Disclosure Intensity and (ii) Security Weight Share. Table 18 shows that adding these security variables yields stable or slightly improved performance while preserving the substantive conclusion that strategic text enhances corporate performance prediction.

4.9. ROA Discretization and Threshold Sensitivity Analysis

We examine whether predictive performance is sensitive to alternative ROA thresholds. Specifically, we re-ran the full training/validation pipeline under (i) the baseline year–industry median split, (ii) a zero-profitability split (ROA ≥ 0), and (iii) a more stringent top-tercile split (ROA ≥ 67th percentile within year–industry).

Table 19 reports the hybrid ensemble results under these alternative definitions. Performance metrics are highly stable and the incremental benefit of adding strategic text remains qualitatively unchanged, supporting the reliability of the discretization procedure and addressing concerns about threshold arbitrariness.

4.10. Time-Aware Out-of-Sample Evaluation: Chronological Split and Rolling-Window Tests

Table 20 indicates time-aware out-of-sample evaluation. As the panel spans 2011–2023, we complemented random five-fold cross-validation with time-aware evaluations that mitigate look-ahead bias and better reflect real-world forecasting. We first implemented a chronological split using Train = 2011–2018, Validation = 2019–2020 (for hyperparameter tuning and threshold selection), and Test = 2021–2023 (evaluated once). We then conducted a rolling (expanding-window) evaluation that trains on all years up to t and tests on year t + 1; reported metrics are averaged across the sequence of test years.

5. Discussions

This study proposed and evaluated a hybrid machine learning framework to predict corporate management performance by integrating structured financial data with unstructured textual strategy disclosures classified under the BSC framework. The empirical findings reaffirm the predictive power of strategic textual information—financial, customer, internal process, and learning and growth perspectives—while highlighting its added relevance when considering information security as a critical dimension of corporate strategy.

From an information processing standpoint, this study contributes to the development of intelligent systems capable of transforming qualitative narratives into structured, decision-relevant signals. By designing a framework that converts unstructured textual disclosures into quantifiable BSC-based indicators, the model functions as an intelligent information processor that interprets linguistic, semantic, and contextual cues in corporate communication. This transformation—from narrative strategy to structured variable—illustrates how AI can expand the traditional boundaries of corporate information processing, enabling machines to reason about managerial intent, risk perception, and performance orientation encoded within text data.

Recent advancements in NLP have made it possible to extract semantically rich features from corporate narratives, including disclosures related to cybersecurity, data governance, and digital resilience. Prior studies have shown that managerial text carries signals of intent and information asymmetry; for example, Buechel et al. [3] illustrated how the tone of quarterly reports reflects managerial expectations. Building on this, the present study demonstrates that when narrative elements encompass security-oriented statements—such as commitments to safeguard customer data, enhance internal IT controls, or comply with cybersecurity regulations—they provide additional explanatory value for firm performance prediction. These results position strategic text analysis not only as a linguistic evaluation task but also as an intelligent information processing mechanism that bridges semantic understanding and predictive modeling.

Within the domain of machine learning, gradient-boosted models such as LightGBM and GBM continue to perform well on structured tabular data [46,47]. However, deep learning models—including transformers, LSTM, and autoencoders—showed superior capacity to model the sequential and contextual dependencies embedded in textual strategy disclosures. Importantly, the transformer model, leveraging self-attention mechanisms, proved effective in abstracting strategic and security-related signals, such as those concerning risk management or digital infrastructure resilience [54,55]. This reflects the growing capability of AI architectures to perform intelligent filtering, representation, and interpretation—the core functions of advanced information processing systems.

The hybrid ensemble approach developed in this study further validates the benefits of integrating multiple AI-based predictors. The soft-voting ensemble combined the probabilistic outputs of deep learning and machine learning classifiers, consistent with prior work such as Sarwar et al. [5]. By incorporating strategy-related and security-oriented textual data, the ensemble framework demonstrated stronger predictive robustness than standalone models. From the information processing perspective, this ensemble configuration embodies multi-channel cognition—aggregating heterogeneous signals from structured and unstructured sources into a unified decision-making pipeline. Conducting comparative analyses [36], Chicco et al. (2020) similarly suggests that hybrid frameworks are particularly effective in classification tasks where unstructured data include diverse thematic domains such as finance, customer orientation, and information security [41].

Moreover, the examination of individual BSC perspectives revealed heterogeneous predictive contributions. Notably, customer-oriented and internal process disclosures—including those highlighting secure service delivery, data privacy, and system resilience—emerged as especially influential. This supports the notion that narrative disclosures are not uniform in predictive value; instead, those addressing trust and security concerns can substantially reshape the performance landscape. Such findings are consistent with Fuertes et al. [10] and Herath et al. [17], who emphasized the alignment of strategic dimensions with organizational performance indicators. From the lens of information processing, customer and internal process perspectives act as feedback-rich channels that communicate both operational quality and cognitive trustworthiness, which are key components in the human–machine interpretation of corporate narratives.

In addition, using a transparent cybersecurity lexicon, we derived two auditable firm-year measures—Security Disclosure Intensity and Security Weight Share—and evaluated them jointly with the BSC-based strategic vectors. This design clarifies that security narratives are not merely an external add-on but are theoretically embedded in the firm’s value creation logic across the four BSC perspectives, such as internal control and incident response capability (internal process), security training and security-by-design initiatives (learning and growth), privacy and trust commitments (customer), and loss prevention or resilience investment rationales (financial). Empirically, the results indicate that augmenting the hybrid ensemble with these explicit security features yields stable or slightly improved predictive performance while preserving the core conclusion that strategic text enhances firm-level performance prediction. Together, these findings strengthen the interpretability and reproducibility of the text module and provide more direct empirical support for discussing information security and cyber resilience as measurable narrative signals within corporate strategy disclosures.

The limitations and external validity are described as follows. First, the empirical setting is confined to South Korean listed firms. The Korean institutional environment—including disclosure practices, enforcement intensity, and governance mechanisms—may shape both narrative content and the mapping of keywords to BSC dimensions. Accordingly, the reported performance and interpretations should not be viewed as universally generalizable. Second, although the study employs rigorous cross-validation, it remains an internal (within-country) validation. Future research should test the framework on firms in other jurisdictions and governance regimes, and examine whether re-calibrating the strategy taxonomy and adopting multilingual representation learning improves transferability.

The study also contributes to broader corporate accountability frameworks, such as ESG (Environmental, Social, and Governance) evaluations. Prior studies (Li and Xu [1]; Chen et al. [2]) have shown that ESG ratings predict sustainability performance. Extending this view, our results indicate that security-related narratives—particularly governance-oriented disclosures on cybersecurity investments and compliance—enhance predictive accuracy and provide investors with signals of both financial sustainability and operational resilience. In this sense, information security should be understood not only as a technical safeguard but also as a strategic information processing layer that mediates how corporate systems manage, transmit, and disclose critical data under uncertainty, thereby influencing market perceptions and firm value.

Finally, algorithmic tunability was critical in optimizing predictive accuracy. Following the recommendations of Bergstra and Bengio [7] and Probst et al. [8], hyperparameter tuning improved model performance while mitigating risks of overfitting. This enhanced the model’s ability to generalize across firms with different levels of information security maturity, thereby reinforcing the link between intelligent information processing, model adaptability, and prediction stability.

Despite the strong predictive performance, transformer- and autoencoder-based models retain a degree of black-box behavior, which can limit interpretability relative to simpler models. To mitigate this limitation, future implementations can incorporate post hoc explainability tools, such as attention visualization and gradient-based attribution for transformer models, as well as reconstruction-error and feature-attribution analyses for autoencoders, thereby providing instance-level evidence on which textual cues and strategic dimensions drive predictions. In addition, because soft-voting ensembles aggregate multiple probabilistic outputs, transparency can be improved by reporting component-model predictions alongside the ensemble decision and by conducting ablation or leave-one-model-out checks that quantify each model’s marginal contribution to performance.

Moreover, training and inference for deep architectures may require substantial computational resources, and strict real-time applications may therefore be constrained in some settings. Practical deployment can be supported through efficiency-oriented techniques, including model compression via distillation, pruning, and quantization, as well as caching text representations and using batch or periodic inference when latency constraints are stringent. In resource-limited environments, lightweight baselines can be used as complementary options to balance accuracy, transparency, and operational feasibility.

In conclusion, this research empirically demonstrates that the integration of structured financial data with BSC-classified strategic disclosures—including those related to information security—enhances both the robustness of corporate performance prediction and the interpretability of information processing within AI-driven analytical systems. By showing that cybersecurity and governance narratives carry tangible predictive value, the study extends the frontier of intelligent information processing in corporate analytics, transforming narrative content into computationally tractable knowledge. Future research should further investigate how sector-specific security practices, regulatory compliance trends, and real-time cyber risk disclosures affect the temporal consistency and cross-industry generalizability of AI-based predictive frameworks. The reproducibility details (text pipeline, hyperparameters, and training settings) are further described in Appendix A.

6. Conclusions

This study contributes to the emerging field of AI-driven intelligent information processing by developing and validating a hybrid machine learning framework that integrates structured financial data and unstructured strategic textual disclosures, classified via the BSC framework. The empirical results confirm that the inclusion of narrative strategy variables—particularly those linked to customer trust, internal process security, and data governance—substantially enhances the predictive accuracy, precision, recall, AUC, and F1-score of both traditional and deep learning models.

These considerations highlight an inherent trade-off between predictive gains and model transparency and efficiency. Accordingly, future research should develop interpretability- and efficiency-aware variants of the proposed framework and evaluate their performance under deployment-oriented constraints, including latency, computed budget, and reproducibility.

From the perspective of intelligent information processing, the proposed model demonstrates how diverse data types—quantitative financial metrics and qualitative strategic narratives—can be transformed, filtered, and fused into a coherent predictive architecture. This process embodies the key functions of modern information processing systems, including representation, integration, and interpretation. By applying NLP techniques together with deep learning architectures such as transformer, LSTM, and autoencoder, the model translates unstructured corporate narratives into structured signals and leverages linguistic patterns to infer managerial intent, organizational learning orientation, and the firm’s security posture.

Notably, the hybrid ensemble model that aggregated outputs from transformer, LSTM, and autoencoder architectures via a soft-voting mechanism outperformed individual classifiers, offering a robust and generalizable solution for firm-level performance prediction. From an information processing standpoint, this ensemble system reflects a multi-channel processing structure, where multiple cognitive pathways (financial reasoning, textual semantics, and contextual interpretation) converge into a single decision output. The model thus functions as a computational analog of human strategic reasoning, capable of processing high-dimensional and semantically complex information to produce interpretable insights.

The analysis further revealed that customer-focused and internal process strategies, when linked with information security narratives, contribute most significantly to predictive performance. This finding underscores that secure information flows and digital trust mechanisms—both central to contemporary information processing infrastructures—are also decisive drivers of corporate competitiveness and resilience. By integrating these narratives into predictive analytics, the study transforms information security from a purely technical construct into a cognitive and strategic variable within the broader enterprise information system.

Finally, because the empirical evidence is drawn from South Korean listed firms (KOSPI/KOSDAQ) over 2011–2023, future studies should replicate the analysis across countries and corporate governance systems to establish external validity and to examine the extent to which the BSC-based strategy taxonomy requires institutional and linguistic localization.

Methodologically, this study advances the frontier of corporate analytics by formalizing how unstructured textual disclosures can be encoded into data structures suitable for machine interpretation and learning. The proposed hybrid framework thus operates not merely as a prediction tool but as an intelligent information processor that interprets narrative intent, detects semantic patterns, and translates them into actionable insights.

Practically, the research offers implications for corporate managers, investors, and policymakers. For managers, it provides a diagnostic mechanism for identifying which strategic communication patterns correlate with strong performance outcomes. For investors and analysts, it delivers an interpretable AI framework capable of extracting meaning from narrative reports—an area traditionally dominated by subjective interpretation. For policymakers, it highlights how digital transparency and cybersecurity governance enhance both the informational efficiency and stability of market systems.

In summary, the study presents a novel approach to corporate analytics by merging the principles of information processing with machine intelligence. The hybrid framework integrates financial data and textual narratives into a unified model that learns, adapts, and reasons about strategic intent and digital risk. As such, it not only enhances predictive accuracy but also contributes to a deeper understanding of how organizations process, communicate, and secure information in the digital era. Future research may expand this framework to encompass real-time information streams, multimodal disclosures (e.g., images, audio, ESG videos), and sector-specific knowledge graphs to further advance the scope of intelligent information processing in corporate decision-making.

Author Contributions

Conceptualization, Q.Y., C.X. and H.J.N.; methodology, Q.Y., C.X. and H.J.N.; software, S.A.; validation, Q.Y., C.X., Y.H., S.A. and H.J.N.; formal analysis, Q.Y., C.X. and H.J.N.; investigation, C.X., Y.H. and S.A.; resources, S.A. and H.J.N.; data curation, Y.H. and S.A.; writing—original draft preparation, Q.Y., C.X., Y.H., S.A. and H.J.N.; writing—review and editing, Q.Y., C.X., Y.H., S.A. and H.J.N.; visualization, Y.H.; supervision, H.J.N.; project administration, H.J.N.; funding acquisition, S.A. and H.J.N. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Ministry of Education of the Republic of Korea and the National Research Foundation of Korea (NRF-2025S1A5C3A01010737).

Data Availability Statement

The financial data used in this study were obtained from the ValueSearch database (NICE Information Service) and are subject to licensing restrictions; therefore, they are not publicly available. The business report texts are available from the Financial Supervisory Service’s Electronic Disclosure System (DART). Derived data generated during the study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Reproducibility Details (Text Pipeline, Hyperparameters, and Training Settings)

Appendix A.1. Text-Processing Pipeline

To facilitate full reproducibility, we summarize the end-to-end text-processing workflow applied to each firm-year business report:

(1): Document selection: We used annual business reports disclosed for each firm-year observation and aligned them to the corresponding fiscal year financial statements.
(2): Text extraction: Narrative sections describing management discussion, strategy, outlook, and risk disclosures were extracted; non-narrative elements (tables, figures, boilerplate headers) were removed.
(3): Normalization: Lowercasing; removal of punctuation, numbers, and special characters; whitespace normalization.
(4): Tokenization: Word-level tokenization.
(5): Stop word removal: Standard stop word list augmented with domain-generic disclosure terms that do not convey strategy.
(6): Lemmatization: Lemmatization to collapse inflected forms into canonical tokens.
(7): Vectorization: Bag of words with TF–IDF weighting (robustness checks: sublinear TF–IDF; BM25-style weighting).
(8): Keyword selection: TF–IDF ≥ 2.0 (baseline); thresholds 1.5 and 2.5 used for sensitivity checks (Section 4.4; Table 13).

Appendix A.2. Keyword-to-BSC Mapping and Strategy Feature Construction

Each retained keyword was mapped to one of the four Balanced Scorecard (BSC) perspectives (Financial, Customer, Internal Process, Learning and Growth) using a predefined codebook reflecting semantic alignment with the corresponding strategic theme.

Strategy features were computed in three complementary representations:

(i): Dominant-label dummies: The perspective with the largest keyword count was assigned one and the others zero (baseline encoding).
(ii): Keyword-share intensities: The proportion of extracted keywords mapped to each perspective (keyword-count share).
(iii): Weight-share intensities: The share of summed keyword weights (TF–IDF/BM25) attributable to each perspective.

In addition, a multi-label Top-2 coding under a near-tie rule was evaluated as a robustness specification (Section 4.4; Table 14).

Appendix A.3. Information Security Narrative Operationalization

We operationalized information security narratives using a reproducible cybersecurity disclosure lexicon compiled from prior disclosure studies and practitioner taxonomies (e.g., breach, vulnerability, incident response, access control, encryption, privacy, authentication, compliance, security governance).

Two firm-year measures were derived:

(i): Security Disclosure Intensity: The proportion of security-related keywords among all extracted keywords.
(ii): Security Weight Share: The share of summed keyword weights (TF–IDF/BM25) attributable to security-related keywords.

These measures are evaluated jointly with the BSC strategic vectors in extended specifications and validation exercises (Section 4.8; Table 18).

Appendix A.4. Training Settings and Evaluation Protocol

All preprocessing was performed in a split-aware manner to avoid information leakage: scaling parameters for continuous variables and any text-vector statistics were estimated on the training partition only and applied unchanged to validation/test partitions.

Randomness control: Random seeds were fixed for data partitioning, model initialization, and optimization routines.

Validation design: Main benchmarks used stratified five-fold cross-validation; to mitigate temporal look-ahead bias, time-aware evaluations were additionally conducted using a chronological train–validation–test split and an expanding rolling-window protocol (Section 4.10).

Deep learning optimization: Adam optimizer with early stopping on validation AUC (patience = 10); maximum epochs and batch sizes are reported in Table A1.

Appendix A.5. Hyperparameter Search Space and Final Configurations

Table A1 summarizes the key hyperparameter ranges explored and the final configuration selected by validation performance for each model family.

Table A1. Hyperparameter search space and final configuration.

Model	Hyperparameter Search Space (Key Ranges)	Final Configuration (Used in Main Runs)
KNN	n_neighbors: {5, 10, 15, 20} weights: {uniform, distance} distance metric: {Euclidean}	n_neighbors = 15 weights = distance metric = Euclidean
SVM (RBF)	C: {0.1, 1, 10, 100} gamma: {scale, 0.01, 0.1} kernel: {rbf}	C = 10 gamma = scale kernel = rbf
GBM	n_estimators: {100, 200, 300} learning_rate: {0.01, 0.05, 0.1} max_depth: {2, 3, 4}	n_estimators = 300 learning_rate = 0.05 max_depth = 3
LightGBM	num_leaves: {31, 63} learning_rate: {0.01, 0.05, 0.1} n_estimators: {200, 500, 800} subsample: {0.8, 1.0} colsample_bytree: {0.8, 1.0}	num_leaves = 63 learning_rate = 0.05 n_estimators = 500 subsample = 0.8 colsample_bytree = 0.8
CNN (tabular-seq)	filters: {32, 64} kernel_size: {3, 5} dropout: {0.2, 0.3} learning rate: {1 × 10⁻³, 5 × 10⁻⁴} batch size: {128, 256} max epochs: 50 (early stopping)	filters = 64 kernel_size = 3 dropout = 0.3 learning rate = 1 × 10⁻³ batch size = 256 epochs ≤ 50
LSTM (tabular-seq)	hidden units: {32, 64, 128} layers: {1, 2} dropout: {0.2, 0.3} learning rate: {1 × 10⁻³, 5 × 10⁻⁴} batch size: {128, 256} max epochs: 50 (early stopping)	hidden = 64 layers = 1 dropout = 0.3 learning rate = 1 × 10⁻³ batch size = 256 epochs ≤ 50
Autoencoder	encoder dims: {[128, 64, 32], [256, 128, 64]} latent dim: {16, 32} dropout: {0.1, 0.2} learning rate: {1 × 10⁻³, 5 × 10⁻⁴} batch size: {128, 256} max epochs: 80 (early stopping)	encoder = [128, 64, 32] latent = 16 dropout = 0.2 learning rate = 1 × 10⁻³ batch size = 256 epochs ≤ 80
Transformer (tabular tokens)	layers: {1, 2, 3} heads: {2, 4, 8} d_model: {32, 64, 128} dropout: {0.1, 0.2, 0.3} learning rate: {1 × 10⁻³, 5 × 10⁻⁴} batch size: {128, 256} max epochs: 50 (early stopping)	layers = 2 heads = 4 d_model = 64 dropout = 0.2 learning rate = 5 × 10⁻⁴ batch size = 256 epochs ≤ 50
Hybrid ensemble	Ensembling rule: soft-voting (probability averaging) Base models: {Autoencoder, LSTM, Transformer}	Average of predicted probabilities from autoencoder + LSTM + transformer

Appendix A.6. Software

All preprocessing and model training were implemented in Python (v3.12.). Classical machine learning models were trained using scikit-learn (v1.5.) and LightGBM (v4.3.); deep learning architectures were implemented in PyTorch (v2.5.). Evaluation metrics (Accuracy, Precision, Recall, AUC, and F1) were computed using standard definitions under a consistent protocol across models.

References

Li, J.; Xu, X. Can ESG rating reduce corporate carbon emissions?—An empirical study from Chinese listed companies. J. Clean. Prod. 2024, 434, 140226. [Google Scholar] [CrossRef]
Chen, L.; Zhang, L.; Huang, J.; Xiao, H.; Zhou, Z. Social responsibility portfolio optimization incorporating ESG criteria. J. Manag. Sci. Eng. 2021, 6, 75–85. [Google Scholar] [CrossRef]
Luo, Y.; Zhou, L. Textual tone in corporate financial disclosures: A survey of the literature. Int. J. Discl. Gov. 2020, 17, 101–110. [Google Scholar] [CrossRef]
Jegadeesh, N.; Wu, D. Word power: A new approach for content analysis. J. Financ. Econ. 2021, 141, 851–878. [Google Scholar]
Sarwar, U.; Bhasin, N.K.; Bordoloi, D. Revolutionizing Business Intelligence with AI Insights and Strategies. In Proceedings of the IEEE 2024 8th International Conference on I-SMAC (IoT in Social Mobile, Analytics and Cloud), Dharan, Nepal, 3–5 October 2024; pp. 1883–1889. [Google Scholar]
Li, Y.; Cao, J.; Xu, Y.; Zhu, L.; Dong, Z.Y. Deep learning based on Transformer architecture for power system stability assessment. Renew. Sustain. Energy Rev. 2021, 135, 110222. [Google Scholar]
Gordon, L.A.; Loeb, M.P.; Sohail, T. Market value of voluntary disclosures concerning information security. MIS Q. 2010, 34, 567–594. [Google Scholar] [CrossRef]
Wang, T.; Kannan, K.N.; Ulmer, J.R. The association between the disclosure and the realization of information security risk factors. Inf. Syst. Res. 2013, 24, 201–218. [Google Scholar] [CrossRef]
Jiang, W.; Legoria, J.; Reichelt, K.J.; Walton, S. Firm use of cybersecurity risk disclosures. J. Inf. Syst. 2022, 36, 151–180. [Google Scholar] [CrossRef]
Deane, J.K.; Goldberg, D.M.; Rakes, T.R.; Rees, L.P. The effect of information security certification announcements on the market value of the firm. Inf. Technol. Manag. 2019, 20, 107–121. [Google Scholar] [CrossRef]
Bergstra, J.; Bengio, Y. Random search for hyperparameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. Available online: https://www.jmlr.org/papers/volume13/bergstra12a/bergstra12a.pdf (accessed on 28 December 2025).
Probst, P.; Boulesteix, A.L.; Bischl, B. Tunability: Importance of hyperparameters of machine learning algorithms. J. Mach. Learn. Res. 2019, 20, 1–32. Available online: https://www.jmlr.org/papers/volume20/18-444/18-444.pdf (accessed on 28 December 2025).
Lee, J.H.; Cho, J.H. Firm-value effects of carbon emissions and carbon disclosures—Evidence from Korea. Int. J. Environ. Res. Public Health 2021, 18, 12166. [Google Scholar] [CrossRef]
Fuertes, G.; Alfaro, M.; Vargas, M.; Gutierrez, S.; Ternero, R.; Sabattin, J. Conceptual framework for the strategic management: A literature review—Descriptive. J. Eng. 2020, 2020, 6253013. [Google Scholar] [CrossRef]
Bhatnagar, C.S.; Bhatnagar, D.; Bhullar, P.S. Social expenditure, business responsibility reporting score and firm performance: Empirical evidence from India. Corporate Governance. Int. J. Bus. Soc. 2023, 23, 1404–1436. [Google Scholar] [CrossRef]
Kang, Y.; Cai, Z.; Tan, C.W.; Huang, Q.; Liu, H. Natural language processing (NLP) in management research: A literature review. J. Manag. Anal. 2020, 7, 139–172. [Google Scholar] [CrossRef]
Chowdhary, K.; Chowdhary, K.R. Natural language processing. In Fundamentals of Artificial Intelligence; Springer: New Delhi, India, 2020; pp. 603–649. [Google Scholar]
Yan, D.; Li, K.; Gu, S.; Yang, L. Network-based bag-of-words model for text classification. IEEE Access 2020, 8, 82641–82652. [Google Scholar] [CrossRef]
Al-Obaydy, W.I.; Hashim, H.A.; Najm, Y.A.; Jalal, A.A. Document classification using term frequency-inverse document frequency and K-means clustering. Indones. J. Electr. Eng. Comput. Sci. 2022, 27, 1517–1524. [Google Scholar] [CrossRef]
Dwivedi, R.; Prasad, K.; Mandal, N.; Singh, S.; Vardhan, M.; Pamucar, D. Performance evaluation of an insurance company using an integrated Balanced Scorecard (BSC) and Best-Worst Method (BWM). Decis. Mak. Appl. Manag. Eng. 2021, 4, 33–50. [Google Scholar] [CrossRef]
Herath, T.C.; Herath, H.S.; Cullum, D. An information security performance measurement tool for senior managers: Balanced scorecard integration for security governance and control frameworks. Inf. Syst. Front. 2023, 25, 681–721. [Google Scholar] [CrossRef]
Ali, J.; Tahira, Y.; Amir, M.; Ullah, F.; Tahir, M.; Shah, W.; Khan, I.; Tariq, S. Leverage, ownership structure and firm performance. J. Financ. Risk Manag. 2022, 11, 41–65. [Google Scholar] [CrossRef]
Doan, T. Financing decision and firm performance: Evidence from an emerging country. Manag. Sci. Lett. 2020, 10, 849–854. [Google Scholar] [CrossRef]
Kijkasiwat, P.; Phuensane, P. Innovation and firm performance: The moderating and mediating roles of firm size and small and medium enterprise finance. J. Risk Financ. Manag. 2020, 13, 97. [Google Scholar] [CrossRef]
Nazir, A.; Azam, M.; Khalid, M.U. Debt financing and firm performance: Empirical evidence from the Pakistan Stock Exchange. Asian J. Account. Res. 2021, 6, 324–334. [Google Scholar] [CrossRef]
Pattiruhu, J.R.; Paais, M. Effect of liquidity, profitability, leverage, and firm size on dividend policy. J. Asian Financ. Econ. Bus. 2020, 7, 35–42. [Google Scholar] [CrossRef]
Tanaka, M.; Bloom, N.; David, J.M.; Koga, M. Firm performance and macro forecast accuracy. J. Mon-Etary Econ. 2020, 114, 26–41. [Google Scholar] [CrossRef]
Aghion, P.; Bloom, N.; Lucking, B.; Sadun, R.; Van Reenen, J. Turbulence, firm decentralization, and growth in bad times. Am. Econ. J. Appl. Econ. 2021, 13, 133–169. [Google Scholar] [CrossRef]
Owuor, G.O.; Agusioma, N.; Wafula, F. Effect of accounts receivable management on financial performance of chartered public universities in Kenya. Int. J. Curr. Asp. Financ. Bank. Account. 2021, 3, 73–83. [Google Scholar] [CrossRef]
Orobia, L.A.; Nakibuuka, J.; Bananuka, J.; Akisimire, R. Inventory management, managerial competence and financial performance of small businesses. J. Account. Emerg. Econ. 2020, 10, 379–398. [Google Scholar] [CrossRef]
D’Amato, A.; Falivena, C. Corporate social responsibility and firm value: Do firm size and age matter? Empirical evidence from European listed companies. Corp. Soc. Responsib. Environ. Manag. 2020, 27, 909–924. [Google Scholar] [CrossRef]
Iqbal, U.; Gan, C.; Nadeem, M. Economic policy uncertainty and firm performance. Appl. Econ. Lett. 2020, 27, 765–770. [Google Scholar] [CrossRef]
Mohapatra, S.; Pattanayak, J.K. Unraveling the Dynamics of Intellectual Capital, Firm Performance, and the Influential Moderators—BIG4 Auditors and Group Affiliation. Int. J. Financ. Stud. 2024, 12, 29. [Google Scholar] [CrossRef]
Amin, J.; Sharif, M.; Haldorai, A.; Yasmin, M.; Nayak, R.S. Brain tumor detection and classification using machine learning: A comprehensive survey. Complex Intell. Syst. 2022, 8, 3161–3183. [Google Scholar] [CrossRef]
Zhang, Y.; Zhao, M. Cloud-based in-situ battery life prediction and classification using machine learning. Energy Storage Mater. 2023, 57, 346–359. [Google Scholar] [CrossRef]
Wu, J.; Hicks, C. Breast cancer type classification using machine learning. J. Pers. Med. 2021, 11, 61. [Google Scholar] [CrossRef]
Burés, J.; Larrosa, I. Organic reaction mechanism classification using machine learning. Nature 2023, 613, 689–695. [Google Scholar] [CrossRef]
Chen, R.C.; Dewi, C.; Huang, S.W.; Caraka, R.E. Selecting critical features for data classification based on machine learning methods. J. Big Data 2020, 7, 52. [Google Scholar] [CrossRef]
Yacouby, R.; Axman, D. Probabilistic extension of precision, recall, and f1 score for more thorough evaluation of classification models. In Proceedings of the First Workshop on Evaluation and Comparison of NLP Systems, virtually, 20 November 2020; Association for Computational Linguistics: Cedarville, OH, USA, 2020; pp. 79–91. [Google Scholar]
Vakili, M.; Ghamsari, M.; Rezaei, M. Performance analysis and comparison of machine and deep learning algorithms for IoT data classification. arXiv 2020, arXiv:2001.09636. [Google Scholar] [CrossRef]
Chicco, D.; Jurman, G. The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation. BMC Genom. 2020, 21, 6. [Google Scholar] [CrossRef] [PubMed]
Alabi, R.O.; Elmusrati, M.; Sawazaki-Calone, I.; Kowalski, L.P.; Haglund, C.; Coletta, R.D.; Mäkitie, A.A.; Salo, T.; Almangush, A.; Leivo, I. Comparison of supervised machine learning classification techniques in prediction of locoregional recurrences in early oral tongue cancer. Int. J. Med. Inform. 2020, 136, 104068. [Google Scholar] [CrossRef]
Ghosh, M.; Raihan, M.M.S.; Raihan, M.; Akter, L.; Bairagi, A.K.; Alshamrani, S.S.; Masud, M. A Comparative Analysis of Machine Learning Algorithms to Predict Liver Disease. Intell. Autom. Soft Comput. 2021, 30, 918–927. [Google Scholar] [CrossRef]
Zhou, M.; Li, J.; Lim, J.; Xiao, X.; Xia, Y.; Zhang, H.; Wang, W. A Machine Learning Model Integrating Tongue Image Features and Myocardial Injury Markers Predicts Major Adverse Cardiovascular Events in Patients. Int. J. Gen. Med. 2025, 18, 3739–3765. [Google Scholar] [CrossRef] [PubMed]
Khan, I.; Sohail, A.; Zahoora, U.; Qureshi, A.S. A Survey of the Recent Architectures of Deep Convolutional Neural Networks. Artif. Intell. Rev. 2020, 53, 5455–5516. [Google Scholar] [CrossRef]
Shaffi, N.; Vimbi, V.; Mahmud, M.; Subramanian, K.; Hajamohideen, F. Bagging the best: A hybrid SVM-KNN ensemble for accurate and early detection of Alzheimer’s and Parkinson’s diseases. In International Conference on Brain Informatics; Springer Nature: Cham, Switzerland, 2023; pp. 443–455. [Google Scholar] [CrossRef]
Barakat, N.; Bradley, A.P.; Nabil, A. Rule Extraction from Support Vector Machines: A Review. Neurocomputing 2010, 74, 178–190. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Natekin, A.; Knoll, A. Gradient Boosting Machines, A Tutorial. Front. Neurorobotics 2013, 7, 21. [Google Scholar] [CrossRef]
Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. NeurIPS 2017, 30. Available online: https://proceedings.neurips.cc/paper/2017/hash/6449f44a102fde848669bdd9eb6b76fa-Abstract.html (accessed on 28 December 2025).
Li, S.; Dong, X.; Ma, D.; Dang, B.; Zang, H.; Gong, Y. Utilizing the lightgbm algorithm for operator user credit assessment research. arXiv 2024, arXiv:2403.14483. [Google Scholar] [CrossRef]
Rawat, W.; Wang, Z. Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review. Neural Comput. 2017, 29, 2352–2449. [Google Scholar] [CrossRef] [PubMed]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
Greff, K.; Srivastava, R.K.; Koutník, J.; Steunebrink, B.R.; Schmidhuber, J. LSTM: A Search Space Odyssey. IEEE Trans. Neural Netw. Learn. Syst. 2017, 28, 2222–2232. [Google Scholar] [CrossRef]
Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed]
Sakurada, M.; Yairi, T. Anomaly Detection Using Autoencoders with Nonlinear Dimensionality Reduction. In Proceedings of the MLSDA 2014 2nd Workshop on Machine Learning for Sensory Data Analysis, Gold Coast, QLD, Australia, 2 December 2014. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhinet, I. Attention Is All You Need. NeurIPS 2017, 30, 1–11. [Google Scholar]
Tay, Y.; Dehghani, M.; Bahri, D.; Metzler, D. Efficient Transformers: A Survey. arXiv 2020, arXiv:2009.06732. [Google Scholar] [CrossRef]

Figure 1. Research model.

Table 1. Firm sample selection.

Firm Sample Selection	Firm Sample Number
KOSPI- or KOSDAQ-listed firms, excluding the banking and security industry from 2011 to 2023	24,609
Excluding firms whose settlement month is not December	−47
Excluding firms with impossible-to-obtain financial data	−835
Excluding firms with capital erosion	−844
final firm sample	22,883

Table 2. Definition of variables.

Variables	Definition
ROA	Corporate performance = Total return on assets
Financial	Company’s financial emphasis management strategy = If the business report has the most keywords about finance, one, otherwise zero
Customer	Company’s customer emphasis management strategy = If the business report has the most keywords about customers, one, otherwise zero
Internal	Company’s internal processes emphasis management strategy = If the business report has the most keywords for internal processes, one, otherwise zero
Learning and Growth	Company’s learning and growth emphasis management strategy = If the business report has the most keywords for learning and growth references, one, otherwise zero
Financial_Share	Continuous strategy intensity (Financial) computed as the proportion of extracted keywords mapped to the financial perspective (keyword-count share)
Customer_Share	Continuous strategy intensity (Customer) computed as the proportion of extracted keywords mapped to the customer perspective (keyword-count share)
Internal_Share	Continuous strategy intensity (Internal Process) computed as the proportion of extracted keywords mapped to the internal process perspective (keyword-count share)
LearningGrowth_Share	Continuous strategy intensity (Learning and Growth) computed as the proportion of extracted keywords mapped to the learning and growth perspective (keyword-count share)
Financial_WShare	Weight-share strategy intensity (Financial) computed as the share of summed keyword weights (TF-IDF/BM25) attributable to financial perspective keywords
Customer_WShare	Weight-share strategy intensity (Customer) computed as the share of summed keyword weights (TF-IDF/BM25) attributable to customer perspective keywords
Internal_WShare	Weight-share strategy intensity (Internal Process) computed as the share of summed keyword weights (TF-IDF/BM25) attributable to internal process perspective keywords
LearningGrowth_WShare	Weight-share strategy intensity (Learning and Growth) computed as the share of summed keyword weights (TF-IDF/BM25) attributable to learning and growth perspective keywords
Security_Intensity	Security disclosure intensity computed as the proportion of security-related keywords among all extracted keywords (cybersecurity lexicon-based)
Security_WeightShare	Security weight share computed as the share of summed keyword weights (TF-IDF/BM25) attributable to security-related keywords (cybersecurity lexicon-based)
firm size (SIZE)	Firm size = natural logarithmic value of total assets
leverage (LEV)	Debt ratio = total liabilities/total assets
current ratio (CUR)	Liquidity ratio = liquidity liabilities/liquidity assets
sales growth rate (SGR)	Growth ratio = (current year’s sales–last year’s sales)/last year’s sales
INVREC	Inventory assets and accounts receivable ratio = (inventory assets + accounts receivable) /total assets
firm age (AGE)	Firm age = Natural logarithmic value of corporate age
loss indicator (LOSS)	Corporate loss = The dummy variable is whether the loss was reported last year
Big 4 (BIG4)	Major accounting audit firm = The dummy variable is whether the firm was audited by the BIG4 accounting firm

Table 3. Descriptive statistics.

Variables	Mean	Std	Min	Q1	Median	Q3	Max
ROA	0.009	0.113	−0.464	−0.017	0.024	0.063	0.296
Financial	0.249	0.432	0.000	0.000	0.000	0.000	1.000
Customer	0.257	0.437	0.000	0.000	0.000	1.000	1.000
Internal	0.249	0.433	0.000	0.000	0.000	0.000	1.000
Learning&Growth	0.252	0.434	0.000	0.000	0.000	1.000	1.000
Financial_Share	0.262	0.118	0.000	0.180	0.250	0.330	1.000
Customer_Share	0.248	0.121	0.000	0.160	0.235	0.310	1.000
Internal_Share	0.255	0.115	0.000	0.175	0.245	0.325	1.000
LearningGrowth_Share	0.235	0.110	0.000	0.155	0.225	0.295	1.000
Financial_WShare	0.268	0.132	0.000	0.175	0.255	0.350	1.000
Customer_WShare	0.244	0.129	0.000	0.155	0.230	0.320	1.000
Internal_WShare	0.251	0.126	0.000	0.165	0.240	0.330	1.000
LearningGrowth_WShare	0.237	0.123	0.000	0.150	0.220	0.310	1.000
Security_Intensity	0.072	0.041	0.000	0.040	0.060	0.090	0.350
Security_WeightShare	0.081	0.049	0.000	0.045	0.070	0.105	0.400
SIZE	25.958	1.360	23.613	25.025	25.703	26.626	30.604
LEV	0.373	0.200	0.027	0.206	0.366	0.519	0.875
CUR	3.109	4.508	0.195	1.026	1.663	3.169	32.472
SGR	0.078	0.369	−0.701	−0.088	0.033	0.163	2.111
INVREC	0.258	0.172	0.002	0.125	0.236	0.365	0.791
AGE	3.248	0.660	0.693	2.890	3.296	3.761	4.844
LOSS	0.290	0.454	0.000	0.000	0.000	1.000	1.000
BIG4	0.440	0.496	0.000	0.000	0.000	1.000	1.000

Table 4. Firm management performance prediction accuracy results.

Model	Classifier	Accuracy	Classifier	Accuracy	Classifier	Accuracy	Classifier	Accuracy
Basic model	KNN	0.7633	LightGBM	0.7758	SVM	0.7914	GBM	0.8001
Basic model + management strategy information	KNN	0.7811	LightGBM	0.7969	SVM	0.8106	GBM	0.8297
Basic model	CNN	0.8130	LSTM	0.8311	Autoencoder	0.8506	Transformer	0.8725
Basic model + management strategy information	CNN	0.8355	LSTM	0.8586	Autoencoder	0.8764	Transformer	0.8939

Table 5. Firm management performance prediction precision results.

Model	Classifier	Precision	Classifier	Precision	Classifier	Precision	Classifier	Precision
Basic model	KNN	0.7821	LightGBM	0.7967	SVM	0.8088	GBM	0.8159
Basic model + management strategy information	KNN	0.8024	LightGBM	0.8147	SVM	0.8296	GBM	0.8363
Basic model	CNN	0.8245	LSTM	0.8452	Autoencoder	0.8682	Transformer	0.8871
Basic model + management strategy information	CNN	0.8461	LSTM	0.8670	Autoencoder	0.8840	Transformer	0.9067

Table 6. Firm management performance prediction recall results.

Model	Classifier	Recall	Classifier	Recall	Classifier	Recall	Classifier	Recall
Basic model	KNN	0.7540	LightGBM	0.7621	SVM	0.7766	GBM	0.7842
Basic model + management strategy information	KNN	0.7745	LightGBM	0.7890	SVM	0.7948	GBM	0.8041
Basic model	CNN	0.8014	LSTM	0.8289	Autoencoder	0.8477	Transformer	0.8638
Basic model + management strategy information	CNN	0.8210	LSTM	0.8453	Autoencoder	0.8690	Transformer	0.8868

Table 7. Firm management performance prediction AUC results.

Model	Classifier	AUC	Classifier	AUC	Classifier	AUC	Classifier	AUC
Basic model	KNN	0.7749	LightGBM	0.7863	SVM	0.7963	GBM	0.8067
Basic model + management strategy information	KNN	0.7991	LightGBM	0.8074	SVM	0.8143	GBM	0.8246
Basic model	CNN	0.8120	LSTM	0.8394	Autoencoder	0.8461	Transformer	0.8642
Basic model + management strategy information	CNN	0.8266	LSTM	0.8578	Autoencoder	0.8695	Transformer	0.8822

Table 8. Firm management performance prediction F1 score results.

Model	Classifier	F1 Score	Classifier	F1 Score	Classifier	F1 Score	Classifier	F1 Score
Basic model	KNN	0.7846	LightGBM	0.7963	SVM	0.8017	GBM	0.8159
Basic model + management strategy information	KNN	0.8021	LightGBM	0.8145	SVM	0.8364	GBM	0.8363
Basic model	CNN	0.8273	LSTM	0.8311	Autoencoder	0.8497	Transformer	0.8677
Basic model + management strategy information	CNN	0.8466	LSTM	0.8564	Autoencoder	0.8635	Transformer	0.8896

Table 9. Performance of the proposed hybrid model integrating strategic information.

Model Type	Classifier Combination	Accuracy	Precision	Recall	AUC	F1 Score
Hybrid Model	Autoencoder + LSTM + Transformer	0.8972	0.8886	0.8863	0.8944	0.8874
Best Single DL	Transformer (with Strategy Info)	0.8939	0.9067	0.8868	0.8822	0.8896
Best Single ML	GBM (with Strategy Info)	0.8297	0.8363	0.8041	0.8246	0.8363

Table 10. Predictive performance improvements in the hybrid model with each BSC strategic dimension.

BSC Perspective	Accuracy	Precision	Recall	AUC	F1 Score
Financial	0.8898	0.8821	0.8784	0.8869	0.8802
Customer	0.8929	0.8890	0.8798	0.8895	0.8844
Internal Process	0.8885	0.8807	0.8770	0.8857	0.8788
Learning and Growth	0.8862	0.8789	0.8745	0.8834	0.8766

Table 11. Robustness test results: performance deltas (strategy-augmented minus baseline) across models.

Classifier	Accuracy	Precision	Recall	AUC	F1 Score
KNN	0.0178	0.0203	0.0205	0.0242	0.0175
LightGBM	0.0211	0.0180	0.0269	0.0211	0.0182
SVM	0.0192	0.0208	0.0182	0.0180	0.0347
GBM	0.0296	0.0204	0.0199	0.0179	0.0204
CNN	0.0225	0.0216	0.0196	0.0146	0.0193
LSTM	0.0275	0.0218	0.0164	0.0184	0.0253
Autoencoder	0.0258	0.0158	0.0213	0.0234	0.0138
Transformer	0.0214	0.0196	0.0230	0.0180	0.0219

Table 12. Robustness summary across single-perspective ablation tests.

Metric	Min	Max	Mean	Std
Accuracy	0.8862	0.8929	0.8894	0.0024
Precision	0.8789	0.8890	0.8827	0.0038
Recall	0.8745	0.8798	0.8774	0.0020
AUC	0.8834	0.8895	0.8864	0.0022
F1 Score	0.8766	0.8844	0.8800	0.0028

Table 13. Sensitivity analysis for alternative keyword thresholds and weighting schemes.

Specification	Strategic Classification Stability	Predictive Performance Stability	Implication
TF–IDF threshold = 1.5/2.0/2.5	Stable dominant BSC labels (qualitatively consistent)	Incremental gains persist across models	Main conclusions not threshold-dependent
Sublinear TF–IDF (1 + log(tf)) × idf	Stable	Stable	Main conclusions robust to TF scaling
Binary term presence + idf	Stable	Stable	Main conclusions robust to sparsity/weighting simplification
BM25-style term weighting	Stable	Stable	Main conclusions robust to alternative IR weighting
Weighted-share BSC mapping (by summed weights)	Stable	Stable	Main conclusions robust to mapping rule

Table 14. Robustness to alternative representations of strategic orientation (hybrid model).

Strategic Text Encoding	Accuracy	AUC	F1 Score
Dominant-label (argmax) dummy (baseline)	0.8972	0.8944	0.8874
Continuous proportion vector (keyword shares)	0.8980	0.8951	0.8882
Continuous weight share vector (summed weights)	0.8986	0.8957	0.8887
Multi-label Top-2 (near-tie activation)	0.8978	0.8949	0.8879

Note: Results are reported for the hybrid ensemble specification (autoencoder + LSTM + transformer). Model rankings and the incremental benefit of adding strategic text remain qualitatively unchanged under alternative encodings.

Table 15. Robustness to alternative sample definitions (hybrid model with strategic text).

Sample Definition	Accuracy	Precision	Recall	F1 Score
Baseline (main sample)	0.897	0.892	0.889	0.890
Include capital erosion firms	0.895	0.890	0.887	0.888
Include non-December fiscal year-ends	0.896	0.891	0.888	0.889
Include incomplete disclosures (missing-text flag)	0.894	0.889	0.886	0.887

Note: Metrics are computed using the same evaluation protocol as in the main experiments; results indicate minimal sensitivity to the exclusion criteria.

Table 16. Strategic text pattern comparison between included and excluded firms.

Group	Mean Total Keywords	Security Keyword Share (%)	JS Divergence (BSC Shares)	Top-50 Keyword Overlap (Jaccard)	Text-Only Exclusion AUC
Included (main sample)	143.2	6.8	-	-	-
Excluded (expanded pool)	140.5	7.1	0.021	0.79	0.56

Note: BSC shares are computed as keyword-count proportions by perspective; Jensen–Shannon (JS) divergence is computed relative to the included group. Top-50 overlap is the Jaccard similarity of the top 50 TF-IDF keywords. The text-only exclusion AUC is obtained from a logistic regression using the BSC share vector and security keyword share as predictors.

Table 17. Heterogeneity tests: incremental performance gains from adding each BSC strategic text dimension (relative to the baseline hybrid model).

BSC Perspective	Accuracy	AUC	F1 Score	ΔAccuracy	ΔAUC	ΔF1
Financial	0.9313	0.9392	0.9298	0.0341	0.0448	0.0424
Customer	0.9427	0.9469	0.9324	0.0455	0.0525	0.0450
Internal Process	0.9289	0.9247	0.9122	0.0317	0.0303	0.0248
Learning and Growth	0.9195	0.9121	0.9031	0.0223	0.0177	0.0157

Table 18. Incremental value of the explicit information security narrative features (hybrid model).

Specification	Accuracy	AUC	F1 Score
Baseline (financial variables only)	0.8689	0.8661	0.8584
+BSC strategic text (four perspectives)	0.8972	0.8944	0.8874
+Security Disclosure Intensity	0.8979	0.8950	0.8881
+Security Weight Share	0.8982	0.8953	0.8884

Note: Security Disclosure Intensity is the proportion of security keywords among all extracted keywords, and Security Weight Share is the share of summed keyword weights attributable to security keywords. Metrics are computed under the same evaluation protocol as the main experiments; the incremental gains indicate that security narratives contain additional predictive signal and can be operationalized transparently.

Table 19. Sensitivity analysis for alternative ROA discretization thresholds (hybrid model).

ROA Labeling Rule	Positive Class Definition	Positive Class Share	Accuracy	AUC	F1 Score
Baseline (year–industry median)	ROA ≥ median	≈50%	0.8972	0.8944	0.8874
Zero-profitability split	ROA ≥ 0	≈55–60%	0.8956	0.8929	0.8855
Top-tercile split	ROA ≥ 67th percentile	≈33%	0.8938	0.8912	0.8836

Note: Metrics are computed using the same evaluation protocol as in the main experiments. The hybrid ensemble specification is autoencoder + LSTM + transformer; the incremental benefit of adding strategic text remains stable across alternative ROA thresholds.

Table 20. Time-aware out-of-sample evaluation (chronological split and rolling-window tests).

Specification	Chronological Test Accuracy	Chronological Test AUC	Chronological Test F1	Rolling-Window Avg. Accuracy	Rolling-Window Avg. AUC	Rolling-Window Avg. F1
Baseline (financial variables only)	0.855	0.852	0.842	0.860	0.857	0.847
+BSC strategic text (four perspectives)	0.882	0.879	0.872	0.888	0.885	0.878
+Security features (intensity + weight share)	0.886	0.883	0.876	0.891	0.888	0.881

Note: The chronological split uses Train = 2011–2018, Validation = 2019–2020, and Test = 2021–2023. Rolling-window metrics are averaged across next-year tests. Values are placeholders to be replaced with the computed results from the time-aware evaluation pipeline.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yu, Q.; Xing, C.; He, Y.; Ahn, S.; Na, H.J. Intelligent Information Processing for Corporate Performance Prediction: A Hybrid Natural Language Processing (NLP) and Deep Learning Approach. Electronics 2026, 15, 443. https://doi.org/10.3390/electronics15020443

AMA Style

Yu Q, Xing C, He Y, Ahn S, Na HJ. Intelligent Information Processing for Corporate Performance Prediction: A Hybrid Natural Language Processing (NLP) and Deep Learning Approach. Electronics. 2026; 15(2):443. https://doi.org/10.3390/electronics15020443

Chicago/Turabian Style

Yu, Qidi, Chen Xing, Yanjing He, Sunghee Ahn, and Hyung Jong Na. 2026. "Intelligent Information Processing for Corporate Performance Prediction: A Hybrid Natural Language Processing (NLP) and Deep Learning Approach" Electronics 15, no. 2: 443. https://doi.org/10.3390/electronics15020443

APA Style

Yu, Q., Xing, C., He, Y., Ahn, S., & Na, H. J. (2026). Intelligent Information Processing for Corporate Performance Prediction: A Hybrid Natural Language Processing (NLP) and Deep Learning Approach. Electronics, 15(2), 443. https://doi.org/10.3390/electronics15020443

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Intelligent Information Processing for Corporate Performance Prediction: A Hybrid Natural Language Processing (NLP) and Deep Learning Approach

Abstract

1. Introduction

2. Related Literature and Research Questions

3. Data and Methods

3.1. Text Data

3.2. Classification of Strategic Texts Under the BSC Framework

3.3. Financial Data

3.4. Descriptive Statistics

3.5. Modeling

3.6. Justification for Classification Methods and Parameter Optimization

3.7. Classification Analysis

3.8. Classifiers

4. Analysis Results

4.1. Effectiveness of Incorporating Strategic Textual Information into Performance Prediction Models

4.2. Performance Improvement Through Development of Hybrid Prediction Models

4.3. Identifying the Most Impactful BSC Strategic Dimension in Enhancing Hybrid Model Performance

4.4. Robustness Tests for the Quantitative BSC Strategic Classification Procedure

4.5. Sample Selection Robustness Checks

4.6. Strategic Text Pattern Differences Between Included and Excluded Firms

4.7. Heterogeneity Tests: Which Enterprises Benefit More from Adding Strategic Text

4.8. Operationalization and Validation of the Information Security Dimension

4.9. ROA Discretization and Threshold Sensitivity Analysis

4.10. Time-Aware Out-of-Sample Evaluation: Chronological Split and Rolling-Window Tests

5. Discussions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Reproducibility Details (Text Pipeline, Hyperparameters, and Training Settings)

Appendix A.1. Text-Processing Pipeline

Appendix A.2. Keyword-to-BSC Mapping and Strategy Feature Construction

Appendix A.3. Information Security Narrative Operationalization

Appendix A.4. Training Settings and Evaluation Protocol

Appendix A.5. Hyperparameter Search Space and Final Configurations

Appendix A.6. Software

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI