Next Article in Journal
Emerging Use of AI and Its Relationship to Corporate Finance and Governance
Previous Article in Journal
Exploring the Role of Brand Capital Investment in the Realization of Firm-Level ESG Benefits and Consequences on Firm Performance: An Empirical Study
Previous Article in Special Issue
AI as an Intelligent Control: Evidence from Italy on Governance, Risk, and the Transformation from Manual to Intelligent Accounting
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Artificial Intelligence’s Role in Predicting Corporate Financial Performance: Evidence from the MENA Region

1
Department of Accounting and Information Systems, Faculty of International Business and Humanities, Egypt-Japan University of Science and Technology, Alexandria 21934, Egypt
2
Department of Accounting, Faculty of Business, Alexandria University, Alexandria 21526, Egypt
*
Author to whom correspondence should be addressed.
J. Risk Financial Manag. 2026, 19(1), 51; https://doi.org/10.3390/jrfm19010051
Submission received: 11 November 2025 / Revised: 28 December 2025 / Accepted: 2 January 2026 / Published: 8 January 2026

Abstract

This study classifies corporate financial performance in countries in the Middle East and North Africa (MENA) region, addressing the critical need for accurate and early identification of high-, moderate-, and low-performance companies. The selection of the MENA region was driven by its significant economic growth, diverse market structures, and increasing attractiveness for foreign investment, which makes accurate financial performance assessment important. Despite the growing interest in AI applications for corporate financial performance, a research gap still persists. Existing studies focus primarily on bankruptcy and financial distress prediction in developed countries, with rather limited studies on multi-class financial performance classification in the MENA region. This study addresses a significant gap in the corporate financial performance evaluation literature, which is the lack of a robust, comparative evaluation of advanced DL techniques against conventional ML methods for multi-class corporate financial performance prediction using high-dimensional data. This study employs a design science research (DSR) approach by developing an evaluation analytics artifact that integrates structured preprocessing, dimensionality reduction, and comparative ML and DL modeling, following the relevance, design, and rigor cycles. By employing a design science research (DSR) methodology, the research used a dataset from the Compustat database, comprising 7971 firm-year observations from 2013 to 2024. A rigorous dimensionality reduction process, including pairwise correlation filtering, resulted in a final set of 15 key classification features. The study compared three machine learning techniques—random forests (RFs), support vector machines (SVMs), and eXtreme Gradient Boosting (XGBoost), against one deep learning technique, deep neural networks (DNNs), for classifying the corporate financial performance of MENA-region companies. The models were trained to classify corporations into three performance classes (low, moderate, and high), using the earnings per share (EPS) as the target variable. The empirical findings indicate that all four machine learning algorithms achieved meaningful predictive performance in classifying EPS-based corporate performance. Among the benchmark models, the support vector machine (SVM) and random forest (RF) classifiers produced stable and competitive results, indicating strong generalization capabilities across firms and periods. XGBoost consistently outperformed the traditional machine learning models, delivering the highest overall classification accuracy and superior discriminatory power, highlighting its effectiveness in capturing nonlinear relationships and complex feature interactions. Similarly, the deep neural network further improved classification performance relative to the benchmark models and exhibited comparable results to XGBoost, especially in modeling high-dimensional data. This superior performance can substantially enhance earnings performance classification through early performance deterioration and improvement identification, allowing more proactive strategic and operational decisions.

1. Introduction

Over the past few decades, predicting corporate financial performance has gained increased scholarly attention, particularly following the crash of the stock market and the great depression (Aljawazneh et al., 2021). In fact, prior research has proven that the global business environment is recognized for its complexity, created by the dynamic and competitive nature caused by growing globalization and market fluctuations, as well as emerging technologies. As a result, companies are motivated to enhance their financial performance in order to maintain competitiveness and respond effectively to evolving market demands. Additionally, financial performance is closely linked to the timing of strategic decision making, which in turn affects competitiveness and growth within the current dynamic and fast-changing business environment (Hezam et al., 2025). Additionally, financial performance serves as a control mechanism for a company’s strategy implementation and is considered an essential indicator of its overall health and operational efficiency (Abdellatif et al., 2023). Accordingly, conducting a thorough analysis and evaluation of financial performance is essential for understanding the strengths and weaknesses of companies, as it enables stakeholders to make more informed decisions and identify potential opportunities for improvement (Omar et al., 2025).
Moreover, accurately classifying and predicting corporate financial performance is a fundamental concern for investors, creditors, and regulators, as it directly affects capital allocation, credit decisions, and financial stability. Consequently, improved prediction and classification of corporate financial performance based on financial statement information can enhance external monitoring, reduce information asymmetry, mitigate agency conflicts, and support more informed investment, lending, and regulatory decisions, particularly in emerging markets such as the MENA region.
A company’s financial reports serve as the primary (if not the only) source of information that presents and explains the financial health of a company, along with its operational activities and cash flow in a given accounting period (Manogna & Mishra, 2021). In these regards, the analysis of financial reports has been a keystone of stakeholder decision making, enabling an enhanced understanding of the overall corporate financial health and position (Zaini & Mahmuddin, 2019). For a long time, decision makers mainly relied on financial ratios and indicators derived from these reports as a statistical method widely used to evaluate and predict the financial performance of the company (Delen et al., 2013; Gregova et al., 2020). Specifically, a variety of traditional statistical methodologies, such as linear regression, ANOVA, and factor analysis, have typically been used in constructing predictive models in earlier studies to assess corporate financial performance (Delen et al., 2013) through leveraging historical and present data to forecast a company’s future financial position, aiming to minimize risk and maximize benefits (Abdellatif et al., 2023).
Despite the existence of a significant body of research that employed the traditional statistical prediction models relying on these financial indicators (Lee et al., 2017), the increasing volume and complexity of modern financial data present significant challenges. This data is often characterized by high dimensionality, nonlinear relationships, and subtle patterns that are difficult for traditional statistical methods to capture effectively (Rundo et al., 2019; Billios et al., 2024). Consequently, there has been an urgent need to use more advanced analytical techniques. Today’s technological advancements, including artificial intelligence (AI), the Internet of Things (IoT), and big data (Omar et al., 2025), have revolutionized the complexity of the performance evaluation process (Hezam et al., 2025). As suggested by the prior research of Moubarak (2024), the application of AI techniques offers a powerful solution, providing the capability to navigate these complexities and uncover the complicated predictive relationships within financial data, thereby enabling more accurate and reliable predictions.
AI is among the breakthroughs shaping the digital era and has emerged as one of the most trending and prominent technologies (Kureljusic & Karger, 2024). AI is a branch of computer science that is mainly focused on creating intelligent computer systems capable of performing complex tasks that require human intelligence (Sarker, 2022). Recently, scholars and managers have shown their great interest and focused attention on AI technology (Brock & von Wangenheim, 2019), particularly in the accounting field, as it is used for prediction tasks such as classification, clustering, regression, and ranking (Xia et al., 2013; Kureljusic & Karger, 2024). Therefore, the quantity of research conducted that uses AI in this area of accounting has grown progressively.
There are several AI techniques for predicting corporate financial performance, yet machine learning (ML) and deep learning (DL) are the most prominent and relevant practices (M. Zhou et al., 2022) which have recently attracted the attention of scholars and professionals in this area (Ozbayoglu et al., 2020; Aljawazneh et al., 2021). Although numerous ML and DL techniques for predicting financial performance have been widely employed in prior studies, the majority of these applications have been conducted in contexts outside the Middle East and North Africa (MENA) region. For example, scholars have extensively applied methods such as random forests (RFs), support vector machines (SVMs), eXtreme Gradient Boosting (XGBoost), and deep neural networks (DNNs) in Latin America, North America, Europe, and parts of Asia to capture determinants of firm performance (Dasilas & Rigani, 2024; Hamdi et al., 2024; Gajdosikova & Michulek, 2025).
However, the MENA region exhibits distinct institutional, cultural, and economic characteristics, including varying degrees of market maturity, regulatory frameworks, and socio-political conditions. Specifically, the MENA region is driven by its significant economic growth, diverse market structures, and increasing attractiveness for foreign investment, as it holds a strategic position in the global economy due to its abundant natural resources, particularly oil and gas, its rapidly diversifying markets, and its role as a hub linking Asia, Europe, and Africa. Predicting financial performance in this region is not only vital for local stakeholders but also has global significance, as fluctuations in MENA economies influence international energy markets, global trade flows, and investment strategies. Moreover, the region has been undergoing substantial institutional reforms and digital transformations, making it an important testing ground for theories of financial performance and corporate governance.
In this sense, based on the previous discussion, this research aims to answer the following research questions (RQs):
RQ1: To what extent do machine learning and deep learning models differ in their classification effectiveness and discriminatory power with regard to EPS-based performance classes?
RQ2: How does the importance of financial indicators in EPS-based classification differ across machine learning and deep learning?
This study aims to contribute to the existing literature by comparing the performance of different ML and DL classification models, namely RFs, SVMs, XGBoost, and DNNs, for accurately predicting the corporate financial performance of listed companies in 10 MENA-region countries (Egypt, Jordan, Morocco, Tunisia, Qatar, Oman, the UAE, Saudi Arabia, Bahrain, and Kuwait) by using data obtained from the Compustat database for the analysis. Additionally, by generating MENA region-specific evidence, this study addresses a gap in the literature and offers insights that can enhance the accuracy of global comparative studies, improve risk assessment for multinational investors, and support policymakers in designing contextually relevant economic strategies. This makes accurate financial performance assessment and prediction crucial. It is important to clarify that this research adopts a contemporaneous classification framework rather than a strict inter-temporal forecasting design. Specifically, the objective is to classify firms into low-, moderate-, and high-performance categories based on financial indicators observed in the same fiscal year, rather than forecast future performance across time. Furthermore, this study adds to the existing literature in this research area and provides a foundation for future research. It offers valuable contributions and insights for scholars in this area, since this study contributes to the ongoing academic debate and discussion by pointing out important topics for future research ideas to be conducted later. Moreover, it contributes to the existing literature by applying and comparing four AI algorithms—RFs, SVMs, XGBoost, and DNNs—to classify and predict corporate financial performance in the MENA region, which is an under-researched area, as it provides methodological insights into model accuracy and class-specific behavior while offering empirical evidence that deep learning yields the most reliable classification results.
The remainder of the study is organized as follows. Section 2 provides a review of the prior literature, while Section 3 presents the methodology of this research, which includes the used dataset, the features selected for analysis, the preprocessing techniques, as well as the classification algorithms employed to construct the classification models for predicting corporate financial performance. Moreover, Section 4 presents the analysis of results, explaining the research findings. Finally, the conclusion of the study, limitations, contributions, recommendations for future research, and implications are all provided in Section 5.

2. Literature Review

2.1. AI and Corporate Financial Performance Prediction

In recent studies, the shortcomings of the traditional methods used for predicting corporate financial performance, such as multicollinearity and multivariate normality, have been increasingly recognized (Jabeur et al., 2021). Hence, the outbreak of several digital technologies, such as AI, has led to the shift from traditional techniques in order to avoid their inadequacies, resulting in a paradigm change that provides better understanding of the market trends and dynamics (Jabeur et al., 2021). Borges et al. (2021) stated that in the 1950s, the term AI was first used by McCarthy, and it was referred to as “the science and engineering of making intelligent machines”, evolving from two separate paradigms, namely the rationalist approach, where mathematics and engineering are integrated, and the human-centered approach, where hypotheses are formulated and experiments are conducted to validate them. Moreover, AI is a key technology of the Industry 4.0 revolution and is considered the umbrella that encompasses ML, DL, and data mining, which are the main components of AI (Hassoun et al., 2023; Omar et al., 2025).
As demonstrated by Riskiyadi (2024), ML is a domain of AI that focuses on developing intelligent systems through learning independently and automatically from data to make decisions based on existing data without explicit user intervention. ML is a defined method that creates automatic actions to determine the most appropriate decision through learning from experience and adapting to new inputs. While DL is an advanced sub-class of ML based on the deep neural network algorithms within AI technologies (J. Huang et al., 2020). Referring to the term “deep” of deep neural network, this represents the concept of multiple layers of information-processing stages presented in a hierarchical structure containing hidden layers, including input and output layers, in which every two adjacent layers are fully connected (J. Li & Sun, 2023), leading to the boosted pattern analysis and classification performance of DL models.
As technology continues its rapid growth, AI has become a transformative force across various areas and radically altering how information is examined and predictions are generated. Accordingly, these AI-based techniques facilitate automation and intelligent decision making as they mimic human cognitive functions such as learning, reasoning, and perception, resulting in the development of adaptive systems across several domains such as health (B. Zhang et al., 2023), engineering (Yaseen et al., 2020), finance (Chang et al., 2024), insurance (Shamsuddin et al., 2023), supply chain (Ali et al., 2024), business (Madanchian, 2024), and accounting (Chi & Chu, 2021). In particular, a wide range of ML and DL techniques have rapidly replaced the traditional statistical methods in accounting research due to their enhanced prediction capabilities in diverse problems, such as bankruptcy (Gholampoor & Asadi, 2024), fraud (Moubarak, 2024; Sabry & Ibrahim, 2024), financial distress (Elhoseny et al., 2022), and business failure (Carmona et al., 2019).
ML is a domain of AI attempting to develop intelligent systems through learning from data rather than depending on established rules, and it includes using big data algorithms and analytics to learn about complicated facts and predict them (Lecun et al., 2015; Jabeur et al., 2021), while DL is a branch of ML which has significantly improved the process and refines the structure of learning from data (Ranta et al., 2023). According to Biju et al. (2024), DL integrates representation learning, allowing for the identification, prediction, and classification of sophisticated patterns and trends required for classification or detection from unstructured datasets, since its foundation is structured with many interconnected processing layers which assists the learning of different patterns and complex tasks. Thus, those networks duplicate human intelligence in their operations, which reduces the need for manual human interference. Additionally, DL is composed of multi-layered computational models that can extract sophisticated features, resulting in the minimal need for manual engineering and restricting it to some activities, such as changing the size and number of layers that achieve several levels of abstraction (S. F. Ahmed et al., 2023).
AI techniques provide numerous advantages, one of which is their dynamic functionality that facilitates continuous operation and enables real-time decision making for enterprises (Jabeur et al., 2021). In addition, AI involves the utilization of data and algorithms automatically without the need for direct programming (Biju et al., 2024). Furthermore, AI techniques facilitate automation and intelligent decision making as they mimic human cognitive functions such as learning, reasoning, and perception, resulting in developing adaptive systems across several domains like business, manufacturing, healthcare and scientific (Sarker, 2022; Elahi et al., 2023). Generally, the incorporation of AI techniques enable faster, more efficient decision making and thereby increase advancements across different industries.
There has been a growing body of research that leverages diverse AI-based techniques to predict corporate financial performance using different financial indicators. Notably, the study conducted by W. Zhang et al. (2004) utilized univariate-linear, multivariate-linear, univariate-neural, and multivariate-neural networks for performance prediction using the EPS with a dataset of 283 companies, showing that the neural network techniques improved the accuracy of the models, whether they were the univariate or the multivariate ones. Moreover, Barboza et al. (2017) in American and Canadian firms used eight ML techniques for bankruptcy prediction during the period from 1985 to 2013 using data on 10,000 firm-year observations from the Compustat and Salomon Centre databases. The researchers employed RFs, artificial neural networks (ANNs), logistic regression (Logit), linear SVM (SVM-Lin), radial basis function SVM (SVM-RBF), multivariate discriminant analysis (MDA), boosting, and bagging models for prediction. The results show that RF, boosting, and bagging models outperformed all the other ML techniques, with accuracy levels of 87.06%, 86.65%, and 85.67%, respectively, while the results of the other techniques showed less accurate results, with SVM-RBF reaching 75.03%, ANNs reaching 69.29%, Logit reaching 66.61%, SVM-Lin reaching 63.56%, and MDA reaching 50.16%. Although the SVM-RBF model produced low error rates, its overall performance was weaker than the other ML techniques used. Moreover, the ANN showed significant variability in its results, which were different from the consistent results obtained by the other models.
Moving to France, Jabeur et al. (2021) assessed nine distinct AI techniques for predicting corporate performance from 2014 to 2016, including SVMs, RFs, logistic regression, gradient boosting machines (GBMs), XGBoost, categorical boosting (CatBoost), NNs, DNNs, and discriminant analysis. Furthermore, the researchers employed dependence plots to assess the significance of the financial features, developing a methodology to assist stakeholders in identifying and detecting early warning signals of financial performance failure. Also, in the study conducted by Mousa et al. (2022), they used a sample of 63 banks from Egypt, Jordan, and Gulf countries for corporate financial performance prediction, using the EPS as a metric for performance from 2008 to 2017 and utilizing RFs, quadratic discriminant analysis, and linear discriminant analysis, and the results show that RFs outperformed the other techniques.
Additionally, Shetty et al. (2022) conducted a study in Belgium for bankruptcy prediction from the year 2002 to 2012 using the financial data of 3728 SMEs. This study used SVMs, XGBoost, and six-layered deep feedforward neural network algorithms for bankruptcy prediction, and the prediction accuracy rates were 83%, 83%, and 82%, respectively. Moreover, Y. P. Huang and Yen (2019) compared the accuracy performance of SVM, XGBoost, hybrid associative memory with translation (HACT), hybrid GA-fuzzy clustering, deep belief network (DBN), and hybrid DBN-SVM models using 16 financial indicators for financial distress prediction from listed firms in Taiwan, and the results revealed that XGBoost achieved the highest accuracy.
Barboza and Altman (2024), compared the performance of logistic regression and RFs for predicting financial distress in Latin America. This study applied the ML models to 20 financial indicators of 808 companies obtained from Thomson Reuters Datastream from 2000 to 2020 across six Latin American countries—Brazil, Argentina, Mexico, Colombia, Peru, and Chile—with a total of 10,118 firm year observations. The results evidenced that RFs consistently outperformed logistic regression in terms of predictive accuracy and error levels for forecasts. Consequently, the study established a stable set of key financial indicators for financial distress prediction over the long term. One notable finding was that the predictive capabilities of the models remained unchanged during the COVID-19 pandemic, indicating the robustness of the indicators and methodologies in forecasting distress even during substantial financial and economic disasters. Furthermore, Hamdi et al. (2024) compared the forecasting accuracies of ML and DL techniques for bankruptcy prediction by utilizing 25 financial ratios for 732 Tunisian companies from 2011 to 2017. The findings revealed that the DNN achieved the highest accuracy at 93.6%, surpassing the other conventional ML techniques, namely RFs at 88.2%, logistic regression at 85.8%, SVMs at 84.8%, linear discriminant analysis (LDA) at 80.9%, and decision trees, which exhibited the lowest accuracy at 74.3%.
Finally, Gholampoor and Asadi (2024) conducted a study on bankruptcy prediction, employing eight AI techniques across 1265 American healthcare companies applied to 40 financial ratios based on Altman’s model. The findings indicated that the performance of the post-tuning prediction models revealed that gradient boosting outperformed all other techniques, achieving an accuracy rate of 90.8%. This was followed by RFs at 90.6%, AdaBoost and SVMs both at 87.3%, decision trees at 86.4%, logistic regression at 86.3%, KNNs at 82.3%, and naïve Bayes models at 68%. On the other hand, the accuracy rates exhibited minor fluctuations while employing the modified Altman’s model for bankruptcy prediction. The results indicate that gradient boosting at 90.6% and RFs at 90.3% surpassed all the employed techniques, followed by SVMs at 88.1%, decision trees at 87.4%, logistic regression at 87.3%, AdaBoost at 84.1%, KNNs at 83.3%, and naïve Bayes models at 66.7%. Meanwhile, the accuracy results of the Ohlson model showed that gradient boosting achieved outstanding performance of 90%, followed by RFs at 89.3%, and the preceding models, which include decision trees at 88.2%, SVM at 87.9%, logistic regression at 87.8%, KNNs at 80%, AdaBoost at 74.5%, and naïve Bayes models at 69.1%.
Based on the following discussion, this study utilized three ML techniques that are commonly used in the extant literature, namely RFs, XGBoost, SVMs, and one deep learning technique (DNN) for corporate financial performance prediction. These algorithms were selected due to their remarkable predictive and classification performance as well as their ability to capture complex interactions among financial indicators. RFs and XGBoost are known for their robustness, high generalization capability, and ability to handle overfitting problems, which make them suitable for dealing with noisy financial indicators (Kristanti et al., 2024; Yang & Wang, 2025). Moreover, the SVM is widely known for its efficacy in separating multidimensional data using optimal hyperplanes, which enhances classification and prediction accuracy (Kim et al., 2020; Hamdi et al., 2024). On the other hand, DNNs are characterized by the inclusion of multiple hidden layers between the input and output layers and can automatically extract hierarchical patterns from large datasets, thereby improving prediction and classification in complex financial environments (Schmidhuber, 2015; T. Ahmed et al., 2025).
Consequently, the literature review presented above strongly suggests that AI-based tools exhibit remarkable performance and efficiency in predicting the financial health of companies across various contexts. This encompasses a wide spectrum of models, ranging from simple models to more sophisticated DL frameworks. In recent years, dedicated researchers have worked to advance the field of financial performance prediction. Their efforts have centered on refining these models by investigating a plethora of methodologies and approaches, each aimed at enhancing predictive accuracy and reliability. Despite these advancements, a significant gap remains in the research, particularly regarding the evaluation of ML and DL models in the context of the MENA region. This unique geographical area presents distinct economic and financial dynamics, underscoring the need for comprehensive studies that specifically address the nuances of financial performance prediction within this context. The scarcity of such research underscores the rich opportunities for exploration and the potential impact of tailored AI applications in enhancing financial performance assessment in the MENA region. In addition, the MENA region’s companies are characterized by diverse economic structures, reporting standards, and market dynamics. Hence, those algorithms are particularly appropriate, as their flexibility allows them to model various financial behaviors and adapt to variations across countries and industries. Thus, applying a comparative approach that involves both ML and DL algorithms offers a comprehensive assessment of their predictive capabilities in an emerging market setting.
Accordingly, this research aims to address the existing gap in the literature by employing a variety of ML and DL models to predict corporate financial performance. Specifically, the research will use RF, XGBoost, SVM, and DNN models. The analysis will encompass data from distinct countries in the MENA region, enabling a comprehensive examination of how these models perform in these special economic and regulatory environments.

2.2. Random Forests (RFs) and Corporate Financial Performance Prediction

RFs is a supervised ensemble algorithm that functions as a group of decision trees, creating a forest which is composed of numerous classification trees, which are generated from several distinct subsets of the original dataset (Barboza & Altman, 2024). This technique is similar to bagging because it builds several classification models by constantly creating subsets from the data (Barboza et al., 2017). According to Hamdi et al. (2024), the “random” in RFs refers to its use of random data samples and random feature subsets to construct each decision tree, where the data is divided into a training set for model construction and a testing set for performance evaluation. Moreover, the algorithm considers only a random subset of features for the split at every node in a tree in order to minimize overfitting and promote diversity across the trees.

2.3. XGBoost and Corporate Financial Performance Prediction

The XGBoost algorithm is a supervised learning approach designed for regression and classification that works by integrating gradient boosting and tree ensemble techniques (Y. P. Huang & Yen, 2019). During the XGBoost training process, it builds a model by adding one tree at a time, and each new tree is built to correct the errors and residuals of the previous one, which improves the overall model, helps keep the model simple, and avoids any arising overfitting problems, leading to the final prediction being determined by combining the results of all the trees (Ben Jabeur et al., 2023). According to Carmona et al. (2019), XGBoost is known as a regularized version of gradient boosting as it includes a mechanism to control the importance and weights of the variables, and the regularization process pushes the weights of less important variables toward zero, effectively performing variable selection, which is particularly useful for problems with a large number of variables or high-dimensional problems.

2.4. SVMs and Corporate Financial Performance Prediction

Furthermore, SVMs is a supervised learning method used for tasks such as classification and regression, and its main goal is finding the ideal hyperplane to separate two classes, which is achieved through a kernel formula that is split into two parts: the optimization problem and the decision function (Hamdi et al., 2024). The SVM optimization model uses the mathematical function (kernel) to find the maximum distance between the most comparable observations that are classified in opposite directions (Zhao et al., 2024). The algorithm of SVMs selects from the training samples a set of characteristic subsets. Therefore, the training data set enables the algorithm to learn from it (Barboza et al., 2017; Chao et al., 2019). Consequently, the model performance can then be assessed using the validation set, which is independent of the training set, by comparing the predictions of the algorithm against the actual results (Barboza et al., 2017). Additionally, SVMs can be 100% accurate if the groups are separable, but this is rare in the financial area because financial data in the real world has noise and bias and for classification problems related to partial separable groups. SVMs can use a “margin of error” to still work effectively (L. Zhou et al., 2014).

2.5. DNNs and Corporate Financial Performance Prediction

DL is a subfield of ML which has been used to address classification tasks across different domains, and its most popular algorithms are ANNs and DNNs. DNNs have gained extensive interest due to their outstanding performance over other ML techniques in several significant applications as well as becoming significantly applied within the general reinforcement learning (RL) domain, which is characterized by the absence of a supervising teacher (Schmidhuber, 2015). Moreover, according to Ben Jabeur et al. (2023) and Hamdi et al. (2024), the DNN is an improved iteration of the traditional ANN that is fundamentally differentiated by its depth, which is achieved through the use of at least two hidden layers as it is characterized by the inclusion of multiple hidden layers between the input and output layers. It is noteworthy that a neural network is composed of several interconnected processors known as neurons, each one generates a series of real-valued activations, and the activation process begins when input neurons are activated using sensors. These neurons then activate others through a network of weighted connections (Schmidhuber, 2015).

3. Materials and Methods

3.1. Sample Selection

This research applied three distinct ML techniques—RFs, SVMs, XGBoost, and a DL technique, which is a DNN—for predicting and classifying corporate financial performance in MENA-region countries with 7971 firm-year observations, using data extracted from Compustat database from 2013 to 2024 via Visual Studio software, excluding financial sector firms.

3.2. Methodology

This research employed the design science research (DSR) methodology, which has been widely adopted recently in the accounting and information systems areas (Demirdöğen et al., 2020; Mousa et al., 2022; Kenetey & Popesko, 2024; Helal et al., 2025). DSR is a methodology that focuses on identifying more efficient and effective ways to address new real-world challenges or improve existing ones (Antony et al., 2024). According to Helal et al. (2025), the DSR methodology makes a substantial contribution to the information systems field by offering insights into database design and implementation, model preprocessing, alignment of information systems with business strategies, and utilizing data analytics to make effective decisions. As first introduced by Hevner (2007), the DSR framework comprises a view of three interconnected cycles, including relevance, design, and rigor. First, the relevance cycle identifies and contextualizes the problem that the study aims to solve through the designed models. Second, the design cycle involves the construction, design, and evaluation of the ML and DL models. As illustrated in Figure 1, this involves the four-step process constructed by the seminal work of Mousa et al. (2022), Moubarak (2024), and Helal et al. (2025). Finally, the rigor cycle utilizes the existing knowledge and establishes the foundations for building the validity of the designed models.
The relevance cycle establishes the practical utility and significance of the research by connecting it to real-world business problems and stakeholder needs. It identifies and contextualizes the problem that the study aims to solve. The problem is the need for accurate financial performance prediction within the dynamic economic landscape of the MENA region, namely for low-, moderate-, and high-performing companies (Classes A, B, and C, respectively). This is crucial for various stakeholders, including investors, creditors, financial analysts, and regulators, who require reliable forecasts for informed decision making, risk assessment, and policy formulation.
The design cycle presents the main component of the DSR methodology, as it outlines the iterative process of developing and evaluating the study. This cycle is systematically executed through a four-step process.
The four-step approach of the central design cycle begins with the identification of problem and solution objectives, which were addressed in greater detail in the Literature Review section. Step two of the DSR approach involves designing the models, which is structured into three distinct phases. The first phase consists of selecting the research sample, identifying data collection procedures, and determining the input features that will be used in building the ML and DL models. The second phase involves identifying the output target variable, which represents the company’s EPS. Lastly, the final phase involves data preprocessing, which is necessary to prepare the dataset for model development, including cleaning and handling missing values as well as extreme values. Step three of the DSR approach involves developing the ML and DL models, including RFs, XGBoost, SVMs, and DNNs. Finally, the DSR central cycle ends by evaluating the performance of these models based on a set of diverse evaluation metrics.
Step one involves a thorough definition of the research problem and clearly articulating the objectives of the solutions which were addressed in detail in the Literature Review section.
Step two, designing the ML and DL models, involves three phases.
Phase One: Research Sample, Data Collection, and Input Features. The research sample encompassed companies in the MENA countries. Financial data was collected from the Compustat database, covering the period from 2013 to 2024. This yielded a comprehensive dataset of 7971 firm-year observations. The input features for the models consisted of 15 distinct financial indicators selected from prior studies in the same context.
Phase Two: Identify Target Variable. The target variable was the earnings per share (EPS), which was transformed into three categorical performance classes—A (0), B (1), and C (2), using pooled quartile thresholds computed across the full cross-sectional sample. This design deliberately captures the absolute firm financial performance levels rather than year-relative rankings. Accordingly, the resulting classes reflect stable cross-sectional performance categories, which enable consistent comparison of firms across all firm-year observations. While the annual EPS transformation guaranteed a within-year 33–33–34% class allocation, the aggregated class distribution in the training and testing subsets was not expected to preserve this exact proportionality due to the panel structure of the data, cross-year pooling, and stratified random partitioning.
Phase Three: Data Preprocessing. This constitutes a significant stage in the application of AI techniques (Cordón et al., 2019). Notably, preparing and cleaning the input data is the primary goal of data preprocessing so that the data is appropriately used in regression and classification (Alexandropoulos et al., 2019). Since the majority of databases provide missing data, researchers are motivated to exclude observations that have missing values (Ben Jabeur et al., 2023). All preprocessing operations were implemented using a strict training–testing isolation protocol to prevent data leakage. Specifically, missing value imputation parameters (median) were estimated exclusively from the training subset and subsequently applied to the testing subset. Moreover, the k-nearest neighbors (KNN) imputation method was used to test the robustness of the results. Feature normalization and scaling operations followed the same training-based fitting and testing-based transformation procedures. This pipeline ensured that the classification models were not exposed to any distributional information from the testing data during training and preserved the integrity of the performance evaluation.
Consistent with the contemporaneous classification objective of this study, the dataset was randomly split into training (90%) and testing (10%) subsets across all firm-year observations in order to prevent overfitting. Overfitting is a common problem that is unpreventable, and it either occurs because of the limitations of algorithms that are too complex and need many parameters or the limitations of the training data, which could have a large amount of noise or be limited in size (Ying, 2019). Since the classification target (EPS class) and the explanatory variables were measured within the same fiscal year, the random split did not introduce inter-temporal look-ahead bias, as no future information was used to explain past outcomes. Therefore, according to Bertsimas et al. (2018) and Lin and Tsai (2020), median imputation was applied in the training dataset of this research for filling the missing values for variables with missing values less than 30%. Those having missing values more than 30% were eliminated.
To address redundancy, multicollinearity, and potential feature leakage issues in the used dataset, a feature selection pipeline was applied. Firstly, pairwise correlation filtering was employed, excluding features with (|ρ| ≥ 0.55) to eliminate the highly overlapping variables while maintaining relevant information, which is a strategy supported by feature selection frameworks that are based on correlation-based elimination and efficiency-driven (Rickert et al., 2023). Then, variable inflation factor (VIF) analysis was performed to assess multicollinearity in the multivariate context. The aim of this was to eliminate redundancy and multicollinearity, leading to a final reduced set of 15 features (see Table 1).
Step three is developing the ML and DL models; the predictive framework is conceptualized as a multi-model classification system that is designed to predict financial performance, and it integrates both ML (RFs, SVMs, and XGBoost) and DL and MLP classifier (DNN) models. These models collectively form the predictive framework, which address the problem identified in the relevance cycle (the first step in the DSR).
Step four is performance evaluation of ML and DL models, which involves testing and validating the developed artifact, and the predictive framework is evaluated using performance evaluation metrices such as model accuracy, Cohen’s kappa, recall, precision, specificity, and F1 score. Furthermore, a confusion matrix analysis was performed which provided a detailed breakdown of correct and incorrect classifications for each class and offered insights into specific model strengths and weaknesses.
The Rigor Cycle. This utilizes the existing knowledge and establishes the foundations for building the research from the prior literature of corporate financial performance and the application of ML and DL models. Furthermore, the methodology employs a quantitative analysis approach to analyze the obtained financial data, in addition to the contributions of the study for generating new insights on the performance of the models and developing a generalized framework for prediction and classification of corporate financial performance for the MENA region market.

4. Results

4.1. Preliminary Analysis

The descriptive statistics provide an overview of the financial ratios, including liquidity, solvency, efficiency, profitability, and cash flow ratios, for non-financial firms in MENA-region countries for the period from 2013 to 2024 using data from Compustat database (see Table 2). Overall, the indicators exhibited considerable variability across firms, reflecting diverse financial structures and performance levels within the region. These statistics provide a foundational understanding of the dataset and justify the application of predictive models capable of capturing nonlinear relationships among financial ratios (see Table 3).
Pairwise correlation filtering (|ρ| ≥ 0.55) was applied in this study to reduce predictor redundancy and avoid multicollinearity problems in the used dataset. For the highly correlated pairs, the variable exhibiting greater theoretical interpretability and reduced accounting noise was retained. This resulted in the exclusion of redundant indicators, namely the current ratio, cash ratio, net ptofit margin, and interest coverage, resulting in 15 final indicators. Following the pairwise correlation filtering procedure, variance inflation factors (VIFs) were claculated as a final diagnostic to conform the absence of multicolinearity among the retained predictors, where all the values were less than that of the conservative threshold, which was five, a value widely considered acceptable (Alauddin & Nghiemb, 2010) (see Table 4). Additionally, the correlation matrix heatmap showed low inter-feature correlations, which validates the combined filtering and VIF-based strategy as rigorous and practical for preparing reliable model inputs (see Figure 2).

4.2. Model Performance Evaluation

This research applied three distinct ML techniques—RFs, SVMs, and XGBoost—and a DL technique, which was a DNN, for predicting and classifying corporate financial performance in MENA-region countries. Each of these models was evaluated using their overall metrics (overall accuracy, overall error, Cohen’s Kappa measures, and correctly and incorrectly classified instances) and per-class performance indicators (recall or sensitivity, precision, specificity, and F measure), which provide a comprehensive interpretation of their predictive capabilities.
The performance evaluation of the RF, XGBoost, SVM, and DNN models employed the metrics outlined in Table 5, which explains the overall evaluation results, and Table 6, which shows the class-specific results. The evaluation was conducted using the following matrices.
Cohen’s Kappa is a statistical measure that considers the possibility of agreement by chance, and it is used as a measure of accuracy for prediction and classification, where higher values indicate that the model’s prediction closely aligned with the actual outcomes (Vieira et al., 2010; Bouke & Abdullah, 2023).
Accuracy refers to the proportion of companies correctly classified by the developed ML and DL models and reflects the overall accuracy and effectiveness of the models in distinguishing between the three classes: A (0), B (1), and C (2). It was calculated as follows:
A c c u r a c y = T P + T N T P + F N + T N + F P
Sensitivity, commonly known as recall, is a measure of the model’s ability to truly identify companies belonging to a certain class, which could be calculated as follows:
S e n s i t i v i t y   o r   R e c a l l = T P T P + F N
Additionally, precision reveals the percentage of which companies were correctly classified within a specific class, which could be calculated as follows:
P r e c i s i o n = T P T P + F P
Moreover, specificity reflects the ability of the model to correctly identify companies that did not belong to a certain class, which was calculated as follows:
S p e c i f i c i t y = T N T N + F P
Type I error, also known as the false positive rate, measures the percentage of companies incorrectly classified as belonging to a class, which happened when a poorly performing company was classified as a higher-performing one. It was calculated as follows:
T y p e   I   E r r o r = F P T N + F P
Type II error, or the false negative rate, shows the percentage of companies that truly belonged to a class but were not identified as such, which happened when a high-performing company was classified as a lower-performing one. It can be calculated as follows:
T y p e   I I   E r r o r = F N T P + F N
Lastly, the F measure is a metric that balances precision and sensitivity to provide a single score for evaluating the model performance, which was calculated as follows:
F m e a s u r e = 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l

4.2.1. Random Forests

The RF model achieved an accuracy of approximately 73.6%, achieving a strong balance between predictive power and generalizability, with an overall error rate of approximately 26.4%. Cohen’s Kappa was calculated to be 0.603389, showing a good level of agreement between the predicted and actual outcomes and thereby confirming the model’s high reliability. Moreover, out of the overall results, 587 classes were correctly classified, while 211 were incorrectly classified.
For Class A (0), the result for the sensitivity, or commonly known as recall, was approximately 0.744, and this shows that the model can truly identify 74.4% of the Class A companies. Additionally, the precision result was about 0.896. This indicates that the model predicted a company belonging to Class A correctly 89.6% of the time, which makes the prediction and classification of this class by the model highly reliable. The specificity equaled 0.957, indicating that the model could correctly identify 95.7% of the companies that did not belong to Class A. The type I error, also known as the false positive rate, for Class A was approximately 0.043. In contrast, the type II error, also known as the false negative rate, for Class A nearly equaled 0.256. The F measure result was 0.813, showing a strong balance between precision and sensitivity in Class A.
The sensitivity or recall for Class B (1) was nearly 0.672. This indicates that the model could identify 67.2% of the Class B firms correctly. In addition, the precision for Class B was approximately 0.602, signifying that the model accurately predicted that a company belonged to Class B 60.2% of the time. Moreover, the specificity result for Class B was 0.786, suggesting that the model could correctly identify 78.6% of the companies as belonging to Class B. The type I error for Class B almost equaled 0.213, and the type II error was about 0.328. Finally, the F measure for Class B was 0.635, demonstrating that there was a moderate balance between sensitivity and precision.
For Class C (2), the sensitivity was almost 0.787, which reveals that the model could truly identify 78.7% of the Class C companies. Additionally, the precision was about 0.746, denoting that the model predicted a company belonging to Class C accurately 74.6% of the time. Also, the specificity for Class C was 0.861, meaning that the model could correctly identify 86.1% of the companies that did not belong to Class C. Moreover, the type I error for Class C was about 0.14, while the type II error was nearly 0.212. As for the F measure for Class C, it was calculated to be approximately 0.766, and this explains the good balance between sensitivity and precision. However, these results show that Class B was more challenging for the model to predict compared with the results for Classes A and C.
In the RF model, it correctly identified 198 instances as Class A, with 57 misclassifications for Class B and 11 for Class C. On the other hand, Class B showed better performance, with 174 correct classifications while misclassifying 23 instances as Class A and 62 as Class C. Class C had 215 correct classifications, with 0 misclassified as Class A and 58 misclassified as Class B. The high true positive rates and relatively low false positive and negative rates across all classes highlight the effectiveness of the model. Generally, the performance of the RF model was significantly good at identifying both Classes A and C. However, it confused Class B with Class C, resulting in a significant number of false negatives for Class B and false positives for Class B when the actual class was C (see Figure 3).

4.2.2. XGBoost

The XGBoost model achieved an accuracy of about 75.5%, demonstrating a strong predictive ability with only a 24.5% error rate. The Cohen’s Kappa for the XGBoost model was 0.631639, indicating a higher level of agreement between the predictions of the model and the actual outcomes compared with the RF model, which confirms its high reliability. In addition, the model accurately classified 602 instances, and only 196 were incorrectly classified.
For Class A, the sensitivity (recall) was nearly 0.752, which demonstrates that the model could accurately identify 75.2% of the Class A companies. Likewise, the result for the precision was approximately 0.901, which shows that the model predicted correctly companies that belonged to Class A 90.1% of the time, making the prediction and classification of this class highly reliable in the model. Moreover, the specificity result for Class A was nearly 0.96, meaning that the model could correctly identify 96% of the companies that were not in Class A. In addition, the type I error for Class A was only 0.041, with the type II error being almost 0.248. Furthermore, the result of the F measure for Class A was about 0.82, demonstrating that there was a robust balance between precision and sensitivity (recall).
For Class B, the sensitivity (recall) was 0.714, and this indicates that the model could truly identify 71.4% of the Class B companies. Similarly, the precision result was 0.631, which demonstrates that the model correctly predicted a company that belonged to Class B 63.1% of the time. Moreover, the specificity result for Class B was nearly 0.8 which means that the model could correctly identify 80% of the companies that were not in Class B. Additionally, the type I error for Class B was 0.2, while the type II error was 0.286. Finally, the F measure result for Class B was 0.67, showing a moderate balance between precision and sensitivity.
For Class C, the recall result was 0.795, which demonstrates that the model could truly identify 79.5% of the Class C companies. Additionally, the precision result was about 0.767. This indicates the model predicted a company belonging to Class C correctly 76.7% of the time. As for the specificity for Class C, it was 0.874, which means that the model could correctly identify 87.4% of the companies that did not belong to Class C. Furthermore, the type I error for this class was roughly 0.126, while the type II error was roughly 0.205. Lastly, the F measure result for Class C was 0.781, showing a strong balance between precision and sensitivity and an improvement over the RF model for this class.
The XGBoost model identified 200 instances correctly as Class A and misclassified 55 instances as Class B and 11 as Class C. Moreover, Class B had 185 correct classifications and misclassified 19 instances as Class A and 55 as Class C. However, Class C exhibited greater performance, with 217 instances correctly classified, 3 instances misclassified as Class A, and 53 instances misclassified as Class B. The results show that this model outperformed the other ML models, and the great true positive rates with low false positive and false negative rates across all classes underscore the model’s effectiveness. Overall, XGBoost demonstrated more balanced and accurate classification across all three classes, with fewer misclassifications between classes, particularly reducing the confusion between Class B and Class C that was more prominent in the RF model (see Figure 4).

4.2.3. SVMs

The SVM model achieved an overall accuracy of approximately 68%, which was considerably lower than the other ML models, with an overall error rate of approximately 32%. In addition, Cohen’s Kappa for the SVM model was calculated to be 0.521528, indicating only a relatively low level of agreement between the model’s predictions and the actual outcomes compared with the results of the other ML models, suggesting limited reliability. Also, out of the overall results, 543 instances were correctly classified, while 255 were incorrectly classified.
For Class A, the sensitivity or recall result was approximately 0.687, indicating that the model could truly identify 68.7% of the Class A companies. Also, the precision result was nearly 0.88, which indicates that the SVM model predicted a company belonging to Class A correctly 88% of the time. Adding to this, the specificity result for Class A was approximately 0.953, which depicts that the model could correctly identify 95.3% of the companies that were not in Class A. Moreover, the type I error for this class was about 0.047, while the type II error was approximately 0.312. Furthermore, the F measure result for Class A was 0.772, showing a moderate balance between precision and sensitivity.
As for Class B, its sensitivity or recall result was about 0.706, indicating that the SVM model could correctly identify 70.6% of the Class B companies. However, the precision result was notably lower at about 0.524, explaining that the SVM model could correctly predict a company belonging to Class B only 52.4% of the time, which highlights the significant number of false positives. Additionally, the result for the specificity for Class B was about 0.692, showing that the model could accurately identify only 69.2% of the companies that did not belong to Class B, and this result was considerably lower compared with the results of the other ML models. Furthermore, the type I error for Class B was 0.308. This result could be explained by the low Cohen’s Kappa result for the SVM model. On the other hand, the type II error was approximately 0.293. It is noteworthy that a low Cohen’s Kappa for a predictive model implies that its classifications are not dependable, making it unsuitable for critical decision making, where the cost of misclassification (especially type II errors) is high (Ben-David, 2008). The result for the F measure for this class was approximately 0.602, which shows an extremely low balance between precision and sensitivity.
For Class C, the sensitivity or recall was approximately 0.648, showing that the model could correctly identify 64.8% of the companies in Class C, considerably lower compared with the previous two ML models. Despite this, the precision result was about 0.734, showing that when the model predicted a company belonging to Class C, it was correct 73.4% of the time. Moreover, the specificity for this class was 0.878, which means the model could accurately identify 87.8% of the companies that did not belong to Class C. In addition, the type I error for Class B was 0.122. On the contrary, the result for the type II error for this class was 0.352, which was due to the low Cohen’s Kappa result for the SVM model. Furthermore, the F measure result was about 0.689, which indicates a low balance between precision and sensitivity that occurred due to the low sensitivity result for this class.
The confusion matrix results for the SVM model depict that it identified 183 instances as Class A correctly and misclassified 72 as Class B and 11 as Class C. However, Class B demonstrated greater performance, with 183 correct classifications, 23 instances misclassified into Class A, and 53 misclassified into Class C. Nevertheless, there were 177 instances classified correctly as Class A, while 2 instances were misclassified as Class A and 94 were misclassified as Class B. The results of this model show weaker performance than the RF and XGBoost models.
In brief, the SVM model encountered considerable difficulties with this multi-class problem. It demonstrated a strong tendency to misclassify Class C companies as Class B while misclassifying a significant percentage of Class A and C examples as Class B. This, in turn, resulted in inadequate recall for Class C and insufficient precision for Class B, making it unsuitable for this particular classification task in comparison with the other ML models (see Figure 5).

4.2.4. Deep Learning

The neural network architecture was selected using grid search with cross-validation, after which the final model was re-estimated on the training data using the optimal hyperparameters. Specifically, the DNN model corresponded to a multilayer perceptron architecture employing three hidden layers with 16 neurons, which was trained for a maximum of 400 iterations, and used ReLU activation functions with the Adam optimization algorithm.
The DNN model achieved an accuracy rate of approximately 72.1%, showing strong predictive capability, with an overall error rate of 27.9%. Additionally, Cohen’s Kappa for the DNN model was calculated to be 0.5806, indicating a low level of agreement between the model’s predictions and the actual outcomes. It is worth noting that out of the overall results, 575 instances were correctly classified, with only 223 incorrectly classified.
For Class A, the result for the sensitivity or recall was approximately 0.737, indicating that the model could identify 73.7% of the Class A companies correctly. Moreover, the precision result was about 0.875, and this indicates the model predicted a company belonging to Class A correctly 87.5% of the time, making the prediction and classification of this class in the model highly reliable. Likewise, the specificity result for Class A was 0.947, meaning the model could correctly identify 94.7% of the companies that did not belong to Class A. Similarly, the result for the type I error for this class was nearly 0.053, and the type II error result was about 0.263. Finally, the F measure result for Class A was 0.8, showing a strong balance between precision and sensitivity.
For Class B, the sensitivity or recall result was nearly 0.622, demonstrating that the model could truly identify 62.2% of the Class B companies. Furthermore, the precision result was approximately 0.59, indicating that the model could predict a company belonging to Class B correctly 59% of the time. In addition, the specificity result for this model was 0.792, meaning that the model could identify 79.2% of the companies that did not belong to Class B correctly. Correspondingly, the type I error result for Class B was almost 0.208, and the type II error result was approximately 0.378. Last but not least, the F measure for Class B was 0.605, showing a robust balance between precision and recall.
For Class C, the model demonstrated near-perfect performance. It achieved a sensitivity or recall of approximately 0.8, indicating that 80% of the actual Class C companies were correctly identified. The precision was similarly high at 0.724, meaning that 72.4% of the predictions for Class C were accurate. With a specificity of 0.842, the model also correctly identified 84.2% of the companies not belonging to Class C. Also, the type I error for this class was about 0.158, while the type II error was roughly 0.201. Furthermore, the F measure of 0.759 confirms an excellent balance between precision and sensitivity for this class.
The deep learning (DNN) model correctly identified 196 instances as Class A and misclassified only 58 instances as Class B, while only 12 instances were misclassified as Class C. For Class B, this model classified 161 instances correctly and misclassified 27 instances into Class A and 71 into Class C. Furthermore, Class C had 218 instances classified correctly, with only 1 misclassified instance in Class A and 54 in Class B (see Figure 6).

4.3. Performance Evaluation Using ROC-AUC, Macro- and Micro-Averaged ROC-AUC, and Average Precision

4.3.1. ROC Curve Results

The ROC curve results for the RF model show excellent classification capability among classes due to the high true positive rate, indicating strong predictive power (see Figure 7). The curve has a high true positive rate, which confirms the strength of RFs in handling the financial data used, demonstrating stable performance across classes.
Furthermore, the ROC curve results for XGBoost show excellent classification capability among classes due to the high true positive rate, indicating strong predictive power. This suggests that this model maintained stable performance across all classes, which indicates its robustness (see Figure 8).
In addition, the results for the ROC curve for the SVM model demonstrate moderate classification capability among classes. The noticeably lower AUC results confirm that the SVM approach struggled to capture complex relationships among financial indicators compared with the other models (see Figure 9).
Finally, the ROC curve for the DNN model shows strong classification capability across all classes. Additionally, these results suggest that the model preserved stable performance across all classes, which indicates its reliability and robustness (see Figure 10).

4.3.2. Macro- and Micro-Averaged ROC-AUC and Average Precision Results

To provide a comprehensive evaluation of the classifier performance across multiclass data, both macro- and micro-averaged metrics for the ROC-AUC and average precision (AP) were calculated (see Table 7). The micro-averaged metrics aggregated the contributions of all classes and emphasized overall performance, particularly favoring majority classes. In contrast, the macro-averaged metrics treated all classes equally, offering insights into how the model performed across less frequent categories.
In this study, XGBoost and RFs significantly outperformed the other models, while DNNs and SVMs underperformed, particularly in terms of the macro-averaged AP, highlighting its difficulty in handling minority classes. These findings are consistent with the recommendations by Saito and Rehmsmeier (2015), who emphasized the usefulness of precision–recall curves over ROC analysis, and Sokolova and Lapalme (2009), who demonstrated the importance of using both macro- and micro-averaging when evaluating multiclass classifiers.

4.3.3. Feature Importance

The feature importance comparison across the four models reveals consistent patterns in how financial ratios contribute to the classification and prediction of corporate financial performance within the MENA region dataset (see Figure 11). As noted from the figure, ROA emerged as the most influential feature across all algorithms, especially in the DNN model, indicating that it is a strong determinant of performance classification. Moreover, ROE showed high importance across the models, reinforcing the significance of returns to shareholder equity.
Meanwhile, efficiency and liquidity indicators such as the quick ratio, current ratio, and inventory turnover, showed moderate significance, suggesting that the short-term financial health and operational efficiency of a company play a meaningful but less significant role, as well as interest. It is noteworthy that the SVM model showed negative importance values for some features, which implies that those variables may introduce noise or inverse effects within a nonlinear decision boundary.
Overall, the results suggest that profitability ratios were consistently the strongest predictors across the models, while the liquidity, leverage, and efficiency measures offered additional but smaller predictive value. Furthermore, the consistency of top-ranking features across multiple algorithms strengthens the reliability of these findings.

4.4. Robustness Check

Given the robustness of the results presented in Table A1, which compares model performance under median and k-nearest neighbors (KNN) imputation, overall, the results demonstrate significant stability across the two imputation strategies, with only mild changes in predictive performance. Accuracy differences between median and KNN imputation were small for all models, typically being within one or two percentage points, and did not alter the relative ranking of the algorithms. Generally, the results using the median imputation were slightly higher across the accuracy, Cohen’s Kappa, macro-ROC AUC, and number of correctly classified firms. For instance, XGBoost achieved the strongest performance under both imputation methods, but median imputation produced marginal improvements relative to KNN imputation, as the accuracy decreased slightly from 0.754 to 0.744 and Cohen’s Kappa decreased from 0.632 to 0.617 under KNN imputation. Similarly, RFs and DNNs exhibited the same pattern, showing slight improvements in accuracy and Cohen’s Kappa. Meanwhile, SVMs showed the smallest variation between the two imputation techniques, demonstrating lower sensitivity to the handling of missing values. Importantly, the ROC-AUC values remained high and stable across all four models, which reinforces the robustness of the results. In summary, the findings confirm that the main results were not driven by the choice of imputation method, thereby supporting the robustness of the empirical results.

5. Discussion

This study provides some important insights into the classification and evaluation of corporate financial performance in the MENA region using AI techniques. The results of the models, particularly the XGBoost model, reveal that they support decision making in analyzing the financial performance of companies. Additionally, the findings suggest that that the utilized AI techniques offer reliable assessment of financial performance, which in turn enables managers, investors, financial analysts, and stakeholders to make proactive decisions.
Although the pooled EPS-based classification ensures consistency of absolute performance categories across time, it does not control for industry-specific EPS scaling effects or annual distributional shifts. Future studies may extend this framework by employing industry-adjusted or year-normalized performance thresholds to capture purely relative intra-period rankings.
Moreover, the outstanding performance of the XGBoost model compared with the other traditional ML models highlights its remarkable classification performance for corporate financial performance. The results of this study are similar to those of Carmona et al. (2019) and Son et al. (2019), where the XGBoost model achieved higher performance accuracy values than those of the other ML models.
Furthermore, our results are consistent with those of Hu et al. (2025), where the accuracy results for the RF model outperformed the other ML models, including XGBoost and SVMs, as well as the neural network DL model. In addition, the findings of the study conducted by Chinonyerem et al. (2025) revealed that the predictive power of XGBoost was the highest, followed by the neural network DL model, RFs, and SVMs. Similarly, the XGBoost prediction model outperformed all the used ML techniques and the deep belief networks (Y. P. Huang & Yen, 2019).
On the other hand, the results of this study contradict those of Hamdi et al. (2024), showing that DNNs outperformed all the other AI techniques used, including RFs and SVMs. Despite the undoubtedly strong performance of ML techniques, the DL techniques showed more promising results, making them essential to adopt more in future research and for stakeholders due to their highly performing algorithms. The same results were given by Hosaka (2019), showing that the DL model (convolutional neural networks) outperformed the performance of traditional ML techniques.
Overall, the consistency of the ML and DNN model results with the current literature reveals that the used models serve as powerful tools for classifying complex and dispersed financial data. Conversely, the differences in the results of this study with those of the existing literature underscore that there is no absolute AI model that dominates across all industries, environments, and data types. Instead, each model’s effectiveness depends on the structure and variability of the data used, which in turn emphasizes that careful model selection and regional customization is critical to achieve optimal predictive performance for corporate financial performance.

6. Conclusions

This study has several contributions to the field of corporate financial performance prediction. First and foremost, it applied DSR methodology, demonstrating how this structured methodology can effectively support financial data analysis in real-world business contexts and address problem relevance, leveraging existing knowledge carefully, designing innovative models, and evaluating the models comprehensively.
The classification of companies into low, moderate, and high performers (Classes A, B, and C) offers valuable insights for managers, investors, and analysts in the MENA region to support risk management, investment strategies, and credit decisions. Moreover, utilizing the popular Compustat database highlights the compatibility of the model with prevalent financial data systems, hence making it easier for financial institutions and scholars to adopt and replicate the model in their work.
The classification model developed in this study serves as a decision-support instrument for credit risk analysts, financial analysts, investors, and corporate managers by providing probabilistic evaluations of firms’ financial health. In doing so, it contributes to more informed financial planning and resource allocation. Additionally, this research enriches the growing literature on the MENA region, offering context-specific insights for policymakers and market participants operating in economies characterized by both opportunities and challenges.
The findings of this study align with the international literature highlighting the importance of AI in financial performance prediction while also extending the current knowledge by demonstrating that these techniques are effective within the distinct structures and regulations of the MENA region. While prior studies have primarily focused on Western or East Asian contexts, the present research provides evidence that advanced AI models can generalize effectively to markets characterized by emerging financial systems, varying disclosure quality, and evolving governance frameworks.
A few points warrant further investigation. The regional focus on MENA-listed firms may restrict the generalizability of the findings to other global markets, given differences in accounting practices, regulations, and market dynamics. Similarly, although the dataset spans 2013–2024 and is extensive, it may not fully capture recent or future structural changes in the economic environment. The performance of ML and DL models is also closely tied to data quality, and financial databases can contain missing or inconsistent information even after preprocessing. In addition, financial markets are affected by unexpected geopolitical events, changes in regulations, and economic shocks, which may not always be reflected in historical data-driven models, leading to lower predictive accuracy during times of instability and volatility.
Future research could extend this work by incorporating additional forms of data to enhance the model’s comprehensiveness. Examples include integrating macroeconomic indicators (e.g., GDP, inflation, and unemployment rates), industry-specific measures, and environmental, social, and governance (ESG) indicators to provide a broader understanding of the factors influencing corporate performance. Expanding the predictive framework to include emerging and developed markets would also allow for cross-regional validation and robustness testing. In addition, future research could incorporate firm-level industry classifications to examine whether predictive performance varies across sectors. Furthermore, incorporating alternative targets such as credit defaults and experimenting with underutilized algorithms such as probabilistic neural networks (PNNs) could enhance both the accuracy and interpretability of future predictive systems. Finally, combining structured financial data with textual information, such as narrative disclosures and sentiment analysis, may further improve the predictive power of future models (Omar et al., 2025).

7. Research Implications

This study has both practical and theoretical implications that contribute to enhancing the existing literature and providing valuable insights for academics and practitioners. On the theoretical side, it provides updated information which contributes to the existing literature of this research area by filling the gap and offering valuable insights to academics. On the practical side, this research highlights the value of AI techniques in corporate financial performance assessment, since they provide insightful evaluations of financial performance and serve as a tool for risk assessment, being useful for decision makers and managers in making timely corrective decisions to minimize any faced risks.
Similarly, this study shed light on the role of AI and its importance in the financial area, which could motivate policy makers to have a governance framework and develop regulatory frameworks to support the responsible use of AI in this area. Adding to this, creditors could minimize the default risk by assessing the financial situation of a company. Moreover, financial analysts as well as consultants could employ AI to evaluate corporate financial performance, enabling them to make more informed trading and advisory decisions and identify good investment opportunities. Also, investors could exploit AI techniques to understand better the companies they are investing in as well as avoiding investing in high-risk companies. Ultimately, governments could utilize AI techniques to analyze financial and economic indicators, which would allow them to evaluate and predict their institutions’ performance at both the macroeconomic and microeconomic scales, thereby fostering growth and economic stability.

Author Contributions

The authors contributed equally to this paper. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used for this study’s findings is not publicly available. Further inquiries can be directed toward the corresponding author.

Acknowledgments

The authors would like to thank Mohamed Gomaa, PhD (California State Polytechnic University) for his assistance and his insightful comments, which greatly enhanced the quality of this paper.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
AIArtificial intelligence
MLMachine learning
DLDeep learning
MENAMiddle East and North Africa
UAEUnited Arab Emirates
DSRDesign science research
PCAPrincipal component analysis
RFRandom forest
SVMSupport vector machine
XGBoosteXtreme Gradient Boosting
DNNsDeep neural networks
EPSEarnings per share
IoTInternet of Things
MDAMultivariate discriminant analysis
ANNArtificial neural networks
LogitLogistic regression
SVM-LinLinear SVM
SVM-RBFRadial basis function SVM
GBMGradient boosting machine
CatBoostCategorical boosting
HACTHybrid associative memory with translation
DBNDeep belief network
LDALinear discriminant analysis
RLReinforcement learning
EBITEarnings before interest and taxes
TPTrue positive
TNTrue negative
FPFalse positive
FNFalse negative
PNNsProbabilistic neural networks

Appendix A

Table A1. Robustness results.
Table A1. Robustness results.
ModelAccuracy
(Median)
Accuracy (KNN)Cohen’s Kappa (Median)Cohen’s Kappa (KNN)ROC AUC (Macro) (Median)ROC AUC (Macro) (KNN)Correctly
Classified (Median)
Correctly
Classified (KNN)
SVMs0.6804510.6716790.5215280.5085960.8584280.852543536
RFs0.7355890.7330830.6033890.599650.9066390.898587585
XGBoost0.7543860.7443610.6316390.6166170.90560.9602594
DNNs0.7205510.7092730.5806330.5640250.8656290.86575566

References

  1. Abdellatif, E. M., Saleh, S. A. F., & Hamed, H. N. (2023). Corporate financial performance prediction using artificial intelligence techniques. In World conference on internet of things: Applications & future (pp. 25–32). Springer. [Google Scholar]
  2. Ahmed, S. F., Alam, M. S. B., Hassan, M., Rozbu, M. R., Ishtiak, T., Rafa, N., Mofijur, M., Shawkat Ali, A. B. M., & Gandomi, A. H. (2023). Deep learning modelling techniques: Current progress, applications, advantages, and challenges. In Artificial intelligence review (Vol. 56, Issue 11). Springer. [Google Scholar] [CrossRef]
  3. Ahmed, T., Zehra, F., Tariq, S., Qureshi, S., Manzoor, G., Hussaini, W., Mustafa, M. L., Khalil, A., & Zeeshan, M. (2025). Advanced financial system architecture using deep neural networks for accurate risk assessment and high-value transaction prediction in modern banking. Journal of Management Science Research Review, 4(3), 698–732. [Google Scholar]
  4. Alauddin, M., & Nghiemb, H. S. (2010). Do instructional attributes pose multicollinearity problem? An empirical exploration. Economic Analysis and Policy, 40(3), 351–361. [Google Scholar] [CrossRef]
  5. Alexandropoulos, S. A. N., Kotsiantis, S. B., & Vrahatis, M. N. (2019). Data preprocessing in predictive data mining. Knowledge Engineering Review, 34, e1. [Google Scholar] [CrossRef]
  6. Ali, S. M., Rahman, A. U., Kabir, G., & Paul, S. K. (2024). Artificial intelligence approach to predict supply chain performance: Implications for sustainability. Sustainability, 16(6), 2373. [Google Scholar] [CrossRef]
  7. Aljawazneh, H., Mora, A. M., Garcia-Sanchez, P., & Castillo-Valdivieso, P. A. (2021). Comparing the performance of deep learning methods to predict companies’ financial failure. IEEE Access, 9, 97010–97038. [Google Scholar] [CrossRef]
  8. Antony, J., Sony, M., Lameijer, B., Bhat, S., Jayaraman, R., & Gutierrez, L. (2024). Towards a design science research (DSR) methodology for operational excellence (OPEX) initiatives. TQM Journal, 36(8), 2383–2397. [Google Scholar] [CrossRef]
  9. Arora, P., & Saurabh, S. (2022). Predicting distress: A post insolvency and bankruptcy code 2016 analysis. Journal of Economics and Finance, 46(3), 604–622. [Google Scholar] [CrossRef]
  10. Barboza, F., & Altman, E. (2024). Predicting financial distress in Latin American companies: A comparative analysis of logistic regression and random forest models. North American Journal of Economics and Finance, 72, 102158. [Google Scholar] [CrossRef]
  11. Barboza, F., Kimura, H., & Altman, E. (2017). Machine learning models and bankruptcy prediction. Expert Systems with Applications, 83, 405–417. [Google Scholar] [CrossRef]
  12. Ben Jabeur, S., Stef, N., & Carmona, P. (2023). Bankruptcy prediction using the XGBoost algorithm and variable importance feature engineering. Computational Economics, 61(2), 715–741. [Google Scholar] [CrossRef]
  13. Ben-David, A. (2008). Comparison of classification accuracy using cohen’s weighted kappa. Expert Systems with Applications, 34(2), 825–832. [Google Scholar] [CrossRef]
  14. Bertsimas, D., Pawlowski, C., & Zhuo, Y. D. (2018). From predictive methods to missing data imputation: An optimization approach. Journal of Machine Learning Research, 18, 1–39. [Google Scholar]
  15. Biju, A. K. V. N., Thomas, A. S., & Thasneem, J. (2024). Examining the research taxonomy of artificial intelligence, deep learning & machine learning in the financial sphere—A bibliometric analysis. Quality and Quantity, 58(1), 849–878. [Google Scholar] [CrossRef]
  16. Billios, D., Seretidou, D., & Stavropoulos, A. (2024). The power of numerical indicators in predicting bankruptcy: A systematic review. Journal of Risk and Financial Management, 17(10), 433. [Google Scholar] [CrossRef]
  17. Borges, A. F. S., Laurindo, F. J. B., Spínola, M. M., Gonçalves, R. F., & Mattos, C. A. (2021). The strategic use of artificial intelligence in the digital era: Systematic literature review and future research directions. International Journal of Information Management, 57, 102225. [Google Scholar] [CrossRef]
  18. Bouke, M. A., & Abdullah, A. (2023). An empirical study of pattern leakage impact during data preprocessing on machine learning-based intrusion detection models reliability. Expert Systems with Applications, 230, 120715. [Google Scholar] [CrossRef]
  19. Brock, J. K. U., & von Wangenheim, F. (2019). Demystifying Ai: What digital transformation leaders can teach you about realistic artificial intelligence. California Management Review, 61(4), 110–134. [Google Scholar] [CrossRef]
  20. Carmona, P., Climent, F., & Momparler, A. (2019). Predicting failure in the U.S. banking sector: An extreme gradient boosting approach. International Review of Economics and Finance, 61, 304–323. [Google Scholar] [CrossRef]
  21. Chang, V., Xu, Q. A., Chidozie, A., & Wang, H. (2024). Predicting economic trends and stock market prices with deep learning and advanced machine learning techniques. Electronics, 13(17), 3396. [Google Scholar] [CrossRef]
  22. Chao, L., Zhipeng, J., & Yuanjie, Z. (2019). A novel reconstructed training-set SVM with roulette cooperative coevolution for financial time series classification. Expert Systems with Applications, 123, 283–298. [Google Scholar] [CrossRef]
  23. Chi, D. J., & Chu, C. C. (2021). Artificial intelligence in corporate sustainability: Using lstm and gru for going concern prediction. Sustainability, 13(21), 1632. [Google Scholar] [CrossRef]
  24. Chinonyerem, C. A., Olalemi, A. A., Paul, M., Nwabunike, O. T., Eniola, O. S., Benjamin, A. O., Ukeje, U., & Seigha, I. B. (2025). Leveraging machine learning and data analytics to predict corporate financial distress and bankruptcy in the United States. Asian Journal of Advanced Research and Reports, 19(6), 65–78. [Google Scholar] [CrossRef]
  25. Cordón, I., Luengo, J., García, S., Herrera, F., & Charte, F. (2019). Smartdata: Data preprocessing to achieve smart data in R. Neurocomputing, 360, 1–13. [Google Scholar] [CrossRef]
  26. Dasilas, A., & Rigani, A. (2024). Machine learning techniques in bankruptcy prediction: A systematic literature review. Expert Systems with Applications, 255, 124761. [Google Scholar] [CrossRef]
  27. Delen, D., Kuzey, C., & Uyar, A. (2013). Measuring firm performance using financial ratios: A decision tree approach. Expert Systems with Applications, 40(10), 3970–3983. [Google Scholar] [CrossRef]
  28. Demirdöğen, G., Işik, Z., & Arayici, Y. (2020). Lean management framework for healthcare facilities integrating BIM, BEPS and big data analytics. Sustainability, 12(17), 7061. [Google Scholar] [CrossRef]
  29. Elahi, M., Afolaranmi, S. O., Martinez Lastra, J. L., & Perez Garcia, J. A. (2023). A comprehensive literature review of the applications of AI techniques through the lifecycle of industrial equipment. In Discover artificial intelligence (Vol. 3, Issue 1). Springer International Publishing. [Google Scholar] [CrossRef]
  30. Elhoseny, M., Metawa, N., Sztano, G., & El-hasnony, I. M. (2022). Deep learning-based model for financial distress prediction. Annals of Operations Research, 345(2), 885–907. [Google Scholar] [CrossRef]
  31. Gajdosikova, D., & Michulek, J. (2025). Artificial intelligence models for bankruptcy prediction in agriculture: Comparing the performance of artificial neural networks and decision trees. Agriculture, 15(10), 1077. [Google Scholar] [CrossRef]
  32. Gajdosikova, D., Valaskova, K., & Lazaroiu, G. (2024). The relevance of sectoral clustering in corporate debt policy: The case study of slovak enterprises. Administrative Sciences, 14(2), 26. [Google Scholar] [CrossRef]
  33. Geng, R., Bose, I., & Chen, X. (2015). Prediction of financial distress: An empirical study of listed Chinese companies using data mining. European Journal of Operational Research, 241(1), 236–247. [Google Scholar] [CrossRef]
  34. Gholampoor, H., & Asadi, M. (2024). Risk analysis of bankruptcy in the U.S. healthcare industries based on financial ratios: A machine learning analysis. Journal of Theoretical and Applied Electronic Commerce Research, 19(2), 1303–1320. [Google Scholar] [CrossRef]
  35. Gregova, E., Valaskova, K., Adamko, P., Tumpach, M., & Jaros, J. (2020). Predicting financial distress of slovak enterprises: Comparison of selected traditional and learning algorithms methods. Sustainability, 12(10), 3954. [Google Scholar] [CrossRef]
  36. Hamdi, M., Mestiri, S., & Arbi, A. (2024). Artificial intelligence techniques for bankruptcy prediction of tunisian companies: An application of machine learning and deep learning-based models. Journal of Risk and Financial Management, 17(4), 132. [Google Scholar] [CrossRef]
  37. Hassoun, A., Aït-Kaddour, A., Abu-Mahfouz, A. M., Rathod, N. B., Bader, F., Barba, F. J., Biancolillo, A., Cropotova, J., Galanakis, C. M., Jambrak, A. R., Lorenzo, J. M., Måge, I., Ozogul, F., & Regenstein, J. (2023). The fourth industrial revolution in the food industry—Part I: Industry 4.0 technologies. Critical Reviews in Food Science and Nutrition, 63(23), 6547–6563. [Google Scholar] [CrossRef]
  38. Helal, M. A., Ismail, M. A.-A., & Moubarak, H. (2025). Pioneer Jones vs the modifiers: Case of detecting accrual-based earnings management using advanced machine learning classifiers in an emerging economy. Journal of Financial Reporting and Accounting. Available online: https://www.emerald.com/jfra/article-abstract/doi/10.1108/JFRA-12-2024-0902/1275983/Pioneer-Jones-vs-the-modifiers-case-of-detecting?redirectedFrom=fulltext (accessed on 10 November 2025). [CrossRef]
  39. Hevner, A. R. (2007). A three cycle view of design science research. Scandinavian Journal of Information Systems, 19(2), 4. [Google Scholar]
  40. Hezam, Y., Luong, H., & Anthonysamy, L. (2025). Machine learning in predicting firm performance: A systematic review. China Accounting and Finance Review, 27(3), 309–339. [Google Scholar] [CrossRef]
  41. Hosaka, T. (2019). Bankruptcy prediction using imaged financial ratios and convolutional neural networks. Expert Systems with Applications, 117, 287–299. [Google Scholar] [CrossRef]
  42. Hu, W., Shao, C., & Zhang, W. (2025). Predicting U.S. bank failures and stress testing with machine learning algorithms. Finance Research Letters, 75, 106802. [Google Scholar] [CrossRef]
  43. Huang, J., Chai, J., & Cho, S. (2020). Deep learning in finance and banking: A literature review and classification. Frontiers of Business Research in China, 14(1), 13. [Google Scholar] [CrossRef]
  44. Huang, Y. P., & Yen, M. F. (2019). A new perspective of performance comparison among machine learning algorithms for financial distress prediction. Applied Soft Computing Journal, 83, 105663. [Google Scholar] [CrossRef]
  45. Jabeur, S. B., Gharib, C., Mefteh-Wali, S., & Arfi, W. B. (2021). CatBoost model and artificial intelligence techniques for corporate failure prediction. Technological Forecasting and Social Change, 166, 120658. [Google Scholar] [CrossRef]
  46. Kenetey, G., & Popesko, B. (2024). Budgetary control and the adoption of consortium blockchain monitoring system in the Ghanaian local government. International Journal of Public Sector Management, 38(1), 12–29. [Google Scholar] [CrossRef]
  47. Kim, H., Cho, H., & Ryu, D. (2020). Corporate default predictions using machine learning: Literature review. Sustainability, 12(16), 6325. [Google Scholar] [CrossRef]
  48. Kristanti, F. T., Febrianta, M. Y., Salim, D. F., Riyadh, H. A., Sagama, Y., & Beshr, B. A. H. (2024). Advancing financial analytics: Integrating XGBoost, LSTM, and random forest algorithms for precision forecasting of corporate financial distress. Journal of Infrastructure, Policy and Development, 8(8), 4972. [Google Scholar] [CrossRef]
  49. Kureljusic, M., & Karger, E. (2024). Forecasting in financial accounting with artificial intelligence—A systematic literature review and future research agenda. Journal of Applied Accounting Research, 25(1), 81–104. [Google Scholar] [CrossRef]
  50. Lecun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521(7553), 436–444. [Google Scholar] [CrossRef] [PubMed]
  51. Lee, J., Jang, D., & Park, S. (2017). Deep learning-based corporate performance prediction model considering technical capability. Sustainability, 9(6), 899. [Google Scholar] [CrossRef]
  52. Li, J., & Sun, Z. (2023). Application of deep learning in recognition of accrued earnings management. Heliyon, 9(3), E13664. [Google Scholar] [CrossRef]
  53. Li, Z., Crook, J., & Andreeva, G. (2014). Chinese companies distress prediction: An application of data envelopment analysis. Journal of the Operational Research Society, 65(3), 466–479. [Google Scholar] [CrossRef]
  54. Lin, W. C., & Tsai, C. F. (2020). Missing value imputation: A review and analysis of the literature (2006–2017). Artificial Intelligence Review, 53(2), 1487–1509. [Google Scholar] [CrossRef]
  55. Madanchian, M. (2024). Generative AI for consumer behavior prediction: Techniques and applications. Sustainability, 16(22), 9963. [Google Scholar] [CrossRef]
  56. Manogna, R. L., & Mishra, A. K. (2021). Measuring financial performance of Indian manufacturing firms: Application of decision tree algorithms. Measuring Business Excellence, 26(3), 288–307. [Google Scholar] [CrossRef]
  57. Moubarak, H. M. R. (2024). Detecting the probability of fraud in interim financial statements using machine learning models: Do correlation-based analysis and principal component analysis for dimensionality reduction matter? Alexandria Journal of Accounting Research, 8(3), 87–138. [Google Scholar] [CrossRef]
  58. Mousa, G. A., Elamir, E. A. H., & Hussainey, K. (2022). Using machine learning methods to predict financial performance: Does disclosure tone matter? International Journal of Disclosure and Governance, 19(1), 93–112. [Google Scholar] [CrossRef]
  59. Omar, M. A., Gomaa, I. I., Moubarak, H., & Sabry, S. H. (2025). Predicting corporate financial performance in artificial intelligence era: A comprehensive bibliometric study. Journal of Financial Reporting and Accounting. Available online: https://www.emerald.com/jfra/article-abstract/doi/10.1108/JFRA-11-2024-0891/1262454/Predicting-corporate-financial-performance-in?redirectedFrom=fulltext (accessed on 10 November 2025).
  60. Ozbayoglu, A. M., Gudelek, M. U., & Sezer, O. B. (2020). Deep learning for financial applications: A survey. Applied Soft Computing Journal, 93, 106384. [Google Scholar] [CrossRef]
  61. Ranta, M., Ylinen, M., & Järvenpää, M. (2023). Machine learning in management accounting research: Literature review and pathways for the future. European Accounting Review, 32(3), 607–636. [Google Scholar] [CrossRef]
  62. Rickert, C. A., Henkel, M., & Lieleg, O. (2023). An efficiency-driven, correlation-based feature elimination strategy for small datasets. APL Machine Learning, 1(1), 016105. [Google Scholar] [CrossRef]
  63. Riskiyadi, M. (2024). Detecting future financial statement fraud using a machine learning model in Indonesia: A comparative study. Asian Review of Accounting, 32(3), 394–422. [Google Scholar] [CrossRef]
  64. Rundo, F., Trenta, F., di Stallo, A. L., & Battiato, S. (2019). Machine learning for quantitative finance applications: A survey. Applied Sciences, 9(24), 5574. [Google Scholar] [CrossRef]
  65. Sabry, S. H., & Ibrahim, Y. (2024). Machine learning on trial: Assessing its efficacy in detecting financial statement fraud. International Journal of Auditing and Accounting Studies, 6(2), 159–186. [Google Scholar] [CrossRef]
  66. Saito, T., & Rehmsmeier, M. (2015). The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE, 10(3), e0118432. [Google Scholar] [CrossRef] [PubMed]
  67. Sarker, I. H. (2022). AI-based modeling: Techniques, applications and research issues towards automation, intelligent and smart systems. SN Computer Science, 3(2), 158. [Google Scholar] [CrossRef]
  68. Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117. [Google Scholar] [CrossRef]
  69. Shamsuddin, S. N., Ismail, N., & Nur-Firyal, R. (2023). Life insurance prediction and its sustainability using machine learning approach. Sustainability, 15(13), 737. [Google Scholar] [CrossRef]
  70. Shetty, S., Musa, M., & Brédart, X. (2022). Bankruptcy prediction using machine learning techniques. Journal of Risk and Financial Management, 15(1), 35. [Google Scholar] [CrossRef]
  71. Sokolova, M., & Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing and Management, 45(4), 427–437. [Google Scholar] [CrossRef]
  72. Son, H., Hyun, C., Phan, D., & Hwang, H. J. (2019). Data analytic approach for bankruptcy prediction. Expert Systems with Applications, 138, 112816. [Google Scholar] [CrossRef]
  73. Vieira, S. M., Kaymak, U., & Sousa, J. M. C. (2010, July 18–23). Cohen’s kappa coefficient as a performance measure for feature selection. 2010 IEEE World Congress on Computational Intelligence, WCCI 2010, Barcelona, Spain. [Google Scholar] [CrossRef]
  74. Xia, J. C., Xie, F., Zhang, Y., & Caulfield, C. (2013). Artificial intelligence and data mining: Algorithms and applications. Abstract and Applied Analysis, 2013, 524720. [Google Scholar] [CrossRef]
  75. Yang, Y., & Wang, H. (2025). Random forest-based machine failure prediction: A performance comparison. Applied Sciences, 15(16), 8841. [Google Scholar] [CrossRef]
  76. Yaseen, Z. M., Ali, Z. H., Salih, S. Q., & Al-Ansari, N. (2020). Prediction of risk delay in construction projects using a hybrid artificial intelligence model. Sustainability, 12(4), 1514. [Google Scholar] [CrossRef]
  77. Ying, X. (2019). An overview of overfitting and its solutions. Journal of Physics: Conference Series, 1168(2), 022022. [Google Scholar] [CrossRef]
  78. Zaini, B. J., & Mahmuddin, M. (2019). Classifying firms’ performance using data mining approaches. International Journal of Supply Chain Management, 8(1), 690–696. [Google Scholar]
  79. Zhang, B., Shi, H., & Wang, H. (2023). Machine learning and AI in cancer prognosis, prediction, and treatment selection: A critical approach. Journal of Multidisciplinary Healthcare, 16, 1779–1791. [Google Scholar] [CrossRef] [PubMed]
  80. Zhang, W., Cao, Q., & Schniederjans, M. J. (2004). Neural network earnings per share forecasting models: A comparative analysis of alternative methods. Decision Sciences, 35(2), 205–237. [Google Scholar] [CrossRef]
  81. Zhao, J., Ouenniche, J., & De Smedt, J. (2024). Survey, classification and critical analysis of the literature on corporate bankruptcy and financial distress prediction. Machine Learning with Applications, 15, 100527. [Google Scholar] [CrossRef]
  82. Zhou, L., Lai, K. K., & Yen, J. (2014). Bankruptcy prediction using SVM models with a new approach to combine features selection and parameter optimisation. International Journal of Systems Science, 45(3), 241–253. [Google Scholar] [CrossRef]
  83. Zhou, M., Liu, H., & Hu, Y. (2022). Research on corporate financial performance prediction based on self-organizing and convolutional neural networks. Expert Systems, 39(9), 1–17. [Google Scholar] [CrossRef]
Figure 1. Design science research (DSR) cycle. Source: Authors’ own contribution.
Figure 1. Design science research (DSR) cycle. Source: Authors’ own contribution.
Jrfm 19 00051 g001
Figure 2. Correlation matrix heat map. Note: *, **, and *** denote statistical significance at the 10%, 5%, and 1% levels, respectively.
Figure 2. Correlation matrix heat map. Note: *, **, and *** denote statistical significance at the 10%, 5%, and 1% levels, respectively.
Jrfm 19 00051 g002
Figure 3. Confusion matrix for RF model.
Figure 3. Confusion matrix for RF model.
Jrfm 19 00051 g003
Figure 4. Confusion matrix for XGBoost model.
Figure 4. Confusion matrix for XGBoost model.
Jrfm 19 00051 g004
Figure 5. Confusion matrix for SVM model.
Figure 5. Confusion matrix for SVM model.
Jrfm 19 00051 g005
Figure 6. Confusion matrix for DNN model.
Figure 6. Confusion matrix for DNN model.
Jrfm 19 00051 g006
Figure 7. ROC curve for RF model.
Figure 7. ROC curve for RF model.
Jrfm 19 00051 g007
Figure 8. ROC curve for XGBoost model.
Figure 8. ROC curve for XGBoost model.
Jrfm 19 00051 g008
Figure 9. ROC curve for SVM model.
Figure 9. ROC curve for SVM model.
Jrfm 19 00051 g009
Figure 10. ROC curve for DNN model.
Figure 10. ROC curve for DNN model.
Jrfm 19 00051 g010
Figure 11. Feature importance across models.
Figure 11. Feature importance across models.
Jrfm 19 00051 g011
Table 1. Input features list.
Table 1. Input features list.
FeatureDefinitionsData SourceReference
Quick Ratio(Current Assets − Inventory) ÷ Current LiabilitiesCompustat(Delen et al., 2013; Geng et al., 2015)
Debt-to-Equity Ratio Total Debt ÷ Total EquityCompustat(Gajdosikova et al., 2024)
Return on Assets (ROA)Net Income ÷ Total AssetsCompustat(Delen et al., 2013; Z. Li et al., 2014)
Return on Equity (ROE)Net Income ÷ Total Equity Compustat(Delen et al., 2013)
Working Capital Turnover Sales ÷ (Current Assets − Current Liabilities)Compustat(Delen et al., 2013)
Current Assets Turnover Sales ÷ Current AssetsCompustat(Delen et al., 2013)
Fixed Assets Turnover Sales ÷ Fixed AssetsCompustat(Delen et al., 2013)
Assets TurnoverSales ÷ Total AssetsCompustat(Delen et al., 2013)
Equity TurnoverSales ÷ Total Equity Compustat(Delen et al., 2013)
Operating Cash Flows RatioOperating Cash Flows ÷ Current Liabilities Compustat(Z. Li et al., 2014)
Operating Cash Flows to InterestOperating Cash Flows ÷ InterestCompustat(Z. Li et al., 2014)
Operating Cash Flows to Total AssetsOperating Cash Flows ÷ Total AssetsCompustat(Arora & Saurabh, 2022)
Short Term Debt Ratio Current Liabilities ÷ Total LiabilitiesCompustat(Delen et al., 2013)
Inventory TurnoverCOGS ÷ Average InventoryCompustat(Delen et al., 2013)
Operating Margin EBIT ÷ SalesCompustat(Arora & Saurabh, 2022)
Table 2. Observation distribution by country.
Table 2. Observation distribution by country.
CountryTotal ObservationsClass A (0)Class B (1)Class C (2)
Saudi Arabia1747579572596
Egypt1526504501521
Jordan1173390382401
Kuwait922309300313
Oman808270261277
UAE629210206213
Morocco 387129125133
Tunisia 319108101110
Qatar269918494
Bahrain191626069
Total7971265225922727
Table 3. Descriptive statistics.
Table 3. Descriptive statistics.
FeaturesMeanStdMin25%50%75%Max
Quick Ratio2.24413812.1519100.6427121.0817131.880524673.5
Debt to Equity1.21533511.4995−628.1760.2821010.6978781.470877584.7252
Short-Term Debt Ratio0.6993430.25220200.513680.760550.9194591.903832
ROE0.0443991.119285−50.33590.0067180.0712250.15232639.15686
ROA0.024970.224285−11.56480.0003470.0354130.0789580.757847
Operating Margin−0.5277717.53445−1017.250.0058070.0836380.1820348.66667
Asset Turnover0.6256040.601206−0.78050.2426940.5019830.83083813.1094
Inventory Turnover78.009991248.304−49.23253.2088886.31035418.4174862731.3
Working Capital Turnover6.583517656.9538−15931.1−0.003891.7007154.18077753928
Current Assets Turnover1.595811.720355−4.963410.7807571.2687771.92186670.66667
Fixed Assets Turnover5.639975274.4232−1.52260.3596660.9163032.23897524403
Equity Turnover1.4624515.537002−206.2160.3619830.9082031.734698187.9949
Operating CF Ratio0.4512665.468825−152.4630.0262620.2455250.63068361.7
Operating CF to Total Assets0.0656320.16278−9.965450.0086320.06080.1211551.336859
Operating CF to Interest85.330231746.904−317220.8660354.96774218.86171108923
Table 4. Multicollinearity Results.
Table 4. Multicollinearity Results.
FeatureVIFFeatureVIF
Debt to Equity4.049229706Operating Margin1.031507546
ROE2.893773344Operating CF Ratio1.026062391
Equity Turnover2.033905781Quick Ratio1.019462332
Asset Turnover1.602556387Fixed Assets Turnover1.008773601
ROA1.547796170OCF Interest1.005475928
OCF TA1.520446235Inventory Turnover1.001390774
Current Assets Turnover1.316247099Working Capital Turnover1.000371071
Short-Term Debt Ratio1.134683447
Table 5. Performance evaluation of classification models.
Table 5. Performance evaluation of classification models.
Algorithm Overall
Accuracy
Overall ErrorCohen’s KappaCorrectly
Classified
Incorrectly Classified
RFs0.7355890.2644110.603389587211
XGBoost0.7543860.2456140.631639602196
SVMs0.6804510.3195490.521528543255
DNNs0.7205510.2794490.580633575223
Table 6. Model performance evaluation for classes.
Table 6. Model performance evaluation for classes.
ClassesAlgorithmTrue
Positives
False PositivesTrue
Negatives
False
Negatives
Type I
Error
Type II ErrorRecall or
Sensitivity
PrecisionSpecificityF Measure
Class ARFs19823509830.0430.2560.7440.8960.9570.813
XGBoost20022510660.0410.2480.7520.9010.9590.819
SVMs18325507830.0470.3120.6870.8790.9530.772
DNNs19628504700.0530.2630.7370.8750.9470.800
Class BRFs174115424850.2130.3280.6720.6020.7860.635
XGBoost185108431740.2000.2860.7140.6310.7990.670
SVMs183166373760.3080.2930.7060.5240.6920.602
DNNs161112427980.2080.3780.6220.5890.7920.605
Class CRFs21573452580.1390.2120.7870.7460.8610.766
XGBoost21766459560.1260.2050.7950.7670.8740.781
SVMs17764461960.1220.3520.6480.7340.8780.689
DNNs21883442550.1580.2010.7990.7240.8420.759
The true positives (TPs), or actual positives, are the correct positive classification of the model for the actual class. The true negatives (TNs), or actual negatives, are the correct negative classification of the model for the actual class. The false negatives (FNs), or actual positives, are the incorrect negative classifications of the model for the actual class. The false positives (FPs), or actual negatives, are the incorrect positive classifications of the model for the actual class.
Table 7. ROC-AUC and AP results.
Table 7. ROC-AUC and AP results.
ModelROC-AUC MacroROC-AUC MicroAP MacroAP Micro
RFs0.9066390.9121990.8272220.84518
XGBoost0.90560.9112130.8224280.843206
SVMs0.8584280.8695360.7433140.776952
DNNs0.8656290.8799470.7392380.789038
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Omar, M.A.; Gomaa, I.I.; Sabry, S.H.; Moubarak, H. Artificial Intelligence’s Role in Predicting Corporate Financial Performance: Evidence from the MENA Region. J. Risk Financial Manag. 2026, 19, 51. https://doi.org/10.3390/jrfm19010051

AMA Style

Omar MA, Gomaa II, Sabry SH, Moubarak H. Artificial Intelligence’s Role in Predicting Corporate Financial Performance: Evidence from the MENA Region. Journal of Risk and Financial Management. 2026; 19(1):51. https://doi.org/10.3390/jrfm19010051

Chicago/Turabian Style

Omar, Mayar A., Ismail I. Gomaa, Sara H. Sabry, and Hosam Moubarak. 2026. "Artificial Intelligence’s Role in Predicting Corporate Financial Performance: Evidence from the MENA Region" Journal of Risk and Financial Management 19, no. 1: 51. https://doi.org/10.3390/jrfm19010051

APA Style

Omar, M. A., Gomaa, I. I., Sabry, S. H., & Moubarak, H. (2026). Artificial Intelligence’s Role in Predicting Corporate Financial Performance: Evidence from the MENA Region. Journal of Risk and Financial Management, 19(1), 51. https://doi.org/10.3390/jrfm19010051

Article Metrics

Back to TopTop