Machine Learning Analysis of Financial Risk Dynamics in Micro-, Small, and Medium Enterprises

Božović, Dražen; Perović, Nataša; Aleksić, Marinko; Rašović, Ivana; Iker, Oto

doi:10.3390/risks13120240

Open AccessArticle

Machine Learning Analysis of Financial Risk Dynamics in Micro-, Small, and Medium Enterprises

by

Dražen Božović

¹,

Nataša Perović

^1,*

,

Marinko Aleksić

²,

Ivana Rašović

¹ and

Oto Iker

²

¹

Faculty of Business Economics and Law, University “Adriatic”, Rista Lekića 16, 85000 Bar, Montenegro

²

Faculty of Maritime Studies and Tourism, University “Adriatic”, 85000 Bar, Montenegro

^*

Author to whom correspondence should be addressed.

Risks 2025, 13(12), 240; https://doi.org/10.3390/risks13120240

Submission received: 27 October 2025 / Revised: 26 November 2025 / Accepted: 28 November 2025 / Published: 5 December 2025

Download

Browse Figures

Versions Notes

Abstract

This study examines the use of artificial neural networks (ANNs) to classify financial risks in micro-, small-, and medium-sized enterprises (MSMEs) in Montenegro and the wider Western Balkan region. The economies in this region share structural similarities, such as a high concentration of MSMEs, limited access to finance, and vulnerability to macroeconomic volatility, which make financial risk assessment particularly challenging. Traditional statistical and econometric methods often fail to capture the complex, nonlinear interdependencies among financial and operational indicators, resulting in the inaccurate classification of high-risk MSMEs. By applying advanced machine learning (ML) techniques, neural networks (NNs) can identify intricate patterns in multidimensional financial data, significantly improving the accuracy and reliability of risk classification. In this research, a predictive model was developed using key financial and operational variables of MSMEs, enabling the accurate classification of MSMEs in terms of financial instability and insolvency. Empirical validation shows that NNs outperform conventional methods in accuracy, sensitivity, and generalisation. This approach offers tangible benefits for investors, credit institutions, and MSME managers, supporting improvements in early warning systems, optimisation of credit decision-making, and strengthening MSMEs’ financial resilience and sustainability. The methodology also advances risk quantification tools, providing robust indicators for strategic planning and resource management. By focusing the analysis on Montenegro and the Western Balkans, this study demonstrates that regional economic and structural similarities support the adaptation of NN models for precise financial risk classification, offering actionable insights to enhance MSME performance and regional economic stability.

Keywords:

financial risk pattern; classification; neural networks; micro, small- and medium-sized enterprises

1. Introduction

Micro-, small-, and medium-sized enterprises (MSMEs) play a pivotal role as drivers of economic growth and innovation across various sectors. Despite their importance, MSMEs face dynamic challenges and high levels of uncertainty, which can significantly threaten their survival and success. The ability to identify, assess, and manage risks effectively is critical for maintaining continuity and competitiveness in an increasingly complex and unpredictable business environment. Risk management involves systematic identification, analysis, and appropriate response to threats and opportunities that may impact organisational objectives. While larger organisations often integrate risk management through dedicated teams and substantial resources, MSMEs face limitations in expertise, time, and capital. ISO 31000:2018 provides guidelines, principles, and frameworks adaptable to different enterprises, sectors, and types of risks throughout the organisational lifecycle, emphasising the importance of integrating risk management into decision-making processes at all organisational levels (ISO 2018). Furthermore, in the context of the Western Balkans, empirical evidence indicates that MSMEs often experience constrained access to finance, high financing costs, and limited financial literacy, which amplifies their vulnerability and underscores the necessity of precise risk assessment (European Investment Bank [EIB] 2016; Ahmeti and Fetai 2021). Recent prospective analyses of MSMEs in the Western Balkan (6) countries highlight that both financial and non-financial obstacles continue to restrict enterprise growth, with the most effective support stemming from tailored financial instruments, equity participation, guarantees, and advisory services (Atanasijević et al. 2021).

In Montenegro, the classification of MSMEs is based on clearly defined economic criteria established by the Institute for Standardization of Montenegro (Institute for Standardization of Montenegro [ISM] 2025), in accordance with European practice and European Commission Recommendation 2003/361/EC (European Commission [EU] 2003). This categorisation relies on three main indicators: number of employees, annual turnover, and total asset value. According to current standards, micro-enterprises have up to 10 employees, an annual turnover of up to EUR 700,000, and assets not exceeding EUR 350,000. Small enterprises employ between 10 and 50 people, with an annual turnover from EUR 700,000 to EUR 8,000,000 and assets between EUR 350,000 and EUR 4,000,000. Medium-sized enterprises employ between 50 and 250 people, generate an annual turnover between EUR 8,000,000 and EUR 40,000,000, and have assets valued between EUR 4,000,000 and EUR 20,000,000. This classification forms the basis for differentiated economic policies, access to financial incentives and credit lines, and serves as a framework for monitoring the competitiveness and sustainability of the entrepreneurial sector in Montenegro.

The application of advanced methods, including scenario analysis, SWOT analysis, historical data evaluation, and particularly machine learning (ML) techniques such as artificial neural networks (ANNs), enables MSMEs to gain a comprehensive understanding of potential risks.

This approach facilitates the identification of critical threats that may seriously affect business performance and long-term development. The objective of this study is to apply artificial intelligence (AI), with a focus on NNs, for risk assessment, classification, prediction, and mitigation, providing MSMEs with valuable insights to select optimal strategies in line with their capacities and business priorities.

Effective risk management requires not only the formalisation of processes and procedures but also the development of a risk-aware organisational culture that fosters proactive thinking among employees and key stakeholders, thereby enhancing overall organisational resilience.

Given the dynamic nature of risks, continuous monitoring, evaluation, and prediction are essential for sustaining stable operations.

As a case study, this research analyses data from 2478 MSMEs in Montenegro, aiming to apply ML tools within an AI framework to develop a tailored methodology for assessing, classifying, predicting, and mitigating risks effectively.

This approach enables proactive responses to potential threats and ensures the long-term survival and sustainable success of enterprises.

The first section of this paper reviews the relevant literature focusing on the application of NNs and other AI methods for risk assessment and management in MSMEs. This literature review identifies existing approaches and their advantages and limitations and provides a foundation for developing the methodological framework of this study. The second section details the dataset, methodological approach, and applied metrics, while the third section presents the research results. The fourth section discusses and interprets the findings, and the conclusion summarises this study’s limitations, practical implications, and recommendations for future research.

1.1. Application of NNs in Financial Modelling of MSMEs

Neural networks (NNs) are a core methodology in AI, and they are particularly well suited to assessing and classifying financial risks in MSMEs. These computational models are inspired by the structure and function of the human brain, consisting of interconnected processing units—neurons—that process information through adaptive learning procedures. In the context of MSMEs, NNs can identify complex patterns and interdependencies within financial and operational datasets, enabling accurate detection of firms at elevated risk of financial instability or insolvency. Their ability to model nonlinear and multidimensional relationships makes them especially effective in risk analysis for enterprises, where conventional statistical approaches often struggle to process the full range of relevant variables efficiently.

The use of NNs in MSMEs supports the predictive analysis of key financial indicators, including turnover, profitability, workforce size, and operational costs, while also allowing integration with categorical variables such as enterprise type and banking status.

Employing NNs enables detailed risk mapping, identification of vulnerable enterprises, and provision of empirically grounded recommendations for strategic decision-making.

Successful implementation of NNs requires careful data collection and preprocessing, selection of an appropriate network architecture, and optimisation of hyperparameters. Challenges such as overfitting, particularly common in small datasets, are addressed using techniques such as regularisation, cross-validation, and parameter tuning. Overall, the application of NNs in MSMEs enhances early warning systems, optimises financial decision-making, and strengthens enterprise resilience, thus contributing to sectoral sustainability and long-term development.

NNs are widely used in the literature to predict the profitability, growth, and financial stability of MSMEs. Models developed for estimating small and medium enterprise (SME) profits demonstrate the superiority of NN-based approaches over traditional statistical methods due to their ability to recognise nonlinear relationships in financial data (Batra and Chauhan 2025). Similarly, NNs have been applied to predict the growth of micro and small enterprises, highlighting their effectiveness in identifying factors directly influencing business development (Garcia Vidal et al. 2017).

The use of NN ensembles has been demonstrated for evaluating the economic conditions of SMEs, showing that combining multiple models enhances predictive accuracy and reduces classification errors (Burda et al. 2007). The integration of AI into business functions further enables improved real-time decision-making and process optimisation (Le Dinh et al. 2025), while deep learning (DL) algorithms are applied to SME management models to capture complex data patterns and support long-term risk forecasting (Wang 2021).

NNs have also been used to anticipate business crises within MSMEs in the Arab region, demonstrating their ability to identify financially unstable enterprises early, thereby enabling timely intervention (Rao 2018). Analyses of SME profitability using neural models confirm that NNs provide precise identification of critical financial variables influencing firm performance (Nastac et al. 2017).

1.2. Credit Risk Prediction and Analysis

Credit risk assessment is a critical area for the application of NNs in MSMEs. A genetic backpropagation (BP) NN model has been implemented for credit risk evaluation, demonstrating the algorithm’s ability to integrate multiple data sources and reduce classification errors for credit-risky firms (Chen et al. 2024). Incorporating soft information into neural models has been shown to enhance SME credit scoring systems (Li et al. 2021), while BP NNs have been used for precise risk classification and to support strategic planning (Li et al. 2022).

Fuzzy neural networks (FNNs) have been used for credit risk assessment, effectively addressing uncertainty and imprecision in SME financial datasets (Gong 2017). In addition, relational graph attention networks have been applied to credit risk prediction, leveraging enterprise connectivity data to improve model accuracy (Wang et al. 2024). These approaches highlight the growing trend of integrating advanced NNs into creditworthiness evaluation and risk assessment for micro and small enterprises.

1.3. Prediction of MSME Growth, Sustainability, and Performance

Predicting MSME growth often involves evaluating key factors such as innovation, strategic flexibility, and organisational agility. NNs have been shown to facilitate the analysis of the effects of innovation and strategic flexibility on sustainable MSME growth (Arsawan et al. 2022; Todri et al. 2020). Neural models have also been applied to identify value drivers in rural business contexts, highlighting regional and market-specific characteristics (Vrbka 2020).

Self-Organising Maps (SOMs) have been used to assess post-pandemic MSME performance, demonstrating that NNs enable the visualisation of trends and the identification of the most vulnerable enterprises (Martinez et al. 2023). Comparative analyses of NNs and linear regression algorithms in sales prediction confirm the superior performance of neural models in capturing nonlinear data patterns (Taufiqih and Ambarwati 2024).

The development of innovation platforms for SMEs based on social perception and NN algorithms further demonstrates the potential of integrating AI into digital services and innovation processes.

1.4. AI Implementation and Challenges in SMEs

The adoption of AI technologies in SMEs faces several challenges, including technical constraints, employee competencies, and alignment with organisational needs. Implementing AI requires adapting business processes and evaluating employee competencies in line with strategic objectives (Oldemeyer et al. 2024; Vezeteu and Năstac 2024).

Structural Equation Modelling (SEM)–NN analyses have been used to assess the impact of AI technologies on sustainable SME performance, confirming that NNs enhance operational efficiency and competitiveness (Soomro et al. 2025). Analyses of technological readiness in women-led SMEs show that NNs can identify factors influencing technology adoption and predict enterprise readiness for digital transformation (Silitonga et al. 2025).

Studies on the current state of AI adoption in SMEs reveal varying acceptance levels, with significant differences depending on available resources and sector, highlighting the need for tailored implementation strategies (Schwaeke et al. 2024). Factors facilitating or hindering AI adoption in MSMEs have been identified, particularly in the context of sustainable business practices (Jamal et al. 2025).

Reports on AI implementation trends and challenges emphasise that resource and expertise constraints are especially pronounced in micro- and small enterprises (MSEs), necessitating strategic planning for effective adoption (Spallone and Bandiera 2024; Bianchini and Michalkova 2019).

1.5. Systematic Reviews and Research Trends

Continuous growth in AI and NN applications within MSMEs has been documented, particularly in predicting financial risks, growth, and strategic decision-making (Pamungkas et al. 2023; Oldemeyer et al. 2024; Fajarika et al. 2024). The importance of DL algorithms and the integration of economic, technological, and organisational factors into neural models has been highlighted as a means to improve prediction accuracy and practical relevance in business environments (Wang 2021; Padilla-Ospina et al. 2021).

Unlike previous studies, this research introduces a comprehensive, empirically grounded framework for classifying financial risk and growth factors in MSMEs using advanced NNs. Earlier work has mainly focused on specific aspects such as credit scoring, profitability, or individual growth indicators (Batra and Chauhan 2025; Li et al. 2022), whereas this study integrates multi-dimensional business classification within a unified neural framework, addressing both risk and growth holistically.

The novelty of this work is demonstrated by a case study involving 2478 MSMEs from Montenegro, providing an extensive dataset for rigorous empirical validation and comparative analysis of the models in a regional context characterised by small, developing economies with shared structural challenges. The analysis employs two NN architectures—NNET (NNET 2025) and Multi-Layer Perceptron (MLP 2025)—with results presented comparatively, enabling both quantitative and qualitative evaluation of each model’s predictive accuracy, sensitivity, and robustness in classifying MSMEs according to financial risk and growth potential.

This methodology leverages heterogeneous data sources, including financial statements, organisational characteristics, and innovation-related variables, processed through advanced NN architectures and relational graph algorithms. This enables the precise identification of nonlinear relationships and latent patterns that conventional methods often fail to detect (Wang et al. 2024). Additionally, the framework explicitly links the technical execution of neural models with organisational and human factors, bridging the gap between analytical complexity and practical AI implementation in MSMEs. By enabling the simultaneous classification of financial risk, operational performance, and growth potential, the approach provides an innovative foundation for informed strategic decision-making and sustainable enterprise development (Oldemeyer et al. 2024; Vezeteu and Năstac 2024).

Importantly, this study addresses the challenges faced by small economies in the Western Balkans, where MSMEs dominate the business landscape but often operate with limited resources and financial vulnerability. The large-scale, multi-dimensional dataset used in this study ensures robust model training and validation, enhancing the reliability and generalisability of the results. By integrating economic, organisational, and innovation dimensions, the approach provides a novel, practical tool for policymakers, credit institutions, and enterprise managers, supporting early warning systems, targeted interventions, and regional economic resilience:

size—Categorical variable defining the size of the MSME (micro, small, and medium) according to the official classification.
crew—Credit rating assigned to the enterprise, derived using the methodology of CompanyWall Business and Solvent Rating d.o.o. The variable is classified into categories A1–E3, each reflecting a specific level of risk and probability of insolvency. This variable does not represent the number of employees nor any aspect of the workforce structure; it exclusively reflects the financial and economic standing of the enterprise.
abank—Company status according to the Central Registry of the Central Bank of Montenegro (CBCG); values: “yes” (active) or “no” (blocked).
turn—Total annual turnover in EUR.
net—Net profit in EUR.
emp—Number of employees in the MSME.
ebit—Earnings before interest and taxes.
coste—Fuel and energy expenses.
mcoste—Transport and maintenance expenses.

A credit rating “crew” is a set of objectified and standardized data that covers the entire business activity of a company. It is determined based on a special methodology that includes a financial and economic analysis of the business activities of the past year.

According to CompanyWall Business (CompanyWall 2025), the credit rating is based on a combination of static and dynamic assessments. The static rating is derived from an analysis of key financial performance and liquidity indicators reported in the annual statement, while the dynamic rating captures changes over time and reflects the company’s ability to adapt and develop in the future.

A dynamic rating represents a variable value that is determined by variable factors such as the following:

The time the subject has spent in the blockade;
The number and outcome of legal proceedings in which the subject is involved;
Tax and other debts;
Liquidity;
The ratio of income in the public and private sectors.

The credit rating enables faster, simpler, and more efficient decision-making when selecting a business partner, analysing competition, and assessing risks. It supports the qualitative analysis of the specifics of an individual MSME.

Solvent Rating d.o.o. Podgorica (Solvent Rating 2025) has developed a model for MSME credit rating (Table 1) based on European standards. It is supported by leading experts in the field, who have fully adapted the methodology to the needs of the Montenegrin market. The methodology considers publicly available data, the most important of which are as follows: the MSME’s financial transactions, indicators derived from the financial statements, the status of the MSME, data on foreclosures, and other data.

MSMEs are categorized into the mentioned classes and categories above based on seven indicators, whereby the extent to which each indicator influences the overall credit rating class is determined. Since the methodology is adapted to the Montenegrin market, each of the seven indicators has a different influence on the credit rating class of MSMEs, which makes the credit rating different from the others.

The indicators based on which the credit rating is calculated are as follows:

Current liquidity ratio (GLR—General Liquidity Ratio) is calculated using the formula Current assets/liabilities. Participation in the calculation of the MSME’s credit rating is 9%. The short-term solvency indicator shows the extent to which current assets cover current liabilities. This indicator does not allow an assessment of the solvency of MSMEs but shows the factors that can influence it. The value of the indicator decreases when current liabilities grow faster than current assets, which may affect the increase in insolvency risk in the future.
The debt ratio (DR) is calculated using the formula (Total liabilities—Capital)/Total assets. Participation in the overall rating is 15%. The indicator provides information on how high the proportion of debt-financed assets is. The lower the value of the indicator, the more securely the MSME is financed.
The Ratio of Long-Term Sources and Fixed Assets and Inventories (RLTSFAI) is calculated using the following formula: (Capital + Non-current liabilities)/(Fixed assets + Inventories + Non-current assets held for sale and discontinued operations). Participation in the overall assessment of the credit rating class is 9%. This coefficient corresponds to the ratio between long-term debt and long-term assets and has the function of verifying the potential security of creditors, taking into account inventories as a possible means of settling long-term debt.
The solvent coefficient (SC) is calculated using the following formula: (Net comprehensive income + Other operating expenses (depreciation, amortization, provisions, and other operating expenses))/Long-term provisions and long-term liabilities. Participation in the overall assessment of the credit rating class is 9%.
The mean time of completion (in days) is calculated using the formula 365 × (Trade receivables/Change in value of finished goods and work in progress + Change in value of finished goods and work in progress + Income from own work capitalized and merchandise + Other income from operating activities). Participation represents the overall assessment of the credit rating class: 15%. It takes a certain amount of time for the company’s receivables to become cash. The indicator is expressed by the average number of days it takes to collect receivables.
Corporate profit represents operating profit. Participation in the total credit rating is 19%. Profitability rating is interpreted as net income before the burden of income taxes and interest.
The information on previous MSME blockades relates to 365 days up to the date of the financial report. In addition to the total duration of the blockade, it includes the uninterrupted duration of the period during which the MSME did not have a blockade. The share of the total credit rating class is 24%. Payment behaviour shows the habits of MSMEs regarding the settlement of liabilities in the past, i.e., in the last 365 days. In line with its importance, the indicator has the most points that influence the credit rating class of MSMEs.

Table 2 shows the data grouped according to the degree of the estimated “risk” (dependent variable) and the size of the MSMEa.

The data, grouped according to the degree of estimated “risk” and the credit rating class (crew), are shown in Table 3.

The Table 4 shows the distribution of data grouped by the estimated level of risk and the status of MSMEs (active or blocked/abank), enabling a comparison of how enterprises are represented across different risk categories.

The minimum, maximum, mean, median, and standard deviation (SD) values for all independent numerical variables, grouped according to the dependent variable “risk”, are presented in Table 5, Table 6, Table 7, Table 8, Table 9 and Table 10.

2. Materials and Methods

The methodological framework of this research is based on the application of advanced ML techniques implemented in the R programming language (The R Project for Statistical Computing 2025), with the aim of modelling and classifying financial risks among MSMEs. The process was conducted in clearly defined stages to ensure the accuracy, reliability, and replicability of the results.

The dataset comprises ten variables: four categorical or factor variables—“risk” (dependent), “crew”, “abank”, and “size”—and six numerical independent variables: “turn”, “net”, “emp”, “ebit”, “mcoste”, and “coste”. To assess potential redundancy and interrelationships among the numerical variables, Principal Component Analysis (PCA) was performed. PCA provides a robust framework for dimensionality reduction and exploratory data analysis, enabling the identification of underlying structures, visualisation of patterns, and selection of the most informative variables for subsequent classification using decision trees.

During preprocessing, data completeness and consistency were verified for each enterprise; no missing or inconsistent values were identified, so imputation was not required. PCA results indicated that the variance explained by the PCA did not justify the removal of any variables, as each contributed uniquely to the model’s explanatory power. Consequently, all original variables were retained, preserving information that is essential for model accuracy and reliability. This approach ensures methodologically sound, interpretable, and stable NN results. Given this study’s focus and scope, a more detailed discussion of PCA procedures has been omitted to maintain emphasis on the core methodological and interpretative aspects.

Phase 1—Data Preparation and Processing: The collected data underwent thorough cleaning and preprocessing, including the removal of missing and inconsistent values, standardisation of formats, and harmonisation of variable types. All numerical data were then normalised to eliminate the effects of differing measurement units and scales among variables. Normalisation is a crucial step in NN training, as it contributes to optimisation stability and prevents variables with larger numerical ranges from disproportionately influencing the final model weights.

All categorical variables retain original coding from the official registries (CRPS (2025), CBCG, and credit reporting systems), while the numerical variables in the model are not kept in their original scales. All numerical columns are normalised using the Min–Max transformation defined in Equation (1), ensuring a consistent value range and improving the stability of NN training. Normalisation is applied consistently to the training, validation, and test datasets:

x^{.} = \frac{x - m i n (x)}{\max (x) - m i n (x)}

(1)

This method ensures that all variables share a comparable scale and have equal influence within the model. Simultaneously, categorical (textual) variables were transformed into dummy (binary) format, enabling their numerical representation within the network. This step was essential, as NNs can process only numerical inputs, and dummy encoding ensures that each category is represented as an independent dimension, thereby avoiding erroneous interpretations of ordinal relationships among categories.

Phase 2—Descriptive Statistics and Correlation Analysis: After data pre-processing, a detailed descriptive analysis (summary statistics) was performed, presenting the fundamental statistical parameters of each attribute.

Inter-variable dependencies were then analysed using correlation techniques. Cramer’s V coefficient was applied to assess associations between categorical variables, quantifying the strength of relationships among nominal features.

The association between numerical and categorical variables was evaluated using the Eta coefficient, which captures nonlinear relationships between variables of differing types. All correlation relationships were visualised using heat maps, providing an intuitive graphical representation of the degree of association among attributes and the target outcome. Correlations among numerical variables were analysed using the Pearson correlation coefficient, which best measures linear relationships among continuous numerical values.

The target variable “risk” was defined as a multiclass categorical variable with five levels (A, B, C, D, and E), where class A represents the lowest and class E represents the highest financial risk. Based on the analysed relationships between attributes and the target variable, the input features for NNs modelling were defined.

Phase 3—Neural Network Generation and Training: For modelling and classification, a robust approach was adopted using NNs. Specifically, two architectures were considered: “NNET” and “MLP”. Instead of a single fixed train–validation–test split, repeated stratified 5-fold cross-validation (three repetitions) was applied to better capture variance due to sampling and to provide uncertainty estimates for model performance. The input dataset was preprocessed, with the dummy encoding of categorical variables and min–max normalisation of numeric features. During training, multiple network configurations were tested, varying the number of hidden neurons (5, 6, and 7) and weight decay parameters (0.01 and 0.1), to identify the optimal balance between predictive accuracy and generalisation. For each fold, predictions were obtained on the corresponding validation set, and fold-wise performance metrics were computed. This procedure ensured that the selected model was robust to data partitioning and not dependent on a single arbitrary split.

Phase 4—Model Evaluation and Result Analysis: In the evaluation phase, performance metrics were computed for each fold, including accuracy and the macro-F1 score. Fold-wise metrics were then aggregated to provide mean values with 95% confidence intervals, quantifying the uncertainty of the estimates. Confusion matrices were examined to understand misclassification patterns across risk classes. In addition to the NNs, two tabular baselines—regularised Multinomial Logistic Regression (MLR) and eXtreme Gradient Boosting (XGBoost)—were trained on the full dataset. For XGBoost, probability calibration and reliability analysis were performed, including Brier scores and reliability diagrams. Ablation analysis was conducted by removing the “crew” variable to confirm that model predictions were independent of potential information leakage. Finally, SHapley Additive exPlanations (SHAP)-based interpretability analysis was carried out for XGBoost, providing both global feature importance summaries and local explanations for representative instances. This comprehensive evaluation ensured a robust, interpretable, and reproducible assessment of model performance.

This approach enabled a comprehensive evaluation of predictive model quality. The importance of input variables for each model was visualised, identifying which features contributed most significantly to determining risk levels. NNs’ weight matrices and graphical representations were generated to illustrate the architecture and internal connections among model layers. Based on all metrics, confusion matrices, and visual analyses, the “NNET” and “MLP” models were compared to determine which achieved superior generalisation and stability in MSMEs’ financial risk classification. Furthermore, the execution time of both algorithms was analysed, as it is an important indicator of model efficiency and practical applicability. The comparative analysis of computational time enabled assessments of each approach’s computational complexity, contributing to a comprehensive evaluation of their operational value under real-world conditions.

During model training, neither the Synthetic Minority Over-sampling Technique (SMOTE) nor any form of oversampling or undersampling was applied, as the synthetic generation of minority class examples can distort the natural data structure and introduce unrealistic patterns, particularly in small and heterogeneous datasets. Instead, class weighting in the loss function, inversely proportional to class frequencies, was used to balance the contribution of each class without altering the true data distribution. This approach preserves the integrity of the original dataset, enhances training stability, and ensures that all evaluation metrics (BA, MCC, QWK, and Macro-PR-AUC) reliably reflect model performance on the natural class distribution.

To rigorously evaluate the performance of the “NNET” and “MLP” models, a repeated stratified k-fold cross-validation scheme (5 folds × 3 repetitions) was implemented. This robust framework quantifies performance variability arising from random sampling and yields reliable average metrics with confidence intervals. Data were partitioned into training, validation, and test sets while fully preserving the underlying class structure and distribution. The data-splitting logic and cross-validation workflow are summarised in a schematic diagram (Figure 1).

Performance evaluation was conducted using metrics sensitive to class imbalance and ordinal relationships—precision, balanced accuracy (BA), Matthews correlation coefficient (MCC), quadratic weighted kappa (QWK), and Macro-PR-AUC—with all metrics computed from out-of-fold predictions generated during the repeated stratified 5 × 3 cross-validation.

For benchmarking, baseline models, including MLR and XGBoost, were implemented. An ablation study was also conducted by excluding the “crew” variable to assess its specific contribution to predictive performance. Model interpretability was enhanced using SHAP, providing both local and global feature contribution visualisations.

This methodology ensures transparency, reproducibility, and a rigorous assessment of model robustness and interpretability across multiple evaluation dimensions.

The methodological approach is shown in Table 11.

Metrics

Correlation analysis was conducted using a threefold approach, tailored to the nature of the data and the types of relationships between variables. To assess associations among attributes, three complementary measures were employed: Cramér’s V (Akoglu 2018; Kuhn and Johnson 2013) for relationships between categorical variables, the Pearson correlation coefficient for relationships between numerical variables, and the Eta coefficient for evaluating links between numerical and categorical attributes. This combination provides comprehensive insight into the structure of interdependencies in the data, which is essential for preparing input variables and interpreting the results of classification models. To quantify the strength of association between nominal (categorical) variables, Cramér’s V coefficient (V) (Brownlee 2020) (Equation (2)) was used. This is a standardised measure based on the χ² (chi-square) test of independence. Unlike the χ² test itself, which only indicates the presence of a statistically significant association between variables, Cramér’s V allows additional interpretation regarding the strength of the association, independently of the sample’s size (Kearney 2017):

V = \sqrt{\frac{χ^{2}}{n \cdot (k - 1)}}

(2)

where

n denotes the total number of observations;
k denotes the smaller number of rows or columns in the contingency table (depending on the table dimensions).

Cramér’s V values range from 0 to 1, where 0 indicates no association, and 1 indicates a perfect association between categories. Unlike the χ² test, which only indicates whether an association exists, Cramér’s V quantifies the strength of the relationship, making it suitable for comparative analysis of multiple categorical pairs. In this study, it was used to examine relationships between discrete risk indicators, types of enterprises, business sectors, and other nominal attributes.

For relationships between two continuous (numerical) variables, the Pearson correlation coefficient (r) (Equation (3)) was applied. This measure quantifies the strength and direction of the linear association between two variables measured at interval levels. Pearson’s r values range from −1 to +1, where negative values indicate an inversely proportional relationship, positive values indicate a directly proportional relationship, and values near zero indicate the absence of linear correlation.

The Pearson coefficient is particularly suitable for analysing relationships among financial indicators that have been normalised, as linear links among them provide insight into the interdependence of key performance indicators (e.g., relationships between profit and costs, turnover, and liquidity). The reliable interpretation of the Pearson correlation requires the following assumptions: normality of distribution, linearity, homogeneity of variance, and absence of extreme values:

r = \frac{\sum (x_{i} - \bar{x}) (y_{i} - \bar{y})}{\sqrt{\sum {(x_{i} - \bar{x})}^{2} \sum {(y_{i} - \bar{y})}^{2}}}

(3)

where

r denotes the Pearson correlation coefficient;
x_i and y_i denote individual values of the variables;
$\bar{x}$ and $\bar{y}$ denote the arithmetic means of variables x and y;
The numerator represents the covariance between the variables;
The denominator is the product of their standard deviations.

Alternatively, for precomputed statistical values, Pearson’s r can be expressed using Equation (4):

r = \frac{c o v (X, Y)}{σ X \cdot σ Y}

(4)

where the following are defined: cov(X, Y)—covariance between variables X and Y; σX and σY—SD of variables X and Y.

The Eta coefficient (η) (Equation (5)) (Quantitative Methods in Public Administration 2025) is a statistical measure that quantifies the strength of a nonlinear relationship between a numerical and a categorical variable. Unlike Pearson’s coefficient, which measures only linear relationships between two continuous variables, Eta assesses the extent to which a categorical factor explains the variance of a numerical variable. Mathematically, Eta is calculated using ANOVA-based variance decomposition:

η = \sqrt{\frac{{S S}_{b e t w e e n}}{{S S}_{t o t a l}}}

(5)

where

${S S}_{b e t w e e n}$ denotes the sum of squares between groups (variance explained by the factor);
${S S}_{t o t a l}$ denotes the total sum of squares (total variance of the numerical variable).

Eta values range from 0 to 1: Values near 0 indicate a very weak influence of the categorical variable on the numerical variable; that is, minimal differences between groups. Values near 1 indicate a strong association, meaning that the categorical factor explains most of the variance in the numerical variable.

In the context of ML and classification models, the Eta coefficient helps identify categorical variables with significant influence on numerical input features, assess the potential contribution of variables in predictive models, and detect variables with low predictive value that can be removed to reduce dimensionality and improve model efficiency.

The use of Eta is particularly valuable in datasets where numerical variables depend on discrete categories: for example, financial indicators divided by enterprise type, business sector, or risk level.

The visualisation of the correlation analysis results was performed using the R packages version 4.5.1 “ggcorrplot” and “corrplot”, which enable the display of correlation matrices for numerical variables, categorical variables, and combinations of numerical and categorical variables.

This approach allows quick identification of potential multicollinearity and supports the selection of relevant predictors for inclusion in machine learning models (Oberg 2019).

This analysis served a dual purpose: (1) to provide a deeper understanding of the relationships among variables in the context of MSME risk classification and (2) to detect variables that potentially contribute to model redundancy, that is, variables that are excessively interrelated, which may increase model variance and reduce its ability to generalise to new data (Brownlee 2020; Kearney 2017).

A confusion matrix (Table 12) is a fundamental analytical tool for evaluating the performance of classification models in ML and statistics. It is a square table that shows how many instances from the actual classes were correctly or incorrectly classified into the predicted classes. This matrix enables multidimensional analysis of the model, thereby avoiding reliance solely on a single indicator such as overall accuracy. In this study, the target variable was categorised into five risk groups reflecting different risk levels. For this multiclass problem, the confusion matrix was extended to a 5 × 5 dimension, with rows representing the actual classes and columns representing the predicted classes. Each cell of the matrix indicates how many examples truly belonged to a specific class and how many the model predicted into each of the five classes. This detailed matrix enables the identification of the classes for which the model performs best (high values along the diagonal) and detection of confusion between certain pairs of classes (high values off the diagonal), and it serves as a basis for calculating numerous model evaluation metrics such as sensitivity, precision, and others.

This detailed analysis is crucial for understanding model errors and identifying areas for potential improvement, which is particularly important in the context of MSME risk classification.

To prevent potential information leakage, an ablation analysis was conducted to compare model performance with and without the “crew” (credit rating) variable. The results confirmed that this predictor improves predictive accuracy without incorporating information from the target variable.

All data preprocessing steps—including normalisation of numeric features, categorical encoding, and missing value imputation—were fitted exclusively on the training folds. The resulting parameters were applied to the validation and test sets, ensuring methodological transparency and reproducibility of the machine learning experiments.

Table 13 presents all the metrics used, along with their descriptions and interpretative value in the context of multiclass event safety classification.

For a comprehensive assessment of performance, model results were compared to evaluate the strengths and limitations of each approach in classifying individual classes. Such a comparative presentation enables the transparent selection of the optimal model and provides an analytical foundation for operational recommendations.

All results were calculated and saved in Excel format and then visualised using R packages and functions, ensuring transparency, reproducibility, and efficiency in the analysis. This approach facilitates the interpretation of results for both researchers and end users in maritime practice.

3. Results

In line with the previously described methodological approach, an initial analysis of the relationships among variables was carried out using appropriate statistical measures: Cramér’s V for categorical variables, the Pearson correlation coefficient for numerical variables, and the Eta coefficient for relationships between categorical and numerical variables.

This preliminary analysis provides insight into the interrelationships and significance of key variables, forming the basis for the construction and evaluation of NNs in subsequent enterprise risk prediction.

Figure 2 presents the strength of association between the target variable “risk” and selected categorical operational variables (“crew”, “abank”, and “size”), expressed using Cramér’s V. The strongest association is observed between “risk” and “crew” (0.84), indicating that the number of employees has the greatest impact on risk. The relationship between “risk” and “abank” is moderate (0.49), while the association with “size” is considerably weaker (0.15), indicating a smaller effect of MSMEs’ size on risk.

Figure 3 presents the interrelationships among numerical variables, expressed using the Pearson correlation coefficient. The strongest positive correlation is between “turn” and “emp” (0.50), while moderate positive correlations are observed between “turn” and “ebit” (0.34) and between “turn” and “mcost” (0.43). The relationships between “net” and other variables are relatively weak, except for a moderate correlation with “ebit” (0.42). Other variables display low-to-moderate intercorrelations, indicating that most of these numerical indicators are partially independent of one another.

Figure 4 presents the strength of association between the categorical variables (“risk”, “crew”, “abank”, and “size”) and the numerical variables (“turn”, “net”, “emp”, “ebit”, “coste”, and “mcost”) as measured by the Eta coefficient. The strongest association is between “size” and “turn” (0.70), indicating that firm size has the greatest influence on turnover. There is also a notable relationship between “crew” and “ebit” (0.37), suggesting that the number of employees significantly affects earnings before interest and taxes. Other associations are of moderate or low intensity, reflecting a smaller impact of “abank”, “risk”, and other numerical indicators on the related metrics.

For the construction and evaluation of NNs, specifically the “NNET” and “MLP” models in R, a repeated stratified k-fold cross-validation (5-fold × 3 repetitions) procedure was employed (R Documentation 2025). This approach ensures a robust estimation of model performance by quantifying variability across folds and mitigating the risk of bias due to a single random split. The dataset was partitioned into 15 stratified folds, preserving the class distribution in each fold. The models were trained and evaluated iteratively, with each fold serving as a validation set while the remaining folds constituted the training set. The final performance metrics were computed as fold-averaged values with standard deviations to reflect uncertainty.

For the “NNET” model, various hidden layer sizes (sizes = 5, 6, 7, 8, 9, 10) and regularisation rates (decays = 0.01, 0.1) were tested. The optimal configuration, selected based on cross-fold performance, was six neurons in the hidden layer with a decay of 0.1. For the “MLP” model from the RSNNS package, hidden layer sizes (MLP_sizes = 5, 6, 7, 8, 9, 10) were evaluated, and the optimal configuration consisted of five neurons in the hidden layer, yielding the best generalisation across folds.

This methodology ensures that both models are adapted to the data structure while maintaining robustness and minimizing overfitting. The cross-validation procedure provides a comprehensive assessment of model stability across different subsets of the data.

Table 14 presents the fold-averaged per-class metrics for the “NNET” and “MLP” models, including sensitivity, specificity, precision, F1 score, BA, QWK, Macro-PR-AUC, and MCC. These metrics highlight the models’ predictive performance and class-wise discrimination under repeated cross-validation.

Table 15 presents the cross-fold averaged performance metrics for the four models — “NNET”, “MLP”, MLR, and XGBoost — obtained through 5-fold cross-validation with 3 repetitions, allowing a comparative assessment of their predictive performance across multiple evaluation metrics.

The cross-validated results show that both “NNET” and “MLP” models achieve similar performance with and without the variable “crew”, indicating minimal contribution of this feature to overall model accuracy. Among all models, XGBoost slightly outperforms NNs on all four metrics (BA, MCC, QWK, and Macro-PR-AUC), while MLR shows lower performance, particularly on the ordinal-sensitive QWK. Variability across folds (SD) is small, confirming stable model estimates.

The Appendix A contains a comprehensive set of supplementary tables and figures that provide a detailed evaluation of the models’ performance and interpretability. Table A1 presents the complete set of per-class performance metrics for the “NNET” model estimated without the variable “crew”, showing how the classifier performs across classes A–E in terms of accuracy, error, and diagnostic measures. Table A2 provides the corresponding per-class performance indicators for the “MLP” model under the same modelling conditions, allowing direct comparisons of the class-wise behaviour of both NNs’ architectures. Table A3 summarises the cross-validated global metrics for the “NNET” and “MLP” models, offering an aggregated view of their predictive capability, calibration characteristics, and diagnostic strength across all classes.

In addition to the performance tables, several interpretability-oriented visualisations are included. Figure A1 shows the global SHAP feature importance for both the “NNET” and “MLP” models, highlighting the relative contribution of each predictor to the overall decision-making process. Figure A2 presents a local SHAP waterfall plot for selected instances, demonstrating how individual feature contributions cumulatively influence the model’s output at the observation level. Figure A3 illustrates the reliability diagram for the XGBoost model, comparing predicted and observed class probabilities to assess model calibration and the alignment between estimated likelihoods and empirical frequencies.

The “MLP” model from the RSANNS R package has five neurons in the hidden layer and does not use bias, meaning that neuron activations are computed solely from the weights of the input connections.

The Figure 5 illustrates the “NNET” model architecture, consisting of 6 hidden neurons and 2 bias nodes, outlining the structural design of the network.

Figure 6 shows the graphical representation of the “MLP” network, illustrating the input layer, hidden layer, and output connections.

Figure 7 presents a comparative visualisation of the ten most important input variables for the optimally selected “NNET” and “MLP” models, trained on the defined training set.

All numerical and dummy variables are included, with categorical variables such as “crew,” “abank,” and “size” transformed into corresponding dummy variables. This results in more displayed variables than original attributes, as each categorical variable contributes multiple indicator variables, enabling the network to quantify their individual impact on predicting the target variable and risk. Analyses of the “NNET” model show that the key predictors of risk are specific categories of the number of employees, particularly “crew.A1” and “crew.A2,” while negative values for variables such as “emp” and “coste” indicate an inhibitory effect—an increase in these indicators reduces the predicted risk. The variables “turn” and “net” have moderate contributions, indicating that operational and financial indicators play a role but are not the primary determinants of risk in this model.

In the “MLP” model, the contribution structure is similar, but the intensity of effects differs—“crew.A1” and “crew.A2” dominate, while “ebit” and “emp” show pronounced negative or positive effects, reflecting the different functional interpretation of the hidden layer neurons in the “MLP” network.

Other dummy variables, though contributing less, reveal additional nuances in risk associated with specific employee categories, banking status, and firm size.

The comparative visualisation allows precise comparisons of variable importance between the “NNET” and “MLP” models and confirms the consistency in identifying key risk determinants. The figure highlights that operational structures and human resources, represented by the “crew” variables, are dominant factors in risk prediction, while financial metrics contribute to a lesser extent but remain relevant for finer risk differentiation. This analysis provides a solid basis for managerial and strategic decisions focused on controlling and optimising key risk factors.

Both confusion matrices (Table 16) on the test dataset indicate very similar model performance. Class A maintains stable accuracy but shows noticeable confusion with class B. Class B is dominant and best recognised by both models, with “MLP” achieving slightly higher accuracy and fewer misclassifications. Class C experiences minor misclassification with B, indicating partial overlap in characteristics. Classes D and E are generally well classified, with errors mostly occurring in adjacent classes (C-D-E zone). Overall, “MLP” demonstrates slightly better performance than “NNET”, but without a significant difference in macro-level performance metrics.

Figure 8 presents a comparative overview of the global performance of the two models, “NNET” and “MLP”, based on the accuracy, F1, and Kappa metrics. Both models achieve similar results. Accuracy is very close for both models (“NNET”: 0.841; “MLP”: 0.847), indicating comparable classification precision. The F1 scores are also nearly identical (“NNET”: 0.812; “MLP”: 0.811), showing that both models maintain a similar balance between precision and sensitivity. Kappa values are likewise very close, confirming good agreement between predicted and actual classes. The slight advantage of the “MLP” model in terms of accuracy may suggest a minor improvement in overall precision, but the difference is minimal and likely not statistically significant. This figure shows that the performance of both models is highly comparable, so model selection may depend on other factors, such as interpretability or training time.

Table 17 and Table 18 present the class-specific performance metrics for both models on the test dataset across all classes, while Table 19 provides a consolidated comparison of the overall metrics for “NNET” and “MLP”.

Based on Table 19, the global metrics show that both “NNET” and “MLP” models achieve high classification accuracy and consistency. MCC and BA values of approximately 0.79 indicate a strong correlation between actual and predicted classes, with effective handling of imbalanced classes. The QWK confirms that the models maintain the correct order of ordinal classes, which is particularly important for ranked risk categories. However, the Macro-PR-AUC (around 0.67) suggests that precision in distinguishing rare or closely related classes could be improved. The “MLP” model demonstrates a slight advantage in prediction consistency, making it marginally more reliable for practical use in predictive analytics systems. These results, as shown in Table 17, indicate that both models meet the fundamental robustness criteria, while also highlighting potential for optimisation in underrepresented classes.

Global metrics (Table 19) were calculated so that accuracy and error rate were obtained directly from the overall confusion matrix, while all other metrics (sensitivity, precision, F1, etc.) are presented as averages across classes. This approach allows straightforward assessment of overall model performance while retaining insight into class-specific behaviour.

Figure 9 displays all metrics for the global models, enabling a straightforward visual comparison of their performance.

In addition to the basic model performance metrics and to better understand the efficiency of the NNs used, this section analyses the temporal and spatial complexity of the “NNET” and “MLP” algorithms implemented in R. The presented data enable comparison of the efficiency of both approaches in relation to network size, number of layers, and training set, which is critical for selecting an appropriate model depending on available resources and problem requirements.

Table 20 presents the temporal parameters of the training process for the “NNET” and “MLP” NN models. The “Total_Duration_sec” column indicates the total training time in seconds, while “Avg_Time_per_Config_sec” represents the average training time per hyperparameter configuration (size × decay). The results show that the “MLP” model was more time-efficient than the “NNET” model, with shorter total and average training times.

Table 21 provides a comparative analysis of the computational (time) and spatial (memory) complexity of the “NNET” and “MLP” algorithms implemented in R. The “NNET” algorithm represents a single-layer NN that uses the Broyden-Fletcher-Quasi-Newton optimisation method (Al-Baali et al. 2014). This approach converges quickly for small models but requires significant memory and computational power because it involves computing and storing the Hessian matrix, resulting in quadratic complexity with respect to the number of weights (W²). Therefore, “NNET” is suitable for smaller networks and limited datasets. In contrast, the “MLP” algorithm allows the construction of multilayer perceptrons that use stochastic gradient descent (SGD) for learning (Tian et al. 2023). This method has linear time complexity relative to the number of connections between layers and does not require additional matrices for optimisation, which significantly reduces memory usage (O(W)). Consequently, “MLP” is more suitable for larger and more complex networks with multiple layers and samples.

4. Discussion

This study confirms that NNs can effectively model credit risk in the MSME sector, even when predictors are heterogeneous and partially interdependent. Both the preliminary correlation analysis and subsequent SHAP explanations consistently indicated that employee structure is the dominant risk factor. Although removing the “crew” variable did not alter the global performance metrics, SHAP results showed that employee-related attributes strongly influence the internal computations of the models through nonlinear interactions with other variables. This explains why aggregate metrics remain unchanged while the variable continues to play an important latent role in the modelling process.

The performance of the “NNET” and “MLP” architectures was highly stable across stratified 15-fold cross-validation, a crucial feature for imbalanced and ordinal classification tasks. Very small standard deviations across folds demonstrate the robustness of both models to sampling variability, strengthening their applicability in practical settings where the underlying data distribution may shift over time. Although their predictive metrics were nearly identical, differences in optimisation procedures and architectural design have practical consequences: “MLP” offers greater computational efficiency and scalability, whereas “NNET”, despite being slower and more memory-demanding due to BFGS optimisation, maintains reliable convergence in smaller configurations. This highlights the importance of considering computational constraints and problem size when selecting between comparable model families.

The expanded methodological framework significantly improved the robustness and credibility of the analysis. Repeated stratified k-fold cross-validation replaced the initial single split, providing stable estimates of model performance and enabling quantification of uncertainty through fold-averaged metrics and confidence intervals. Evaluation metrics were broadened to include those suitable for imbalanced ordinal outcomes—BA, MCC, QWK, and Macro-PR-AUC—while comprehensive confusion matrices clarified class-level performance. All preprocessing procedures were fitted strictly on training folds to prevent information leakage.

To contextualise NN performance, two benchmark models—regularised MLR and XGBoost—were incorporated. Their evaluation included probability calibration, multiclass Brier scores, and reliability diagrams, offering a fuller assessment of predictive reliability. An ablation analysis confirmed that the “crew” type variable did not introduce target leakage. The interpretability component was strengthened through SHAP-based global importance assessments and representative local explanations, demonstrating consistent feature contributions across resamples and enhancing the transparency of the modelling approach.

Overall, the revised methodology establishes a rigorous, well-validated, and interpretable pipeline for MSME credit risk prediction, aligning with modern standards in applied risk analytics and providing a robust framework for future decision-support systems.

5. Conclusions

The application of NNs, including the “NNET” and “MLP” models, in MSME risk assessment demonstrates high predictive accuracy and consistency in risk classification. The analysis indicates that human resources and the operational structure of enterprises dominate in risk prediction, while financial indicators contribute to more refined differentiation. However, the low interpretability of the models represents a significant limitation, making it difficult for management to understand the specific effects of individual variables on risk, which may restrict their practical application in strategic decision-making. Network performance strongly depends on the quality and quantity of data. Small samples, class imbalance, or data deficiencies can lead to overfitting and reduce the model’s ability to generalize to other enterprises or sectors. The network structure, choice of hyperparameters, and use of bias further influence the model’s stability and efficiency, requiring careful adjustment for each specific application.

Additionally, major limitations stem from the sensitivity of NNs to initial conditions and local minima of the error function, which can lead to result instability. The lack of transparency in the learning process prevents precise tracking of how the network makes decisions. NN models also often exhibit high computational complexity and require substantial training resources, which may limit their use in small organizations or when working with a large number of input variables. Scalability and transferability issues between different sectors present further challenges, as the structure and behaviour of data vary depending on the economic environment and regulatory context.

Future research should aim to combine the high predictive power of ANNs with the interpretability of other models through integration with regression or fuzzy systems, the use of regularization and data augmentation techniques, and the inclusion of dynamic and external economic indicators. Such an approach would enable a more robust and practically applicable MSME risk assessment, balancing predictive accuracy with result transparency, which is crucial for strategic risk management and informed decision-making.

Author Contributions

Conceptualization, D.B., M.A., I.R. and O.I.; Methodology, D.B.; Software, I.R.; Validation, D.B., M.A., I.R. and O.I.; Formal analysis, D.B. and I.R.; Investigation, N.P.; Resources, M.A. and I.R.; Data curation, M.A. and O.I.; Writing—original draft, N.P. and O.I.; Writing—review & editing, D.B., N.P. and O.I.; Visualization, N.P.; Supervision, M.A. and O.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ACC	Accuracy;
AI	Artificial Intelligence;
ANNs	Artificial Neural Networks;
AP	Actual Positive;
BA	Balanced Accuracy;
BFGS	Broyden–Fletcher–Goldfarb–Shanno;
BP	Backpropagation;
CI	Confidential Index;
CR CBM	Central Register of the Central Bank of Montenegro;
DL	Deep Learning;
DR	Debt Ratio;
EBIT	Earnings Before Interest and Taxes;
F1	F1 Score;
FDR	False Discovery Rate;
FN	False Negative;
FNN	Fuzzy Neural Network;
FNR	False Negative Rate;
FOR	False Omission Rate;
FP	False Positive;
FPR	False Positive Rate;
FSA	Formal Safety Assessment;
GLR	General Liquidity Ratio;
LR−	Negative Likelihood Ratio;
LR+	Positive Likelihood Ratio;
Macro-PR-AUC	Macro-Area Under the Precision–Recall Curve;
MCC	Matthews Correlation Coefficient;
ML	Machine Learning;
MLP	Multi-Layer Perceptron;
MLR	Regularised Multinomial Logistic Regression;
MSEs	Micro and Small Enterprises;
MSMEs	Micro, Small, and Medium Enterprises;
NN	Neural Network;
NPV	Negative Predictive Value;
PCA	Principal Component Analysis;
PN	Predicted Negative;
PP	Predicted Positive;
PPV	Positive Predictive Value;
QWK	Quadratic Weighted Kappa;
RLTSFAI	Ratio of Long-Term Sources to Fixed Assets and Inventories;
RSNNS	R Stuttgart Neural Network Simulator;
SC	Solvency Coefficient;
SD	Standard Deviation;
SEM	Structural Equation Modelling;
SGD	Stochastic Gradient Descent;
SHAP	SHapley Additive exPlanations;
SME(s)	Small Medium Enterprise(s);
SMOTE	Synthetic Minority Over-sampling Technique;
SOMs	Self-Organising Maps;
TIN	Tax Identification Number;
TN	True Negative;
TNR	True Negative Rate;
TP	True Positive;
TPR	True Positive Rate;
XGBoost	eXtreme Gradient Boosting.

Appendix A

Table A1. Per-class performance metrics for the “NNET” model estimated without the variable “crew”.

Class	A	B	C	D	E
Accuracy	0.835	0.835	0.835	0.835	0.835
ErrorRate	0.165	0.165	0.165	0.165	0.165
Sensitivity	0.580	0.920	0.780	0.870	0.760
Specificity	0.992	0.900	0.945	0.945	0.985
Precision	0.840	0.790	0.780	0.750	0.950
NPV	0.975	0.955	0.945	0.975	0.930
F1	0.684	0.851	0.780	0.807	0.843
FDR	0.160	0.210	0.220	0.250	0.050
FOR	0.025	0.045	0.055	0.025	0.070
FPR	0.008	0.100	0.055	0.055	0.015
FNR	0.420	0.080	0.220	0.130	0.240
LR_Positive	95.0	9.2	14.1	15.7	50.7
LR_Negative	0.425	0.100	0.231	0.130	0.245
DOR	224.0	84.8	61.0	120.5	206.7
BA	0.786	0.910	0.862	0.908	0.873
MCC	0.690	0.805	0.740	0.770	0.820
QWK	0.025	0.210	0.120	0.105	0.140
Macro_PR_AUC	0.500	0.770	0.630	0.650	0.740

Table A2. Per-class performance metrics for “MLP” model (without variable crew).

Class	A	B	C	D	E
Accuracy	0.841	0.841	0.841	0.841	0.841
ErrorRate	0.159	0.159	0.159	0.159	0.159
Sensitivity	0.590	0.965	0.760	0.850	0.750
Specificity	0.990	0.880	0.965	0.950	0.985
Precision	0.800	0.780	0.850	0.750	0.955
NPV	0.975	0.980	0.945	0.970	0.930
F1	0.682	0.860	0.803	0.797	0.840
FDR	0.200	0.220	0.150	0.250	0.045
FOR	0.025	0.020	0.055	0.030	0.070
FPR	0.010	0.120	0.035	0.050	0.015
FNR	0.410	0.035	0.240	0.150	0.250
LR_Positive	65.0	8.0	21.7	17.0	51.0
LR_Negative	0.410	0.023	0.249	0.150	0.255
DOR	159.0	348.0	87.0	113.3	200.0
BA	0.790	0.922	0.863	0.900	0.868
MCC	0.695	0.815	0.750	0.770	0.820
QWK	0.030	0.220	0.125	0.110	0.140
Macro_PR_AUC	0.510	0.780	0.645	0.655	0.735

Table A3. Comparative cross-validated global performance metrics for the “NNET” and “MLP” models.

Model	“NNET”	“MLP”
Accuracy	0.835	0.841
ErrorRate	0.165	0.159
Sensitivity	0.776	0.780
Specificity	0.942	0.940
Precision	0.826	0.830
NPV	0.956	0.958
F1	0.813	0.814
FDR	0.162	0.170
FOR	0.040	0.038
FPR	0.043	0.042
FNR	0.224	0.220
LR_Positive	37.2	36.0
LR_Negative	0.210	0.212
DOR	176.7	170.0
BA	0.817	0.820
MCC	0.789	0.792
QWK	0.140	0.142
Macro_PR_AUC	0.670	0.672

Figure A1. Global SHAP summary—NNET/MLP.

Figure A2. Local SHAP waterfall-like plot.

Figure A3. Reliability diagram for XGBoost model, showing predicted vs. observed probabilities across classes.

References

Ahmeti, Hana Gashi, and Besnik Fetai. 2021. Determinants of Financing Obstacles of SMEs in Western Balkans. Management Dynamics in the Knowledge Economy 9: 331–44. [Google Scholar] [CrossRef]
Akoglu, Haldun. 2018. User’s Guide to Correlation Coefficients. Turkish Journal of Emergency Medicine 18: 91–93. [Google Scholar] [CrossRef]
Al-Baali, Mehiddin, Emilio Spedicato, and Francesca Maggioni. 2014. Broyden’s Quasi-Newton Methods for a Nonlinear System of Equations and Unconstrained Optimization: A Review and Open Problems. Optimization Methods and Software 29: 937–54. [Google Scholar] [CrossRef]
Arsawan, I Wayan Edi, Ni Kadek Ssy De Hariyanti, I Made Ari Dwi Suta Atmaja, Dwi Suhartanto, and Viktor Koval. 2022. Developing Organizational Agility in SMEs: An Investigation of Innovation’s Roles and Strategic Flexibility. Journal of Open Innovation: Technology, Market, and Complexity 8: 149. [Google Scholar] [CrossRef]
Atanasijević, Jasna, Fausto Corradin, Domenico Sartore, Milica Uvalic, and Francesca Volo. 2021. Prospective Analysis of the SME Sector in the Western Balkans. Final Report. Venice: GRETA Associati. Available online: https://www.greta.it/images/research/pdf/EIB_2021.pdf (accessed on 20 October 2025).
Batra, Ajay Chimanlal, and Hullash Chauhan. 2025. Artificial Neural Network Based Model to Estimate Profit for SMEs. International Journal on Science and Technology 16: 1–6. Available online: https://www.ijsat.org/papers/2025/3/8411.pdf (accessed on 10 November 2025). [CrossRef]
Bianchini, Marco, and Veronika Michalkova. 2019. Data Analytics in SMEs: Trends and Policies. In OECD SME and Entrepreneurship Papers. No. 15. Paris: OECD Publishing. Available online: https://www.oecd.org/content/dam/oecd/en/publications/reports/2019/06/data-analytics-in-smes_1535d46b/1de6c6a7-en.pdf (accessed on 15 October 2025).
Brownlee, Jason. 2020. Imbalanced Classification with Python: Better Performance on Imbalanced Datasets. Melbourne: Machine Learning Mastery, pp. 1–320. [Google Scholar]
Burda, Andrzej, Barbara Kuczmowska, and Zdzisław S. Hippe. 2007. Ensembles of Artificial Neural Networks for Predicting Economic Situation of Small and Medium Enterprises. In Computer Recognition Systems 2, Advances in Soft Computing. Heidelberg: Springer, vol. 45, pp. 808–15. [Google Scholar]
Chen, Binhao, Weifeng Jin, and Huajing Lu. 2024. Using a Genetic Backpropagation Neural Network Model for Credit Risk Assessment in the Micro, Small and Medium-Sized Enterprises. Heliyon 10: e33516. [Google Scholar] [CrossRef] [PubMed]
CompanyWall. 2025. Available online: https://www.companywall.me (accessed on 22 October 2025).
CRPS. 2025. Available online: http://crps.me/strana/pregled/90 (accessed on 12 October 2025).
European Commission [EU]. 2003. Commission Recommendation of 6 May 2003 Concerning the Definition of Micro, Small and Medium-Sized Enterprises (2003/361/EC). Available online: https://eur-lex.europa.eu/eli/reco/2003/361/oj (accessed on 10 October 2025).
European Investment Bank [EIB]. 2016. Assessment of Financing Needs of SMEs in the Western Balkans Countries: Synthesis Report. Luxembourg: European Investment Bank. Available online: https://www.eib.org/files/efs/assessment_of_financing_needs_of_smes_in_the_western_balkans_synthesis_en.pdf (accessed on 21 October 2025).
Fajarika, Dian, Fitri Trapsilawati, and Bertha Maya Sopha. 2024. Influential Factors of Small and Medium-Sized Enterprises Growth Across Developed and Developing Countries: A Systematic Literature Review. International Journal of Engineering Business Management 16: 1–21. [Google Scholar] [CrossRef]
Garcia Vidal, Gelmar, Reyner Perez, Alexander Sánchez-Rodríguez, and Rodobaldo Martínez-Vivar. 2017. Predicting Growth of Micro and Small Business Using Neural Networks. Available online: https://www.researchgate.net/publication/321712670_Predicting_growth_of_micro_and_small_business_using_neural_networks (accessed on 22 October 2025).
Gong, Minhuan. 2017. Research and Application of Credit Rating Model in Small and Micro Enterprises Based on Fuzzy Neural Network. Journal of Discrete Mathematical Sciences and Cryptography 20: 817–34. [Google Scholar] [CrossRef]
Institute for Standardization of Montenegro [ISM]. 2025. Micro, Small and Medium-Sized Enterprises; Podgorica: Institute for Standardization of Montenegro—Info Center. Available online: https://isme.me/en/ (accessed on 13 October 2025).
ISO. 2018. Risk Management—Guidelines. ISO 31000:2018. Geneva: ISO. Available online: https://www.iso.org/obp/ui/en/#iso:std:iso:31000:ed-2:v1:en (accessed on 22 October 2025).
Jamal, Abdul M., Hanh Thi Pham, Shahul Hameed M., and Sadique R. Ahmed. 2025. Adoption of Artificial Intelligence in Micro Small and Medium Enterprises. International Journal of Sustainable Business, Management, and Accounting 1: 36–43. [Google Scholar] [CrossRef]
Kearney, Michael W. 2017. Cramér’s V. In The SAGE Encyclopedia of Communication Research Methods. Thousand Oaks: Sage. Available online: https://methods.sagepub.com/ency/edvol/the-sage-encyclopedia-of-communication-research-methods/chpt/cramer-s-v#_ (accessed on 13 October 2025).
Kuhn, Max, and Kjell Johnson. 2013. Applied Predictive Modeling. New York: Springer, pp. 1–600. Available online: https://vuquangnguyen2016.wordpress.com/wp-content/uploads/2018/03/applied-predictive-modeling-max-kuhn-kjell-johnson_1518.pdf (accessed on 26 October 2025).
Le Dinh, Thang, Manh-Chiên Vu, and Giang T. C. Tran. 2025. Artificial Intelligence in SMEs: Enhancing Business Functions Through Technologies and Applications. Information 16: 415. [Google Scholar] [CrossRef]
Li, Bing, Binqing Xiao, and Yang Yang. 2021. Strengthen Credit Scoring System of Small and Micro Businesses with Soft Information: Analysis and Comparison Based on Neural Network Models. Journal of Intelligent & Fuzzy Systems 40: 1–18. [Google Scholar] [CrossRef]
Li, Yongyi, Jing Lu, Quankun Jiao, and Kexin Cao. 2022. Study on Risk Analysis and Decision-Making of Small- and Medium-Sized Enterprises on BP Neural Network Algorithm. Scientific Programming 2022: 7134983. [Google Scholar] [CrossRef]
Martinez, Lisana B., Valeria Scherger, and Sofía Orazi. 2023. Post-Pandemic Performance of Micro, Small and Medium-Sized Enterprises: A Self-Organizing Maps Application. Cogent Business & Management 10: 2276944. [Google Scholar] [CrossRef]
MLP. 2025. Multi-Layer Perceptrons (MLP) in R. Available online: https://www.geeksforgeeks.org/deep-learning/multi-layer-perceptrons-MLP-in-r/ (accessed on 22 October 2025).
Nastac, Dumitru-Iulian, Alexandru Isaic-Maniu, and Irina-Maria Dragan. 2017. Analyzing the Profitability Performance of SMEs Using a Neural Model. Available online: https://ideas.repec.org/a/cys/ecocyb/v50y2017i4p55-71.html (accessed on 2 October 2025).
NNET. 2025. NNET—Feed-Forward Neural Networks and Multinomial Log-Linear Models. Available online: https://cran.r-universe.dev/nnet (accessed on 11 October 2025).
Oberg, Roger. 2019. Introducing RStudio Team; Posit Blog. Boston: Posit. Available online: https://posit.co/blog/introducing-rstudio-team/ (accessed on 16 October 2025).
Oldemeyer, Leon, Andreas Jede, and Frank Teuteberg. 2024. Investigation of Artificial Intelligence in SMEs: A Systematic Review of the State of the Art and the Main Implementation Challenges. Management Review Quarterly 75: 1185–227. Available online: https://link.springer.com/article/10.1007/s11301-024-00405-4 (accessed on 9 October 2025). [CrossRef]
Padilla-Ospina, Ana Milena, Javier Enrique Medina-Vásquez, and Javier Humberto Ospina-Holguín. 2021. Financial Determinants of Innovation in SMEs: A Machine Learning Approach. Journal of Small Business Strategy 31: 117–31. [Google Scholar] [CrossRef]
Pamungkas, Muhammad Raihan Satrio Putra, Angellina Asyivadibrata, Tasya Susilawati, and Mochamad Nurul Huda. 2023. Unleashing the Potentials of Artificial Intelligence for Micro, Small, and Medium Enterprises: A Systematic Literature Review. Jurnal Teknologi Dan Sistem Informasi Bisnis 5: 303–10. [Google Scholar] [CrossRef]
Quantitative Methods in Public Administration. 2025. Nominal by Interval Associations: Eta, the Correlation Ratio. Available online: https://studylib.net/doc/26157850/quantitativemethods (accessed on 26 October 2025).
Rao, Ananth. 2018. Predicting Business Distress Using Neural Network in SME-Arab Region. International Review of Advances in Business, Management and Law 1: 68. [Google Scholar] [CrossRef][Green Version]
R Documentation. 2025. GridSearch Function. Available online: https://search.r-project.org/CRAN/refmans/NMOF/html/gridSearch.html (accessed on 20 October 2025).
Schwaeke, Julia, Anna Peters, Dominik K. Kanbach, Sascha Kraus, and Paul Jones. 2024. The New Normal: The Status Quo of AI Adoption in SMEs. Journal of Small Business Management 63: 1297–331. [Google Scholar] [CrossRef]
Silitonga, Riana Magdalena, Yann-Mey Yee, Ronald Sukwadi, and Agustinus Silalahi. 2025. Neural Network Analysis of Technology Adoption Intentions Among Womenpreneurs in Small and Medium Enterprises. Engineering Proceedings 103: 25. [Google Scholar] [CrossRef]
Solvent Rating. 2025. Available online: http://solventrating.me/ (accessed on 9 October 2025).
Soomro, Raheem Bux, Waleed Mugahed Al-Rahmi, Nisar Ahmed Dahri, Latifah Almuqren, Abeer S. Al-mogren, and Ayad Al-Daijy. 2025. A SEM-ANN Analysis to Examine Impact of Artificial Intelligence Technologies on Sustainable Performance of SMEs. Scientific Reports 15: 5438. Available online: https://www.nature.com/articles/s41598-025-86464-3 (accessed on 26 October 2025). [CrossRef]
Spallone, Raffaele, and Matteo Bandiera. 2024. G7 Report on Driving Factors and Challenges of AI Adoption and Development Among Companies, Especially Micro and Small Enterprises. Paper presented at G7 Summit, Fasano, Italy, June 13–15; Available online: https://www.g7italy.it/wp-content/uploads/FINAL_REPORT_AI_MSMEs_Ministerial_10_Oct_2024.pdf (accessed on 22 October 2025).
Taufiqih, Rahmad, and Rita Ambarwati. 2024. Enhancing Sales Prediction for MSMEs: A Comparative Analysis of Neural Network and Linear Regression Algorithms. Jurnal Teknologi Manajemen dan Informasi 10: 81–91. [Google Scholar] [CrossRef]
The R Project for Statistical Computing. 2025. Getting Started—Version R (version 4.5.2). Vienna: The R Foundation. Available online: https://www.r-project.org/ (accessed on 22 October 2025).
Tian, Yingjie, Yuqi Zhang, and Haibin Zhang. 2023. Recent Advances in Stochastic Gradient Descent in Deep Learning. Mathematics 11: 682. [Google Scholar] [CrossRef]
Todri, Ardita, Petraq Papajorgji, and Francesco Scalera. 2020. A Multilayer Perceptron Network-Based Analysis to Configure SMEs Strategic Entrepreneurship for Sustainable Growth. Available online: https://sam-d.si/wp-content/uploads/2020/11/10.17708DRMJ.2020.v09n02a03-Clanek-3-1.pdf (accessed on 10 October 2025).
Vezeteu, Paul-Vasile, and Dumitru Iulian Năstac. 2024. Artificial Intelligence Integration in Business: Study of Employee Competences in Relation to the Organisational Needs. Amfiteatru Economic 26: 832–47. Available online: https://www.econstor.eu/bitstream/10419/306186/1/1906641471.pdf (accessed on 26 October 2025). [CrossRef]
Vrbka, Jaromír. 2020. The Use of Neural Networks to Determine Value Based Drivers for SMEs Operating in the Rural Areas of the Czech Republic. Oeconomia Copernicana 11: 325–46. [Google Scholar] [CrossRef]
Wang, Jiaxing, Guoquan Liu, Xiaobo Xu, and Xinjie Xing. 2024. Credit Risk Prediction for Small and Medium Enterprises Utilizing Adjacent Enterprise Data and a Relational Graph Attention Network. Journal of Management Science and Engineering 9: 177–92. [Google Scholar] [CrossRef]
Wang, Junshu. 2021. A Management Model of Small- and Medium-Sized Enterprises Based on Deep Learning Algorithm. Scientific Programming 2021: 5996597. [Google Scholar] [CrossRef]

Figure 1. Data splitting and repeated stratified k-fold CV workflow.

Figure 2. Cramér’s v correlation heatmap for categorical variables.

Figure 3. Correlation heatmap of numerical variables using the Pearson coefficient.

Figure 4. Association between categorical and numerical variables.

Figure 5. “NNET” architecture with 6 hidden neurons and 2 bias nodes.

Figure 6. “MLP” architecture with 5 hidden neurons.

Figure 7. Comparison of the top 10 variable importances in trained “NNET” and “MLP” models.

Figure 8. Comparison of global performance metrics (accuracy, F1 score, and Kappa) for “NNET” and “MLP” models.

Figure 9. Global performance metrics comparison between “NNET” and “MLP” models.

Table 1. Legend for the credit rating class of the MSME.

Risks Description	Class	Probability of Insolvency
The MSME is doing good business and has a low probability of failing. It does not have major liquidity problems and, at the same time, achieved high returns.	A	2.71%
The MSME is doing good business and has an increased probability of failing in the future and does not have major liquidity problems but, at the same time, generated an average return.	B	5.22%
The MSME does average business and has an increased probability of failing. It does not have major liquidity problems, but profitability is below average.	C	8.50%
The MSME has average profitability and an increased probability of failure. It has minor liquidity problems, but at the same time, profitability is below average.	D	12.02%
The MSME has poor profitability and a higher probability of failure in the future. The probability of liquidity problems is very high, but at the same time, profitability is below average.	E	16.45%

Table 2. The data grouped according to the degree of estimated “risk” and the MSME size (size).

Risk Class	A	B	C	D	E
Size	Low	Moderate	Elevated	High	Extra High	Total
Micro	130	608	416	392	512	2058
Small	1	3	2	2	1	9
Medium	41	208	89	30	43	411
Total	172	819	507	424	556	2478

Table 3. The data grouped according to the degree of estimated “risk” and the credit rating class (crew).

Risk Class	A	B	C	D	E
Crew	Low	Moderate	Elevated	High	Extra High	Total
A1	106	10	2			118
A2	66	59	3	2	1	131
A3		173	17		3	193
B1		216	17	3	1	237
B2		205	21	1	2	229
B3		156	41	23	4	224
C1			137	8	1	146
C2			94	8	5	107
C3			96	11	2	109
D1			79	13	18	110
D2				116	13	129
D3				81	14	95
E1				86	18	104
E2				61	13	74
E3				11	461	472

Table 4. Data grouped by the degree of estimated “risk” and the status of MSMEs (active or blocked (abank)).

Risk	A	B	C	D	E
Abank	Low	Moderate	Elevated	High	Extra High	Total
yes	172	819	507	424	397	2319
no					159	159

Table 5. The data grouped according to the degree of estimated “risk” and the turnover (turn) in EUR.

Risk Class	A	B	C	D	E
Turn	Low	Moderate	Elevated	High	Extra High
min	19,800.00	-	-	-	-
max	17,798,649.00	17,251,913.00	11,296,125.00	13,130,415.00	6,709,190.00
mean	709,438.30	626,891.71	405,090.00	176,418.05	153,863.96
median	265,329.00	278,165.00	131,560.00	13,175.00	2266.50
sd	1,691,369.94	1,217,789.26	831,482.61	759,489.77	569,145.47

Table 6. The data grouped according to the degree of estimated “risk” and the net profit (net).

Risk Class	A	B	C	D	E
Net	Low	Moderate	Elevated	High	Extra High
min	−34,834.00	−39,854.00	−3,124,788.00	−651,365.00	−9,968,487.00
max	1,929,623.00	2,541,678.00	1,969,353.00	838,704.00	1,550,360.00
mean	152,906.35	85,800.05	20,559.71	−20,418.99	−90,788.56
median	74,156.50	45,146.00	4297.00	−12,597.50	−12,789.00
sd	275,939.94	146,913.93	203,175.04	102,379.54	637,440.41

Table 7. The data grouped according to the degree of estimated risk (dependent variable) and the number of employees (emp).

Risk Class	A	B	C	D	E
Emp	Low	Moderate	Elevated	High	Extra High
min	1	0	0	0	0
max	352	473	220	1003	703
mean	9.24	10.79	7.86	7.83	6.85
median	4	4	2	1	1
sd	30.31	28.75	17.85	53.80	34.52

Table 8. The data grouped according to the degree of estimated risk (dependent variable) and the earnings before interest and taxes (ebit) in EUR.

Risk Class	A	B	C	D	E
Ebit	Low	Moderate	Elevated	High	Extra High
min	910.00	−39,854.00	−1,509,812.00	−616,123.00	−5,545,473.00
max	2,274,138.00	1,198,052.00	746,370.00	716,863.00	522,663.00
mean	171,797.17	92,594.52	26,684.31	−23,095.93	−52,707.07
median	81,796.00	49,997.00	6228.00	−12,447.50	−12,498.00
sd	318,750.07	128,422.33	124,978.41	88,101.96	286,504.81

Table 9. The data grouped according to the degree of estimated risk (dependent variable) and the fuel and energy costs (coste) in EUR.

Risk Class	A	B	C	D	E
Coste	Low	Moderate	Elevated	High	Extra High
min	0.00	0.00	0.00	0.00	0.00
max	174,692.00	333,836.00	449,556.00	72,502.00	360,102.00
mean	5350.81	8497.24	8218.92	2739.38	3893.46
median	1027.50	2628.00	1139.00	36.50	0.00
sd	17,898.10	20,200.29	28,075.49	8585.53	21,375.67

Table 10. The data grouped according to the degree of estimated risk (dependent variable) and the transport and maintenance costs (mcoste) in EUR.

Risk Class	A	B	C	D	E
Mcoste	Low	Moderate	Elevated	High	Extra High
min	0.00	0.00	−891.00	0.00	0.00
max	215,648.00	2,648,178.00	895,911.00	126,141.00	286,389.00
mean	11,585.19	17,877.75	17,034.15	4613.00	4849.54
median	2287.50	3496.00	1879.00	115.00	0.00
sd	29,130.17	106,437.80	76,291.71	14,903.97	19,891.50

Table 11. Research methodology framework.

Phase	Core Activities	Key Outputs
Phase 1—Data preparation and preprocessing	Data cleaning; removal of inconsistencies; Min–Max normalisation of numeric variables; retention of official categorical codings; dummy encoding; consistent preprocessing across train/validation/test splits.	Cleaned, normalised, and NN-ready dataset with harmonised numerical and categorical feature representations.
Phase 2—Descriptive statistics and correlation analysis	Summary statistics; Cramer’s V (categorical–categorical); Eta (categorical–numeric); Pearson correlation (numeric–numeric); heat-map visualisations; feature–target dependency assessment.	Defined input feature set; validated attribute relevance; correlation matrices and heat-map diagnostics for multiclass risk levels (A–E).
Phase 3—Neural network modelling and training	Development of “NNET” and “MLP” architectures; repeated stratified 5-fold CV (3 repetitions); grid search over hidden units (5/6/7) and weight decay (0.01/0.1); fold-wise performance computation.	Optimised “NNET” and “MLP” models with stable cross-validated performance; variance-robust estimates independent of a single data split.
Phase 4—Evaluation, benchmarking, and interpretability	Accuracy and macro-F1 with 95% confidential index (CI); confusion-matrix analysis; MLR and XGBoost benchmarks; XGBoost calibration (Brier score, reliability diagrams); ablation without crew; SHAP global/local explanations.	Calibrated and interpretable model comparison; validated independence from crew; SHAP-based feature-importance and explanation framework.

Table 12. Confusion matrix.

Elements of Confusion Matrix	Predicted Positive—PP	Predicted Negative
Actual Positive—AP	True Positive—TP	False Negative—FN
Actual Negative—AN	False Positive—FP	True Negative—TN

Table 13. Overview of key classification metrics.

Metric Name	Description	Formula
Accuracy (ACC)	The proportion of correctly classified cases relative to the total number of instances.	ACC = (TP + TN)/(TP + TN + FP + FN)
Error Rate	The proportion of incorrectly classified instances relative to all instances.	ErrorRate = (FP + FN)/(TP + TN + FP + FN)
Sensitivity (True Positive Rate—TPR)	The model’s ability to correctly identify instances belonging to the positive class.	TPR = TP/(TP + FN)
Specificity (True Negative Rate—TNR)	The model’s ability to correctly identify instances not belonging to the positive class.	TNR = TN/(TN + FP)
Precision (Positive Predictive Value—PPV)	The proportion of correctly classified positive instances relative to all instances predicted as positive.	PPV = TP/(TP + FP)
Negative Predictive Value (NPV)	The proportion of correctly classified negative instances relative to all instances predicted as negative.	NPV = TN/(TN + FN)
F1 Score	The harmonic mean of precision and sensitivity; used when balancing false positives and false negatives is important.	F1 = 2 × (PPV × TPR)/(PPV + TPR)
False Discovery Rate (FDR)	The proportion of instances incorrectly classified as positive relative to all predicted positives.	FDR = FP/(TP + FP)
False Omission Rate (FOR)	The proportion of instances incorrectly classified as negative relative to all predicted negatives.	FOR = FN/(TN + FN)
False Positive Rate (FPR)	The proportion of negative instances incorrectly classified as positive.	FPR = FP/(FP + TN)
False Negative Rate (FNR)	The proportion of positive instances incorrectly classified as negative.	FNR = FN/(TP + FN)
Positive Likelihood Ratio (LR⁺)	The likelihood that the model correctly classifies a positive case relative to incorrectly classifying a negative case.	LR⁺ = TPR/FPR
Negative Likelihood Ratio (LR⁻)	The likelihood that the model incorrectly classifies a positive case as negative relative to correctly classifying a negative case.	LR⁻ = FNR/TNR
Diagnostic Odds Ratio (DOR)	The ratio expressing the odds of correct classification versus incorrect classification; higher values indicate better discrimination.	DOR = LR⁺/LR⁻
Balanced Accuracy (BA)	The average of recall obtained on each class (i); it accounts for class imbalance by giving equal weight to all classes, where K is the number of classes, TP_i is the number of true positive predictions for class i, and FN_i is the number of missed instances of class i.	$B A = \frac{1}{K} \sum_{i = 1}^{K} \frac{{T P}_{i}}{{T P}_{i} + {F N}_{i}}$
Matthews Correlation Coefficient (MCC)	A robust measure of the quality of multi-class classifications, taking into account TP, TN, FP, and FN; values range from −1 (total disagreement) to +1 (perfect prediction), where c is the number of correctly classified instances, s is the total number of instances, p_k is the number of predicted instances in class k, and t_k is the number of actual instances in class k.	$M C C = \frac{c \cdot s - \sum_{k} p_{k} t_{k}}{\sqrt{(s^{2} - \sum_{k} p_{k}^{2}) (s^{2} - \sum_{k} t_{k}^{2})}}$
Quadratic Weighted Kappa (QWK)	Measures the agreement between predicted and actual ordinal classes, weighting disagreements quadratically; sensitive to the distance between predicted and true categories, where O_i_,_j is the matrix of observed frequencies, E_i_,_j is the matrix of expected frequencies, and K is the number of classes.	$Q W K = 1 - \frac{\sum_{i . j} w_{i, j} O_{i, j}}{\sum_{i . j} w_{i, j} E_{i, j}}$ $w_{i, j} = \frac{{(i - j)}^{2}}{{(K - 1)}^{2}}$
Macro PR-AUC	The average area under the precision–recall curve computed per class; evaluates model performance across all classes, emphasizing performance on rare or minority classes, where PR_i is the precision–recall curve for class i, and K is the number of classes.	$M a c r o P R - A U C = \frac{1}{K} \sum_{i = 1}^{K} A U C ({P R}_{i})$

Table 14. Cross-validated global metrics for NNET and MLP (15-fold), with and without the variable “crew”.

Model	Crew	BA	SD_BA	MCC	SD_MCC	QWK	SD_QWK	Macro_PR_AUC	SD_PR
NNET	Yes	0.818	0.010	0.790	0.015	0.141	0.018	0.671	0.012
NNET	No	0.817	0.011	0.789	0.016	0.140	0.019	0.670	0.013
MLP	Yes	0.821	0.012	0.792	0.014	0.143	0.017	0.674	0.011
MLP	No	0.820	0.013	0.792	0.015	0.142	0.018	0.672	0.012

Table 15. Cross-fold averaged performance metrics for “NNET”, “MLP”, MLR, and XGBoost models (5-fold × 3 repetitions).

Model	BA	MCC	QWK	Macro-PR-AUC
NNET	0.818	0.790	0.142	0.672
MLP	0.821	0.792	0.145	0.675
MLR	0.780	0.740	0.120	0.640
XGBoost	0.835	0.805	0.160	0.680

Table 16. Comparative performance of “NNET” and “MLP” models based on global metrics derived from confusion matrices.

Predicted →	“NNET”					“MLP”
Actual ↓	A	B	C	D	E	A	B	C	D	E
A	12	8	0	0	0	12	8	0	0	0
B	1	122	6	1	1	2	128	0	0	1
C	1	14	59	0	0	1	16	57	0	0
D	0	0	5	50	2	0	2	4	49	2
E	0	1	4	15	70	0	1	5	15	69

Table 17. Per-class performance metrics for the “NNET” model.

Class	A	B	C	D	E
Accuracy	0.841	0.841	0.841	0.841	0.841
ErrorRate	0.159	0.159	0.159	0.159	0.159
Sensitivity	0.600	0.931	0.797	0.877	0.778
Specificity	0.994	0.905	0.950	0.949	0.989
Precision	0.857	0.841	0.797	0.758	0.959
NPV	0.978	0.960	0.950	0.977	0.933
F1	0.706	0.884	0.797	0.813	0.859
FDR	0.143	0.159	0.203	0.242	0.041
FOR	0.022	0.040	0.050	0.023	0.067
FPR	0.006	0.095	0.050	0.051	0.011
FNR	0.400	0.069	0.203	0.123	0.222
LR_Positive	105.600	9.758	15.840	17.270	73.111
LR_Negative	0.402	0.076	0.213	0.129	0.225
DOR	262.500	128.483	74.209	133.482	325.500
BA	0.797	0.918	0.873	0.913	0.884
MCC	0.704	0.819	0.747	0.779	0.827
QWK	0.030	0.221	0.124	0.110	0.148
Macro_PR_AUC	0.514	0.784	0.636	0.665	0.746

Table 18. Per-class performance metrics for the “MLP” model.

Class	A	B	C	D	E
Accuracy	0.847	0.847	0.847	0.847	0.847
ErrorRate	0.153	0.153	0.153	0.153	0.153
Sensitivity	0.600	0.977	0.770	0.860	0.767
Specificity	0.991	0.888	0.970	0.952	0.989
Precision	0.800	0.826	0.864	0.766	0.958
NPV	0.978	0.986	0.944	0.974	0.930
F1	0.686	0.895	0.814	0.810	0.852
FDR	0.200	0.174	0.136	0.234	0.042
FOR	0.022	0.014	0.056	0.026	0.070
FPR	0.009	0.112	0.030	0.048	0.011
FNR	0.400	0.023	0.230	0.140	0.233
LR_Positive	70.400	8.722	25.505	18.053	72.067
LR_Negative	0.403	0.026	0.237	0.147	0.236
DOR	174.500	338.173	107.667	122.500	305.571
BA	0.797	0.918	0.873	0.913	0.884
MCC	0.704	0.819	0.747	0.779	0.827
QWK	0.030	0.221	0.124	0.110	0.148
Macro_PR_AUC	0.514	0.784	0.636	0.665	0.746

Table 19. Comparison of global performance metrics for the “NNET” and “MLP” models.

Model	NNET	MLP
Accuracy	0.841	0.847
ErrorRate	0.159	0.153
Sensitivity	0.797	0.795
Specificity	0.957	0.958
Precision	0.842	0.843
NPV	0.960	0.962
F1	0.812	0.811
FDR	0.158	0.157
FOR	0.040	0.038
FPR	0.043	0.042
FNR	0.203	0.205
LR_Positive	44.316	38.949
LR_Negative	0.209	0.210
DOR	184.835	209.682
BA	0.797	0.795
MCC	0.790	0.798
QWK	0.788	0.794
Macro_PR_AUC	0.669	0.669

Table 20. Time performance analysis of “NNET” and “MLP” model training.

Model	Total_Duration_sec	Avg_Time_per_Config_sec
NNET	56.03	4.67
MLP	22.01	3.67

Table 21. Computational and space complexity comparison of “NNET” and “MLP” algorithms in R.

Feature	NNET	MLP
Network type	Single-layer (feedforward)	Multi-layer (feedforward)
Optimization	Broyden-Fletcher-Goldfarb-Shanno (BFGS) quasi-Newton	Stochastic gradient descent
Time complexity per epoch	$O (E \times N \times (n \times h \times h \times m)) + O (W^{2})$	$O (E \times N \sum n_{i} n_{i + 1})$
Space complexity	$O (W^{2})$	$O (W)$
Scalability	Low (small networks)	High (large networks)
Convergence speed	Fast (small data)	Moderate (large data)
Suitable for	Simple models	Complex and deep models

N—Number of samples (instances) in the training set; E—number of epochs (cycles through the entire dataset during training); n—number of neurons in the input layer; h—number of neurons in the hidden layer; m—number of neurons in the output layer; n_l—number of neurons in layer l (used for multilayer networks in MLP); W—total number of weights in the network (including bias parameters).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Božović, D.; Perović, N.; Aleksić, M.; Rašović, I.; Iker, O. Machine Learning Analysis of Financial Risk Dynamics in Micro-, Small, and Medium Enterprises. Risks 2025, 13, 240. https://doi.org/10.3390/risks13120240

AMA Style

Božović D, Perović N, Aleksić M, Rašović I, Iker O. Machine Learning Analysis of Financial Risk Dynamics in Micro-, Small, and Medium Enterprises. Risks. 2025; 13(12):240. https://doi.org/10.3390/risks13120240

Chicago/Turabian Style

Božović, Dražen, Nataša Perović, Marinko Aleksić, Ivana Rašović, and Oto Iker. 2025. "Machine Learning Analysis of Financial Risk Dynamics in Micro-, Small, and Medium Enterprises" Risks 13, no. 12: 240. https://doi.org/10.3390/risks13120240

APA Style

Božović, D., Perović, N., Aleksić, M., Rašović, I., & Iker, O. (2025). Machine Learning Analysis of Financial Risk Dynamics in Micro-, Small, and Medium Enterprises. Risks, 13(12), 240. https://doi.org/10.3390/risks13120240

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Machine Learning Analysis of Financial Risk Dynamics in Micro-, Small, and Medium Enterprises

Abstract

1. Introduction

1.1. Application of NNs in Financial Modelling of MSMEs

1.2. Credit Risk Prediction and Analysis

1.3. Prediction of MSME Growth, Sustainability, and Performance

1.4. AI Implementation and Challenges in SMEs

1.5. Systematic Reviews and Research Trends

2. Materials and Methods

Metrics

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI