Predicting Stock Market Risk Using Machine Learning Classification Models

Noh, Seol-Hyun

doi:10.3390/risks14040092

Open AccessArticle

Predicting Stock Market Risk Using Machine Learning Classification Models

by

Seol-Hyun Noh

Department of Liberal Arts, Hongik University, Seoul 04066, Republic of Korea

Risks 2026, 14(4), 92; https://doi.org/10.3390/risks14040092

Submission received: 7 March 2026 / Revised: 4 April 2026 / Accepted: 14 April 2026 / Published: 17 April 2026

(This article belongs to the Special Issue AI for Financial Risk Perception)

Download

Browse Figures

Versions Notes

Abstract

This study aims to predict stock market risk and improve preparedness for potential economic crises by identifying sharp declines in stock returns using classification-based machine learning models. Using ten years of KOSPI 200 index data (2015 to 2024), a daily return series was constructed. A day was labeled a risk event (1) if its return fell below the 5th percentile of the returns observed over the preceding 100 trading days, indicating a sharp decline. Nine classification models—Logistic Regression, k-nearest Neighbor, Decision Tree, Random Forest, Linear Discriminant Analysis, Naive Bayes, Quadratic Discriminant Analysis, AdaBoost, and Gradient Boosting—were trained and validated. Among these, Logistic Regression demonstrated the strongest overall performance across multiple evaluation metrics, including accuracy, non-risk F1 score, risk F1 score, and AUC.

Keywords:

stock market risk prediction; sharp return decline prediction; classification machine learning models; classification performance

1. Introduction

This study aims to predict stock market risks and enhance preparedness for potential economic crises by detecting sharp declines in stock returns using classification-based machine learning models.

Recently, considerable attention has been devoted to forecasting stock market risks through artificial intelligence techniques. Kim and Yu (2024) identified major economic disruptions as anomalies using an isolation-forest-based anomaly-detection framework. Bhandari et al. (2022) employed a Long Short-Term Memory (LSTM) model to predict the next day’s closing price of the S&P 500 index. Similarly, Asare et al. (2023) empirically demonstrate that a Bayesian LSTM model outperforms a standard LSTM model in forecasting the next-day closing price of the S&P 500 index. Mehtab et al. (2009) compared eight machine-learning-based regression models and found that an LSTM-based deep-learning approach achieved the highest predictive performance for the NIFTY 50 index. Furthermore, Kim and Won (2018) reported that forecasting the volatility of the KOSPI 200 index using a hybrid model combining LSTM with two GARCH-type models produced superior performance compared to the single-model approach.

Financial risk, which is a central concern in modern financial systems, has a widespread and significant impact on the economy. As financial markets have rapidly evolved and interconnections among financial institutions have intensified, the management and prevention of financial risk have become critical priorities in both academia and industry. In particular, predicting extreme financial risks is essential for safeguarding sustainable social and economic development. Previous studies have investigated severe financial disruptions such as banking crises (Carmona et al. 2019; Climent et al. 2019; Coffinet and Kien 2019; Filippopoulou et al. 2020; Gutiérrez et al. 2010; Lyócsa et al. 2023; Nakatani 2020; Roy 2022; Virtanen et al. 2018) and currency crises (Alaminos et al. 2021; Arifovic and Maschek 2012; Aydın and Tunç 2023; Bodart and Carpantier 2023; Candelona et al. 2014; Gangopadhyay 2020; Lin et al. 2008; Wang and Zong 2023).

The present study differs from prior research by focusing specifically on stock market risk and operationalizing risk as sharp decline events in stock returns detected in advance using classification-based machine learning models. Daily return data were computed from the KOSPI 200 Index over ten years from 1 January 2015, to 31 December 2024. Data from 2015 to 2022 were used for model training, whereas return data from 2023 and 2024 were reserved for out-of-sample validation to assess the predictive performance.

A binary classification label was constructed as follows: if the return on day

t + 1

fell below the lower

5 %

of the return distribution calculated over the preceding 100 days, the label for day

t + 1

was set to

1

, indicating a sharp decline. Otherwise, it was set to

0

. The

5 %

significance level was selected because it is widely used in statistical inference as a conventional cutoff for identifying extreme observations, making it an appropriate and interpretable criterion for defining downside-tail risk events.

Ren et al. (2024) proposed an effective model for forecasting extreme financial risk in the American stock market using optimal AdaBoost model. They devised a binary risk event indicator by comparing the stock market returns and historical return-based threshold. The threshold is determined at the end of each day

t

using the

5 %

of the historical stock market return distribution over a rolling 500 days. In experimental result, AUC performance of optimal AdaBoost model is presented 0.8785, which is lower than the AUC performance result of 0.9913 of the Logistic regression model in this study. In this study the reference date was set to 100 days to detect relatively frequent sharp decline points, including extreme financial risks.

Nine classification models were employed in this study: Logistic Regression, k-nearest Neighbor, Decision Tree, Random Forest, Linear Discriminant Analysis, Naive Bayes, Quadratic Discriminant Analysis, AdaBoost, and Gradient Boosting. Each model is trained using in-sample data and evaluated using an out-of-sample validation set to assess its ability to predict sharp decline events. The results indicated that the Logistic Regression model achieved superior performance across all evaluation metrics, including accuracy, non-risk F1 score, risk F1 score, and AUC.

Contribution of this study is as follows:

■: We operationalize stock market risk as extreme down side return events
■: We systematically compare the nine classification models using out-of-sample validation, and find that an interpretable model (Logistic Regression) outperforms more complex alternatives in predicting tail-risk events.

The remainder of this paper is organized as follows: Section 2 describes the data and nine classification models used for training and validation. Section 3 presents experimental results. Section 4 summarizes the findings of the study.

2. Data and Models

2.1. Data

In this study, data from the Korean KOSPI 200 Index over 10 years from 1 January 2015, to 31 December 2024, were collected to construct a daily return data series. Let

S_{t - 1}

denote the index value on day

t - 1

and

S_{t}

denote the index value on day

t

. The log return on day

t

is defined as

l n (\frac{S_{t}}{S_{t - 1}})

. The return data from 2015 to 2022 are used as the training data, whereas the return data from 2023 and 2024 are reserved for out-of-sample validation.

If the return on day

t + 1

fell below the lower

5

th percentile of the return distribution over the preceding 100 days, the label for day

t + 1

was set to

1

, indicating a sharp decline; otherwise, it was set to

0

. As the

5 %

significance level is widely used in statistics as a threshold for rejecting normality, it was considered an appropriate criterion for defining downside-risk events. Figure 1a illustrates the return series and the corresponding sharp decline points identified during the validation period. Figure 1b shows the rolling standard deviation for the validation data, and the points marked x represent the rolling standard deviation points at the sharp decline points. As can be seen in Figure 1b, the sharp decline points are included among the points where volatility increases, which means that the definition of sharp decline points is valid.

2.2. Model

The classification-based machine learning models employed in this study to predict points of sharp decline in stock returns include Logistic Regression, k-nearest Neighbor, Decision Tree, Random Forest, Linear Discriminant Analysis, Naive Bayes, Quadratic Discriminant Analysis, AdaBoost, and Gradient Boosting.

2.2.1. Logistic Regression Model

The Logistic Regression model is a probabilistic classification method used to model the conditional probability

P (Y = 1 | x_{1}, x_{2}, \dots, x_{k})

, where the binary response variable

Y

takes values

0

or

1

, and the

k

explanatory variables are given as

x_{1}, x_{2}, \dots, x_{k}

, as expressed in Equation (1).

l o g [\frac{P (Y = 1 | x_{1}, x_{2}, \dots, x_{k})}{1 - P (Y = 1 | x_{1}, x_{2}, \dots, x_{k})}] = α + β_{1} x_{1} + β_{2} x_{2} + \dots + β_{k} x_{k}

(1)

The intercept

α

and the effect

β_{i}

of

x_{i}

are estimated from the training data using the maximum likelihood method. For new data, if the predicted probability obtained using Equation (1) exceeds a predetermined threshold, the observation is classified as 1; otherwise, it is classified as 0. This procedure addresses the binary classification problem.

2.2.2. k-Nearest Neighbor Model

The k-nearest neighbor algorithm is a nonparametric supervised learning method that is widely used in machine learning. Given a new observation, the k-nearest neighbor model identifies the

k

closest observations in the training dataset according to a specified distance metric

d

. Based on the information from these selected observations, the model predicts either the class of the new data in the classification problems or the prediction value in the regression problems. The hyperparameters of the k-nearest neighbor model are the number of neighbors

k

and the distance metric

d

. Common distance measures include the Euclidean distance, Manhattan distance, Mahalanobis distance, correlation distance, and rank correlation distance.

2.2.3. Decision Tree Model

The decision-tree model is a supervised learning algorithm applicable to both regression and classification problems. It predicts outcomes by representing the decision rules in a tree-like structure. During training, a decision tree was constructed in which the explanatory variables were expressed as nodes. Based on the splitting criteria, the feature space is partitioned into non-overlapping groups in a hierarchical branch structure, and the terminal nodes at the bottom represent the groups classified according to the dependent variable.

2.2.4. Random Forest Model

The Random Forest model is an ensemble learning algorithm that employs multiple Decision Trees. It was used for both regression and classification tasks. In classification problems, the predicted class is determined by majority voting across individual decision trees, whereas in regression problems, the output is calculated as the average of the predicted values from all trees. The Random Forest algorithm was designed to address the overfitting problem associated with Decision Trees. The training data were divided into multiple subsets. For each subset, a predetermined number of explanatory variables were randomly selected to construct a new Decision Tree. The variance of the model was reduced by training multiple Decision Trees and averaging their predicted values.

2.2.5. Linear Discriminant Analysis Model

Linear Discriminant Analysis is a supervised classification and dimensionality reduction algorithm. It identifies a vector

W

that maximizes the between-class variance, while minimizing the within-class variance. Classification was performed by comparing a threshold with the score obtained from the inner product of

W

and the input data (McLachlan 2004).

2.2.6. Naive Bayes Model

The Naive Bayes model is a probabilistic classification algorithm based on Bayes’ theory. Under the assumption that all features are mutually independent, given a feature vector

x = (x_{1,}, \dots, x_{n})

representing

n

features, the model determines which of the

K

possible classes

C_{1}, \dots, C_{K}

the observation belongs to using Bayes’ theorem and the maximum likelihood principle.

\underset{k \in {1, \dots, K}}{argmax} P (C_{k}) \prod_{i = 1}^{n} P (x_{i} | C_{k})

2.2.7. Quadratic Discriminant Analysis Model

Quadratic Discriminant Analysis is a supervised classification machine-learning algorithm that assumes that the feature vectors within each class follow distinct multivariate normal distributions. Given a feature vector

x = (x_{1,}, \dots, x_{n})

representing

n

features, the prior probability

P (x | C_{k})

is estimated for each class, and the posterior probability

P (C_{k} | x)

is computed using Bayes’ theorem, where

C_{k}

denotes the

k

-th class. Unlike Linear Discriminant Analysis, this method can generate quadratic decision boundaries, making it suitable for complex nonlinear data structures (Tharwat 2016).

2.2.8. AdaBoost Model

AdaBoost is an ensemble machine-learning algorithm that sequentially combines multiple weak binary classifiers to construct a strong predictive model. During training, higher weights were assigned to observations misclassified in previous iterations, whereas lower weights were assigned to correctly classified observations. The processes of model training, error measurement, and weight updating were iteratively repeated, and the final class was predicted by aggregating the outputs of the binary classifiers (Ren et al. 2024).

2.2.9. Gradient Boosting Model

Gradient Boosting is an ensemble-learning technique that combines multiple weak learners (typically Decision Trees) to form a strong predictive model. At each iteration, the algorithm fits a new model to the residual errors, which are defined as the difference between the predicted and actual values from the previous stage, by applying a gradient descent. At each stage, a classifier is constructed to minimize the prediction error, and the outputs of the sequential classifiers are combined to produce a final prediction (Hastie et al. 2009).

3. Experimental Results

3.1. Performance Metrics

The nine classification models described in Section 2.2, namely Logistic Regression, k-nearest Neighbor, Decision Tree, Random Forest, Linear Discriminant Analysis, Naive Bayes, Quadratic Discriminant Analysis, AdaBoost, and Gradient Boosting, were trained on the training dataset and evaluated on the validation dataset to predict sharp declines. For each observation in the validation data, if the predicted probability exceeded a threshold of

0.5

, the observation was classified as a risk state, indicating a sharp decline; otherwise, it was classified as a non-risk state. Because both the actual and predicted states can take either Non-Risk or Risk values, the classification results were summarized in a confusion matrix, as shown in Table 1.

Based on the confusion matrix, the following performance metrics were calculated: Accuracy, Non-Risk Precision, Non-Risk Recall, Risk Precision, Risk Recall, Non-Risk F1 Score, and Risk F1 Score. Additionally, by comparing the predicted probabilities with the actual labels, the ROC AUC curve and the AUC under the ROC AUC curve were calculated. The mathematical definitions of these performance metrics are given in Equation (2).

A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N}

N o n - R i s k P r e c i s i o n = \frac{T P}{T P + F P}

N o n - R i s k R e c a l l = \frac{T P}{T P + F N}

R i s k P r e c i s i o n = \frac{T N}{F N + T N}

(2)

R i s k R e c a l l = \frac{T N}{F P + T N}

N o n - R i s k F 1 S c o r e = \frac{2 \times N o n - R i s k P r e c i s i o n \times N o n - R i s k R e c a l l}{N o n - R i s k P r e c i s i o n + N o n - R i s k R e c a l l}

R i s k F 1 S c o r e = \frac{2 \times R i s k P r e c i s i o n \times R i s k R e c a l l}{R i s k P r e c i s i o n + R i s k R e c a l l}

The ROC curve is a graph obtained by plotting the false-positive rate, defined as

\frac{F P}{F P + T N}

, on the

x

-axis against the true-positive rate, defined as

\frac{T P}{T P + F N}

, on the

y

-axis. A model with superior classification performance produces a curve that lies above the line

y = x

, exhibits an upward convex shape, and is positioned closer to the upper-left region of the graph. Therefore, a larger AUC value indicates a better overall classification performance.

3.2. Performance Analysis Results

Table 2 presents the performance results of the nine classification machine learning models across multiple evaluation metrics, including Accuracy, Non-Risk Precision, Non-Risk Recall, Risk Precision, Risk Recall, Non-Risk F1 Score, Risk F1 Score, and AUC for the training data from 2015 to 2022.

Table 3 presents the performance results of the nine classification machine learning models across multiple evaluation metrics, including Accuracy, Non-Risk Precision, Non-Risk Recall, Risk Precision, Risk Recall, Non-Risk F1 Score, Risk F1 Score, and AUC for the validation data from 2023 to 2024.

The confusion matrices for the Logistic Regression, k-nearest Neighbor, Decision Tree, Random Forest, Linear Discriminant Analysis, Naïve Bayes, Quadratic Discriminant Analysis, AdaBoost, and Gradient Boosting models for training data are

[\begin{matrix} 1728 & 20 \\ 40 & 75 \end{matrix}]

,

[\begin{matrix} 1709 & 29 \\ 38 & 77 \end{matrix}]

,

[\begin{matrix} 1703 & 35 \\ 36 & 79 \end{matrix}]

,

[\begin{matrix} 1703 & 35 \\ 36 & 79 \end{matrix}]

,

[\begin{matrix} 1727 & 11 \\ 55 & 60 \end{matrix}]

,

[\begin{matrix} 1726 & 12 \\ 56 & 59 \end{matrix}]

,

[\begin{matrix} 1726 & 12 \\ 56 & 59 \end{matrix}]

,

[\begin{matrix} 1705 & 33 \\ 24 & 91 \end{matrix}]

, and

[\begin{matrix} 1707 & 31 \\ 33 & 82 \end{matrix}]

, respectively.

The confusion matrices for the Logistic Regression, k-nearest Neighbor, Decision Tree, Random Forest, Linear Discriminant Analysis, Naïve Bayes, Quadratic Discriminant Analysis, AdaBoost, and Gradient Boosting models for validation data are

[\begin{matrix} 456 & 2 \\ 12 & 19 \end{matrix}]

,

[\begin{matrix} 448 & 10 \\ 11 & 20 \end{matrix}]

,

[\begin{matrix} 446 & 12 \\ 12 & 19 \end{matrix}]

,

[\begin{matrix} 446 & 12 \\ 12 & 19 \end{matrix}]

,

[\begin{matrix} 457 & 1 \\ 19 & 12 \end{matrix}]

,

[\begin{matrix} 457 & 1 \\ 18 & 13 \end{matrix}]

,

[\begin{matrix} 457 & 1 \\ 18 & 13 \end{matrix}]

,

[\begin{matrix} 447 & 11 \\ 11 & 20 \end{matrix}]

, and

[\begin{matrix} 446 & 12 \\ 12 & 19 \end{matrix}]

, respectively.

For superior risk prediction performance, higher values of Risk Precision, Risk Recall, Risk F1 Score, and AUC indicate better model performance. To identify the most effective model for risk prediction, we combined the Risk F1 Score and AUC as selection criteria. Among the nine classifiers, the Logistic Regression model achieved the highest combined score of

R i s k F 1 S c o r e + A U C

at 1.7221, followed by AdaBoost at 1.6335, the k-nearest neighbor model at 1.5989, and the Gradient Boosting model at 1.5963, based on validation data. Based on the validation data, the probability of the nine classification models detecting the actual risk state among the 31 sharp decline points defined as risk states was 0.6129, 0.6451, 0.6129, 0.6129, 0.3871, 0.4194, 0.4194, 0.6452 and 0.6129, respectively. Overall, the Logistic Regression model demonstrated superior classification performance in predicting sharp declines in returns, outperforming the nine other classification models considered in this study. This is because when the features of the input data are one-dimensional, the data lies on a one-dimensional line, so if there is a difference in the distribution of the two classes, the Logistic Regression model can find the optimal single threshold value through the sigmoid function to accurately distinguish the data.

In the performance analysis results for the training data in Table 2, it can be seen that, with the exception of the Logistic Regression model, the other models showed somewhat higher performance metrics than on the validation data.

As a result of conducting the same experiment using the 2025 KOSPI 200 return data as test data, the confusion matrices for the Logistic Regression, k-nearest Neighbor, Decision Tree, Random Forest, Linear Discriminant Analysis, Naïve Bayes, Quadratic Discriminant Analysis, AdaBoost, and Gradient Boosting models are

[\begin{matrix} 134 & 0 \\ 1 & 7 \end{matrix}]

,

[\begin{matrix} 132 & 2 \\ 2 & 6 \end{matrix}]

,

[\begin{matrix} 131 & 3 \\ 1 & 7 \end{matrix}]

,

[\begin{matrix} 131 & 3 \\ 1 & 7 \end{matrix}]

,

[\begin{matrix} 133 & 1 \\ 1 & 7 \end{matrix}]

,

[\begin{matrix} 132 & 2 \\ 1 & 7 \end{matrix}]

,

[\begin{matrix} 133 & 1 \\ 1 & 7 \end{matrix}]

,

[\begin{matrix} 131 & 3 \\ 1 & 7 \end{matrix}]

, and

[\begin{matrix} 131 & 3 \\ 1 & 7 \end{matrix}]

, respectively. Among the nine classifiers, the Logistic Regression model achieved the highest combined score of

R i s k F 1 S c o r e + A U C

at 1.9314, followed by Linear Discriminant Analysis at 1.8731, the Quadratic Discriminant Analysis at 1.8731, and the Naïve Bayes model at 1.8216. For test data as well, when the features of the input data are one-dimensional, a simple Logistic Regression model showed superior classification performance compared to a complex ensemble model.

Figure 2 shows the ROC AUC for the four top-performing models for the validation dataset.

The sample period from 2015 to 2024 includes the COVID-19 pandemic period. COVID-19 pandemic period represents a period of extreme market volatility and structural disruption. In this study, we additionally aimed to measure the classification performance of nine models regarding the period in which crisis-specific patterns appear. We set return data from 2015 to 2019 as the training dataset and return data from the pandemic period of 2020 to 2023 as the test dataset, and then performed the same experiment. As shown in Figure 3, the number of sharp decline points increased significantly to 69. Among the nine classifiers, the Logistic Regression model achieved the highest combined score of

R i s k F 1 S c o r e + A U C

at 1.7081, followed by AdaBoost at 1.6909, the Gradient Boosting model at 1.6145, and the Linear Discriminant Analysis model at 1.5774. The probability of the nine classification models detecting the actual risk state among the 69 sharp decline points defined as risk states was 0.6377, 0.6957, 0.6377, 0.6377, 0.4493, 0.4203, 0.4203, 0.7246 and 0.6377, respectively. The probability of detecting a risk state increased slightly compared to the case where the experiment was conducted using validation data from 2023 to 2024. This result demonstrates that the methodology designed in this study performs well even in crisis situations.

4. Conclusions

This study aims to predict stock market risks and enhance preparedness for potential economic crises by detecting sharp declines in stock returns in advance using classification machine learning models.

Unlike previous studies, which primarily focused on forecasting next-day closing prices, volatility, or extreme financial risks, this study specifically targets the early prediction of sharp decline points in stock returns.

Return data were constructed from the KOSPI 200 Index over 10 years from 1 January 2015, to 31 December 2024. Data from 2015 to 2022 were used for training, whereas data from 2023 and 2024 were used for validation.

A sharp decline was defined such that if the return on day

t + 1

fell below the lower

5 %

of the return distribution over the preceding 100 days, the label for day

t + 1

was set to

1

; otherwise, it was set to

0

. The reference date of 100 days was set to detect relatively frequent sharp decline points, including extreme financial risks. Using this labeling scheme, Logistic Regression, k-nearest Neighbor, Decision Tree, Random Forest, Linear Discriminant Analysis, Naïve Bayes, Quadratic Discriminant Analysis, AdaBoost, and Gradient Boosting models were trained and evaluated using the validation dataset. The results summarized in Table 3 show that the Logistic Regression model achieved the highest performance among the nine classification models, with an accuracy of 0.9714, Non-Risk F1 Score of 0.9849, Risk F1 Score of 0.7308, and AUC of 0.9913. These findings indicate that the Logistic Regression model is the most effective classification for predicting sharp declines in stock returns. This is because when the features of the input data are one-dimensional, the data lies on a one-dimensional line, so if there is a difference in the distribution of the two classes, the Logistic Regression model can find the optimal single threshold value through the sigmoid function to accurately distinguish the data. We additionally measured the classification performance of nine models regarding the COVID-19 pandemic period in which crisis-specific patterns appear. The probability of detecting a risk state increased slightly compared to the case where the experiment was conducted using validation data from 2023 to 2024. This result demonstrates that the methodology designed in this study performs well even in crisis situations.

The methodology and results suggest that the early detection of such declines can enhance preparedness for potential stock market risks.

Funding

This work was supported by the Hongik University new faculty research support fund.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The author declares no conflicts of interest.

References

Alaminos, David, Jose’ I. Peláez, M. Belen Salas, and Manuel A. Fernández-Gámez. 2021. Sovereign debt and currenct crises prediction models using machine learning techniques. Symmetry 13: 652. [Google Scholar] [CrossRef]
Arifovic, Jasmina, and Michael K. Maschek. 2012. Currency crisis: Evolution of beliefs and policy experiments. Journal of Economic Behavior & Organization 82: 131–50. [Google Scholar] [CrossRef]
Asare, Clement, Derrick Asante, and John F. Essel. 2023. Probabilistic LSTM Modeling for Stock Price Prediction with Monte Carlo dropout Long Short-Term Memory Network. International Journal of Innovative Science and Research Technology 8: 2316–22. [Google Scholar]
Aydın, Suat, and Cengiz Tunç. 2023. What is the most prominent reserve indicator that forewarns currency crises? Economic Letters 231: 111282. [Google Scholar] [CrossRef]
Bhandari, Hum N., Binod Rimal, Nawa R. Pokhrel, Ramchandra Rimal, Keshab R. Dahal, and Rajendra K. C. Khatri. 2022. Predicting stock market index using LSTM. Machine Learning with Applications 9: 100320. [Google Scholar] [CrossRef]
Bodart, Vincent, and Jean-François Carpantier. 2023. Currency crises in emerging countries: The commodity factor. Journal of Commodity Markets 30: 100287. [Google Scholar] [CrossRef]
Candelona, Bertrand, Elena-Ivona Dumitrescub, and Christophe Hurlinc. 2014. Currency crisis early warning systems: Why they should be dynamic. International Journal of Forecasting 30: 1016–29. [Google Scholar] [CrossRef]
Carmona, Pedro, Francisco Climent, and Alexandre Momparler. 2019. Predicting failure in the U.S. Banking sector: An extreme gradient boosting approach. International Review of Economics and Finance 61: 304–23. [Google Scholar] [CrossRef]
Climent, Francisco, Alexandre Momparler, and Pedro Carmona. 2019. Anticipating banking distress in the Eurozone: An extreme gradient boosting approach. Journal of Business Research 101: 885–96. [Google Scholar] [CrossRef]
Coffinet, Jérôme, and Jean-Noël Kien. 2019. Detection of rare events: A machine learning toolkit with an application to banking crises. The Journal of Finance and Data Science 5: 183–207. [Google Scholar] [CrossRef]
Filippopoulou, Chryssanthi, Emilios Galariotis, and Spyros Spyrou. 2020. An early warning system for predicting systemic banking crises in the Eurozone: A logit regression approach. Journal of Economic Behavior and Organization 172: 344–63. [Google Scholar] [CrossRef]
Gangopadhyay, Partha. 2020. A new & simple model of currency crises: Bifurcations and the emergence of a bad equilibrium. Physica A: Statistical Mechanics and its Applications 538: 122860. [Google Scholar]
Gutiérrez, Pedro A., M. J. Segovia-Vargas, Sancho Salcedo-Sanz, C. Hervás-Martínez, Araceli Sanchis, J. Antonio Portilla-Figueras, and Francisco Fernández-Navarro. 2010. Hybridizing logistic regression with product unit and RBF networks for accurate detection and prediction of banking crises. Omega 38: 333–44. [Google Scholar] [CrossRef]
Hastie, Trevor, Robert Tibshirani, and Jerome H. Friedman. 2009. The Elements of Statistical Learning. New York: Springer. [Google Scholar]
Kim, Ha Y., and Chang H. Won. 2018. Forecasting the Volatility of Stock Price Index: A Hybrid Model Integrating LSTM with Multiple GARCH-type Models. Expert Systems with Applications 103: 25–37. [Google Scholar] [CrossRef]
Kim, Hyun J., and Heonchang Yu. 2024. Development of a Stock Volatility Detection Model Using Artificial Intelligence. In Annual Symposium of KIPS. Seoul: KIPS, vol. 31, pp. 576–79. [Google Scholar]
Lin, Chin-Shien, Haider A. Khan, Ruei-Yuan Chang, and Ying-Chieh Wang. 2008. A new approach to modeling early warning systems for currency crises: Can a fuzzy expert system predict the currency crises effectively? Journal of International Money and Finance 27: 1098–121. [Google Scholar] [CrossRef][Green Version]
Lyócsa, Štefan, Martina Halousková, and Erik Haugom. 2023. The US banking crisis in 2023: Intraday attention and price variation of banks at risk. Finance Research Letters 57: 104209. [Google Scholar] [CrossRef]
McLachlan, Geoffrey J. 2004. Discriminant Analysis and Statistical Pattern Recognition. Hoboken: Wiley Interscience. ISBN 978-0-471-69115-0. [Google Scholar]
Mehtab, Sidra, Jaydip Sen, and Abhishek Dutta. 2009. Stock Price Prediction Using Machine Learning and LSTM-Based Deep Learning Models. arXiv arXiv:2009.10819v1. [Google Scholar]
Nakatani, Ryota. 2020. Macroprudential policy and the probability of a banking crisis. Journal of Policy Modeling 42: 1169–86. [Google Scholar] [CrossRef]
Ren, Tingting, Shaofang Li, and Siying Zhang. 2024. Stock Market Extreme Risk Prediction Based on Machine Learning: Evidence from the American Market. North American Journal of Economics and Finance 74: 102241. [Google Scholar] [CrossRef]
Roy, Saktinil. 2022. What drives the systemic banking crises in advanced economies? Global Finance Journal 54: 100746. [Google Scholar] [CrossRef]
Tharwat, Alaa. 2016. Linear vs. Quadratic Discriminant Analysis Classifier: A Tutorial. International Journal of Applied Pattern Recognition 3: 145–80. [Google Scholar] [CrossRef]
Virtanen, Timo, Eero Tölö, Matti Virén, and Katja Taipalus. 2018. Can bubble theory foresee banking crises? Journal of Financial Stability 36: 66–81. [Google Scholar] [CrossRef]
Wang, Peiwan, and Lu Zong. 2023. Does machine learning help private sectors to alarm crises? Evidence from China’s currency market. Physica A: Statistical Mechanica and Its Applications 611: 128470. [Google Scholar] [CrossRef]

Figure 1. (a) Returns and points of sharp decline for validation data (b) Rolling standard deviation at sharp decline for validation data.

Figure 2. ROC AUC curves of four top-performing models on the validation dataset.

Figure 3. Returns and points of sharp decline for COVID-19 pandemic period.

Table 1. Confusion matrix.

		Prediction Outcome
		Non-Risk	Risk
Actual outcome	Non-Risk	True Positive (TP)	False Negative (FN)
Actual outcome	Risk	False Positive (FP)	True Negative (TN)

Table 2. Performance results for nine classification machine learning models for training data.

Model	Accuracy	Non-Risk Precision	Non-Risk Recall	Risk Precision	Risk Recall	Non-Risk F1 Score	Risk F1 Score	AUC
Logistic Regression	0.9676	0.9772	0.9885	0.7895	0.6522	0.9828	0.7143	0.9878
k-Nearest Neighbors	0.9638	0.9782	0.9833	0.7264	0.6696	0.9808	0.6968	0.9921
Decision Tree	0.9617	0.9793	0.9799	0.6930	0.6870	0.9796	0.6900	1.0
Random Forest	0.9617	0.9793	0.9799	0.6930	0.6870	0.9796	0.6900	1.0
Linear Discriminant Analysis	0.9644	0.9691	0.9937	0.8451	0.5217	0.9813	0.6452	0.9878
Naïve Bayes	0.9633	0.9686	0.9931	0.8310	0.5130	0.9807	0.6344	0.9878
Quadratic Discriminant Analysis	0.9633	0.9686	0.9931	0.8310	0.5130	0.9807	0.6344	0.9878
AdaBoost	0.9692	0.9861	0.9810	0.7339	0.7913	0.9836	0.7615	0.9876
Gradient Boosting	0.9655	0.9810	0.9822	0.7257	0.7130	0.9816	0.7193	0.9988

Table 3. Performance results for nine classification machine learning models for validation data.

Model	Accuracy	Non-Risk Precision	Non-Risk Recall	Risk Precision	Risk Recall	Non-Risk F1 Score	Risk F1 Score	AUC
Logistic Regression	0.9714	0.9744	0.9956	0.9048	0.6129	0.9849	0.7308	0.9913
k-Nearest Neighbors	0.9571	0.9760	0.9782	0.6667	0.6452	0.9771	0.6557	0.9432
Decision Tree	0.9509	0.9738	0.9738	0.6129	0.6129	0.9738	0.6129	0.7655
Random Forest	0.9509	0.9738	0.9738	0.6129	0.6129	0.9738	0.6129	0.9384
Linear Discriminant Analysis	0.9591	0.9601	0.9978	0.9231	0.3871	0.9786	0.5455	0.9913
Naïve Bayes	0.9611	0.9621	0.9978	0.9286	0.4194	0.9796	0.5778	0.9913
Quadratic Discriminant Analysis	0.9611	0.9621	0.9978	0.9286	0.4194	0.9796	0.5778	0.9913
AdaBoost	0.9550	0.9760	0.9760	0.6452	0.6452	0.9760	0.6452	0.9883
Gradient Boosting	0.9509	0.9738	0.9738	0.6129	0.6129	0.9738	0.6129	0.9834

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Noh, S.-H. Predicting Stock Market Risk Using Machine Learning Classification Models. Risks 2026, 14, 92. https://doi.org/10.3390/risks14040092

AMA Style

Noh S-H. Predicting Stock Market Risk Using Machine Learning Classification Models. Risks. 2026; 14(4):92. https://doi.org/10.3390/risks14040092

Chicago/Turabian Style

Noh, Seol-Hyun. 2026. "Predicting Stock Market Risk Using Machine Learning Classification Models" Risks 14, no. 4: 92. https://doi.org/10.3390/risks14040092

APA Style

Noh, S.-H. (2026). Predicting Stock Market Risk Using Machine Learning Classification Models. Risks, 14(4), 92. https://doi.org/10.3390/risks14040092

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Predicting Stock Market Risk Using Machine Learning Classification Models

Abstract

1. Introduction

2. Data and Models

2.1. Data

2.2. Model

2.2.1. Logistic Regression Model

2.2.2. k-Nearest Neighbor Model

2.2.3. Decision Tree Model

2.2.4. Random Forest Model

2.2.5. Linear Discriminant Analysis Model

2.2.6. Naive Bayes Model

2.2.7. Quadratic Discriminant Analysis Model

2.2.8. AdaBoost Model

2.2.9. Gradient Boosting Model

3. Experimental Results

3.1. Performance Metrics

3.2. Performance Analysis Results

4. Conclusions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI