Application of Artificial Intelligence in Stock Market Prediction

Bellaly, Aya; Belattar, Sara; Haimoudi, El Khatir

doi:10.3390/engproc2025112034

Open AccessProceeding Paper

Application of Artificial Intelligence in Stock Market Prediction^†

by

Aya Bellaly

^1,*,

Sara Belattar

² and

El Khatir Haimoudi

¹

ISISA Team, Department of Computer Science, Faculty of Sciences, Abdelmalek Essaâdi University, Tetouan 93000, Morocco

²

Artificial Intelligence Research and Application Laboratory (AIRA), École Nationale de Commerce et de Gestion ENCG, Hassan First University, Settat 26000, Morocco

^*

Author to whom correspondence should be addressed.

^†

Presented at the 7th edition of the International Conference on Advanced Technologies for Humanity (ICATH 2025), Kenitra, Morocco, 9–11 July 2025.

Eng. Proc. 2025, 112(1), 34; https://doi.org/10.3390/engproc2025112034

Published: 15 October 2025

Download

Browse Figures

Versions Notes

Abstract

Because of the inherent volatility and non-linearity of financial markets, precise forecasting is a constant struggle. Using historical data, this study examines how well intelligent algorithms in forecast trading market trends. It contrasts a number of machine learning methods, such as Random Forest, K-Nearest Neighbors (KNN), XGBoost, and Decision Tree, with a deep learning methodology based on Long Short-Term Memory (LSTM) networks. It is possible to assess these models’ capacity to identify intricate temporal patterns because they are trained directly on historical price data rather than using specially designed technical indicators. The findings provide insights into successful data-driven forecasting techniques by highlighting the advantages and disadvantages of each approach in various market scenarios. Supporting the creation of predictive tools for well-informed decision-making in trading environments is the goal of this research.

Keywords:

stock prediction; machine learning; LSTM; XGBoost; decision tree

1. Introduction

Because of its intrinsic volatility, non-stationarity, and the large number of interrelated factors that affect asset price movements, financial market forecasting is still a difficult and challenging undertaking. The nonlinear and dynamic characteristics seen in financial time series are frequently not well captured by traditional statistical methods. Intelligent systems that use deep learning and machine learning have demonstrated great potential in recent years for raising predicted accuracy in this field. Without depending on manually created technical indicators, this doctoral study explores the use of a variety of intelligent algorithms to forecast trading market trends using raw historical market data. Random Forest, K-Nearest Neighbors (KNN) [1], XGBoost, Decision Tree, and Long Short-Term Memory (LSTM) in [2,3] networks—a kind of recurrent neural network that is especially well-suited for sequential data—are the five supervised learning approaches that are compared in this study. Time series data that depicts actual market behavior is used to train and test each model, with an emphasis on assessing the models’ capacity to identify trends, learn temporal dependencies, and generalize across various market phases. To evaluate the models’ performance using a variety of measures, including accuracy, precision, recall, and root mean square error (RMSE), a rigorous experimental framework is put into place. The study also investigates how data preprocessing techniques like windowing and normalization affect the efficacy of the model. The results demonstrate situations where some models perform better than others in terms of predictive power, computational efficiency, and robustness, exposing the relative advantages and disadvantages of each strategy. This research aids in the creation of automated decision-support tools for traders, analysts, and institutional investors by developing efficient methods for modeling financial time series using intelligent algorithms. The ultimate objective is to improve market comprehension and facilitate better-informed, data-driven investing choices in ever intricate financial landscapes.

2. Related Literature

2.1. Use of Technical Indicators in Forecasting

Numerous research stress how crucial it is to combine deep learning architectures with conventional technical indicators in order to improve prediction performance. Alongside models like Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Convolutional Neural Network (CNN), indicators like Moving Average Convergence Divergence (MACD), Relative Strength Index (RSI), and Stochastic Oscillator are commonly employed. Based on market behavior, these indicators offer structured insights that assist models in identifying patterns unique to a given domain. According to studies, for example, combining these signals with raw price data greatly increases accuracy because the models use both learnt and designed features [4].

2.2. Hybrid Model Effectiveness

Recent research suggests hybrid approaches that combine technical indicators and neural networks to overcome the drawbacks of each approach alone. An efficient method creates intelligent trading signals by combining Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) networks with indicators such as Moving Average Convergence Divergence (MACD), Directional Movement Index (DMI), and Know Sure Thing (KST). According to the results of these studies, GRU frequently outperforms LSTM in terms of efficiency and predictive capabilities, resulting in both increased accuracy and shorter training times [5]. These findings highlight the potential of hybrid architectures for financial forecasting, particularly in complicated and noisy market environments.

2.3. ARIMA vs. Deep Learning Approaches

Because of their simplicity and interpretability, conventional statistical methods like Autoregressive Integrated Moving Average (ARIMA) have been used for forecasting time series data for many years. However, when dealing with long-term or nonlinear dependencies, their performance deteriorates. Deep learning models, on the other hand, like LSTM, can simulate intricate, nonlinear patterns over long time periods. This difference is supported by comparative research [6], which shows that LSTM is superior at capturing long-range, nonlinear behavior, whereas ARIMA is better at short-term, linear trends. The argument for hybrid models that combine the interpretability of ARIMA with the predictive power of deep learning is supported by this comparison.

2.4. Impact of Market Volatility on Model Performance

Predictive modeling faces considerable challenges due to fluctuations in the market. By analyzing the effectiveness of indices such as the S&P 500 and Nasdaq 100 over various trading periods (e.g., 24 h, daytime, and overnight sessions), several research examine how models like LSTM react to volatility. Findings indicate that training durations and volatility levels affect LSTM’s performance, with shorter training times producing better outcomes for more volatile assets [5]. This implies that in order to maximize returns, model tweaking needs to take market-specific factors into consideration.

3. Materials and Methods

3.1. Methodology

In the data collection phase, extensive and relevant financial data is meticulously gathered from reliable and authoritative sources to ensure accuracy and representativeness. The data preprocessing stage involves cleaning, normalization, and transformation operations to prepare the raw data for machine learning algorithms. This step improves data quality and ensures consistency. During model training and development, appropriate learning algorithms are selected, configured, and trained to capture underlying patterns in the data. This phase also includes feature selection and hyperparameter tuning to optimize predictive performance Finally, model evaluation involves rigorous testing and validation using predefined performance metrics to assess the models’ accuracy, generalization ability, and robustness. Each phase is designed to contribute to the creation of scalable, reliable, and effective intelligent systems that support informed financial decision-making [7] (see Figure 1).

3.2. Data Description

This study evaluates the effectiveness and generalizability of intelligent prediction models using historical stock data from three distinct datasets. First, from 8 February 2013, stock market data for American Airlines Group Inc. (AAL) and Apple Inc. (AAPL) [8] was gathered from Kaggle [9]. Important trade characteristics like opening price, daily high and low, closing price, and trading volume are included in these statistics. Furthermore, a more comprehensive dataset comprising past stock prices for every company on the S&P 500 index was included [10]. This dataset, which includes information from hundreds of significant publicly traded companies, covers the five years prior to February 2018 and offers a more thorough market view. No technical indicators, such as moving averages or momentum metrics, were created; simply unprocessed historical data was used in every instance. The study preserves the integrity of the data and evaluates the pure predictive power of machine learning algorithms on stock price movements by utilizing only fundamental trading information.

3.3. Models Evaluated

Random Forest, XGBoost, Decision Tree (scikit-learn) in [11,12].
KNN Regressor (scikit-learn) in [13].
LSTM Network (Keras/TensorFlow) in [14].

3.4. Data Preprocessing and Training

Two separate subsets of the dataset were created: 20% of the dataset was set aside for testing, while the remaining 80% was used to train the models. The prediction models were fitted to the training set, which enabled them to discover underlying linkages and patterns in the historical stock data. The testing set was used to impartially assess the models’ generalization ability on fresh, untested data; it was not visible during the training process. Instead of just learning past trends, this division guarantees the models capacity to forecast future market behavior.

3.5. Modeling

Random Forest, K-Nearest Neighbors (KNN), XGBoost, Decision Tree, and Long Short-Term Memory (LSTM) networks were the five prediction models used in this investigation. To provide a uniform and equitable foundation for comparison, the same training set was used to train each model. To improve each model’s predictive performance, hyperparameters were [manually adjusted/optimized via grid search]. Choosing the ideal number of trees, tree depth, number of neighbors, learning rate, and other model-specific parameters was part of the tuning process for the conventional machine learning models (Random Forest, KNN, XGBoost, and Decision Tree). To improve temporal pattern recognition and reduce overfitting, key parameters for the LSTM model were meticulously modified, including the number of neurons, hidden layers, batch size, and learning rate. The study intends to impartially assess each model’s capacity to forecast stock market movements, concentrating on both short-term volatility and longer-term trends, by preserving the same training data and methodically adjusting hyperparameters. This work purposefully applies LSTM to univariate closing price data, even though it is commonly used for multivariate time series. Assessing the model’s ability to directly learn temporal dynamics from raw price sequences is the aim. It is anticipated that the LSTM will detect latent patterns like momentum or reversal behaviors over time using a sliding window of 60 days. Using the same constrained feature set, this also acts as a baseline against conventional models.

3.6. Tools and Libraries

This study’s experiments were all carried out with Python programming language (Python 3.10).utilizing a number of robust libraries designed for machine learning and data research. The Scikit-learn library was used to implement algorithms like Random Forest and Support Vector Regression (SVR) for both traditional and ensemble machine learning models. It was also used to carry out preprocessing tasks like feature scaling, data normalization, and model evaluation through cross-validation methods. For gradient-boosted decision trees, the XGBoost library was used because of its scalability to big datasets, regularization capabilities, and speed. The Keras Application Programming Interface (API) with a TensorFlow backend was utilized for deep learning models, especially those that involve sequential data, such LSTM (Long Short-Term Memory). This framework provided for effective training; the LSTM model’s hyperparameters were manually chosen for this investigation based on previous tests.There was no usage of automated search methods like random or grid search.

4. Results

This section presents the examination of various machine learning and deep learning frameworks, used for stock price prediction. Each model was evaluated using the R² score (accuracy), Mean Squared Error (MSE), and the performance of the last day’s prediction compared to the actual closing price.

Evaluation Metrics

We employ the following measures to evaluate the regression models’ performance. Each of these metrics offers a unique viewpoint on how effectively a model predicts events that cannot be seen and how well it reflects the underlying relationships in the data.

Mean of Actual Values ( $\bar{y}$ )

The mean of actual values represents the central tendency or average of all true target values in the dataset:

\bar{y} = \frac{1}{n} \sum_{i = 1}^{n} y_{i}

where n indicates the complete count of observations and

y_{i}

denotes the value that has been actually measured, for instance, i. When evaluating other metrics, such as the

R^{2}

score, this value is essential because it acts as a baseline reference. It can assist in determining whether the forecasts substantially depart from expected results and represents the typical market behavior.

2.: Mean Squared Error (MSE)

The Mean Squared Error calculates the average of the squared differences between the predicted and actual values.

MSE = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

where the number of predictions is n, the corresponding actual value is

y_{i}

, and the anticipated value is

{\hat{y}}_{i}

. Because of the squaring process, MSE is sensitive to outliers and penalizes greater errors more than smaller ones. The projected values are generally closer to the actual values when the MSE is lower. When all predictions are assessed using the same scale, it is extremely helpful for quantitatively evaluating the prediction accuracy of models.

3.: Coefficient of Determination ( $R^{2}$ Score)

The

R^{2}

score measures how much of the variation in the dependent variable can be anticipated based on the independent variables.

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

In this case, the denominator is the overall variance in the data, and the numerator is the residual sum of squares, or the unexplained variance. The

R^{2}

value is a number between 0 and 1, where:

R^{2} = 1

signifies that the model does not exceed the accuracy of simply estimating the average of the target values;

R^{2} = 0

implies that the model is equivalent to a basic average prediction; and negative values indicate that the model’s performance is inferior to that of a simple average prediction.

A high

R^{2}

in regression problems indicates that the model captures the link between input features and the target variable and effectively explains the variability in the data.

The following Table 1 summarizes the key prediction results obtained from different models:

The results of the proposed approach are presented in the following graphs. Figure 2, Figure 3, Figure 4 and Figure 5 illustrate the performance of the model under different conditions and demonstrate its effectiveness compared to baseline methods. Key metrics such as accuracy, error rate, and processing time are shown to highlight the improvements achieved.

5. Discussion

Remarks on Model Performance

Random Forest Regressor: Demonstrated very high accuracy with a low mean squared error, indicating a strong ability to fit the data.
XGBoost Regressor: Achieved the best generalization performance among all models, with the lowest error and strong predictive stability.
LSTM Neural Network: Showed good performance, particularly in capturing temporal dependencies, but overall accuracy was lower than tree-based models.
KNN Regressor: Performed with excellent accuracy when tuned correctly, though it may be sensitive to feature scaling.
Decision Tree Regressor: Produced good results, but showed signs of overfitting on the training data compared to ensemble methods.

As shown in Table 1, the tree-based models, particularly Random Forest, XGBoost, and KNN Regressor, achieved extremely high accuracy, with R² scores approaching 0.998. XGBoost demonstrated the best overall performance, with the lowest Mean Squared Error (MSE) of 0.1992 and a last-day prediction very close to the actual closing price. The KNN model also performed comparably, showing that with proper hyperparameter tuning, it can match the performance of more complex ensemble methods. On the other hand, the LSTM neural network, while effective, showed a lower R² score of 0.90991 and a higher MSE of 2.0270. This is understandable, given that LSTM networks generally require more extensive tuning, larger datasets, and longer training times to outperform traditional machine learning models on structured, non-sequential data such as stock closing prices. The Decision Tree Regressor performed well but slightly overfit the training data, as evidenced by its comparatively higher MSE. Despite this, it maintained a strong predictive performance for the test set. Overall, the results highlight the effectiveness of tree-based ensemble methods in financial market prediction when using historical price data without feature engineering. Deep learning methods like LSTM may require further enhancements, such as larger datasets or hybrid models, to outperform traditional models in this specific context.

6. Conclusions

The use of intelligent algorithms to forecast stock market movements based on previous trade data was investigated in this study. Random Forest, XGBoost, K-Nearest Neighbors (KNN), Decision Tree, and Long Short-Term Memory (LSTM) networks are five supervised learning algorithms that are compared in this study to give a thorough assessment of their advantages and disadvantages in financial forecasting applications.

With R² values close to 0.998 and the lowest Mean Squared Errors (MSE), the results show that tree-based ensemble models—in particular, XGBoost and Random Forest—performed better. When trained on unprocessed historical data, these models showed excellent predictive potential and were quite successful even without the inclusion of engineered technical indicators. Under ideal parameter choices, KNN’s performance closely matched that of the ensemble models, demonstrating encouraging results as well.

LSTM networks effectively captured the temporal relationships present in financial time series, while having a larger MSE and a lower R² score than tree-based models. To realize their full potential, their performance indicates that more fine-tuning, bigger datasets, or incorporation into hybrid frameworks are required.

Future research could improve model generalization by expanding the assessment to include more assets and different market circumstances. Furthermore, investigating ensemble approaches that integrate deep learning with conventional models or adding outside elements like sentiment analysis or macroeconomic indicators may result in forecasting systems that are more resilient and flexible.

Ultimately, the results lend credence to the application of clever, data-driven models in creating tools that assist analysts and investors in making more strategic and informed trading decisions in erratic financial markets.

Author Contributions

Conceptualization, A.B.; methodology, A.B.; software, A.B.; validation, A.B., S.B. and E.K.H.; formal analysis, A.B.; investigation, A.B.; resources, A.B., S.B. and E.K.H.; data curation, A.B.; writing—original draft preparation, A.B.; writing—review and editing, A.B.; visualization, S.B. and E.K.H.; supervision, S.B. and E.K.H.; project administration, E.K.H.; funding acquisition, S.B. and E.K.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from publicly accessible sources.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Belattar, S.; Abdoun, O.; Haimoudi, E.K. Performance analysis of the application of convolutional neural networks architectures in the agricultural diagnosis. Indones. J. Electr. Eng. Comput. Sci. 2022, 27, 156–162. [Google Scholar] [CrossRef]
Mostafavi, S.M.; Hooman, A.R. Key technical indicators for stock market prediction. Mach. Learn. Appl. 2025, 20, 100631. [Google Scholar] [CrossRef]
Belattar, S.; Abdoun, O.; Haimoudi, E.K. Comparing machine learning and deep learning classifiers for enhancing agricultural productivity. Case study: Larache Province, Northern Morocco. Int. J. Electr. Comput. Eng. 2023, 13, 1689–1697. [Google Scholar] [CrossRef]
Saud, A.S.; Shakya, S. Technical indicator empowered intelligent strategies to predict stock trading signals. J. Open Innov. Technol. Mark. Complex. 2024, 10, 100398. [Google Scholar] [CrossRef]
Zhang, X.; Pinsky, E. S&P-500 vs. Nasdaq-100 price movement prediction with LSTM for different daily periods. Mach. Learn. Appl. 2025, 19, 100617. [Google Scholar] [CrossRef]
Najem, R.; Bahnasse, A.; Talea, M. Toward an Enhanced Stock Market Forecasting with Machine Learning and Deep Learning Models. Procedia Comput. Sci. 2024, 241, 97–103. [Google Scholar] [CrossRef]
Belattar, S.; Abdoun, O.; Haimoudi, E.K. New learning approach for unsupervised neural networks model with application to agriculture field. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 360–369. [Google Scholar] [CrossRef]
Ayah, F. Stock Market Analysis & Prediction Using LSTM. 2023. Available online: https://www.kaggle.com/code/faressayah/stock-market-analysis-prediction-using-lstm/input (accessed on 19 June 2025).
Fahmi, A. AAL Stock Prediction. 2023. Available online: https://www.kaggle.com/code/aufafahmi/aal-stock-prediction (accessed on 18 June 2025).
Paparaju, T. Apple (AAPL) Historical Stock Data. 2023. Available online: https://www.kaggle.com/datasets/tarunpaparaju/apple-aapl-historical-stock-data (accessed on 19 June 2025).
Liu, H.; Chang, L.; Tian, Y.; Chen, H.; Zhang, P.; Song, B.; Zheng, S.; Liu, T. Research on time-domain motion prediction of floating platforms based on XGBoost model. Ocean Eng. 2025, 332, 121393. [Google Scholar] [CrossRef]
Zhang, H.; Zhong, J. Efficient and Effective Counterfactual Explanations for Random Forests. Expert Syst. Appl. 2025, 293, 128661. [Google Scholar] [CrossRef]
Armghan, A.; Htay, M.M.; Alsharari, M.; Aliqab, K.; Surve, J.; Patel, S.K. Performance enhancing solar energy absorber with structure optimization and absorption prediction with KNN regressor model. Alex. Eng. J. 2023, 82, 531–540. [Google Scholar] [CrossRef]
Liu, C.; Ren, H.; Li, G.; Ren, H.; Liang, X.; Yang, C.; Gui, W. Singular Value Decomposition-based lightweight LSTM for time series forecasting. Future Gener. Comput. Syst. 2026, 174, 107910. [Google Scholar] [CrossRef]

Figure 1. Prediction process.

Figure 2. Stock market prediction using Decision Tree.

Figure 3. Stock market prediction using XGBoost.

Figure 4. Stock market prediction using KNN.

Figure 5. Stock market prediction using Random Forest.

Table 1. Model performance comparison.

Model	R² Score	MSE
Random Forest	0.99796	0.2382
XGBoost	0.99830	0.1992
LSTM	0.90991	2.0270
KNN	0.99830	0.1995
Decision Tree	0.99618	0.4470

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bellaly, A.; Belattar, S.; Haimoudi, E.K. Application of Artificial Intelligence in Stock Market Prediction. Eng. Proc. 2025, 112, 34. https://doi.org/10.3390/engproc2025112034

AMA Style

Bellaly A, Belattar S, Haimoudi EK. Application of Artificial Intelligence in Stock Market Prediction. Engineering Proceedings. 2025; 112(1):34. https://doi.org/10.3390/engproc2025112034

Chicago/Turabian Style

Bellaly, Aya, Sara Belattar, and El Khatir Haimoudi. 2025. "Application of Artificial Intelligence in Stock Market Prediction" Engineering Proceedings 112, no. 1: 34. https://doi.org/10.3390/engproc2025112034

APA Style

Bellaly, A., Belattar, S., & Haimoudi, E. K. (2025). Application of Artificial Intelligence in Stock Market Prediction. Engineering Proceedings, 112(1), 34. https://doi.org/10.3390/engproc2025112034

Article Menu

Application of Artificial Intelligence in Stock Market Prediction^†

Abstract

1. Introduction

2. Related Literature