Abstract
This paper presents an analysis of stock price forecasting in the financial market, with an emphasis on approaches based on time series models and deep learning techniques. Fundamental concepts of technical analysis are explored, such as exponential and simple averages, and various global indices are analyzed to be used as inputs for machine learning models, including Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), Convolutional Neural Network (CNN), and XGBoost. The results show that while each model possesses distinct characteristics, selecting the most efficient approach heavily depends on the specific data and forecasting objectives. The complexity of advanced models such as XGBoost and GRU is reflected in their overall performance, suggesting that they can be particularly effective at capturing patterns and making accurate predictions in more complex time series, such as stock prices.
1. Introduction
The fusion between technology and finance has radically transformed the way markets operate and how investors make decisions. With the emergence of online trading platforms, high-frequency trading algorithms and the increasing use of Artificial Intelligence (AI), the financial landscape is experiencing an unprecedented digital revolution.
This convergence is redefining the boundaries of what is possible in the stock market, offering new opportunities and challenges for investors and analysts. The ability to process large volumes of data in real time and apply advanced analytics algorithms is creating new opportunities in forecasting and risk management. In this context, the research and development of AI-based forecasting models represents a growing area of interest [1].
The stock market is a global environment where millions of investors buy and sell shares in companies, representing a fraction of a company’s share capital. The purpose of these transactions is to profit from fluctuations in asset prices. For many, investing in the stock market is an essential part of their financial strategy, as it offers an opportunity to grow their capital over time in a passive way, often surpassing the rates of return offered by more traditional investments, such as bank deposits. However, stock market trading is also known for its unpredictability and high volatility. Predicting future market movements is a challenging and highly desirable task. Investors are constantly looking for new methods and techniques to anticipate market changes and make more informed decisions about their investment portfolios.
Throughout history, investors and analysts have employed a variety of methods and techniques to anticipate stock market behavior. From fundamental analysis, which evaluates financial performance and potential company growth, to technical analysis, which examines past price patterns to identify future trends, a wide range of approaches have been explored. However, even with all these efforts, the ability to accurately predict market movements remains a challenging and evolving open issue.
Recently, with technological advances and increasing data availability, new opportunities have emerged to apply machine learning (ML) techniques in stock market forecasts. ML, a subfield of AI, focuses on the development of algorithms capable of learning patterns and making data-driven predictions. By analyzing vast sets of historical data, algorithm ML tools can identify complex correlations and subtle patterns that might otherwise be missed to traditional forecasting methods [2].
In this work, we aim to explore the potential of ML algorithms in stock market prediction. Predictive models are developed to capture the complexity and dynamics of the market, providing valuable insights for investors. By combining advanced ML techniques with an in-depth understanding of financial markets, this study seeks to contribute to the advancement of the field and deliver tangible benefits to those operating in the stock market. For that, this study distinguishes from other works by establishing a basis in the field of time series forecasting in the stock market, not only by choosing between various algorithms, such as LSTM, GRU, CNN, RNN, XGBoost, but also by choosing different combinations of these, for instance, LSTM + CNN, LSTM + GRU, GRU + CNN, RNN + GRU, and RNN + LSTM, and for different numbers of layers for each model and combination of algorithms. More detailed analysis in the selection of the best features, window input size, and hyperparameters is also provided. The main contributions of this work are as follows:
- Providing a basic understanding of how the stock market works and how ML is being used to predict it.
- Evaluating which features are best suited to be used as inputs to stock market prediction models.
- Developing and applying various ML models for stock price prediction.
- Evaluating and comparing the performance of different models using a variety of metrics to identify which techniques and combination of techniques provide the best results in stock price prediction.
The article is structured as follows. Section 2 describes the dataset utilized, the evaluation metrics applied, and the data preparation process for the models. Section 3 introduces the forecasting models employed for stock price prediction. Section 4 presents the results of the study, including a comparative analysis of the applied forecast models. Section 5 provides a discussion of the results. Finally, Section 6 addresses the main conclusions and outlines potential directions for future work.
Literature Review
In recent years, the application of machine learning and deep learning techniques in financial markets has garnered significant interest, particularly for stock market price forecasting. One study by Zhenglin Li et al. (2023) investigated the use of Long Short-Term Memory (LSTM) networks to predict the stock prices of major technology companies, including Apple Inc. (Cupertino, CA, USA); Alphabet Inc. (Mountain View, CA, USA), owner of Google; Microsoft Corporation, Inc. (Redmond, WA, USA) and Amazon.com, Inc. (Seattle, WA, USA and Arlington, VI, USA) [3]. The researchers utilized historical stock price data from Yahoo Finance, spanning over a decade, to train their LSTM model. The study demonstrated that LSTM effectively shared the potential of capturing complex patterns and trends in stock price movements, leading to reasonably accurate predictions. However, the authors highlighted limitations, such as the need for a larger dataset and the use of additional evaluation metrics, in addition to the used RMSE, to provide a more comprehensive performance analysis. Sonkavde et al. (2023) provided a systematic review of machine learning and deep learning techniques in financial forecasting, emphasizing ensemble models such as a hybrid of Random Forest, XGBoost, and LSTM. Their findings concluded that these models outperform individual algorithms, offering improved accuracy and reduced errors in stock price predictions. By implementing and testing ensemble methods on specific stock datasets, the study confirms the potential of integrated approaches to address the complexities of financial data [4]. Hoque and Aljamaan (2021) conducted a detailed study on the impact of hyperparameter tuning on the performance of machine learning models in stock price forecasting. Their research focused on the Saudi Stock Exchange. This study’s goal was to evaluate and compare the predictive capabilities of eight machine learning models, including Decision Trees (DTs), Support Vector Regression (SVR), K-Nearest Neighbors (KNN), Gaussian Process Regression (GPR), Stochastic Gradient Descent (SGD), Partial Least Squares Regression (PLS), Kernel Ridge Regression (KRR), and Least Absolute Shrinkage And Selection Operator (LASSO), both with and without hyperparameter tuning. The study’s conclusions were significant: hyperparameter tuning substantially improved the forecasting accuracy of most models, with SVR emerging as the best performer after tuning. Additionally, the research emphasized that the default hyperparameter configurations of machine learning models are often suboptimal, and tuning is essential for achieving robust predictions. This insight is particularly valuable for practitioners and researchers aiming to apply machine learning techniques in financial markets [5]. Gülmez et al. (2023) introduced a novel approach combining LSTM with the Artificial Rabbits Optimization (ARO) algorithm to enhance the prediction accuracy of stock prices. This study focused on the Dow Jones Industrial Average (DJIA) index and evaluated the model against various alternatives, including traditional Artificial Neural Networks (ANNs), unoptimized LSTMs, and LSTMs optimized using Genetic Algorithms (GAs). To benchmark the performance, the research employed multiple evaluation metrics, such as Mean Squared Error (MSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and . Among these metrics, the LSTM-ARO model exhibited the lowest error rates (MSE, MAE, and MAPE) and the highest , indicating its superior ability to model the financial data [6]. Another contribution by Nabipour et al. (2020) explored the effectiveness of various machine learning models, including Decision Tree, Bagging, Random Forest, Adaptive Boosting (Adaboost), Gradient Boosting, and XGBoost, ANNs, recurrent neural network (RNN), and LSTMs, in predicting the stock market groups within the Tehran Stock Exchange. Using a decade of historical data and technical indicators as input features, the study highlighted that LSTM demonstrated superior accuracy compared to other models. The research emphasized the importance of deep learning techniques in managing the inherent non-linearity while also recommending the exploration of ensemble approaches for enhanced performance and the use on different stock markets [7]. Naufal and Wibowo (2023) proposed a hybrid deep learning model integrating Convolutional Neural Network (CNN), LSTM, and Gated Recurrent Units (GRUs) for stock price forecasting across Tesla, Inc., Alphabet Inc., and Twitter, Inc. when it was public. By combining the strengths of these architectures, the hybrid model achieved improved prediction accuracy over standalone LSTM networks, effectively addressing both the short- and long-term dependencies in stock data. The study concluded that hybrid models are particularly advantageous in managing the complexities of the dynamic and non-linear stock market trends [8]. Zhang et al. (2023) proposed an hybrid model combining CNN, BiLSTM, and a mechanism for stock price prediction, addressing the non-linear, volatile, and high-frequency nature of financial data. The model leverages the ability of CNNs to extract local non-linear features, along with the capacity of BiLSTM to capture bidirectional temporal features. Additionally, an attention mechanism was incorporated to fit the weight assignments to the information features automatically, enhancing prediction accuracy. The model was tested on 12 stock indices, including the CSI 300 from China and 8 international markets, consistently demonstrating superior performance compared to alternatives such as the standalone LSTM, CNN-LSTM, and CNN-Attention models in the previous mentioned works. Evaluation metrics such as RMSE, MAPE, and confirmed the model’s accuracy in handling diverse market data [9]. Mehtab and Sen (2020) introduced a suite of five deep learning-based regression models for forecasting the NIFTY 50 index, using historical data from December 2008 to July 2020. The proposed models included two CNN-based architectures and three variants of LSTM models, evaluated using a multi-step prediction approach with walk-forward validation. Among these, the encoder–decoder CNN-LSTM model, which utilized two weeks of historical data, achieved the highest prediction accuracy, while the univariate CNN model with one week of data was the fastest in terms of execution. Their study highlighted the ability of hybrid architectures to effectively capture complex temporal patterns in financial time series, offering both accuracy and computational efficiency. The authors also suggested the potential for future research involving generative adversarial networks (GANs) to improve forecasting accuracy [10].
2. Materials and Methods
In this section, the development of the techniques and processes used are discussed. It begins with a description of the dataset used, followed by the presentation and description of all the features integrated into the dataset. Finally, the methods used to make predictions are detailed through the algorithms and the presentation of the developed models.
2.1. Dataset
The initial dataset is composed of historic data of Apple Inc. collected from Yahoo Finance [11]. This dataset includes over 40 years of stock prices and is organized in 7 columns, containing the Open, High, Low, Close, and Adjusted Close prices, as well as the date and the volume of transactions, as shown in Table 1.
Table 1.
Apple dataset.
2.2. Features
Another 43 features were also added to the initial dataset and tested using both the correlation method and SelectKBest, based on their relationship with the target variable, set to be the value of the Adj Close price [12,13]. The first 27 features are directly related to Apple Inc. stock, including price data, transaction volume, and technical indicators such as moving averages and momentum metrics. The other features are a combination of interest rates and indices.
All features were selected based on their popularity, including Exponential Moving Averages and Simple Moving Averages, as well as those identified in the study by Hoseinzade, Ehsan and Haratizadeh, Saman [14]. This study evaluates a diverse array of variables for use as features in prediction models. These features were either calculated or gathered using several sources, including Yahoo Finance, the Federal Reserve Economic Data (FRED), which is an online database managed by the Federal Reserve Bank of St. Louis [15], and the Pandas Technical Analysis library, TA-Lib, which offers a comprehensive set of technical indicators [12]. A complete list of these features can be found in Table 2.
Table 2.
Features tested.
Starting with the correlation analysis, the results can be observed in Table 3. This table indicates the correlation of all features with the target variable. Correlation analysis is a fundamental approach for understanding the relationship between all features and the target variable. It is noteworthy that the top 20 features exhibit significantly higher correlation values compared to the others, as these are variables related to the price, moving averages, or indices, which naturally track stock prices fluctuations closely.
Table 3.
Correlation.
In addition to the correlation analysis, the SelectKBest method was employed to select the best features. This method is one of the most commonly used feature selection techniques and is based on machine learning filters. It utilizes statistical tests to identify features that have the strongest relationship with the output variable, with the procedure initially involving the definition of the appropriate statistical test based on the type of data and the problem at hand. In regression cases, the SelectKBest method provides the f_regression option, which was utilized here to select the best features [16]. Subsequently the test was applied to each feature to calculate an importance score. Features with the highest scores were selected, and the dataset was transformed to include only these features. Upon applying this method to the 49 features, scores for each were obtained as evidenced in Table 4.
Table 4.
SelectKBest score.
Analyzing Table 3 and Table 4, it can be concluded that the performance of the top 20 features does not vary between the two selection methods. Furthermore, the top 20 features achieved much higher scores than the others, particularly in the correlation analysis, where the 20th best feature (NYA) scored , while the 21st (MACD) only scored . This indicates a substantial difference in the importance of the features for the prediction model.
After this analysis, it was decided to use only the 20 best features in the forecasting model, which include 4 variables from the initial dataset (Adj Close, Low, High and Open), 12 technical indicators, which include 5 Small Moving Averages (SMA) and 7 Exponential Moving Averages (EMA) of different sizes, and finally 4 indices, the NASDAQ Composite (IXIC), S&P 500 (GSPC), Dow Jones Industrial Average (DJI), and NYSE Composite (NYA).
2.3. Performance Measures
Before addressing the machine learning models and the data preparation, it is imperative to choose several performances metrics to evaluate such models. These metrics play a fundamental role in the evaluation of these algorithms, providing an objective measure of the quality of predictions in relation to the actual values. In order to evaluate the performance of the different models, it was decided to use a set of specialized metrics for the regression task since it is essential to select metrics that capture both the magnitude as well as the direction of the prediction errors. Common metrics such as Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) are often used due to their easy interpretation and ability to provide a clear measure of forecast accuracy. It was therefore decided to use a set of five different metrics to evaluate the different models, these being the Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE), and the Coefficient of Determination ().
The MAE is a simple measure of the average of the absolute differences between forecasts and actual values. This metric provides a direct indication of the average magnitude of forecast errors, regardless of their direction. In simple terms, the MAE is calculated as the average of the absolute differences between forecasts and actual values (Equation (1)), where a smaller absolute difference indicates better forecast quality [17]:
where represents the actual values, the predicted values, and n is the total number of observations.
MSE (Equation (2)) is another common metric used to evaluate the performance of regression models by measuring the average of the squared differences between the predicted and actual values. This metric emphasizes larger errors more than smaller ones since the errors are squared before they are averaged, making it sensitive to outlier values [18]:
where represents the actual values, the predicted values, and n is the total number of observations.
In addition to these two metrics, the RMSE, which is a variant of the MSE, was also used. The RMSE calculates the square root of the MSE, providing a measure of the average magnitude of prediction errors on a scale similar to the actual values. The RMSE is widely used due to its interpretability and ability to provide a clear measure of the predictability of forecasts (Equation (3)). As the RMSE is expressed in the same unit as the actual values, it is easier to interpret and compare with the actual values [19]:
where represents the actual values, the predicted values, and n is the total number of observations.
Another important metric is the MAPE, which is a useful measure for understanding the average percentage error of forecasts in relation to the actual values. MAPE calculates the average of the absolute percentage differences between forecasts and actual values (Equation (4)). MAPE is especially useful when we need to understand the relative accuracy of the forecasts in relation to the actual values, regardless of the scale of the data. For example, in the context of predicting stock prices, comparing the absolute values of these metrics between different stocks or different subsets of the same stock price dataset poses no benefit. In such cases, a metric that calculates a percentage value, like MAPE, proves to be very useful:
where represents the actual values, the predicted values, and n is the total number of observations.
Finally, was also used, which is an important statistical metric that indicates the proportion of the variability in the data that is explained by the model (Equation (5)). Values closer to 1 indicate a good fit of the model to the data, while negative values and values closer to 0 indicate a poor fit of the model. is a useful metric for understanding the explanatory power of the model and will be especially useful for a quick comparison between different models just like the previous metric MAPE [20]:
where represents the actual values, is the predicted values, n is the total number of observations, and is the mean of the actual values.
2.4. Data Processing
Data processing is a fundamental step in building reliable forecasting models, especially when comparing them. It is very important to provide the models with consistent data, so when measuring their performance, it will only evaluate the models and not the way that the data was provided. It also ensures that the data are appropriately formatted and cleaned, which not only enhances the accuracy of the model but also improves its overall stability and performance. For this study, data from multiple sources were collected and processed to create a robust dataset for training machine learning models as stated in the previous section.
2.4.1. Data Acquisition
Historical stock data from Apple Inc. and major financial indices, including IXIC, GSPC, DJI and NYA, were obtained using the download function of the yfinance library [13]. The download function allowed access to the complete time series data from the inception of these indices and the company itself. This approach ensured that the data captured all significant market trends and stock price movements over the maximum available period.
Once acquired, the indices’ datasets underwent a thorough cleaning process. Columns irrelevant to the modeling task, such as Open, High, Low and Volume, were removed. The cleaned datasets retained only the essential variables required for the prediction task and the integration with the initial Apple Inc. dataset, such as the adjusted closing price (Adj Close), and the Date column.
The integration process began by aligning the dates of the financial indices with the Apple Inc. stock data. The adjusted closing prices of each index were merged with the stock data based on the date, ensuring that the dataset was fully synchronized.
2.4.2. Calculation of Technical Indicators
In addition to the raw stock prices, technical indicators were computed to enrich the feature set. These indicators included, as discussed previously, several SMA and EMA, both of which are widely used in financial analysis to capture trends and momentum in stock prices.
Multiple SMA and EMA values were calculated with varying window lengths, based on historical data up to and including the day before the prediction, to provide the model with a range of perspectives on stock price movements. Specifically, SMAs were calculated over periods of 5, 25, 50, 100, and 200 days, while EMAs were calculated for 10, 12, 20, 26, 50, 100, and 200 days. These indicators helped capture both short-term and long-term price trends.
2.4.3. Normalization and Data Preparation
After integrating the technical indicators and financial indices, the last step in the construction of the final dataset is to add a target variable, the variable that the model will be predicting, and, as stated previously, this target will be the value of the Adj Close of the next day. To achieve this, it was only needed to create a new column in the dataset that is equal to the shifted value of the Adj Close. With this, a dataset was obtained that contains a total of 21 columns, with the first 20 columns being the 20 selected features, and the 21st column representing the target variable.
The dataset was then normalized using the MinMaxScaler function from the very popular sklearn.preprocessing library [13], which scaled the data to a predetermined range from 0 to 1. This step is very important for improving the performance and efficiency of machine learning models, helping to ensure that all the features contribute equally to the modeling process, preventing some variables with higher values from dominating others with lower values.
At this stage, the data were prepared for input into the machine learning models. The next step was organizing the data into time sequences of a fixed size, where each sequence contains all the features. To do this, we first needed to use a concept called an input window, which involves incorporating sets of sequential observations. To establish a value for the time window, some tests were carried out with two simple models, one with two GRU layers and the other with two LSTM layers, which concluded that the best value would be 100. The tests consisted of training each model and predicting the value of the prices 10 times while keeping the 80/20 split between the train and test subsets. The metrics for each prediction were recorded, and their average values were calculated and are presented in Table 5 and Table 6.
Table 5.
Input window—GRU.
Table 6.
Input window—LSTM.
The last step in this process was to split the data into training and tests sets, in order to evaluate the models with data that were not seen during the training process. Generally, a common proportion to use is 80% of the data reserved for training and 20% for testing. It should be noted that the incorporation of a validation subset between the training and test data with a size of 15% was also tested. Incorporating this subset resulted in generally worse performance for the models, particularly those utilizing CNN and RNN algorithms, which experienced decreases in the metric of 21 and 33 percentage points, respectively. Therefore, it was decided to incorporate only the training and test subsets since the main reason for incorporating a validation subset was to reduce overfitting in the training data in order to improve performance in the test data, which was not demonstrated.
Once these steps had been carried out, the data were ready to be used in the model training and evaluation process, where the training set was used to adjust the model parameters, while the test set were used to evaluate the model’s performance on unseen data.
3. Prediction Models
This section details the forecasting models used to predict Apple Inc. stock prices. A total of 44 different models were implemented. Each model was trained and evaluated based on the metrics presented in Section 2.3, using a set of 10 tests, where the average value is presented in this section.
All models were compiled using the Adam optimizer with a learning rate of 0.001 and the MSE loss function. The output layer uses a linear activation function to predict continuous stock price values. In addition, EarlyStopping and ReduceLROnPlateau were also used, the latter being used to adjust the learning rate during the model’s training process, decreasing it when the model’s performance stopped improving, thus helping to improve the model’s convergence [21]. Another tool that was used was BayesianOptimization of keras_tuner [21]. The BayesianOptimization function allows the identification of the best combinations of hyperparameters for the models, while optimizing for hyperparameters that minimize the MSE in the training set. This method is useful for exploring a wide range of possible configurations efficiently. This function was then used to search for the ideal number of memory cells in the deep learning models’ layers and also the best rate to use in the dropout layers, searching between 64 and 256 units and between 0.1 and 0.5 for the dropout rate.
3.1. LSTM Model
The first model consists only of LSTM layers combined with dropout layers followed by a dense output layer. To do this, various configurations of the model were devised, such as two, three, four, and five LSTM layers, with hyperparameter values of 256 and 0.1 for the memory cells and the dropout rate, respectively, which were then the initial values used throughout the model tests. The performance tests of Table 7 shows that the best version of the model was the one with two LSTM layers since it had the lowest values for the first four metrics and the highest value among the models.
Table 7.
LSTM models.
The BayesianOptimization method led to the conclusion that the ideal number of memory cells was 256 for both LSTM layers and 0.1 for both dropout layers. The architecture of the final optimized model is shown in Table 8.
Table 8.
Final LSTM model.
3.2. GRU Model
For the second model, GRU, a similar architecture to the LSTM model was adopted, except that the LSTM layers were replaced by GRU layers. The best model shown in Table 9 has two GRU layers. As in the previous model, hyperparameter search was carried out using BayesianOptimization. The final architecture of the GRU model included two GRU layers combined with dropout layers, where the optimum values of 192 and 256 units were found for the first and second GRU layers, respectively, and 0.1 as dropout rates for both dropout layers as shown in Table 10.
Table 9.
GRU models.
Table 10.
Final GRU model.
3.3. LSTM + GRU Model
For the third model, a combination of the LSTM and GRU architectures was implemented, and the best model was selected according to the performance shown in Table 11. These hybrid architectures allow us to leverage the strengths of both models to better capture the complexity of the data. Hybrid models like this are designed to combine the advantages of different architectures, offering a more comprehensive approach to capturing both short-term patterns and long-term trends.
Table 11.
LSTM + GRU models.
The final, optimized architecture of the model is illustrated in Table 12 and consists of two layers of LSTM and one layer of GRU.
Table 12.
Final LSTM + GRU model.
3.4. CNN Model
Although CNNs are generally used for vision-related tasks, they are still one of the most used algorithms for time series forecasting [22]. This is not only due to their computational efficiency, especially when compared to algorithms like RNNs but because of their ability to extract hierarchical features. This capability allows them to capture both short- and long-term dependencies, making them very versatile for time series applications [8,23]. To evaluate the performance of CNNs in this type of model and context, they were initially used alone, then combined with LSTM and finally with GRU, in different combinations. In the first stage, CNNs were tested alone to see how they performed in the task of predicting stock prices. Although CNNs showed a good ability to identify patterns in the data, they lacked the ability to follow the stock price with a forecast with a minimally reasonable error as shown in Figure 1.
Figure 1.
CNN model.
Next, combinations of CNN with LSTM and GRU were also explored (Table 13), leading to the conclusion that the best-performing models were those combining 3 CNN layers and 1 GRU layer, and 2 CNN layers with 2 LSTM layers.
Table 13.
CNN models.
As with the other models, the BayesianOptimization algorithm was used to find the best hyperparameters for the models, which were 256, 128, 224, and 256 for the CNN and GRU layers, respectively, in the model composed of 3 CNN + 1 GRU, and 256 and 128 for the two CNN layers, and 256 and 224 for the two LSTM layers in the model composed of two CNN and LSTM layers as illustrated in Table 14 and Table 15.
Table 14.
Final CNN + GRU model.
Table 15.
Final CNN + LSTM model.
3.5. RNN Model
For the RNN model, as with the previous models, different combinations were tested between the RNN layers alone and with the GRU and LSTM layers as shown in Table 16. Unlike the CNN models, most combinations of these layers produced undesirable results, but even so, the models consisting of two GRU layers followed by two RNN layers, and the one composed of one LSTM layer followed by two RNN layers, obtained the best results. In addition to these two models, the model consisting of only RNNs was also chosen, as it also performed very well without the need to implement other types of layers. The final architectures and optimized values of these models can be seen in Table 17, Table 18 and Table 19.
Table 16.
RNN models.
Table 17.
Final RNN model.
Table 18.
Final GRU + RNN model.
Table 19.
Final LSTM + RNN model.
3.6. XGBoost Model
For the XGBoost model, the input data were prepared differently compared to the other models. Firstly, the data preparation was altered in order to accommodate the specificities of the model in question. Unlike the previous process, where the data were organized into fixed time sequences via an input window, for the XGBoost, the data were divided into training and test sets using a simple 80% and 20% split, respectively. After this division, the independent variables were separated from the target variable, just like the previous process.
The model was then configured using the XGBRegressor function [24] and the parameters shown in Table 20, which were obtained by optimizing the values using the GridSearchCV function from the scikit-learn library [13]. These value ranges were obtained from the document available on the Kaggle platform called “A Guide on XGBoost hyperparameters tuning” [25].
Table 20.
Parameters for the XGBoost model.
After configuring the XGBoost model, the prediction process was performed iteratively. Since there was no input window, the model needed to update the data used for predictions with previous predictions and the previous real value after each prediction. This method allowed the model to adjust its forecasts by incorporating both its own predictions and the actual observed data. This iterative update approach is well suited for time series data, where each new prediction can be informed by both previous predictions and values.
4. Results
This section presents the results regarding the performance of all the selected models in time series forecasting. The Time Series Cross Validation technique was used to evaluate and compare the performance of the models with different divisions on the training and test subsets, as well as for different stock prices. This technique is ideal for time series, as it maintains the temporal sequence of the data, unlike traditional cross validation, where the order of the data does not need to be preserved. In Time Series Cross Validation, the test set always consists of data after the training set, thus ensuring that the model is evaluated based on its ability to predict future data from past information [26]. For this analysis, 10 folds were used as shown in Figure 2. This means that the model was trained 10 times, each time with a smaller time window, always validating with future data not seen up to that point. The performance metrics of the MAE, MSE, RMSE, MAPE, and metrics were used for each fold.
Figure 2.
Time Series Cross Validation.
It should be noted that each subset was individually resized between 0 and 1, using the MinMaxScaler function to provide a constant input to the model, thus only varying the size of the subsets and not the size of the values themselves. Next, after the model made its prediction, it was necessary to resize the prediction to the actual values to between 0 and 100. This method solves two problems that may arise. First is the illegibility of the values since the calculated metrics have quite small values, and second is maintaining a constant scale between the various subsets since if the subsets are resized to their actual values (before passing through the MinMaxScaler function), they will have different scales, rendering the interpretation of the absolute metrics useless. Table 21, Table 22, Table 23, Table 24, Table 25, Table 26, Table 27, Table 28 and Table 29 show the results of the applied models.
Table 21.
LSTM model.
Table 22.
GRU model.
Table 23.
LSTM + GRU model.
Table 24.
CNN + GRU model.
Table 25.
CNN + LSTM model.
Table 26.
GRU + RNN model.
Table 27.
LSTM + RNN model.
Table 28.
RNN model.
Table 29.
XGBoost model.
5. Discussion
The analysis of the results reveals that the model consisting of only two GRU layers and the XGBoost model showed the best overall performance compared to the other models tested as evidenced by the low average values of MAE, MSE, RMSE, MAPE, and across all folds. Table 30 and Table 31 provide a summary of the average results and standard deviations of the metrics calculated for each model tested across all folds.
Table 30.
Average of calculated metrics.
Table 31.
Standard deviation of calculated metrics.
Among the models tested, GRU had the best MSE and MAPE, while XGBoost model had the best MAE, RMSE and , making it the two models that stood out the most, with very similar values in all metrics. The next best-performing model was the LSTM model. Although this model showed good results, when compared to the two previously mentioned, it was simply worse in all metrics. Looking closer at this model, we can see that the main reason for this is its poor performance in fold 1. Additionally, the model composed of RNN and GRU layers and the model combining CNN and GRU layers consistently ranked 5th and 4th, respectively. As for the LSTM + GRU, CNN + LSTM and RNN models, they performed considerably worse than all the models mentioned above. Although they showed worse values on average, these models are still capable of making good predictions as evidenced by the performance of the CNN + LSTM model in fold 6, surpassed only by the CNN + GRU model. Finally, the model with the worse performance was the model composed of RNN and LSTM layers, which came as a surprise, given that the addition of a LSTM layer would be expected to improve the model’s performance when compared to a model with only RNN layers. One point to highlight is the inclusion of GRU layers, which was shown to have a very positive impact on model performance, for the GRU alone, the LSTM + GRU model, the CNN + GRU model and the GRU + RNN model. The prediction plots of the models in their best folds are presented in Figure 3, Figure 4, Figure 5, Figure 6, Figure 7, Figure 8, Figure 9, Figure 10 and Figure 11. Fold 9 was the best for all models, with the exception of those composed of CNN layers, i.e., the CNN + LSTM and CNN + GRU models, where the best result was obtained in fold 6.
Figure 3.
Forecast curve of the 9th fold of the LSTM model.
Figure 4.
Forecast curve of the 9th fold of the GRU model.
Figure 5.
Forecast curve of the 9th fold of the LSTM + GRU model.
Figure 6.
Forecast curve of the 6th fold of the CNN + GRU model.
Figure 7.
Forecast curve of the 6th fold of the CNN + LSTM model.
Figure 8.
Forecast curve of the 9th fold of the GRU + RNN model.
Figure 9.
Forecast curve of the 9th fold of the LSTM + RNN model.
Figure 10.
Forecast curve of the 9th fold of the RNN model.
Figure 11.
Forecast curve of the 9th fold of the XGBoost model.
When analyzing the graphs, it is evident that the GRU, LSTM, and XGBoost models demonstrate the best performance for fold 9, highlighting their ability to closely and accurately track the various price fluctuations. The GRU + RNN model also performed well, ranking close to the top models. In fold 6, all models showed good performance, attributed to the lower complexity of this fold, which exhibits fewer large-amplitude oscillations compared to the others. This indicates that these models handle scenarios with lower variability very effectively.
On the other hand, we observe that the RNN and LSTM + RNN models struggled to match the actual prices with the same precision as the others, particularly in the final predictions, where the percentage error tended to increase compared to the initial predictions. This suggests that these models may face challenges in capturing patterns with high fluctuations and in keeping up with sudden amplitude increases.
Additionally, it is clear that all the models struggled to keep up with the rapid price changes, particularly exacerbated in the latter part of fold 9, where the magnitude of price fluctuations is greater than in the rest of the fold. Another point to note is the clear delay observed in the predictions of the worst models, such as RNN and LSTM + RNN, which directly contributed to their poorer performance on the calculated metrics.
6. Conclusions
This work explored the application of different stock price prediction techniques by applying and comparing models such as LSTM, GRU, CNN, RNN, and XGBoost, providing insight into the capabilities and limitations of these approaches in the context of time series data.
The analysis showed that, although each model has its own characteristics, the choice of the most efficient approach strongly depends on the specifics of the data and the objective of the forecast, where in this case, the XGBoost and GRU models were better at generalizing data. The complexity of advanced models such as these two was reflected in their overall performance, suggesting that they can be particularly effective in capturing patterns and making accurate predictions in more complex time series such as stock prices. In contrast, models such as RNN and CNN demonstrated variable performance, indicating that model configuration needs to be adjusted, as combining them with GRU layers yielded significantly better performance compared to when tested alone or in combination with LSTM layers.
For future work, we may test new machine learning algorithms or more combinations of different algorithms. Another possibility would be to diversify the dataset used, including a more diverse set of stocks spanning different financial markets and growth percentages. Additionally, the implementation of more advanced features, such as economic indicators or even the inclusion of different stocks within the same dataset, could provide a more robust view of the market as a whole. Another approach would be to apply classification techniques to predict the direction of prices, that is, whether they will go up or down. This would allow for detailed analysis and a more relevant approach to financial decision-making. The use of ensemble models could be a viable option to achieve this goal. Another improvement could be to use Time Series Cross Validation in the input window size and in the choice of the model architectures.
Author Contributions
Conceptualization, D.M.T. and R.S.B.; methodology, D.M.T. and R.S.B.; software, D.M.T.; validation, D.M.T. and R.S.B.; formal analysis, D.M.T.; investigation, D.M.T.; writing—original draft preparation, D.M.T.; writing—review and editing, D.M.T. and R.S.B.; visualization, D.M.T.; supervision, R.S.B. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Data Availability Statement
The original data presented in the study are openly available on Yahoo Finance at https://finance.yahoo.com/ (accessed on 28 September 2024) and on FRED, Federal Reserve Economic Data at https://fred.stlouisfed.org/ (accessed on 28 September 2024).
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Jesse, A. Algorithmic Trading: Leveraging AI and ML in Finance. RapidInnovation. Available online: https://www.rapidinnovation.io/post/algorithmic-trading-leveraging-ai-and-ml-in-finance (accessed on 28 September 2024).
- Shah, D.; Isah, H.; Zulkernine, F. Stock Market Analysis: A Review and Taxonomy of Prediction Techniques. Int. J. Financ. Stud. 2019, 7, 26. [Google Scholar] [CrossRef]
- Li, Z.; Yu, H.; Xu, J.; Liu, J.; Mo, Y. Stock Market Analysis and Prediction Using LSTM: A Case Study on Technology Stocks. Innov. Appl. Eng. Technol. 2023, 2, 1–6. [Google Scholar] [CrossRef]
- Sonkavde, G.; Dharrao, D.S.; Bongale, A.M.; Deokate, S.T.; Doreswamy, D.; Bhat, S.K. Forecasting Stock Market Prices Using Machine Learning and Deep Learning Models: A Systematic Review, Performance Analysis, and Discussion of Implications. Int. J. Financ. Stud. 2023, 11, 94. [Google Scholar] [CrossRef]
- Hoque, K.E.; Aljamaan, H. Impact of Hyperparameter Tuning on Machine Learning Models in Stock Price Forecasting. IEEE Access 2021, 9, 163815–163824. [Google Scholar] [CrossRef]
- Gülmez, B. Stock Price Prediction with Optimized Deep LSTM Network Using Artificial Rabbits Optimization Algorithm. Expert Syst. Appl. 2023, 227, 120346. [Google Scholar] [CrossRef]
- Nabipour, M.; Nayyeri, P.; Jabani, H.; Mosavi, A.; Salwana, E.; Shamshirband, S. Deep Learning for Stock Market Prediction. Entropy 2020, 22, 840. [Google Scholar] [CrossRef] [PubMed]
- Naufal, G.R.; Wibowo, A. Time Series Forecasting Based on Deep Learning CNN-LSTM-GRU Model on Stock Prices. Int. J. Eng. Trends Technol. 2023, 71, 126–133. [Google Scholar] [CrossRef]
- Zhang, J.; Ye, L.; Lai, Y. Stock Price Prediction Using CNN-BiLSTM-Attention Model. Mathematics 2023, 11, 1985. [Google Scholar] [CrossRef]
- Mehtab, S.; Sen, J. Stock Price Prediction Using CNN and LSTM-Based Deep Learning Models. In Proceedings of the 2020 International Conference on Decision Aid Sciences and Application (DASA), Chiangrai, Thailand, 5–6 November 2020; pp. 447–452. [Google Scholar] [CrossRef]
- Yahoo Finance. Available online: https://finance.yahoo.com/ (accessed on 28 September 2024).
- Pandas. Available online: https://pandas.pydata.org/ (accessed on 28 September 2024).
- Scikit-Learn. Available online: https://scikit-learn.org (accessed on 5 October 2024).
- Hoseinzade, E.; Haratizadeh, S. CNNpred: CNN-based stock market prediction using a diverse set of variables. Expert Syst. Appl. 2019, 129, 273–285. [Google Scholar] [CrossRef]
- Federal Reserve Economic Data (FRED). Available online: https://fred.stlouisfed.org/ (accessed on 28 September 2024).
- Kavya, D. Optimizing Performance: SelectKBest for Efficient Feature Selection in Machine Learning. Medium. 2023. Available online: https://medium.com/@Kavya2099/optimizing-performance-selectkbest-for-efficient-feature-selection-in-machine-learning-3b635905ed48 (accessed on 30 September 2024).
- Cort, J. Willmott and Kenji Matsuura. Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim. Res. 2005, 30, 79–82. [Google Scholar] [CrossRef]
- Ken, S. Mean Squared Error. Encyclopedia Britannica, 2024. Available online: https://www.britannica.com/science/mean-squared-error (accessed on 30 September 2024).
- Deepchecks. Root Mean Squared Error (RMSE). Available online: https://www.deepchecks.com/glossary/root-mean-square-error/ (accessed on 30 September 2024).
- Scott, N. Coefficient of Determination: How to Calculate It and Interpret the Result. Investopedia. 2024. Available online: https://www.investopedia.com/terms/c/coefficient-of-determination.asp (accessed on 30 September 2024).
- Keras. Available online: https://keras.io (accessed on 4 October 2024).
- Hu, Z.; Zhao, Y.; Khushi, M. A Survey of Forex and Stock Price Prediction Using Deep Learning. Appl. Syst. Innov. 2021, 4, 9. [Google Scholar] [CrossRef]
- Mancuso, P.; Piccialli, V.; Sudoso, A.M. A machine learning approach for forecasting hierarchical time series. Expert Syst. Appl. 2021, 182, 115102. [Google Scholar] [CrossRef]
- XGBoost Documentation. Available online: https://xgboost.readthedocs.io/en/latest/python/ (accessed on 5 October 2024).
- Prashant, B. A Guide on XGBoost Hyperparameters Tuning. Kaggle. 2020. Available online: https://www.kaggle.com/code/prashant111/a-guide-on-xgboost-hyperparameters-tuning/ (accessed on 5 October 2024).
- GeeksforGeeks. GeeksforGeeks: A Computer Science Portal for Geeks. Available online: https://www.geeksforgeeks.org/ (accessed on 23 October 2024).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).