A New Stock Price Forecasting Method Using Active Deep Learning Approach

: Stock price prediction is a signiﬁcant research ﬁeld due to its importance in terms of beneﬁts for individuals, corporations, and governments. This research explores the application of the new approach to predict the adjusted closing price of a speciﬁc corporation. A new set of features is used to enhance the possibility of giving more accurate results with fewer losses by creating a six-feature set (that includes High, Low, Volume, Open, HiLo, OpSe), rather than the traditional four-feature set (High, Low, Volume, Open). The study also investigates the effect of data size by using datasets (Apple, ExxonMobil, Tesla, Snapchat) of different sizes to boost open innovation dynamics. The effect of the business sector in terms of the loss result is also considered. Finally, the study included six deep learning models, MLP, GRU, LSTM, Bi-LSTM, CNN, and CNN-LSTM, to predict the adjusted closing price of the stocks. The six variables used (High, Low, Open, Volume, HiLo, and OpSe) are evaluated according to the model’s outcome, showing fewer losses than the original approach, which utilizes the original feature set. The results show that LSTM-based models improved using the new approach, even though all models showed a comparative result wherein no model showed better results or continuously outperformed other models. Finally, the added new features positively affected the prediction models’ performance.


Introduction
Countries focus on improving and enhancing their economies to create a good standard of living by ensuring public spending. The modern economy establishes large corporations that could create enormous opportunities and keep up with rapid changes in the world economy [1,2]. The stock market is a pool of buyer and seller securities divided into the private stock exchange, open stock exchange, and mixed ownership stock exchange [3]. The private stock exchange involves exchanging shares of private companies, whereas the open stock exchange includes shares of a company listed in the public stock market. The mixed-ownership stock is in companies whose shares are only partially exchangeable in the public stock market. These stock exchanges are created in the United Kingdom, such as in the London Stock Exchange, and the United States, such as in the New York Stock Exchange (NYSE) [4][5][6][7][8][9].
Stock price forecasting is among one of the most challenging problems that financial institutions, businesses, and individual investors face [10]. Many factors impact the validity of stock price forecasts, including economics, political contexts, and investor psychology. According to the literature, because of this complexity, there is much interest in applying machine learning methods such as artificial intelligence, probabilistic reasoning, and evolutionary programming to assess large historical datasets on stock prices [11,12]. As it does not require any statistical hypotheses, an Artificial Neural Network, chiefly, a statistical and non-parametric method, is one of the most popular tools in predictive modeling among all of these computer intelligence approaches [13][14][15][16][17][18].
The stock market is the backbone of any economy; the primary purposes of any investment in the stock market are profit maximization and minimizing risk [4]; therefore, countries need to enhance their stock markets, since they are related to economic growth [19]. Investing in the stock market could lead to a quick return on investment; therefore, stock market prediction is one of the best strategies to achieve a profit. Stock market prediction is not linear, thus making it harder to predict a corporation's stock prices in a specific market [20]. Consequently, investors and researchers have to find techniques that could lead to accurate results and higher profits [21]. Conventional machine learning models are superior to statistical models such as ARIMA [22]. On the other hand, deep learning models such as Long Short-Term Memory (LSTM) were proven to outperform machine learning models such as Support Vector Regression (SVR) [23], (KARA et al., 2011), which also showed that the deep learning model Artificial Neural Network (ANN) had been detected instead of Support Vector Machine (SVM) [24].
Forex price forecasting is similar to stock price forecasting [25,26]. An attention RNN-ARIMA (ARNN-ARIMA) model is proposed to forecast forex prices. The proposed model was evaluated using three main metrics: Root Mean Squared Error (RMSE), Mean Squared Percentage Error (MAPE), and Directional Accuracy. The proposed model has been compared with multiple models, including RNN, GRU, LSTM, and ARNN. It outperformed all other models by utilizing all metrics, subsequently achieving the lowest RMSE and MAPE with 1.65 × 10 −3 , and 23.2%, respectively, and the highest DA with 75.7%, slightly outperforming ARNN which achieved 73.5% DA [27]. An LSTM with an embedded layer model (ELSTM), and a LSTM with an Autoencoder (ALSTM), are introduced in [28]. The proposed model compared multiple metrics to evaluate its performance compared with multiple models. In addition to the two datasets, the first experiment on the first dataset, using ALSTM and ELSTM, revealed a good performance, outperforming other models such as attention multi-layer perceptron (AMLP) and embedded multi-layer perceptron (EMLP) by scoring a lower MSE and higher relative accuracy of the Shanghai A-share composite index; however, ALSTM achieved the worst MSE score on the second dataset, and both models achieved the worst results in terms of the comparative accuracy of Sinopec.
Deep learning models gave excellent results in many areas [29,30]. They showed potential for use in stock market prediction due to their capability to detect the dynamics of stock market movements and get adequate results [31]. This article focuses on the proposed six deep learning models and detecting the differences between them, including LSTM [32,33]. The Gated Recurrent Network (GRU) has also been used in the evaluation process [34], which is also a RNN-based model. A Multi-Layer Perceptron (MLP) [35] has been used in this work, as well as a Convolutional Neural Network (CNN) [36], CNN-LSTM model, and Bidirectional-LSTM (Bi-LSTM) [37]. This research introduced six models; the first is the Multi-Layer Perceptron (MLP). MLP is a neural network of three sections of neurons, including an input neuron layer, hidden neuron layer, and output neuron layer, and the model could have multiple hidden layers. Each neuron is connected to all neurons in the previous layer in this model. These types of connections are called fully connected layers or dense layers. The neurons of the same layer are not connected. The learning process changes the weights of each neuron after processing the data according to the error amount in the output compared with the excepted result. Data concerning four companies, Apple, Tesla, ExxonMobil, and Snapchat, evaluated these models. Each dataset focuses upon a different period to detect the effect of data size. Each company has a different business focus. This article proposes a feature extraction technique to increase the number of features models that could be utilized in order to give accurate predictions with fewer losses. Finally, as noted by Kim and Kim in [38], the loss functions used in the evaluation process are Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE). The results showed that LSTM-based models improved using the new approach, even though all models showed a comparative result wherein no model showed better results or continuously outperformed other models. The CNN model showed the best efficiency in terms of execution time. GRU and CNN were the best models for giving good results with fewer examples. The main aims of this paper are presented as follows: • Study the effects of the additional features (i.e., High, Low, Volume, Open, HiLo, OpSe).

•
Detect the effect of the size of the datasets on the prediction accuracy. • Detect the difference between the deep learning models (i.e., MLP, GRU, LSTM, Bi-LSTM, CNN, and CNN-LSTM).
The main sections of this paper are organized as follows. Section 2 presents the related works in this paper. Section 3 shows the proposed methodology for Stock Price Forecasting. Section 4 presents the experiments, results, and discussion. Finally, the conclusion and directions for future research are given in Section 5.

Related Work
Recently, a great deal of research in forecasting forex and stock market prices has been undertaken [39][40][41][42][43]. Kang et al., 2019, proposed Generative Adversarial Network architecture with Long Short-Term Memory (LSTM) as a generator and Multi-Layer Perceptron (MLP) as a discriminator. The GAN model has been compared with the LSTM, Artificial Neural Network (ANN), and Support Vector Regression (SVR); multiple metrics have been utilized to evaluate the models, and the proposed GAN model has proven to be superior compared to another model, according to all metrics used in this paper [44]. Big data would allow for more efficiency and innovative speed. Venture capital, equity funds, and exchange-traded funds are examples of financial innovation that have aided financial development and economic growth [45][46][47][48][49].
Three models, including Support Vector Regression (SVR), Linear Regression (LR), and Long Short-Term Memory (LSTM), are introduced in [50]. It was revealed that the LSTM outperforms other models by far, achieving 0.0151 scores, whereas LR came second with 13.872, and SVR came last with 34.623 [51]. Pratik et al. proposed two models based on graph theory; the first was based on the correlation between historical prices and the other was based on causation. The results proved that graph-based models are superior to traditional methods, and the causation-based model achieved slightly better results than the correlation one. The basic RNN, LSTM, and GRU models are proposed in [52]. The GRU model had achieved results with 0.67 accuracies and 0.629 log loss, followed by LSTM with 0.665 accuracies and 0.629 log loss, and RNN with 62.5 accuracies and a log loss of 0.725, but, both LSTM and GRU were tweaked with the addition of a dropout layer, and the GRU model did not show any enhancement because of the dropout layer; however, the LSTM showed a slight performance enhancement of 2%.
The LSTM model is proposed in [53] to forecast (nifty 50) stock prices; LSTM is an RNN architecture used in Natural Language Processing (NLP). The results showed that the more parameters and epochs it gets, the better performance it gives, and it achieved the best performance of 0.00859 in the RMSE metric using the High, Low, Open, Close parameter set and 500 epochs. Four deep learning models, namely, MLP, RNN, CNN, and LSTM, are introduced in [54]; these models have been trained on TATA MOTORS. After training, the models were evaluated by predicting stock prices, and the models achieved satisfactory results by identifying the patterns of stock movements even in other stock markets, which shows that deep learning models could identify the underlying dynamics; CNN proved to be superior. This article also tried the ARIMA model, but it did not learn the underlying dynamics between multiple time series.
A CNN model that uses a high order structure is proposed in [55]. Indeed, it was compared with many different models, including traditional methods such as ARIMA and Wavelet, which were proven to perform the worst, followed by the machine-learning model, and the Hidden Markov Model (HMM), which was also inferior when compared with deep learning models such as LSTM and SMF with 1-3% accuracy. These deep learning models were inferior to the CNN model, which uses a high-order structure. These results were obtained after evaluating multiple datasets, including Apple, Google, IBM, S&P 500, and other datasets. The RNN, CNN, and LSTM deep learning models are introduced in [56], and ARIMA is compared against deep learning models. The models were trained and evaluated on the Infosys dataset, TCS, and Cipla datasets, to investigate if the models would capture the hidden underlying dynamics between data. Deep learning models showed a superior performance to the ARIMA model, with CNN being the best deep learning model, outperforming ARIMA by 1352.1%, LSTM by 177.1%, and RNN by 165.2%.
The performance of various deep learning models, such as deep LSTM, MLP, and ELSTM models in [57], LSTM and GRU in [58], and the SVR and NN in [59], were compared for stock price forecasting. Data from three banks in the NSE of India has been gathered to evaluate these models. Deep LSTM was proven to have a higher accuracy and lower MSE than other models. A Deep Wide Neural Network (DWNN) is proposed in [60] that combines both RNN and CNN models to solve the RNN-basic models' limitations, and it trains the models' stock data in China's SSE sandstorm sector to ensure that it has been utilized; the results proved that the combination of RNN and CNN models reduced the performance by 30% compared with the vanilla RNN. A hybrid model that combines the Discrete Wavelet Transform (DWT) and Artificial Neural Network (ANN) is proposed in [61] to produce better performance using DWT to analyze the original data. Moreover, to produce an approximation, and to detail coefficients used as input for the model, this method enhances performance compared with the original ANN model for five datasets.
A novel model is proposed in [62] to predict Bitcoin prices, similar to stock price prediction. Three are deep learning models, vanilla RNN, LSTM, and ARIMA. The three models showed similar performance when it comes to accuracy, 52.78%, 50.25%, and 50.05% for LSTM, RNN, and ARIMA, respectively; however, when it comes to the RMSE, the two deep learning models demolished the ARIMA model, with 6.87% and 5.45% for LSTM and RNN, respectively, and 53.74% RMSE for the ARIMA model. A new deep learning model is proposed using vanilla CNN, ANN, and a CNN model enhanced by a genetic algorithm (GA-CNN) [63]. The results showed that GA-CNN outperforms both CNN and ANN models in terms of accuracy by achieving 73.74% accuracy, thus outperforming the vanilla CNN by over +3%, and ANN by +15% accuracy. In [64], multiple deep learning models are introduced, including LSTM, CNN, LSTM-CNN, SVR, Applied Empirical Mode Decomposition (EMD), and Complete Ensemble-EMD (CEEMD), to help in the process of improving LSTM and CNN-based models. They applied these models to four different datasets and the results showed that CEEMD-LSTM-CNN proved to be superior to other models introduced in this paper.
A novel model utilized Wavelet Transform, stacked auto-encoders, and bidirectional long short-term memory [65]. This model was called WAE-BLSTM and had a three-stage workflow, including eliminating noise, dimensionality reduction, and prediction using BLSTM. To show the capabilities of this model, which has been compared with four models, W-BLSTM, W-LSTM, BLSTM, and LSTM, the WAE-BLSTM outperformed other models according to both MAE and RMSE metrics. A CNN-BiLSTM-AM model is presented in [66] that utilizes CNN, BiLSTM, and the attention mechanism. CNN extracts the features, BiLSTM is used for prediction using these features, and the attention mechanism captures the influence of the extracted features. Compared with Bi-LSTM-AM, CNN-BiLSTM, CNN-LSTM, BiLSTM, LSTM, CNN, RNN, and MLP, the model proved to be superior according to MAE and RMSE metrics.
The Elman neural network is introduced in [67] and is an RNN-based neural network. The Elman-NN utilized direct input-to-output connections (DIOCs) to produce Elman-DIOCs to evaluate these models against the Elman-NN and MLP. Four global stock indices were used. Elman-DIOCs outperformed both the Elman-NN and MLP according to MAE and RMSE metrics; DIOCs are usually beneficial when adding to Neural Network models. A graph-based CNN is introduced in [68] called the Stock Sequence Array Convolutional Neural Network (SSACNN). It gathers data, including historical data prices and leading indicators as an array, and feeds them to the CNN model as a graph; ten stock datasets from two markets fed into the model in the evaluation process, and SSACNN proved to outperform CNN, ANN, and SVM models in terms of accuracy.
Different GRU models are presented in [69] to predict Bitcoin prices, and these models also have been compared with LSTM and the Artificial Neural Network (ANN); these GRU models included the basic GRU, GRU-Dropout model, and the GRU-Dropout-GRU model, and the results showed that the basic GRU outperformed both other GRU models, LSTM, and ANN models by achieving lower RMSE. Attention-Based LSTM is introduced in [70], which utilizes Wavelet Transform to clear the noise of the data (AWLSTM). This model has been compared with WLSTM, LSTM, and GRU models. Three datasets and four metrics have been used to evaluate the models; the datasets included S&P 500, DIJA, and HSI, and the results proved that AWLSTM is superior compared with other models according to the four metrics [70]. Table 1 shows an overview of the most related works.

Methodology
This section gives the main procedure of the methods used as follows.

Datasets
This research includes four datasets of four companies with different business sectors: Apple, Tesla, Snapchat, and ExxonMobil.
Apple is a software and hardware provider. Its data set includes stock price indexes such as opening and volume, high and low price, as well as adjusted closing price, which is considered to be a feature that predicts how the first four (opening and volume, high and low price) indexes are treated, either as input data or features, based on the past 21 years of stock price data. The first dataset for this study includes the period 30 October 2000 to 17 October 2021, with 5283 data instances. The second dataset contains 11 years of stock price data for Tesla from 29 June 2010 to 27 October 2021, including 2855 instances that concern an automobile company. Tesla's market capitalization and stock prices have been more volatile than Apple and Snapchat datasets. This is due to tweets by Tesla's executive director, Elon Musk, which have influenced Tesla's market capitalization and stock prices.
The third dataset contains three years and nine months of stock price data for Snapchat from 3 February 2017 to 11 November 2021, including 1186 instances; it is a social media platform and a relatively new company compared with the other three datasets. Its dataset creates a challenge for the models to make predictions due to its relatively small dataset, leading to under-fit. The fourth dataset is the ExxonMobil dataset, which includes the pricing data of the period from 3 January 2000 to 7 December 2021, including 5520 instances of an oil company created from merging the Exxon oil and Mobil oil companies. Its dataset was added to diversify the used datasets. This dataset has been used for the past 21 years. Data collected from Yahoo Finance (.csv) files included four input features and one output feature. The Date/Time dimension has been removed because it has no relation or effect on the prediction process.
The data has been normalized using a min-max scaler: where: -x*: is the new value - x: is the old value min: the minimum value max: the maximum value The max is the maximum value of the sample, and the min is the minimum of the sample, so the x is mapped to [0, 1].
Then, it is split into 70% training data, 15% testing data, and 15% validation; this split is used to prevent overfitting models and to evaluate models accurately [72].

Used Models
The overall flow diagram of the proposed work is presented in Figure 1. stock prices. The third dataset contains three years and nine months of stock price data for Snapchat from 3 February 2017 to 11 November 2021, including 1186 instances; it is a social media platform and a relatively new company compared with the other three datasets. Its dataset creates a challenge for the models to make predictions due to its relatively small dataset, leading to under-fit. The fourth dataset is the ExxonMobil dataset, which includes the pricing data of the period from 3 January 2000 to 7 December 2021, including 5520 instances of an oil company created from merging the Exxon oil and Mobil oil companies. Its dataset was added to diversify the used datasets. This dataset has been used for the past 21 years. Data collected from Yahoo Finance (.csv) files included four input features and one output feature. The Date/Time dimension has been removed because it has no relation or effect on the prediction process. The data has been normalized using a min-max scaler: where: The max is the maximum value of the sample, and the min is the minimum of the sample, so the x is mapped to [0, 1].
Then, it is split into 70% training data, 15% testing data, and 15% validation; this split is used to prevent overfitting models and to evaluate models accurately [72].

Used Models
The overall flow diagram of the proposed work is presented in Figure 1. This research introduced six models; the first is the Multi-Layer Perceptron (MLP). MLP is a neural network of three sections of neurons, including the input neuron layer, hidden neuron layer, and output neuron layer, and the model could have multiple hidden layers. Each neuron is connected to all neurons in the previous layer in this model. These types of connections are called fully connected layers or dense layers. Neurons of the same layer are not connected. The learning process changes the weights of each neuron after processing the data according to the error amount in the output compared with the excepted result. Each neuron has several inputs (xi), and each neuron has weight (wi); the sum of the results of the neurons' inputs (xi) multiplied by the weights (wi) of these neurons, is then added to the threshold value (b), as shown in the equation below [73]. This research introduced six models; the first is the Multi-Layer Perceptron (MLP). MLP is a neural network of three sections of neurons, including the input neuron layer, hidden neuron layer, and output neuron layer, and the model could have multiple hidden layers. Each neuron is connected to all neurons in the previous layer in this model. These types of connections are called fully connected layers or dense layers. Neurons of the same layer are not connected. The learning process changes the weights of each neuron after processing the data according to the error amount in the output compared with the excepted result. Each neuron has several inputs (xi), and each neuron has weight (wi); the sum of the results of the neurons' inputs (xi) multiplied by the weights (wi) of these neurons, is then added to the threshold value (b), as shown in the equation below [73].
Then, this net of A is applied to the activation function F(A) to give the output as the equation below.
The second model used in this research is Long Short-Term Memory (LSTM). LSTM is a RNN-based model that is used when long-term dependencies are a significant part of the learning process [73]. This is because remembering dependencies for a long time is a major benefit of using LSTM since it has forgotten gates on top of two main gates, which are the input and output gates. These forgotten gates allow the model to learn when to forget [24]; the following Figure 2 will break down the work process of the LSTM cell [73].
Then, this net of A is applied to the activation function F(A) to give the output as the equation below.
The second model used in this research is Long Short-Term Memory (LSTM). LSTM is a RNN-based model that is used when long-term dependencies are a significant part of the learning process [73]. This is because remembering dependencies for a long time is a major benefit of using LSTM since it has forgotten gates on top of two main gates, which are the input and output gates. These forgotten gates allow the model to learn when to forget [24]; the following Figure 2 will break down the work process of the LSTM cell [73]. Ot is the output sigmoid gate, and the line from Ct-1 and Ct carries the information covering the entire network, gathering the information from the gates of the cell and transferring it from Ct-1 to Ct [73]. The ft layer decides to remember the information [37], and the ft output is multiplied by Ct-1. Then, the multiplication between it, the sigmoid layer gate, and the Ĉt tanh layer gate, is added to the Ct-1, and the point-wise multiplication of Ot and tanh forms the output ht [73].
Bidirectional-LSTM (Bi-LSTM) is a version of LSTM that is introduced to increase the amount of data available in the neural network [71,74]. The LSTM could only learn from past information, and Bi-LSTM could learn from both the past and future at the same time because it had two hidden layers that have opposite directions connected to the same output [74], as shown in Figure 3.  As Figure 2 reveals, Ct-1 and Ct are the old and present cell states. The ht-1 and ht are the output of the previous and current cells. ft is the forgotten gate-the input gate. The Ot is the output sigmoid gate, and the line from Ct-1 and Ct carries the information covering the entire network, gathering the information from the gates of the cell and transferring it from Ct-1 to Ct [73]. The ft layer decides to remember the information [37], and the ft output is multiplied by Ct-1. Then, the multiplication between it, the sigmoid layer gate, and theĈt tanh layer gate, is added to the Ct-1, and the point-wise multiplication of Ot and tanh forms the output ht [73].
Bidirectional-LSTM (Bi-LSTM) is a version of LSTM that is introduced to increase the amount of data available in the neural network [71,74]. The LSTM could only learn from past information, and Bi-LSTM could learn from both the past and future at the same time because it had two hidden layers that have opposite directions connected to the same output [74], as shown in Figure 3.
Then, this net of A is applied to the activation function F(A) to give the output as the equation below.
The second model used in this research is Long Short-Term Memory (LSTM). LSTM is a RNN-based model that is used when long-term dependencies are a significant part of the learning process [73]. This is because remembering dependencies for a long time is a major benefit of using LSTM since it has forgotten gates on top of two main gates, which are the input and output gates. These forgotten gates allow the model to learn when to forget [24]; the following Figure 2 will break down the work process of the LSTM cell [73]. Ot is the output sigmoid gate, and the line from Ct-1 and Ct carries the information covering the entire network, gathering the information from the gates of the cell and transferring it from Ct-1 to Ct [73]. The ft layer decides to remember the information [37], and the ft output is multiplied by Ct-1. Then, the multiplication between it, the sigmoid layer gate, and the Ĉt tanh layer gate, is added to the Ct-1, and the point-wise multiplication of Ot and tanh forms the output ht [73].
Bidirectional-LSTM (Bi-LSTM) is a version of LSTM that is introduced to increase the amount of data available in the neural network [71,74]. The LSTM could only learn from past information, and Bi-LSTM could learn from both the past and future at the same time because it had two hidden layers that have opposite directions connected to the same output [74], as shown in Figure 3.  The Convolutional Neural Network (CNN) is a special Feed Forward Neural Network (FFNN). CNN has shown a decent performance in many Artificial Intelligence (AI) applications such as Natural Language Processing (NLP), image and video processing, as well as its application in time series data [27]. CNN uses weight sharing and local perception to downsize the used parameters. Moreover, it could also be separated into three-layer types: convolutional, pooling, and fully connected [75]. CNN works as follows: the convolutional layer conducts a convolution operation to extract the features, then, the pooling layer reduces the number of extracted features, thus, reducing dimensionality to speed up the process and avoid the curse of dimensionality [76].
This research also introduced a CNN-LSTM model that combines the CNN and LSTM models to get the best out of each model; however, since the model is slightly deeper than other models proposed in this research, and thus, it needs a higher volume of data, the LSTM model uses the extracted features from the CNN model to predict the stock prices due to the LSTM's ability to identify dependencies [77].
The Gated Recurrent Unit (GRU) is a RNN-based model similar to LSTM, but it merges the forgotten gate and the input gate into one single gate called the update gate. Then, it combines both the cell state and hidden state, and both GRU and LSTM solve the vanishing gradient problem of the vanilla RNN, but since GRU has less tensor operation, it will be faster than LSTM in the training time. Figure 4 shows the GRU model representation [78]. The Convolutional Neural Network (CNN) is a special Feed Forward Neural Network (FFNN). CNN has shown a decent performance in many Artificial Intelligence (AI) applications such as Natural Language Processing (NLP), image and video processing, as well as its application in time series data [27]. CNN uses weight sharing and local perception to downsize the used parameters. Moreover, it could also be separated into threelayer types: convolutional, pooling, and fully connected [75]. CNN works as follows: the convolutional layer conducts a convolution operation to extract the features, then, the pooling layer reduces the number of extracted features, thus, reducing dimensionality to speed up the process and avoid the curse of dimensionality [76].
This research also introduced a CNN-LSTM model that combines the CNN and LSTM models to get the best out of each model; however, since the model is slightly deeper than other models proposed in this research, and thus, it needs a higher volume of data, the LSTM model uses the extracted features from the CNN model to predict the stock prices due to the LSTM's ability to identify dependencies [77].
The Gated Recurrent Unit (GRU) is a RNN-based model similar to LSTM, but it merges the forgotten gate and the input gate into one single gate called the update gate. Then, it combines both the cell state and hidden state, and both GRU and LSTM solve the vanishing gradient problem of the vanilla RNN, but since GRU has less tensor operation, it will be faster than LSTM in the training time. Figure 4 shows the GRU model representation [78]. where xt is the input and ht-1 is the output of the previous unit multiplied by the weights Wt. Then, after adding the two, the result was applied to the sigmoid function. The vanishing gradient problem is solved by the update gate zt, which decides how much information should pass. The rest gate rt carries out an operation similar to that of the input gate. The rt decides how much information should be forgotten. The current memory content ht, where the input is multiplied by the weights Wr and ht-1, is multiplied by the output of rest gate rt, then Hadamard Product Operation (HPO) is applied to pass the relative information tanh function, which is applied to the summation [78]. To get ht the following operations are applied: Where xt is the input and ht-1 is the output of the previous unit multiplied by the weights Wt. Then, after adding the two, the result was applied to the sigmoid function. The vanishing gradient problem is solved by the update gate zt, which decides how much information should pass. The rest gate rt carries out an operation similar to that of the input gate. The rt decides how much information should be forgotten. The current memory content ht, where the input is multiplied by the weights Wr and ht-1, is multiplied by the output of rest gate rt, then Hadamard Product Operation (HPO) is applied to pass the relative information tanh function, which is applied to the summation [78]. To get ht the following operations are applied:

rt = σ(Wr·[ht-1,xt])
All six models used Exponential Linear Units (ELU), which outperformed the Rectified Linear units (ReLU) in the first experiments; therefore, ELU are the primary activation function for all of the experiments mentioned earlier. ELU are activation functions that could speed up the training process, solve vanishing gradient problems by improving linear characteristics, and give an identity for positive values [52]. ELU are considered an alternative for ReLU because of their ability to reduce bias shifts by pushing the mean activation towards zero when training. ELU could learn faster and have better generalization than Leaky-ReLU (LReLU) and ReLU [52]. ELU also perform normalization across the network layers without additional normalization, so a predetermined parameter scales the ELU. The following equation represents the function of ELU [79].
The MLP was the first model that tested both approaches using four datasets (Apple, Tesla, Snapchat, and ExxonMobil). The MLP model used in this research contains three layers: input layer (Sequential), hidden layer (100 Dense neurons), and output layer (Dense single neuron). It utilized an ELU activation function and Adam optimization function; the model completed a hundred epochs with a batch size of 2. As mentioned in the introduction section, the data split was 70% training, 15% testing, and 15% validation.

• CNN Model
The CNN was the fifth model that tested both approaches using four datasets (Apple, Tesla, Snapchat, and ExxonMobil). The CNN model used in this research contains six layers: an input layer (Sequential) and four hidden layers. The first layer is a Conv1D layer with 64 filters and a kernel size of 2, a MaxPooling1D layer with a pooling size of 2, a Flatten layer, a Dense layer of 50 neurons, and an Output layer (Dense single neuron). The model utilized an ELU activation function and Adam optimization function; the model completed 100 epochs with a batch size of 4.

• LSTM Model
LSTM was the third model that tested both approaches using four datasets (Apple, Tesla, Snapchat, and ExxonMobil). The LSTM model used in this research contains three layers: the input layer (Sequential), the hidden layers (32 LSTM neurons), and the output layer (Dense single neuron). The model utilized an ELU activation function and Adam optimization function; the model completed 100 epochs with a batch size of 2.

Bi-LSTM Model
Bi-LSTM was the fourth model that tested both approaches using four datasets (Apple, Tesla, Snapchat, and ExxonMobil). The Bi-LSTM model used in this research contains four layers: an input layer (Sequential), two hidden layers (32 Bi-LSTM neurons and 16 Bi-LSTM neurons), and an output layer (Dense single neuron). The model utilized an ELU activation function and Adam optimization function; the model completed 100 epochs with a batch size of 2.

• GRU Model
The GRU was the second model that tested both approaches using four datasets (Apple, Tesla, Snapchat, and ExxonMobil). The GRU model used in this research contains four layers: The input layer (Sequential), two hidden layers (both of which are GRU layers wherein the first contains 50 neurons and the second contains 25 neurons), and an output layer (Dense single neuron). The model utilized an ELU activation function and Adam optimization function; the model completed 70 epochs with a batch size of 2.
• CNN-LSTM Model CNN-LSTM was the sixth model that tested both approaches using four datasets (Apple, Tesla, Snapchat, and ExxonMobil). The CNN-LSTM model used in this research contains six layers: an input layer (Sequential) and four hidden layers. The first layer is a Conv1D layer with 64 filters and a kernel size of 1, followed by a MaxPooling1D layer with a pooling size of 2, a flatten layer, a LSTM layer of 50 neurons, and an output layer (Dense single neuron). The model utilized an ELU activation function and Adam optimization function; the model completed 100 epochs with a batch size of 4.

Feature Engineering
The original approach of stock prediction using deep learning uses four features, which are:

•
High: represents the highest price of the stock on a particular day. • Low: represents the lowest price of the stock on a particular day. • Open: represents the price at the opening stock exchange on a particular day. • Volume: represents the total number of shares or contracts exchanged between buyers and sellers.
These four are the most commonly used in stock price prediction when predicting the adjusted closing price, which represents the stock price after adjusting the closing price, and amends a stock's closing price to reflect a stock's value after accounting for any corporate actions. This research investigates the effect of the modification of the original prediction approach using the original feature set mentioned above (High, Low, Volume, Open) by creating two additional features, which will be referred to HiLo (High-Low) and OpSe (Open-Close).

Results
In this section, the proposed methods are experimented with and compared with other literature methods. A new set of features is used to enhance the possibility of giving more accurate results with fewer losses by creating a six-feature set (that includes High, Low, Volume, Open, HiLo, OpSe), rather than the traditional four-feature set (High, Low, Volume, Open). The study also investigates the effect of data size by using datasets (Apple, ExxonMobil, Tesla, Snapchat) of different sizes. The study also investigates the effect of the business sector on the loss result; finally, the study included six deep learning models, MLP, GRU, LSTM, Bi-LSTM, CNN, and CNN-LSTM, to predict the adjusted closing price of the stocks. This study revealed that using six variables (High, Low, Open, Volume, HiLo, and OpSe) improves the model's outcome, showing fewer losses than the original approach, which utilizes the original feature set. The software used in this paper is Python, and the original main parameters of the tested methods are applied. This research was performed using Google-Colab. Moreover, several libraries were used, which are as follows: pandas, NumPy, Matplotlib, Sklearn, Keras. Tables 2 and 3 demonstrate that the models showed minor and major improvements depending on the model and how much it could benefit from the new additional features. The results also show that the LSTM model outperformed other training results in both datasets.  Tables 4 and 5 show that when using the four-feature set, the CNN outperformed the other models' features in the Tesla dataset; however, when using the additional two features, Bi-LSTM outperformed other models according to the MSE metric due to the massive boost in performance caused by the addition of the two features. Moreover, LSTM outperforms other models according to the MAPE metric, which shows that the LSTM-based models improved more than other models when using additional features. As shown in Tables 6 and 7, when using four features, the CNN model outperformed other models, but when using the additional two features, LSTM showed a massive boost in performance and outperformed other models, thus showing that the LSTM model received the most benefits compared with other models using the two additional features. Tables 8 and 9 show that Bi-LSTM outperforms other models in both MSE and MAPE metrics, as well as in both datasets. The tables also show that almost all models benefitted from two additional features except CNN-LSTM.  The tables presented previously show that LSTM-based models perform better than other models in most datasets, with the CNN performing better in two cases. In general, most models showed an improvement when using the additional features.
The results of Tables 10 and 11 showed that the CN-based models (CNN, CNN-LSTM) outperform other models according to MSE metric, and LSTM-based models (LSTM and Bi-LSTM) outperform other models according to the MAPE metric. Surprisingly, Bi-LSTM showed overfitting when using four input features; this overfitting is reduced when using the additional two features. The results of Tables 12 and 13 show that Bi-LSTM is super overfitted and has a high loss compared with its training results, according to the MSE metric. The two additional features show immense importance in that it pushes the Bi-LSTM model from the worst model among the models, according to MSE metric, and second-worst model, according to the MAPE metric, to the absolute best model according to both metrics.  The results of Tables 14 and 15 show that the Bi-LSTM model outperforms other models according to the MSE model when using four input features, and GRU outperforms other models in Table 15 according to the MAPE metric; GRU also outperforms other models according to both metrics when using the additional two features. The results of Table 16 show that GRU outperforms other models according to both MSE and MAPE metrics. The CNN showed a high loss, but as shown in Table 17, this loss was mitigated after using the additional features. In general, all models showed a higher loss compared with other datasets because oil companies' stock prices in nature are very volatile, which makes the prediction process harder for the models.  Tables 2-17 show that no model continuously outperformed the other models; the results also highlight the fact that the new approach did improve the prediction accuracy of the models in most cases, especially the LSTM and Bi-LSTM models. The results also showed that most significant losses were related to Tesla due to its high adjusted closing price compared with other datasets; for example, the adjusted closing price for Tesla on 27 October 2021 was 1037.85, and Apple's adjusted closing price on the same day was 148.85. This means that the big difference in loss is due to the high-adjusted closing price of Tesla; the results also show that the small-sized dataset of Snapchat did not create a loss problem, and it was as normal as the others. The results also show that the models achieved a high loss when predicting the price of the ExxonMobil dataset compared with other datasets due to the volatile nature of its stock prices. To have another perspective on the achieved results, visualizations have been utilized to clearly show the proposed effect of the approach. Figures 5 and 6 show the validation results of both the standard and proposed approaches.     As shown in Figures 7 and 8, Bi-LSTM showed a big decrease in loss when utilizing the MSE metric, and a good decrease in loss when utilizing the MAPE metric with six features (new approach); the figures also showed that some in cases, the new approach did not make a difference, or it caused a slight increase in loss. Figures 9 and 10 show the visualization results of the Snapchat corporation.  As shown in Figures 7 and 8, Bi-LSTM showed a big decrease in loss when utilizing the MSE metric, and a good decrease in loss when utilizing the MAPE metric with six features (new approach); the figures also showed that some in cases, the new approach did not make a difference, or it caused a slight increase in loss. Figures 9 and 10 show the visualization results of the Snapchat corporation. As shown in Figures 7 and 8, Bi-LSTM showed a big decrease in loss when utilizing the MSE metric, and a good decrease in loss when utilizing the MAPE metric with six features (new approach); the figures also showed that some in cases, the new approach did not make a difference, or it caused a slight increase in loss. Figures 9 and 10 show the visualization results of the Snapchat corporation.  As shown in Figures 9 and 10, the new approach caused a decrease in loss in some cases, and a noticeable increase in loss in other cases. This noticeable increase might be due to the small-sized dataset of Snapchat. From Figure 9, we can see that the MLP method, when used on the Snapchat dataset, got the best results in terms of MSE when using both the four-feature set and the six-feature set. Figures 11 and 12 show the visualization results of the ExxonMobil Corporation. From Figure 10, we can also see that the MLP method, when used on the Snapchat dataset, got the best results in terms of MAPE when using the six-feature set, and the CNN-LSTM method, when used on the Snapchat dataset, got the best results in terms of MAPE using the four-feature set.  As shown in Figures 9 and 10, the new approach caused a decrease in loss in some cases, and a noticeable increase in loss in other cases. This noticeable increase might be due to the small-sized dataset of Snapchat. From Figure 9, we can see that the MLP method, when used on the Snapchat dataset, got the best results in terms of MSE when using both the four-feature set and the six-feature set. Figures 11 and 12 show the visualization results of the ExxonMobil Corporation. From Figure 10, we can also see that the MLP method, when used on the Snapchat dataset, got the best results in terms of MAPE when using the six-feature set, and the CNN-LSTM method, when used on the Snapchat dataset, got the best results in terms of MAPE using the four-feature set. As shown in Figures 9 and 10, the new approach caused a decrease in loss in some cases, and a noticeable increase in loss in other cases. This noticeable increase might be due to the small-sized dataset of Snapchat. From Figure 9, we can see that the MLP method, when used on the Snapchat dataset, got the best results in terms of MSE when using both the four-feature set and the six-feature set. Figures 11 and 12 show the visualization results of the ExxonMobil Corporation. From Figure 10, we can also see that the MLP method, when used on the Snapchat dataset, got the best results in terms of MAPE when using the six-feature set, and the CNN-LSTM method, when used on the Snapchat dataset, got the best results in terms of MAPE using the four-feature set.   Figures 11 and 12 show that the new approach was usually beneficial in terms of decreasing the loss of the MSE and the MAPE metrics. The results in Figures 11 and 12 show that the CNN model benefited the most in comparison to other models. From Figure  11, we can also see that the CNN method, when used on the ExxonMobil dataset, got the best results according to the MAPE metric when using the four-feature set, and with regard to the CNN-LSTM method, when used on the ExxonMobil dataset, it got the best results according to the MAPE metric when using the four-feature set.
Finally, the proposed new approach showed promising results that could help create better and more accurate stock predictions; however, it is not perfect, and in some cases, it could lead to a slight increase in loss, but as with any new technique, the proposed approach needs more research. By adding these essential features, we increased the algorithm's effectiveness in each area. Automatically selected features are an essential collec-   Figures 11 and 12 show that the new approach was usually beneficial in terms of decreasing the loss of the MSE and the MAPE metrics. The results in Figures 11 and 12 show that the CNN model benefited the most in comparison to other models. From Figure  11, we can also see that the CNN method, when used on the ExxonMobil dataset, got the best results according to the MAPE metric when using the four-feature set, and with regard to the CNN-LSTM method, when used on the ExxonMobil dataset, it got the best results according to the MAPE metric when using the four-feature set.
Finally, the proposed new approach showed promising results that could help create better and more accurate stock predictions; however, it is not perfect, and in some cases, it could lead to a slight increase in loss, but as with any new technique, the proposed approach needs more research. By adding these essential features, we increased the algorithm's effectiveness in each area. Automatically selected features are an essential collec-  Figures 11 and 12 show that the new approach was usually beneficial in terms of decreasing the loss of the MSE and the MAPE metrics. The results in Figures 11 and 12 show that the CNN model benefited the most in comparison to other models. From Figure 11, we can also see that the CNN method, when used on the ExxonMobil dataset, got the best results according to the MAPE metric when using the four-feature set, and with regard to the CNN-LSTM method, when used on the ExxonMobil dataset, it got the best results according to the MAPE metric when using the four-feature set.
Finally, the proposed new approach showed promising results that could help create better and more accurate stock predictions; however, it is not perfect, and in some cases, it could lead to a slight increase in loss, but as with any new technique, the proposed approach needs more research. By adding these essential features, we increased the algorithm's effectiveness in each area. Automatically selected features are an essential collection of techniques to use when preparing the dataset. In this work, we learned about feature selection, the advantages of primary feature classification, and how to use these techniques to their full potential.

Discussion
In this section, the relation between deep learning-based stock price forecasting methods and open innovation is presented.
Little research has focused on projecting daily stock market returns, especially when utilizing vital machine learning approaches such as deep neural networks (DNNs) [80,81]. The operations of these created economic models require two-factor and three-factor financial analysis to examine the dynamics of the company's profitability [82,83]. The suggested model in [84] is based on the convergence of deterministic financial analysis methods that are included in the DuPont model, and simulation methods that allow analysis with random components. A good prediction of a stock's future price might provide significant profit. When projecting stock trends in prior years, many methodologies were used [44,47].

Implication
Predicting the future has been a dream for most economies and people due to the benefits that it may bring. Predicting stock price movements will also benefit those interested in researching stock market prediction. Artificial intelligence will present researchers with forecasts that are more accurate than ever. It will also become more accurate as technology and algorithms become more advanced over time. The results of this study reveal that the new technique of using six variables (High, Low, Open, Volume, HiLo, and OpSe) improves the models' outcomes, showing fewer losses compared with the original approach, which utilizes the original feature-set (High, Low, Open, Volume). The paper also showed that LSTM-based models improved much more using the new approach, even though all models showed a comparative result wherein no model showed far better results or continuously outperformed the other models; thus, overall, feature engineering proved to benefit the models. It is proven that feature engineering should be considered as an essential step in terms of designing better learning models. In this work, we learned about feature selection, the advantages of primary feature classification, and how to use these techniques to their full potential.

Limits and Future Reserch Topic
The research limitations include: depending only on a basic deep learning model, as the research did not investigate using transformer-based approaches or transfer learning; and finally, the research area of time series analysis does not have a big pre-trained model such as BERT in NLP, and DALL-E2 in the computer vision domain. Thus, this area might be covered better in future work. Improved machine learning and deep learning methods might be proposed to tackle the current weaknesses and the low performance in some tested cases. Moreover, other test cases can be considered in order to validate the performance of the proposed method. Future work could build on this work by using a more advanced deep learning approach, or by using a hybrid model that uses both stock price indexes and sentiment news analysis to improve the results by including more features that the models could benefit from.