LSTM in Algorithmic Investment Strategies on BTC and S&P500 Index

We use LSTM networks to forecast the value of the BTC and S&P500 index, using data from 2013 to the end of 2020, with the following frequencies: daily, 1 h, and 15 min data. We introduce our innovative loss function, which improves the usefulness of the forecasting ability of the LSTM model in algorithmic investment strategies. Based on the forecasts from the LSTM model we generate buy and sell investment signals, employ them in algorithmic investment strategies and create equity lines for our investment. For this purpose we use various combinations of LSTM models, optimized on in-sample period and tested on out-of-sample period, using rolling window approach. We pay special attention to data preprocessing in the input layer, to avoid overfitting in the estimation and optimization process, and assure correct selection of hyperparameters at the beginning of our tests. The next stage is devoted to the conjunction of signals from various frequencies into one ensemble model, and the selection of best combinations for the out-of-sample period, through optimization of the given criterion in a similar way as in the portfolio analysis. Finally, we perform a sensitivity analysis of the main parameters and hyperparameters of the model.


Introduction
The main aim of this paper is to explore deep learning possibilities in time series forecasting by applying buy/sell signals generated by the LSTM-type (Long Short-Term Memory) recurrent neural network to algorithmic investment strategies, tested on various frequencies of BTC (Bitcoin) and S&P500 index. We focus solely on LSTM networks, and compare its performance on various datasets, frequencies, selected hyperparameters and ensemble models, created by combining the aforementioned variables.
The main advantages and the novelty of our work can be divided into five important points, listed below. Firstly, the use of the newest Machine Learning (ML) methods (LSTM model) in algorithmic investment strategies (AIS) applied for cryptocurrency (BTC) and traditional equity index market (S&P500 index). Secondly, the indication of often encountered drawbacks occurring in paper testing of various algorithmic strategies. Thirdly, designing the proper architecture (initial hyperparameters tuning) of the LSTM model and testing the performance of AIS with comparison to the traditional Buy&Hold model (B&H). Fourthly, the use of various frequencies from daily to 15 min data in algorithmic investment strategies. Finally, the construction of an ensemble model, based on the combination of algorithmic investment strategies, on various frequencies applied for BTC and S&P500 index for separate and combined frequencies.

LSTM Research Literature
Papers describing various approaches to LSTM can be diveded on those referring to the theoretical aspects of LSTM model and the ones focusing mainly on LSTM and various ML models empirical properties, tested on various sets of data.
The first introduction of LSTM was presented in the paper written by Hochreiter and Schmidhuber (1997) [13]. By introducing Constant Error Carousel (CEC) units, LSTM can deal with the exploding and vanishing gradient problems. The initial version of the LSTM block included cells, input, and output gates. LSTM genuine feature was the ability to preserve information through the chain of iterations during training. The next theoretical advancement was introduced by Gers (1999) [14] who introduced the forget gate (also called "keep gate") into the LSTM architecture, enabling the network to reset its own state. Next, Gers et al. (2000) [15] added peephole connections, which are connections from the cell to the gates. Additionally, the output activation function was omitted. More recent advancements cover putting forward a simplified variant called Gated Recurrent Unit (GRU) by Chung et al. (2014) [16].
We can also find numerous studies presenting results of application the LSTM model, mostly in predicting stock prices. Chen et al. (2015) [24] implemented LSTM on China stock market. They collected data from stocks and divided percentage returns of prices into seven groups: (−∞, −1.5], . The main aim of the research was tosuccessfully predict a proper group for the next day return. In addition to returns data, they also used 10 different features: open, low, high, close prices and volume for a given stock, and the same five features for Shanghai Securities Composite Index. Model specification used in this research: 30 days sequence length, 'RMSprop' optimizer and learning rate 0.001. The best results measured by accuracy of predicted return group were given by model using all ten features achieving 27 [27], apply the LSTM model along with technical analysis indicators and get an average of 56% of accuracy in predicting directions of stocks movements in the near future. Bao et. al. (2017) [28] present a novel deep learning framework where wavelet transforms (WT), stacked autoencoders (SAEs) and long-short term memory (LSTM) are combined for stock price forecasting. Their model outperforms other similar models in both predictive accuracy and profitability performance. Vargas et al. (2018) [29] used LSTM with input variables as technical indicators, but divided them into two sets. Both had a sequence length equal to period of 5. They also used the text analysis of financial news, which divided the study into the next two subgroups. Test was performed on the Chevron Corporation stocks between 2006 and 2013. The test period covered the last 8 months of the total set, which was equal to approximately 11% of data. The hyperparameters used in that work included: 128 LSTM units, 1 LSTM layer (LSTM as an input, no additional hidden layers), and SGD (stochastic gradient descent algorithm) as optimizer. The overall result exposed a great advantage of the LSTM network, supported by the news analysis and first set of TIs, over the standard LSTM model with the same set of TIs. Nevertheless, both of them proved to generate higher returns than buy-and-hold strategy. Zhang et al. (2018b) [11] implemented LSTM model to predict the next day returns for China stocks. A different approach using LSTM model was presented in Sang and Di Pierro (2019) [30]. Instead of using prices or returns for predicting stock price movements, authors decided to use well known technical analysis trading strategies signals as features. Selected methods were: Simple Moving Average, Relative Strength Index and Moving Average Convergence Divergence. Dataset used in empirical study contained five stocks with highest capitalization in each from nine sectors of S&P500. Parameters used in final model were: one hidden layer, learning rate 0.001, 15 days sequence length. LSTM outperformed oscillators on six of nine sectors. Zhang et al. (2019) [31] presented AT-LSTM model which is combination of LSTM and Attention based model and provided results for three index datasets: Russell 2000, DJIA and NASDAQ. Kijewski andŚlepaczuk (2020) [32] compared the performance of classical techniques with LSTM model for S&P500 index on daily frequency from the last 20 years and showed that LSTM model results are not robust to initial hyperparameters assumptions.
One of the last approach which tested various machine learning techniques for time series forecasting problem was paper of Chlebus et al. (2020) [33] who applied the following methods: SVR, KNN, XGBoost, LightGBM, LSTM, ARIMA, ARIMAX with features coming from such classes like: technical analysis, fundamental analysis, Google Trends entries, markets related to Nvidia. The best performance was obtained by SVR based on stationary attributes.
At the end we have to say that many other authors have successfully verified that the LSTM network is able to perform better than many other popular time series prediction methods, examples include: (Gao and Chai (2018) [34], Dautel et al. (2020) [35], Fischer and Krauss (2018) [36], Shynkevich et al. (2017) [37]).

Terminology and Metrics
The main model used in this work is based on deep recurrent neural network, specifically on LSTM network. Performance of this type of networks proved to work very well with financial time series, and there have been extensive research put into testing LSTMs for stock returns forecasting and directional movements, as presented in literature review.
To train the network, a custom loss function had to be used as the base network performance metric in the training process. Apart from that, a set of strategy performance metrics was also calculated on the basis of equity line constructed from the investments based on single Buy/Sell signals. Sensitivity analysis was also implemented to show how changes in network hyperparameters and architecture affect the base case results. Additionally, ensemble models built on strategies with various frequencies and assets were tested at the end.

Lstm Model
LSTM networks are a type or recurrent neural networks (RNN) that can keep track of long term dependencies in data, allowing to partially solve vanishing gradient problem, typical for classic RNNs (Goodfellow et al. (2016) [38]). They are widely used to model sequential data such as text, speech and time series data. LSTM units are composed of memory cells, with each cell having three types of gates (input gate, output gate and forget gate). These gates use tanh and sigmoid functions to regulate the flow of information through the cell, deciding how much and which information should be stored in long term state, passed on to another step, or discarded. In our research, the input vector for the LSTM network (x t ), was a series of past observations form BTC and S&P500 data, and the output vector (h t ) was a single value predicted for the next period.
The architecture of LSTM network can be described as follows: where f t , i t and o t are activation vectors for three specific gates, h t is a hidden state (or output) vector, C t is a cell state vector, while b, U and V denote biases, input weights and recurrent weights of the network cells. Figure 1 shows the a single cell of typical LSTM network:

Specification of Our LSTM Model
Our model consists of three LSTM layers with 512/256/128 neurons respectively and one single neuron dense layer on the output. Each of LSTM layers is using tanh activation function, which allows to retain negative values. L2 kernel regularization (0.0005) and dropout (0.02) are also applied to each of these layers. Input shape of the data for the network was set to (sequence size, number of features), where only one feature was used as the input data-the simple returns of the tested frequency. The first two layers return sequences with the same shape as the input sequence (full sequence) and the last LSTM layer returns only the last output. To be able to use GPU acceleration during the training process, recurrent activation function was set to sigmoid and we did not use any recurrent dropout.
To train the model, we used Adam optimizer (Kingma and Ba (2017) [40])-a stochastic gradient descent optimizer with momentum (estimating first-order and second-order moments). The learning rate of the optimizer was set to 0.0015 (after tuning). Data was split into mini-batches (set to 80 after tuning), to allow the optimizer to work more efficiently. Such architecture allowed us to use the model efficiently across both datasets, as well as test it on different frequencies and apply sensitivity analysis to various hyperparameter settings.

New Loss Function
In order to avoid one of the most common drawbacks from papers testing AIS, we introduced our authorship loss function, which improves the usefulness of forecasting ability of LSTM model in algorithmic investment strategies (AIS).
Based on our previous research (e.g., Kijewski andŚlepaczuk (2020) [32]), and Vo andŚlepaczuk (2022) [41], we concluded that popular error metrics like RMSE, MSE, MAE, MAPE, %OP used in 99.9% of similar research are not proper error functions for the evaluating the efficiency of forecasting ability of the models tested in AIS. The reason is that the above mentioned error metrics evaluate only the accuracy of forecasts (i.e., difference between forecasted and observed value), which is often confused with the forecasting ability of investment signals in AIS built on these forecasts. It means that almost all these error metrics (RMSE, MSE, MAE, MAPE) are penalized no matter if the forecast error (forecast error =R i − R i ) was positive or negative while %OP metric does not take into account the magnitude of forecast error, but only its direction. For this reason, researchers in most of the other papers select not the most profitable combination of signals for the strategy, but the combination which only optimizes the selected error metric. Therefore, we propose new loss function called Mean Absolute Directional Loss (MADL) that can be calculated using the following formula: where: MADL is the Mean Absolute Directional Loss, R i is the observed return on interval i,R i is the predicted return on interval i, sign{X} is the function which gives the sign of X, abs{X} is the function which gives the absolute value of X and N is the number of forecasts. This way, the value the function returns will be equal to the observed return on investment with the predicted direction, which allows the model to tell if the prediction will yield profit or loss and how much this profit or loss will be. MADL was designed specifically for working with AIS's. The function in our model is minimized, so that if it returns the negative values the strategy will make a profit, and if it returns a positive value the strategy will generate a loss. MADL was the main loss function used in hyperparameters tuning and in the estimation of the LSTM model.
Most of the tuning was done using the KerasTuner framework (O'Malley et al. (2019) [42]) allowing automated parameter selection using Hyperband search algorithm (Li et al. (2018) [43]). This approach allowed us to test how changes to several parameters at once would affect the network performance, instead of testing each hyperparameter separately. Results are presented in Table 1. In addition, we also conducted a careful manual sensitivity analysis on the parameters that had the most impact on the results.

Training Process
For training and prediction we used a walk forward predictions/rolling window approach. This allowed us to make sure that the network will not overfit, as it was trained and tested multiple times, across various sets of data. Model was trained on approximately three years of data (equal to train set length) and then it was used for predictions over the next 3 months (equal to test set length). During that period, a single return value was predicted each time, based on the last 14/20 (sequence length) values. After making the predictions, the window was moved ahead, by the number of periods equal to test set and the model was retrained from scratch.
A single iteration was trained for 40 epochs. Model checkpoint callback function was used to store the best weights (parameters) of the model, based on the lowest loss function value from all trained epochs. These weights were then used for prediction on the test set data.

Research Description
During this research the following steps were performed: • The division into in-sample (training and validation) and into out-of-sample (test) samples, set to 1371/90 observations for BTC and 948/65 observations for S&P500. The combination of signals across different frequencies (1 d, 1 h and 15 min) and asset classes (equity-S&P500 index and cryptocurrency-BTC), results provided in Section 6.

Performance Metrics
In order to evaluate the efficiency of tested strategies we calculate the following performance metrics based on Kosc et al. (2019) [45] and Bui andŚlepaczuk (2021) [46].

•
Annualized Return Compounded (ARC), which shows annualized rate of return for the given instrument (strategy), over the period (0, . . . , T): where P T is the price of the given instrument at the end of interval T, P 0 is its current price and the scale parameter S is equal to the number of trading periods during a year for a given frequency. • Annualized Standard Deviation (ASD) is the most common risk measure showing the annualized deviation of returns from their long-term average: whereR is the average simple daily return of the given instrument and the scale parameter S is equal to the number of trading periods during a year for a given frequency. • Maximum Drawdown (MD) which informs us about maximum percentage drawdown during the investment period: • Maximum Loss Duration (MLD) which informs us about maximum number of years between the previous local maximum to the forthcoming local maximum: for which Val m j > Val m i and j > i, where m j and m i are the numbers of days indicating consecutive local maximum of equity line, Val m j and Val m i are the values of local maximums in days m j and m i , respectively. The scale parameter S is equal to the number of trading periods during a year for a given frequency. • Information Ratio (IR * ) which describes the relation of the portfolio annualized rate of return to its annualized standard deviation: • Modified Information Ratio (IR * * ) which takes into account the sign of the ARC metric: • Aggregated Information Ratio (IR * * * ) which we regard this as the most important in evaluation of results of this study: • Number of observations (nObs) which is the length of the investment horizon in trading days. • Number of trades (nTrades).

Data Description
As input data for the network we used simple returns, based on one minute data for both Bitcoin (BTC) and S&P500, from 1 April 2013 to 31 December 2020 (source for the BTC data: Kraken, Bitfinex, BTC-e, CEX and Coinbase exchanges. Source for the S&P500 data: Interactive Brokers API). Lower frequency returns were aggregated from the minute returns data. The descriptive statistics for time series on daily, hourly and 15 min frequency are presented in Table 2.
For the size of the training set we used 1371 observations for BTC and 948 observations for S&P500, (after tuning). Validation set size was 33% of the training set. Test sets, and also rolling window, size was set to 90 for BTC and 65 observations for S&P500 (after turning).
Input sequence size for LSTM network was set to 20 for BTC and 14 for S&P500 and the batch size was set to 80.
The output of the model was a single number predicting the next return value. Based on the sign of the predicted return value we assigned −1, 0, 1 signals. However, the cases where network predicted the 0 return value (resulting in a neutral signal) were negligible.
The hours of trading for S&P500 index were between 3.30 p.m. CET and 10.00 p.m. CET, from Monday to Friday excluding official holidays, while BTC was traded 24 h per day, 7 days a week.  Table 3 shows the results of strategies for BTC and S&P500 on daily frequency, using a classic MSE loss function and our novel MADL loss function. MSE was used as our starting point in comparing the performance of LSTM networks but after thorough consideration placed in Section 3.3.1 and comparison placed in Table 3 we saw that results based on MADL function are much better in terms of maximizing of our risk-adjusted return metrics (IR * , IR * * , and IR * * * ) for tested algorithmic investment strategies. Therefore, in the next steps, we use our novel MADL function.   Table 4 shows that the best results for BTC on daily frequency with regards to aggregated IR were obtained by Long/Short strategy (IR * * * = 21.23), but Long Only had very similar results (IR * * * = 19.56), which were much better than for Buy&Hold strategy on BTC (IR * * * = 1.19). The best results for Long/Short and Long Only strategies were possible mainly because the joint improvement of ARC, MD, and MLD indicators.

Results for the Base Case Scenario
Panel B in Table 4 shows that the best results for BTC on hourly frequency, with regards to aggregated IR, were obtained by Long Only strategy (IR * * * = 7.12), while Long/Short and BTC had much worse results (IR * * * = 0.49) and (IR * * * = 1.13).
Panel C in Table 4 shows that the best results for BTC on 15 min frequency, with regards to aggregated IR, were obtained by BTC strategy (IR * * * = 1.11) while Long/Short and Long Only had much worse results (IR * * * = −0.05) and (IR * * * = 0.05).
Panel D, E and F in Table 4 summarize the results for strategies on S&P500 index. Panel D for daily frequency shows that the best results were obtained for S&P500 and Long Only (IR * * * = 0.09 and IR * * * = 0.09). Panel E for hourly frequency shows that the best results were obtained for S&P500 (IR * * * = 0.05). Panel F for 15 min frequency shows that the best results were obtained for Long Only (IR * * * = 0.38).  Figure 2 presents the equity lines for all frequencies for BTC and S&P500 index. Panel A on Figure 2 shows the fluctuations of equity lines for tested strategies, and confirms the results presented in Table 1. The equity lines for Long/Short and Long Only strategies climb higher in a much smoother way than for BTC. The similar confirmation can be seen in Panels B and C of Figure 2 for BTC and in Panels D, E and F of Figure 2 for S&P500 index. Table 5 shows the results of a test of significance of α and β from the regression in the form of R t = α + βR * t + ε t , where R t is the buy and hold returns, and R * t returns from Long/Short and Long Only strategies, and test whether α = 0 using standard tools and additionally one paragraph with the interpretation of results. Generally, the results presented in Table 5 confirm the ones presented in Table 4, while the slight differences come from different approach to risk metrics, mainly MD and MLD.
Summarizing the results for investment strategies on BTC in the base case scenario, we should underline that on the daily frequency the best results were obtained for Long/Short and Long Only, on hourly frequency for Long Only, and 15 min frequency for BTC. Therefore, we can note that in case of BTC the results worsen when we change the frequency from daily to hourly and then to 15 min.
Slightly different situation can be observed for investment strategies on S&P500 index in the base case scenario. We can notice that on the daily frequency the best results were obtained for S&P500 and Long Only, on hourly frequency for S&P500, and 15 min frequency for Long Only. Overall, we can note that in case of S&P500 index the best results were for 15 min, then for 1 d and lastly for 1 h. Additionally, we see that LO is much better than LS.
where R t is the return for tested strategy in period t and R * t is the return in of BTC or S&P500 strategies. Regressions were calculated in the period between 1 January 2017 and 31 December 2020. The hyperparameters of LSTM model for the the base case scenario were set as it was described in Table 1. Asterisks *, ** and *** denote statistical significance at the 10%, 1% and 0.1%, respectively.

Sensitivity Analysis
In order to properly refer to obtained results, we have to check their robustness with regards to all crucial hyperparameters that were selected at the beginning. Therefore, in this section we checked the sensitivity of results based on the changes to the following hyperparameters: dropout rate, sequence length, train set length and batch size, changing them one by one, ceteris paribus. Due to the very long time of computations, we were not able to perform the analysis for all the frequencies (especially for 15 min data). Therefore we decided to present it only for hourly frequency. We check the sensitivity of final investment strategies to the changes of the following values of above mentioned hyperparameters: • Dropout rate: from 2% to 1% and 4%. • Sequence length. • Batch size: from 80 to 40 and 160. Table 6 presents the aggregated results of sensitivity analysis for 1 h data for BTC and S&P500 for Long/Short and Long Only strategies. Note: BTC and S&P500 stand for the benchmark strategies, i.e., Buy&Hold applied for BTC and S&P500 prices, respectively. LS stands for the investment strategy with long and short signals. LO stands for the investment strategy with long only signals. dr001, dr002, dr004 is the abbreviation for dropout rates equal to 1%, 2%, and 4%. The table presents the results in the period between 1 January 2017 and 31 December 2020 for daily frequency.

Sensitivity Analysis for 1 h Data-Dropout Rate
The hyperparameters of LSTM model for the base case scenario were set as it was described in Table 1.
The short summary of the sensitivity of tested strategies to the changes in dropout rate is listed below. In case of Long/Short for BTC, the most efficient dropout was 2%, i.e., the one selected during hyperparameters tuning. The results of the model are not robust to slight changes in dropout rate what can be additionally seen in Panel A of Figure 3.
The most efficient dropout in case of Long Only for BTC was 2%, i.e., once again the one selected during hyperparameters tuning. The results of the model are rather robust to slight changes in dropout rate (Panel B of Figure 3).
The results of sensitivity analysis to the changes in dropout rate differ when we take into account the S&P500 index. The most efficient dropout in case of Long/Short strategy was 1%, while 2% selected during hyperparameters tuning was the least efficient. The results of the model were quite robust to slight changes in dropout rate, In the case of Long Only for S&P500 index, the most efficient dropout was 1%, but 2% selected during hyperparameters tuning gives almost the same results. The results of the model are robust to slight changes in dropout rate.  Figure 3. Sensitivity analysis for 1 h data with regard to the dropout rate. Note: BTC and S&P500 stand for the benchmark strategies, i.e., Buy&Hold applied for BTC and S&P500 prices, respectively. dr001, dr002, dr004 is the abbreviation for dropout rates equal to 1%, 2%, and 4%. The plot presents the fluctuations of equity lines in the period between 1 January 2017 and 31 December 2020 for daily frequency. The hyperparameters of LSTM model for the base case scenario were set as it was described in Table 1 Summarizing sensitivity of tested strategies to the changes in dropout rate we can say that strategies for BTC were not robust, while they were robust for S&P500 index. Moreover, we noticed that the parameters selected during hyperparameters tuning were still the best after sensitivity analysis in case of BTC strategies and it was not the case for S&P500 index.

Sensitivity Analysis for 1 h Data-Sequence Length
Details of sensitivity analysis to the changes in sequence length are presented in Table 7 and Figure 4.
Panel A of Table 7 and Figure 4 show that in case of Long/Short for BTC the most efficient sequence length was 20, i.e., the one selected during hyperparameters tuning and that the results of the model were not robust to slight changes in sequence length.
In the case of a Long Only strategy for BTC (Panel B of Table 7 and Figure 4 the most efficient sequence length was 20, i.e., the one selected during hyperparameters tuning and the results of the model were not robust to slight changes in sequence length.
The results for S&P500 index were slightly different because the sensitivity analysis showed that the best sequence length for Long/Short (Panel C of Table 7 and Figure 4 and Long Only (Panel D of Table 7 and Figure 4 was 7, while 14 selected during hyperparameters tuning was the least efficient. Moreover, the results of the model were not robust to slight changes in dropout rate.   Figure 4. Sensitivity analysis for 1 h data with regard to sequence length. Note: BTC and S&P500 stand for the benchmark strategies, i.e., Buy&Hold applied for BTC and S&P500 prices, respectively. seq10, seq20, seq40 is the abbreviation for sequence lengths equal to 10, 20, and 40. The plot presents the fluctuations of equity lines in the period between 1 January 2017 and 31 December 2020 for daily frequency. The hyperparameters of LSTM model for the base case scenario were set as it was described in Table 1 Summarizing the results of sensitivity analysis to the changes in Sequence length we can say that strategies for BTC nor for S&P500 index were not robust. Moreover, we noticed that the parameters selected during hyperparameters tuning were still the best after sensitivity analysis in case of BTC strategies and it was not the case for S&P500 index. Table 8 and Figure 5 show the results of sensitivity analysis to the changes in train set length. Note: BTC and S&P500 stand for the benchmark strategies, i.e., Buy&Hold applied for BTC and S&P500 prices, respectively. LS stands for the investment strategy with long and short signals. LO stands for the investment strategy with long only signals. seq10, seq20, seq40, seq07, seq14, seq28 is the abbreviation for sequence lengths equal to 10, 20, 40, 7, 14, and 28. The table presents the results in the period between 1 January 2017 and 31 December 2020 for daily frequency. The hyperparameters of LSTM model for the base case scenario were set as it was described in Table 1.

Sensitivity Analysis for 1 h Data-Train Set Length
Panel A of Table 8 and Figure 5 presenting the results for Long/Short strategy for BTC and panel B of Table 8 and Figure 5 presenting the results for Long Only strategy for BTC informs us that the most efficient Train Set length was 1371, i.e., the one selected during hyperparameters tuning. The results of the model were not robust even to slight changes in sequence length.  Figure 5. Sensitivity analysis for 1 h data with regard to the train set length. Note: BTC and S&P500 stand for the benchmark strategies, i.e., Buy&Hold applied for BTC and S&P500 prices, respectively. tr0685, tr1371, tr2742, tr0474, tr0948, tr1896 is the abbreviation for train set lengths equal to 685, 1371, 2742, 474, 948 and 1896. The plot presents the fluctuations of equity lines in the period between 1 January 2017 and 31 December 2020 for daily frequency. The hyperparameters of LSTM model for the base case scenario were set as it was described in Table 1 The results for S&P500 index are once again slightly different. In case of Long/Short (Panel C of Table 8 and Figure 5) and Long Only (Panel D of Table 8 and Figure 5) the most efficient train set length was 1896, while 948 selected during hyperparameters tuning was the least efficient. The results of the model were not robust to slight changes in train set length.
Summarizing the results of sensitivity analysis to the changes in Train set length we can say that strategies for BTC nor for S&P500 index were not robust. Moreover, we noticed that the parameters selected during hyperparameters tuning were still the best after sensitivity analysis in case of BTC strategies and it was not the case for S&P500 index.

Sensitivity Analysis for 1 h Data-Batch Size
The last part of sensitivity analysis is summarized in Table 9 and Figure 6. The results of sensitivity analysis to the changes in Batch size for BTC for Long/Short (Panel A of Table 9 and Figure 6) and Long Only (Panel B of Table 9 and Figure 6) show that the most efficient Batch Size was 80, i.e., the one selected during hyperparameters tuning. The results of the model were not robust to slight changes in batch size. Note: BTC and S&P500 stand for the benchmark strategies, i.e., Buy&Hold applied for BTC and S&P500 prices, respectively. LS stands for the investment strategy with long and short signals. LO stands for the investment strategy with long only signals. bs040, bs080, bs160 is the abbreviation for batch sizes equal to 40, 80 and 160. The table presents the results in the period between 1 January 2017 and 31 December 2020 for daily frequency.
The hyperparameters of LSTM model for the base case scenario were set as it was described in Table 1.
The results of sensitivity analysis to the changes in Batch size for S&P500 index for Long/Short (Panel C of Table 9 and Figure 6) and Long Only (Panel D of Table 9 and Figure 6) show that the most efficient Batch Size was 40, while 80 selected during hyperparameters tuning was the least efficient. The results of the model were not robust to slight changes in batch size.
Summarizing the results of sensitivity analysis to the changes in Batch size we can say that strategies for BTC nor for S&P500 index were not robust. Moreover, we noticed that the parameters selected during hyperparameters tuning were still the best after sensitivity analysis in case of BTC strategies and it was not the case for S&P500 index.
Overall the results of sensitivity analysis inform us that it is possible that hyperparameters tuning procedure was correct for BTC but should be improved for S&P500.

Combined Model on Different Frequencies and Different Assets
In order to smooth our equity lines and use limited correlations between AIS on various frequencies and various types of assets, we decided to create ensemble models built from three frequencies (1 d, 1, and 15 min) and/or two types of assets (BTC and S&P500).
We However, due to very similar results, we have decided to present the results only for approach #1. Other aspects of construction of equity lines stays as it was for the base case in the main results section. Table 10 and Figure 7 summarize the result of ensemble models for BTC and S&P500 index. Panel A of Table 10 with ensemble model for combined frequencies on BTC, shows that the most efficient results can be obtained for Long Only strategies and for BTC strategy. The similar results are presented in Panel B of Table 10 for S&P500 index. Overall, we can notice that models ensembled across frequencies results in lower volatility and smoother equity lines.
Summarizing the results for ensemble model built on different frequencies and different assets presented in Panel C, D, E, and F of Table 10 we can stress the following: • The combination of weights equal to {S&P500 = W20%, BTC = 80%} was always better than {S&P500 = W10%, BTC = W90%}. • The length of rebalancing period equal to RB6m was always better than RB3m.    Panels A, B, C, D, E, and F of Figure 7 confirm the observations from Table 10. As a conclusion for this section, we can say that combined results for ensemble model suggest rare rebalancing of assets and higher weight of BTC in the optimal portfolio for investment strategies.

Conclusions
This research aimed to test LSTM networks in forecasting the value of the BTC and S&P 500 index on the data from 2013 to the end of 2020 on data with the following frequencies: daily, 1 h and 15 min. Based on the forecasts from LSTM models we generated buy and sell investment signals, used them in algorithmic investment strategies and created equity lines for our investment. For this purpose we used various combination of LSTM models optimized on in-sample period and tested on out-of-sample period with rolling window approach. We paid special attention to data preprocessing in the input layer, to avoiding overfitting in the estimation and optimization process, and we assured correct selection of hyperparameters at the beginning of our tests. We introduced our authorship loss function with better utilizes the forecasting ability of LSTM model in algorithmic investment strategies. Then we performed the sensitivity analysis of the main parameters and hyperparameters. In the final step, we combined the signals from various frequencies into one ensemble model.
In this paragraph we refer to Research Hypotheses formulated at the beginning of this research.
The first hypothesis (H1): The signals from LSTM model employed in AIS are more efficient than Buy&Hold approach regardless of asset class tested,holds only for some of BTC strategies (1 d_LS, 1 d_LO, and 1 h_LO) and some of S&P500 strategies (1 d_LO, 15 min_LS, and 15 min_LO). Therefore, we reject H1.
The second hypothesis (H2): The signals from LSTM model employed in AIS are more efficient than Buy&Hold approach regardless of data frequency tested, holds only for daily data in case of BTC and 15 min data in case of S&P500. Therefore, we reject H2.
The third hypothesis (H3): The signals from LSTM model employed in AIS are more efficient in case of BTC than in case of S&P500 index, has to be rejected as well because for various frequencies we obtain different results.
The fourth hypothesis (H4): The robustness of tested models to various hyperparameters does not depend on asset class tested, enables us to state that the results were not robust for BTC nor for S&P500 but in significantly different way. Nevertheless, we can not reject the fourth hypothesis. Finally, we were not able to reject the fifth hypothesis (H5): The ensemble model constructed as a combination of ML models with various frequencies and assets can produce better risk adjusted returns than single models, because Long only ensemble strategies performed best.
Summarizing the most important conclusions from this paper, we can state that the efficiency of LSTM in algorithmic investment strategies strictly depends on the hyperparameters tuning procedure, the construction of the model and the estimation process. Moreover, the proper loss function is crucial in the model estimation process. What is more, the results are dependent on asset classes and frequencies used. Finally, we noticed that the results are not robust to initial assumptions.
Possible research extensions of this paper could cover more extensive sensitivity analysis especially with regards to parameters and hyperparameters which were not tested in this study, the construction of alternative loss functions improving the problems identified with regards to common error measures (one of the drawbacks of using MADL as a loss function is that it's not easily optimized), the use of various high frequency data, the repetition of the whole research with transaction costs included in the estimation process and finally, more careful hyperparameters tuning process, especially in case of S&P500. Data Availability Statement: All relevant data along with the sources are presented in the paper.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: