Equity-Market-Neutral Strategy Portfolio Construction Using LSTM-Based Stock Prediction and Selection: An Application to S&P500 Consumer Staples Stocks

: In recent years, a great deal of attention has been devoted to the use of neural networks in portfolio management, particularly in the prediction of stock prices. Building a more proﬁtable portfolio with less risk has always been a challenging task. In this study, we propose a model to build a portfolio according to an equity-market-neutral (EMN) investment strategy. In this portfolio, the selection of stocks comprises two steps: a prediction of the individual returns of stocks using LSTM neural network, followed by a ranking of these stocks according to their predicted returns. The stocks with the best predicted returns and those with the worst predicted returns constitute, respectively, the long side and the short side of the portfolio to be built. The proposed model has two key beneﬁts. First, data from historical quotes and technical and fundamental indicators are used in the LSTM network to provide good predictions. Second, the EMN strategy allows for the funding of long-position stocks by short-sell-position stocks, thus hedging the market risk. The results show that the built portfolios performed better compared to the benchmarks. Nonetheless, performance slowed down during the COVID-19 pandemic.


Introduction
According to the market efficiency hypothesis developed by Fama (1970), it is impossible to make accurate predictions about stock prices in the future, because the current prices of financial assets reflect all the information that is available, and thus there is no such thing as an undervalued or overvalued stock. Nonetheless, many empirical studies have debunked this hypothesis, and have shown that, with some methods and techniques, it is possible to make good predictions about future stock prices. Most of these techniques use historical stock prices and/or financial information from the issuing companies, as part of two well-known types of stock analysis: fundamental analysis and technical analysis (Carhart 1997).
Since the foundational work of Markowitz (1952), which established the mathematical foundations of portfolio construction, many statistical and econometric models have been developed in order to predict the future prices and returns of financial assets, such as the Capital Asset Pricing Model (CAPM) in 1961, the Three Factor Model by Fama and French (1993), the Four Factor Model by Carhart (1997), the Autoregressive Model (AR) by Yule (1926), the Moving Average process (MA) by Wold in 1938 (Neyman 1939), the ARMA model by Box and Jenkins (1970), the ARIMA in 1976, the ARCH by Engle (1982), and the GARCH by Bollerslev (1986). All of these statistical models are based on assumptions relating to data, such as normality and stationarity.
With the technological and algorithmic evolution, rapid advances in artificial intelligence technologies, the development of processors with high computing capacity, large is that the LSTM structure is more able to adapt to financial time series. Jiang (2021) also conducted a literature review on the application of DL in stock prediction by studying more than 120 research papers from 2017, 2018, and 2019. He found that RNN models, including LSTM, are more commonly used than other models. Not only did he reveal that LSTM is popular in financial stock predictions, but he also demonstrated its predicting power. In their literature review of Forex and the prediction of stock prices, Hu et al. (2021) used data from the DBLP database and Microsoft Academic between 2015 and 2021, and found that all 27 papers that used LSTM agreed that LSTM neural networks outperform other models, or that it is at least capable of obtaining good prediction results.
Many research articles have used LSTM neural networks in applications related to stocks. Most of these studies apply LSTM to the prediction of stock prices with different study characteristics, such as the learning data period, the prediction horizon, the number of variables in the study, the frequency of the historical data used (intraday, daily, weekly, or monthly), the nature of the used variables (OCHLV prices (Open, Close, High, Low, and Volume), technical, fundamental, feeling analysis or macroeconomics), different hyperparameters, and different LSTM networks settings (Naik and Mohan 2019;Qiu et al. 2020;Ding and Qin 2020;Ghosh et al. 2019). Other research uses LSTM networks in the prediction of index prices, since they are less volatile than stocks and constitute a set of structurally linked stocks (in terms of sector, industry, size, etc.) (Michańków et al. 2022;Tfaily and Fouad 2022). The application of LSTM networks is not limited to the prediction of financial asset prices, but it is also used in the prediction of the direction of price trends. In fact, several studies have used LSTM to predict the rise or the fall of stock prices by transforming the regression problem to a classification problem with other metrics for performance measurement (Patel et al. 2015;Yao et al. 2018).
However, less research has focused on portfolio construction and asset allocation methods that use LSTM neural networks. Indeed, the real challenge for portfolio managers is to figure out the best investment strategy for building a profitable portfolio with less risk. The stocks need to be selected in such a way as to ensure optimal capital allocation. Managers not only strive for accuracy in their stock price predictions in order to build their portfolios, but also need to account for other considerations, such as the number of stocks their portfolios must contain to diversify away idiosyncratic risk, and questions such as how to hedge against systemic market risk, how to allocate capital among the stocks on the portfolio to maximize its profitability, and how to fund the acquisition of long positions. Chaweewanchon and Chaysiri (2022) proposed a hybrid model, R-CNN-BiLSTM (BiL-STM is an improved version of LSTM), to build a mean-variance (MV) optimal portfolio containing stocks that obtained the best predicted returns. CNN networks are used to extract the data's important characteristics and the BiLSTM networks are used to predict prices. The model these authors propose is compared to other reference models that use mean-variance optimization or equal weights to allocate capital on one side, and either LSTM or BiLSTM to select stocks on the other side. The authors used the following metrics to evaluate the portfolio performance: the mean return, the standard deviation, and the Sharpe ratio. Their experiments on the SET50 index of Thailand's stock exchange between 2015 and 2020 demonstrate that BiLSTM outperforms other techniques. They also demonstrated that models that use "robust" inputs (i.e., those undergoing raw prices transformations) outperform those that directly use closing prices. This study also concluded that portfolios built with the support of LSTM or BiLSTM models outperform portfolios where stocks are randomly selected. Sen et al. (2021) built portfolios containing five stocks each from the seven sectors that are part of the National Stock Exchange (NSE) in India. To achieve this, they used the OCHLV historical prices of the chosen stocks for the previous five years (from 2016 to 2020) to train the LSTM neural networks, and implemented a test period from 1 January to 1 June 2021. Two portfolios were built for each sector: a minimum risk portfolio and an optimal risk portfolio, according to Markowitz minimum variance optimization. LSTM networks were used to predict stock prices that were then used to calculate portfolios returns. The results demonstrated that LSTM performed well when the actual returns were compared to the predicted returns. Zhang and Tan (2018) proposed a new model for stock selection, referred to as "Deep Stock Ranker", to build a stock portfolio. Their model uses LSTM networks to predict future returns rankings based on the OCHLV historical daily raw prices of all the stocks listed in the Chinese market A-Share between 2006 and 2017. The authors built two portfolios: a portfolio with an equally weighted selection strategy of the top 100 stocks according to the obtained ranking, and another portfolio consisting of all the stocks in a score-weighted fashion, i.e., with the weight of each stock being proportional to a score given according to the stock's position in the obtained ranking. The performance of these two portfolios was compared to the performance of other portfolios generated using other models and techniques. It was measured during a three-year test period (from 2015 to 2017) using the following metrics: the information coefficient (IC), active return (AR), and information ratio (IR). The authors found that the portfolio using the equally weighted selection with raw price data outperformed the other portfolios. Touzani and Douzi (2021) proposed a trading strategy for some stocks in the Moroccan stock exchange using LSTM and GRU in the short term and the long term. To overcome the liquidity problem, the small number of listed stocks (76 stocks), and the low volume negotiated in the Moroccan market, the authors trained the model on data from the S&P500 index and the CAC40 index in the French stock exchange. Validation and other processes were performed on data from the Moroccan stock exchange. The trading strategy involved buying or selling a stock depending on how a function of the predicted price and the actual price compare to a certain calibrated threshold. Finally, two stocks were chosen to construct a portfolio and assess its performance during a test period from March 2019 to March 2021. Their results showed that the portfolio they built generated an annual return of 27.13%, and thus outperformed all the utilized benchmarks, except for the "Software and IT services" index, which achieved a high return during the COVID-19 period. Liu et al. (2017) presented a trading strategy based on a hybrid model combining CNN and LSTM. CNN was used to select stocks and LSTM was used to manage the timing of opening or closing a position as part of a long-short strategy. To achieve this, the authors used OCHLV prices and returns data related to stocks in the Chinese Exchange. The training period ran from 2007 to 2013, and the test period from January 2014 to March 2017. They found that their strategy was more profitable than the benchmark and a simple momentumbased strategy (which stipulates that the stocks that performed best in the last three to twelve months will continue to perform well for the next few months, and that the reverse is also true). Hou et al. (2020) proposed a hybrid LSTM-DNN model by integrating 18 monthly returns in LSTM and 19 fundamental variables in DNN to build a portfolio with a longshort strategy. The authors tested the model on 1398 stocks listed on NYSE, AMEX, and NASDAQ from 1977 to 2018. The portfolio was rebalanced in each period by buying the stocks with the highest predicted returns in the top decile and selling those in the lowest decile of the predicted returns ranking. To assess the model's performance, two metrics were used: the average monthly return and the Sharpe ratio. The results demonstrated that this model outperformed other OLS and DNN reference models.
Cipiloglu Yildiz and Yildiz (2022) used LSTM to predict the prices of stocks in the Turkish BIST30 using monthly OCHLV data from May 2000 to June 2019. They calculated the predicted returns to infer price trends. Portfolios were built using stocks with predicted returns above a certain threshold. Among the five methods used for weighting, equal weighting and minimum variance were used. The metrics used to evaluate the portfolios' performance were the Sharpe ratio, maximum drawdown, and conditional VaR. The results show that portfolios using LSTM outperformed the other portfolios and the benchmarks. Yi et al. (2022) proposed a model named "IntelliPortfolio", which is geared toward building a portfolio within the framework of Enhanced Index Tracking (EIT). The portfolio is constructed in two steps: the first step involves stock selection using principal component analysis (PCA) and the k-means clustering algorithm, and the second step comprises weight calculation using LSTM neural networks. Testing was performed on daily prices and some fundamental indicators of five stock exchange indexes from 2009 to 2018. The model was tested with four performance indicators (the tracking error, excess return, information ratio, and Sharpe Ratio) over the last 60 days of the sample. The model was compared to five existing models in the literature and the results show that it outperformed them.
This literature review can be summarized in the following Table 1: The literature offers many studies focused on the prediction of prices and the direction of change or stocks returns, but the integration of predictions in portfolio construction is a research subject that has not yet been adequately explored. Furthermore, research that uses predictions as part of equity-market-neutral (EMN) alternative investment strategies is quite rare. Hence, it would be beneficial to have a comprehensive framework combining prediction, stock selection, and capital allocation to build a portfolio with an EMN strategy and offer a detailed performance analysis.
Based on the previous literature review, it is evident that LSTM neural networks outperform other methodologies in comparative model studies. Furthermore, in portfolio management research, portfolios constructed using LSTM networks outperform their benchmarks. Therefore, to answer the previously stated research question, the following research hypotheses are formulated: Hypothesis 1. A portfolio constructed based on the EMN investment strategy, which utilizes LSTM neural networks to forecast returns, outperforms both benchmarks: sector index and market index. The performance is measured in terms of risk-adjusted returns, expressed by metrics such as Sharpe ratio, Sortino ratio, Treynor ratio, Omega ratio, and Calmar Ratio.
Hypothesis 2. Within a portfolio built according to an EMN investment strategy that utilizes LSTM neural networks, enriching input data enhances the portfolio's performance in terms of riskadjusted returns. The enrichment of input data is achieved by integrating fundamental indicator ratios of the stocks relative to those of their sector (stock to sector fundamental indicators) and by adding scoring variables (Scores and Piotroski) that assess the financial health of a stock by assigning a score to the company based on its fundamental indicators.

Data and Methodology
In this empirical study, we propose a stock prediction and selection model to construct an EMN portfolio. The model predicts the weekly returns of stocks in the "S&P 500 Consumer Staples" (CS) sector using LSTM neural networks to construct a robust portfolio that is rebalanced weekly in three distinct configurations. The literature boasts many models that predict stock price or the direction of its trend; however, in our model, we predict the stock return, not the stock price. This decision was made based on the fact that price series are not stationary, while returns are stationary.
For these purposes, ten years of historical data for all stocks in the CS sector were used to train and test the model. This dataset is composed of a set of OCHLV price variables, and a set of calculated technical and fundamental indicators. Each week, stocks were classified into five quintiles based on their predicted returns; the two extreme quintiles, quintile 1 and quintile 5, constituted candidate stocks for building the long and short portfolios, that is, the two sides of the robust EMN strategy portfolio. In such an alternative strategy, the accuracy of the predicted return value for the stock itself is not relevant. Rather, it is the accuracy of the stock's ranking in the set of stocks composing the study universe at a given date (cross section) that matters most.
After generating the set of candidate stocks for each of the long and short sides of the portfolio through 15 different prediction repetitions, the final selection of the stocks of the robust portfolio was undertaken according to the majority voting technique, which was applied to the candidate stocks through all the repetitions performed.
To measure the model's performance, two types of evaluations were undertaken: a first evaluation of the statistical model's performance according to the errors in the prediction of future returns versus realized returns namely, MSE and MAE. Then, a second evaluation of the financial performance of the robust portfolios was performed.
A comparison of the model evaluations was made according to the following four levels ( Figure 1):

•
The category of explanatory variables introduced into the model, i.e., a basket of "basic variables" versus "all variables".

•
The size of the look-back period considered in LSTM to predict future weekly returns. Two sizes are to be compared: a window of the past three observations versus a window of the past four observations. • The number of stocks selected for the two sides of the robust portfolio, i.e., six stocks in each side versus seven stocks.

•
The financial performance during the test period, namely, the pre-COVID-19 period versus the whole period including the COVID-19 crisis period.
diction of future returns versus realized returns namely, MSE and MAE. Then, a second evaluation of the financial performance of the robust portfolios was performed.
A comparison of the model evaluations was made according to the following four levels ( Figure 1): The category of explanatory variables introduced into the model, i.e., a basket of "basic variables" versus "all variables".


The size of the look-back period considered in LSTM to predict future weekly returns. Two sizes are to be compared: a window of the past three observations versus a window of the past four observations.  The number of stocks selected for the two sides of the robust portfolio, i.e., six stocks in each side versus seven stocks.  The financial performance during the test period, namely, the pre-COVID-19 period versus the whole period including the COVID-19 crisis period.

Data Acquisition
Several research papers use historical market data of OCHLV stocks because they are more accessible than other stock data (Zhang and Tan 2018;Liu et al. 2017;Cipiloglu Yildiz and Yildiz 2022). Other research papers use technical indicators, such as simple moving averages, exponential moving averages, the relative strength index (RSI), daily, weekly, or monthly returns, and historical volatilities (Lanbouri and Achchab 2019). Other studies use fundamental indicators calculated based on financial statements, such as ratios of profitability, operational efficiency, solvency, growth, and debt (Yi et al. 2022).
This study combines different categories of data. Specifically, we used the stocks in the Consumer Staples (CS) sector of the US S&P 500 index according to the second level of the GICS classification (Global Industry Classification Standard). This sector index 3.1. Data Preparation 3.1.1. Data Acquisition Several research papers use historical market data of OCHLV stocks because they are more accessible than other stock data (Zhang and Tan 2018;Liu et al. 2017;Cipiloglu Yildiz and Yildiz 2022). Other research papers use technical indicators, such as simple moving averages, exponential moving averages, the relative strength index (RSI), daily, weekly, or monthly returns, and historical volatilities (Lanbouri and Achchab 2019). Other studies use fundamental indicators calculated based on financial statements, such as ratios of profitability, operational efficiency, solvency, growth, and debt (Yi et al. 2022).
This study combines different categories of data. Specifically, we used the stocks in the Consumer Staples (CS) sector of the US S&P 500 index according to the second level of the GICS classification (Global Industry Classification Standard). This sector index was chosen because it is part of a defensive sector, where firms produce or market basic goods and services that are always in demand. Theoretically, these stocks are more stable compared to stocks in other sectors, and they are equally impacted by financial or economic crises.
The basic data used in this study comprise stock and sector index market data and are taken directly from the Bloomberg data provider; meanwhile, the variables used are the result of calculations and transformations performed on these basic data.
To avoid survival bias, we listed all the stocks of the chosen sector index from January 2010 until December 2020. The number of active stocks at each time step during the study period varies between 30 and 45 stocks depending on the missing data (Table A1 in Appendix B). We then extracted the historical market data and the fundamental data generated during the whole study period. In addition, to avoid anachronistic bias and futuristic data use, and to ensure that the data reflected the true dates when the information was available, we associated the fundamental data with their release dates and not the dates recorded in the financial statement reports.
Given that financial statements are produced quarterly, and in order to be able to associate stock market data with fundamental data on a daily basis, we replicated the fundamental data for each stock over the entire period between two successive release dates. Furthermore, we combined by date the obtained data with the sector index data to obtain a mixture of three types of data: stock market data, stock fundamental data, and sector index data. From the daily data obtained, we calculated technical indicators, fundamental indicators, price multiples, stock-to-sector indicators, and sector indicators ( Figure 2). Finally, we extracted weekly observations from this large daily database to form our final dataset. This approach had many upsides: we reduced redundancy and the computation time, and our data weekly frequency was in line with the portfolio rebalancing frequency. study period varies between 30 and 45 stocks depending on the missing data (Table A1 in Appendix B). We then extracted the historical market data and the fundamental data generated during the whole study period. In addition, to avoid anachronistic bias and futuristic data use, and to ensure that the data reflected the true dates when the information was available, we associated the fundamental data with their release dates and not the dates recorded in the financial statement reports.
Given that financial statements are produced quarterly, and in order to be able to associate stock market data with fundamental data on a daily basis, we replicated the fundamental data for each stock over the entire period between two successive release dates. Furthermore, we combined by date the obtained data with the sector index data to obtain a mixture of three types of data: stock market data, stock fundamental data, and sector index data. From the daily data obtained, we calculated technical indicators, fundamental indicators, price multiples, stock-to-sector indicators, and sector indicators ( Figure 2). Finally, we extracted weekly observations from this large daily database to form our final dataset. This approach had many upsides: we reduced redundancy and the computation time, and our data weekly frequency was in line with the portfolio rebalancing frequency.

Calculating the Indicators
We applied "features engineering" to enrich our database with other derived variables that helped to improve the prediction process. With the raw data downloaded directly from the external data provider, we created new variables through performing

Calculating the Indicators
We applied "features engineering" to enrich our database with other derived variables that helped to improve the prediction process. With the raw data downloaded directly from the external data provider, we created new variables through performing some transformations. Furthermore, based on our experience and knowledge of the financial domain, we derived analytical representations by calculating a wide range of financial indicators (or approximations thereof) that are commonly used in the finance domain.
A major component of this study is to form a portfolio based on the EMN strategy by selecting the right stocks from the "S&P500 Consumer Staples" sector. It involved comparative selection between the stocks belonging to this sector in order to choose the best ones according to the selection criteria of the adopted strategy. Therefore, to capture the disparity of stocks and to be able to compare them within their sector, we calculated indicators relative to the sector (stock to sector indicators). To do this, we calculated the same indicators for both the stocks and the sector before calculating the ratio.
The process of preparing the explanatory variables taken from the model ( Figure 3) begins with the direct download of the 57 "raw variables": 34 stocks variables and 23 sector variables. These variables form the basis for the calculation of all other variables for both stocks (Table A2) and the sector (Table A3). After downloading the raw variables, a set of "intermediate indicators" was calculated both for the stocks (Table A4) and for the sector (Table A5), which were used in the calculation of our 173 final variables (Table A6). study the impact of adding "stock to sector fundamental indicators", "Piotroski indicators", and "Scores" on the model's performance, we generated two baskets of variables: the first basket contained all 173 of the final variables calculated, and the second basket contained 128 so-called "basic" variables, i.e., all the final variables except for those in the three categories of "stock to sector fundamental indicators", "Piotroski Indicators", and "Scores" (Figure 3). It should be noted that the number of variables varies from one stock to another, depending on the missing data. The final variables taken from the model were classified into five categories: In order to compare the different models using different baskets of variables, and to study the impact of adding "stock to sector fundamental indicators", "Piotroski indicators", and "Scores" on the model's performance, we generated two baskets of variables: the first basket contained all 173 of the final variables calculated, and the second basket contained 128 so-called "basic" variables, i.e., all the final variables except for those in the three categories of "stock to sector fundamental indicators", "Piotroski Indicators", and "Scores" (Figure 3). It should be noted that the number of variables varies from one stock to another, depending on the missing data.
The final variables taken from the model were classified into five categories: • Technical indicators: this category of indicators includes stock market data without transformations, OCHLV and Market Capitalization, returns, volatilities, ratio of returns to volatilities, simple moving averages (SMA), exponential moving averages (EMA), prices relative to simple and exponential moving averages, momentum, the 14-day relative strength index (RSI), the 5-day RSI moving average, 14-day stochastic oscillators (slow and fast), the Williams 14-day indicator (%R), and On Balance Volume (OBV).

•
Fundamental indicators: most of the fundamental indicators used in our model are ratios between the fundamental indicators of the stocks and the same indicators calculated on the sector data ("stock to sector fundamental indicators"). Other indicators are calculated differently, such as the "Piotroski indicators" and the "Scores". The "Piotroski" indicators are binary indicators assigned to the stock at a given time if certain fundamental indicators satisfy certain criteria (0 if the criterion is satisfied, 1 otherwise). Thus, a total Piotroski score is the sum of all the calculated indicators (Piotroski 2000). Inspired by Piotroski's indicators, we established "Scores" that can takes the values 0, 1, or 2, which are attributed to the stock at a given time according to the value of certain financial ratios. These ratios are compared to threshold set beforehand. For instance, if the value of the ratio is less than that of the small threshold, the score will be set at 0, and if the value of the ratio is situated between the small and the big thresholds, the score will equal 1. Otherwise, the score will take the value of 2. Thus, the total score of a group of financial ratios is the sum of the constituent scores. • Hybrid indicators: these are indicators calculated based on both technical and fundamental indicators. This category of indicators consists mainly of price multiples (price-earnings ratio, price-to-book ratio, price-to-sales ratio, etc.) • Stock to sector indicators: most of the final variables used in the model fall into this category of indicators. A stock-to-sector indicator is a ratio of a stock indicator and a sector indicator. For instance, close to sector, open to sector, price-to-sell to sector, price-to-book to sector, stock returns to sector, volatilities relative to sector, simple moving averages relative to sector, exponential moving averages relative to sector, momentum relative to sector, or price-to-moving average ratios relative to sector. • Sector-specific indicators: in this category, we considered only the five-day sector return, since other indicators in the sector are already included in the calculation of the other variables.

Cleaning and Standardization of Data
After calculating all the indicators and building our final variables, we performed separate pre-processing of the weekly data of all variables of each stock ( Figure 4). We removed the variables that had more than a third of their values missing, and we imputed the other missing values by the values that precede them according to their chronological order. When there was no doner (preceding value) for the imputation, the whole record was deleted. The deletion of data rows usually occurred at the beginning of the stock data histories that have no preceding values to be imputed. This situation mainly occurred with stocks that appeared during the study period and that have no previous history, as some indicators require a data history to calculate the early values. This explains the existence of empty observations at the beginning of the histories of certain stocks for some indicators, such as simple and exponential moving averages.

LSTM Neural Networks
Long short-term memory (LSTM) neural networks are an improved version of recurrent neural networks (RNN). They are widely used in time series machine learning, and more specifically in the prediction of financial stock prices. They were initially proposed by Hochreiter and Schmidhuber (1997), and later improved by Gers et al. (2000). LSTM neural networks were introduced to solve the vanishing gradient problem, which RNN suffered with long term data sequences, by integrating a memory cell and other functions together in structures known as "gates". LSTM networks can store a sequence of data via their memory cell, which stores the flow of information carried from one cell to another through the time sequence. Within each unit of the network, the "gates" control the information that is added to this memory.
As shown in Figures 5 and 6, an LSTM unit is composed of a memory cell and three main gates: a forget gate, an input gate, and an output gate. These gates act as valves that control the information to be added to or ignored by the memory cell at each step of the sequence. Indeed, the "forgetting gate" receives the current value of the inputs ( ) combined with the output of the previous state of unit ℎ( − 1) and puts them into a sigmoid activation function, producing a value between 0 and 1 according to Formula (1), where and are the weights, and is the bias of the forgetting gate. The "forget gate" decides which information to keep and which to forget from the previous state. The extreme value 0 means "ignore everything" and the extreme value 1 means "keep everything". Once the data for each stock were cleaned, they were standardized for all variables of the dataset of each stock separately. Standardization is a data normalization technique that allows for the direct comparison of scores by taking out the units of measurement. All the standardized explanatory variables are on the same measurement scale, which improves the performance and training stability of the model and ensures the rapid convergence of its parameters during the optimization operation.

LSTM Neural Networks
Long short-term memory (LSTM) neural networks are an improved version of recurrent neural networks (RNN). They are widely used in time series machine learning, and more specifically in the prediction of financial stock prices. They were initially proposed by Hochreiter and Schmidhuber (1997), and later improved by Gers et al. (2000). LSTM neural networks were introduced to solve the vanishing gradient problem, which RNN suffered with long term data sequences, by integrating a memory cell and other functions together in structures known as "gates". LSTM networks can store a sequence of data via their memory cell, which stores the flow of information carried from one cell to another through the time sequence. Within each unit of the network, the "gates" control the information that is added to this memory. Figures 5 and 6, an LSTM unit is composed of a memory cell and three main gates: a forget gate, an input gate, and an output gate. These gates act as valves that control the information to be added to or ignored by the memory cell at each step of the sequence. Indeed, the "forgetting gate" receives the current value of the inputs x(t) combined with the output of the previous state of unit h(t − 1) and puts them into a sigmoid activation function, producing a value between 0 and 1 according to Formula (1), where W f and U f are the weights, and b f is the bias of the forgetting gate. The "forget gate" decides which information to keep and which to forget from the previous state. The extreme value 0 means "ignore everything" and the extreme value 1 means "keep everything".

As shown in
and The output gate decides the amount of information to be fed to the output: first, it calculates this amount using the new inputs x(t) combined with the output of the previous state of unit h(t − 1), according to Formula (5). Second, it regulates this resulting amount using the current state of the memory cell according to Formula (6), where , , and are, respectively, the weights and bias of the output gate: According to these formulas, the output of a state depends on the previous output of the unit and the current state of the memory cell, which in turn depends on the previous output of the unit and the previous state of the memory cell. This sequence makes LSTM networks powerful by providing them with the ability to hold information in a long-term memory (Qiu et al. 2020). The output gate decides the amount of information to be fed to the output: first, it calculates this amount using the new inputs x(t) combined with the output of the previous state of unit h(t − 1), according to Formula (5). Second, it regulates this resulting amount using the current state of the memory cell according to Formula (6), where , and are, respectively, the weights and bias of the output gate: According to these formulas, the output of a state depends on the previous output of the unit and the current state of the memory cell, which in turn depends on the previous output of the unit and the previous state of the memory cell. This sequence makes LSTM networks powerful by providing them with the ability to hold information in a long-term memory (Qiu et al. 2020). Next, the input gate uses the combined input between x(t) and h(t − 1) to determine two components: what information to update in the memory cell and what new candidate information to add to it. The first component is computed via a sigmoid function according to Formula (2), and the second component is calculated via a hyperbolic tangent function according to Formula (3), where W and U are the weights and b is the biases of the input gate and the memory cell.
and tanh(x) = e x − e −x e x + e −x Once the outputs of both the forget gate and input gate are calculated, the state of the memory cell is updated by multiplying the output value of the forget gate f t by the previous state of the memory itself C t−1 . A decision is then made as to what information can be forgotten and what information needs to be updated or added via the multiplication of the result of the input gate i t and c t , according to Formula (4).
The output gate decides the amount of information to be fed to the output: first, it calculates this amount using the new inputs x(t) combined with the output of the previous state of unit h(t − 1), according to Formula (5). Second, it regulates this resulting amount using the current state of the memory cell according to Formula (6), where W, U, and b are, respectively, the weights and bias of the output gate: According to these formulas, the output of a state depends on the previous output of the unit and the current state of the memory cell, which in turn depends on the previous output of the unit and the previous state of the memory cell. This sequence makes LSTM networks powerful by providing them with the ability to hold information in a long-term memory (Qiu et al. 2020).

Training Set and Testing Set
To compare the predicted performance of the stocks in this study, we set the same training period and the same test period for all stocks. For this reason, we divided the study period into two segments: the first segment constituted the sample for training (60% of the period) and the second segment constituted the sample for testing (40% of the period). Thus, for each stock, we formed a training sample over the period from 8 January 2010 to 12 August 2016, totaling 330 weekly observations, and a test sample over the period from 19 August 2016 to 18 December 2020, totaling 222 weekly observations. However, some stocks did not cover the entirety of the training and testing periods. Furthermore, to avoid under-training the model, which can lead to bad predictions due to insufficient training data, we set up a filter that excludes stocks with a training sample size of less than 100 observations (about two years).

LSTM Structure and Setup
In the present work, we chose LSTM neural networks to predict the future return of each stock composing the studied sectorial index (S&P500 Consumer Staples index). We used the Python language for the scripts, and the Keras library with Tensorflow as the backend for the LSTM networks. Additionally, the SQL-Server was used for data organization and indicator calculations.
Before passing the processed data to the LSTM network, it was transformed into a supervised problem by associating each entry of the data with a target value equal to the future return (a later week's return). Furthermore, the weekly return series serves both as an explanatory variable due to its historical values already observed at a given date "t", and as a target variable when we consider the later values to be predicted at the same date "t". Thus, we transformed the data to a three-dimensional format, adapted to the format expected by the multivariate LSTM input layer (in terms of the data sample size, the look-back period size, and the number of explanatory variables). We experimented with two types of LSTM network configurations: the first one had a look-back period w = 3 and, for the second, w = 4. For example, a size of w = 3 means that data from the first three weeks are used to predict the performance of the fourth week. This process was repeated by shifting the window one week in advance for all records, both for the training and testing datasets. During training and predicting, the data are not randomized, as the order of the time sequence is important in the case of time series.
As shown in Figure 7, the input layer passes data to the hidden LSTM layer consisting of 128 units with the Rectified Linear Unit (ReLu) activation function. The outputs of the LSTM layer are passed to a fully connected "Dense" layer that generates the output. When optimizing the model parameters, LSTM uses the mean square error (MSE) as the cost function and the Adam optimizer for stochastic gradient descent (SGD). The stochastic nature of the SGD changes the results obtained from the optimization depending on the series of random numbers generated during the optimization process. To obtain more robust results, we adopted an "iteration" technique for each stock that allowed for reoptimization by changing the "seed" of the random numbers used by the optimizer during the SGD. For each stock, we produced 15 iterations of predictions, which were stored for use in the selection of stocks by majority voting. Finally, for each stock, the model was trained for 300 epochs with a batch size of 16, as illustrated in Table 2. Since the choice of the lookback period (w) and the input data size (m) in LSTM networks is important for predictions, we studied the impact of two values of the first variable {w = 3, w = 4} and two values of the second {m = 173, m = 128} on the model's performance. The other hyperparameters were set manually by measuring the error and choosing the best ones by applying the cross-validation detailed in the next section. The other hyperparameters were set manually by measuring the error and choosing the best ones by applying the cross-validation detailed in the next section.    To compare several model versions, we used cross-validation (CV): 70% of the training dataset is used to train the model and the remaining 30% is used for model validation while respecting the chronological order (Dangeti 2017;Kohavi 1995). For each version of the model, we calculated the statistical error of the prediction via two metrics: the mean square error (MSE) and the mean absolute error (MAE). The MSE metric is the average of the squares of the difference between the predicted and actual values (Formula (7)), while the MAE is the average of absolute difference between the predicted values and actual values (Formula (8)), whereŷ t , y t , and T are the predicted value, the actual value, and the size of the prediction horizon, respectively. These metrics were calculated for the training dataset sample to estimate the training error, and on the validation sample to estimate the validation error (Val-Error). These measurements allowed us to choose the optimal configuration of the model hyperparameters in terms of batch size, the number of epochs, the number of units in the LSTM layer, etc.
Given that we made 15 different predictions per stock, corresponding to the 15 iterations made by changing the "seed" during the optimization of the model parameters, we calculated the average metrics per stock over all the iterations, namely, the average MSE of stock "i" according to Formula (9) and the average MAE of stock "i" according to Formula (10), where the number of repetitions R = 15: Finally, the estimation of the model error was obtained by a simple average of the errors of all the stocks according to Formulas (11) and (12), where n is the number of stocks: The final statistical performance of the different versions of the model was measured by the MSE and MAE errors calculated from the out-of-sample testing dataset. The diagram in Figure 8 shows the process of calculating the estimation of the training error (Train-MSE and Train-MAE), the validation error (Val-MSE and Val-MAE), and the test error (Test-MSE and Test-MAE).
Finally, the estimation of the model error was obtained by a simple average of the errors of all the stocks according to Formulas (11) and (12), where n is the number of stocks: The final statistical performance of the different versions of the model was measured by the MSE and MAE errors calculated from the out-of-sample testing dataset. The diagram in Figure 8 shows the process of calculating the estimation of the training error (Train-MSE and Train-MAE), the validation error (Val-MSE and Val-MAE), and the test error (Test-MSE and Test-MAE). For the evaluation of models that make predictions in portfolio management, the statistical metrics commonly used in data mining, such as MSE or MAE, are not suitable measures for evaluating performance; instead, portfolio performance measures take precedence (Hou et al. 2020).

Portfolio Construction
In this study we construct an equally weighted portfolio according to an equitymarket-neutral strategy. This type of strategy is used in alternative investment portfolios where the long position of the portfolio is covered by the short position composed of short-sold stocks. This strategy has two main advantages: financing, as the short position finances the long position, and hedging against market risk, i.e., if the market is falling, the strategy will lose on the long positions, but the loss will be compensated by the gains made from the short positions (Jacobs and Levy 2005). Figure 9 shows the three steps of the portfolio construction process: first, the prediction of stock returns is performed in 15 iterations for each stock. The second step involves the selection of stocks, and the third step comprises the construction of the "robust" portfolio. The constructed portfolio consists of two sides, long and short, with the same number of stocks. Two portfolios are to be compared; the first one has 12 stocks, with 6 stocks in each side, and the second portfolio is composed of 14 stocks, with 7 For the evaluation of models that make predictions in portfolio management, the statistical metrics commonly used in data mining, such as MSE or MAE, are not suitable measures for evaluating performance; instead, portfolio performance measures take precedence (Hou et al. 2020).

Portfolio Construction
In this study we construct an equally weighted portfolio according to an equity-marketneutral strategy. This type of strategy is used in alternative investment portfolios where the long position of the portfolio is covered by the short position composed of short-sold stocks. This strategy has two main advantages: financing, as the short position finances the long position, and hedging against market risk, i.e., if the market is falling, the strategy will lose on the long positions, but the loss will be compensated by the gains made from the short positions (Jacobs and Levy 2005). Figure 9 shows the three steps of the portfolio construction process: first, the prediction of stock returns is performed in 15 iterations for each stock. The second step involves the selection of stocks, and the third step comprises the construction of the "robust" portfolio. The constructed portfolio consists of two sides, long and short, with the same number of stocks. Two portfolios are to be compared; the first one has 12 stocks, with 6 stocks in each side, and the second portfolio is composed of 14 stocks, with 7 stocks in each side. Moreover, the data used in the model have a weekly frequency, the same as the portfolio rebalancing frequency. Each week, the portfolio is reconstructed according to the results of the prediction, which leads to a new selection of stocks on both sides. In addition, the same market value (MV) is invested in each stock of the two sides of the portfolio, as illustrated in Formulas (13) and (14), where V M L t and V M S t are the market values of the "long" and "short" portfolios, respectively; V M L i,t and V M S j,t are the market values of stock "i" in the "long" portfolio and stock "j" in the "short" portfolio, respectively; and n l and n s are the number of stocks in the "long" and "short" portfolios, respectively. In effect, the market value of the long portfolio equals that of the short portfolio, resulting in a net market value of zero dollars.
same as the portfolio rebalancing frequency. Each week, the portfolio is reconstructed according to the results of the prediction, which leads to a new selection of stocks on both sides. In addition, the same market value (MV) is invested in each stock of the two sides of the portfolio, as illustrated in Formulas (13) and (14), where and are the market values of the "long" and "short" portfolios, respectively; , and , are the market values of stock "i" in the "long" portfolio and stock "j" in the "short" portfolio, respectively; and and are the number of stocks in the "long" and "short" portfolios, respectively. In effect, the market value of the long portfolio equals that of the short portfolio, resulting in a net market value of zero dollars.
Figure 9. Robust portfolio construction process.
For each date of the test period and for each iteration, the model predicts the next return (of the later week) for all stocks present at that date. The predicted returns obtained are ordered in ascending order and are ranked in five quintiles (five classes of stock). Only stocks belonging to the first and last quintiles are to be considered when forming the two classes of stocks, namely class "1" and class "5". Class 1 corresponds to the first quintile, containing the stocks with the lowest predicted returns, and class 5 corresponds to the fifth quintile, containing the stocks with the highest predicted returns. The stocks in class 1, which are expected to perform poorly, are considered as candidate stocks in the "short" portfolio, while those in class 5, which are expected to perform well, form the candidate stocks in the "long" portfolio.
To build a robust portfolio, we ran the model 15 times (15 iterations) by changing the seed of the random numbers used by the LSTM neural network optimization algorithm to obtain 15 different classes of candidate stocks for the long and short portfolios. To construct the robust "short" portfolio, we applied the majority voting principle on the fifteen different "1" classes by counting the number of times the stock was classified as a candidate of the "short" portfolio. Thus, the stocks with the highest number of appearances formed the "short" portfolio (up to the number of stocks previously fixed in each of the two sides of the robust portfolio). We experimented with two types of robust port- For each date of the test period and for each iteration, the model predicts the next return (of the later week) for all stocks present at that date. The predicted returns obtained are ordered in ascending order and are ranked in five quintiles (five classes of stock). Only stocks belonging to the first and last quintiles are to be considered when forming the two classes of stocks, namely class "1" and class "5". Class 1 corresponds to the first quintile, containing the stocks with the lowest predicted returns, and class 5 corresponds to the fifth quintile, containing the stocks with the highest predicted returns. The stocks in class 1, which are expected to perform poorly, are considered as candidate stocks in the "short" portfolio, while those in class 5, which are expected to perform well, form the candidate stocks in the "long" portfolio.
To build a robust portfolio, we ran the model 15 times (15 iterations) by changing the seed of the random numbers used by the LSTM neural network optimization algorithm to obtain 15 different classes of candidate stocks for the long and short portfolios. To construct the robust "short" portfolio, we applied the majority voting principle on the fifteen different "1" classes by counting the number of times the stock was classified as a candidate of the "short" portfolio. Thus, the stocks with the highest number of appearances formed the "short" portfolio (up to the number of stocks previously fixed in each of the two sides of the robust portfolio). We experimented with two types of robust portfolios: one with six stocks in each side, and a second with seven stocks in each side. The same method is applied to the different candidate stocks classified as "5" to construct the robust "long" portfolio. Note that the choice of the number of stocks in each side of the robust portfolio is justified by the average number of stocks that form the quintiles over the test-set period, which varies between six and eight stocks.

Evaluation of Portfolio Performance
In addition to the statistical evaluation of the model's performance through the MAE and MSE error measures, we carried out an evaluation of the actual performance of the robust portfolios obtained by the different versions of the model on the test set. Several performance measures were used to evaluate the financial performance of the portfolios on the one hand, and, on the other, to compare them to benchmarks such as the S&P500 stock market index and the CS sector index. Indeed, eight portfolios were compared in terms of performance over two different periods (pre-COVID-19 and the period including the COVID-19 pandemic). Table 3 shows the eight portfolios that were constructed based on the four versions of the model by changing the following three hyperparameters:

•
The type of basket of explanatory variables taken in the model, including a first basket of "basic variables" with 128 variables, and a second basket of "all variables" with 173 variables.

•
The size of the look-back period used by the LSTM networks, which takes the following two values: w = 3 and w = 4. • The number of stocks taken for each side of the robust portfolio, with the following two values: n = 6 and n = 7. For this comparison, several performance measures were calculated, namely: return, volatility, downside volatility, Alpha, Beta, correlation, Sharpe ratio, Sortino ratio, Treynor ratio, Omega ratio, information ratio, capture ratio, and maximum drawdown. The benchmark and risk-free rate used to calculate these measures are, respectively, the S&P 500 Index and the three-month U.S. Treasury Bill (T-bills) rate.

Return
Since return is the main measure of portfolio performance, we calculated the entire series of actual weekly returns over the entire test period (222 weeks from 19 August 2016 to 18 December 2020) of the different portfolios to be compared. To do this, we first calculated the return series of the two portfolios, long and short, separately, and then deducted the return series of the net portfolio as the difference between the two returns series of the two portfolios at each date of the test period. The return of each of the long (r L t ) and short (r S t ) portfolios at date "t" was calculated by the arithmetic average of the return of stocks in the portfolio, according to Formulas (15) and (16) (Demonstration A1 in Appendix A). While the net return (r t ) of the portfolio at date "t" is the difference between the long portfolio return and the short portfolio return, as in Formula (17), where n l and n s are the number of stocks in the long and short portfolios, respectively; r L i,t is the return of the stock "i" in the long portfolio at date "t" and r S j,t is the return of the stock "j" in the short portfolio at date "t".
From the series of net portfolio returns over the entire test period, we calculated all the performance indicators of the robust equity-market-neutral portfolio. The annualized average return is the most important indicator to consider. It does not give much detail on the behavior of the return series over time, but it nevertheless summarizes the realized performance of the portfolio over the whole test period. It is measured by the geometric mean of all weekly returns in the net portfolio series according to Formula (18), where r t is the weekly return of the net portfolio on date t and T is the number of weeks in the test period.

NAV
The graphical representation of the evolution of the net asset value (NAV) of the net robust portfolio and the other portfolios (long and short) offers a highly relevant overview of the portfolio's performance. It allows for a visual comparison of the performance of portfolios with each other and with the benchmarks. The NAV is calculated by accumulating returns from an initial NAV at the beginning of the period (USD 100, for example). The NAV at the end of the period is reinvested for the next period according to Formula (19), over the whole test period, where N AV 0 is the initial amount invested. N AV t is the portfolio NAV at date "t", and r j is the weekly portfolio return in week "j".

Volatility
Annualized volatility is the first indicator of risk to be seen in a portfolio, complementing the information provided by the first measure of performance: the annualized average return. It measures the degree to which the returns of the series deviate, on average, from the average return. It is calculated as the standard deviation of the series of weekly returns over the test period according to Formula (20), where r t is the weekly return of the portfolio in week t; r (h) is the arithmetic average of the weekly return series; and T is the number of weeks in the test period.

Sharpe Ratio
Developed by the economist Sharpe (1994), this ratio measures the risk-adjusted return of a portfolio. It is one of the best indicators for comparing the risk-adjusted performance of different portfolios, and for assessing a portfolio compared to benchmark portfolios. The higher the ratio, the better the portfolio. Thus, a value of this ratio greater than 1 means that the portfolio's risk-taking is amply rewarded by an excess return of the portfolio over the risk-free return. This ratio is calculated by dividing the portfolio's excess return by its volatility, according to Formula (21), where r and r f are the annualized average returns of the portfolio and the risk-free rate, respectively, and vol is the annualized volatility of the portfolio.
3.5.5. Downside Volatility Unlike volatility that considers both increases and decreases in returns relative to the mean, downside volatility is an additional risk measure that only takes into account bearish returns below the average. It is calculated according to Formula (22), where r t is the weekly portfolio return in week t; r (h) is the arithmetic average of the weekly return series; and T is the number of weeks in the test period.

Downside vol
3.5.6. Sortino Ratio Developed by Frank A. Sortino, the Sortino ratio is an improved version of the Sharpe ratio (Bodson et al. 2010). Both ratios calculate a risk-adjusted performance measure. However, the Sortino ratio substitutes the volatility used in Sharpe's ratio with the downside volatility, which only considers bearish returns. The rationale behind is that bullish returns are generally beneficial and should not be included in a risk measure, according to the author of the ratio. It calculated according to Formula (23), where r and r f are the annualized average returns of the portfolio and the risk-free rate, respectively.
3.5.7. Beta Beta measures the systematic risk of the portfolio; that is, the sensitivity of the portfolio to the market. It is always compared to the reference value 1. The portfolio fluctuates more than the market if the Beta value is higher than 1, and fluctuates less if its Beta value is lower than 1. Furthermore, a Beta-neutral portfolio is a portfolio with a Beta close to or equal to 0. It is calculated via Formula (24) as the ratio between the covariance of the portfolio's return and the market's return on the one hand, and the variance of the market's return on the other hand, where r t and r m are the portfolio return and market return series, respectively. cov(.) and var(.) are covariance and variance, respectively.
3.5.8. Alpha Jensen's Alpha, or simply the Alpha, is a performance metric that is widely used in active portfolio management strategies where managers use their skills to outperform the market. Proposed by Jensen (1968), it is extracted from the Capital Asset Pricing Model (CAPM) and measures how well the portfolio outperforms the market considering its systematic risk. The alpha is calculated as the difference between the excess return achieved by the portfolio and the expected return measured by the Beta weighted market excess return. It is calculated by Formula (25), where r (h) , r (h) f , and r (h) m are the average of the portfolio weekly returns, the average of weekly risk-free rates, and the average of weekly market returns, respectively. β is the Beta of the portfolio and T is the number of weeks in the test period.

Correlation
Correlation measures the intensity of the linear relationship between two variables. If two variables move relatively and linearly in the same way, the correlation between them will be close to 1, and if they move relatively in the same way but in opposite directions, their correlation will be close to −1. A value of zero or close to zero for the correlation indicates that the two variables have no linear relationship or a very weak linear relationship. The correlation is calculated by Formula (26) as the ratio of the covariance of the portfolio return and the market return to the product of the standard deviations of the two series of portfolio and market returns. r t et r m are the portfolio and market return series, respectively, and σ(.) refers to the standard deviation.

Treynor Ratio
Created by economist Treynor (1962), the Treynor ratio is similar to the Sharpe ratio, with both measuring the risk-adjusted return of a portfolio. However, the Sharpe ratio uses the risk of the portfolio represented by its volatility, whereas the Treynor ratio uses the systematic risk of the portfolio represented by the Beta value. The higher the ratio, the better the portfolio performs. It is calculated by dividing the excess return of the portfolio by its Beta according to the Formula (27), where r and r f are the annualized portfolio average return and the risk-free rate, respectively:

Information Ratio
The information ratio measures how much the portfolio outperforms the benchmark, considering the risk of this benchmark. As shown in Formula (28), it is calculated as the ratio of the difference between the average return of the portfolio and the benchmark to the tracking error. The tracking error is calculated by the square root of variance of the series of differences between the portfolio returns and the benchmark returns, where r and r b are the portfolio and benchmark annualized average returns, respectively. TE is the tracking error, and r and TE =

Capture Ratios
There are two capture ratios: the upside capture ratio and the downside capture ratio. The upside capture ratio calculates the performance of the portfolio compared to a benchmark if the benchmark is rising, whereas the downside capture ratio calculates the performance of the portfolio relative to a benchmark if the benchmark is falling. The upside capture ratio measures the extent to which the portfolio outperforms (or underperforms) the benchmark during periods of positive returns (bull market) and the downside capture ratio measures the extent to which the portfolio outperforms (or underperforms) the benchmark during periods of negative returns (bear market).
A value greater than one for the upside capture ratio means that the portfolio has performed better than the benchmark through periods when the benchmark is rising, whereas a value below one indicates that the portfolio has underperformed while the benchmark has risen. The analysis is reversed for the downside capture ratio, where a positive value less than one indicates that the portfolio has lost less than the index during periods when the index is falling. The difference between the value of this ratio and 1 measures the degree of resilience of the portfolio during periods when the market is falling. A negative downside capture value means that the portfolio has made a positive return when the benchmark has negative returns during its downside periods.
According to Formula (29), the upside capture ratio is calculated as the ratio of the annualized average return of the portfolio during periods when the benchmark has positive returns, to the annualized average return of the benchmark during the same periods. Similarly, the downside capture ratio is calculated over periods when the benchmark has a negative return according to Formula (30), where r t and r b,t are, respectively, the return of the portfolio and the return of the benchmark at date t. T p and T n are, respectively, the number of weeks when the benchmark returns are positive and the number of weeks when they are negative:

Omega Ratio
This ratio was developed by Keating and Shadwick (2002) and measures the riskadjusted performance of the portfolio compared to a threshold or a benchmark. It identifies the chances of gain compared to loss. Omega captures all the moments of the portfolio return distribution and makes no assumptions about the distribution of returns. According to Formula (31), it is calculated as the ratio of total gains to total losses relative to an objective return (expected return) or a so-called "minimum accepted return" (MAR), which can be a risk-free rate or a benchmark portfolio. In the formula, r t is the portfolio return at date t; MAR is the minimum accepted return, which may be a fixed threshold, a risk-free rate, or a benchmark return; and T is the number of weeks in the test period:

Maximum Drawdown
The maximum drawdown measures the widest loss in a portfolio that connects the highest peak and the next deepest trough. It measures the maximum loss over the history of a portfolio. A value of 100% for this metric means that the portfolio has lost all its value. It is a risk measure used to compare performances between portfolios and is also used as a risk measure in the Calmar ratio. It is calculated using Formula (32) as the accumulated return during the entire period of steep decline, where H is the highest portfolio value reached before the largest portfolio fall and L is the lowest value observed before a new H:

Calmar Ratio
Created by fund manager Terry Young in 1991, the Calmar ratio is a measure of riskadjusted return similar to the Sharpe ratio, but it uses the maximum drawdown in its risk instead of volatility. It is calculated according to Formula (33) by the excess return divided by the maximum drawdown, where r et r f are the portfolio annualized average return and the risk-free rate annualized average return, respectively:

Statistical Performance of the Model
During this study, several hyperparameters of the model were experimentally tested to select those that best fit the data. For these purposes, we used the cross-validation technique by calculating the two metrics MSE and MAE according to Formulas (11) and (12) detailed above in the section titled "Statistical evaluation of the model". According to Figure 1, the performances of the four versions of the model were compared by changing the type of the basket of variables input to the model and the size of the look-back period used by the LSTM networks.  With the same number of epochs (epoch = 300), the model versions have the same statistical error values for the training sample. Furthermore, the error generated by the cross validation (which is an estimate of the model error extracted from the training data) allowed us to calibrate the model hyperparameters. This error shows a minimal value for model M3, which uses "all variables", and a size w = 3 of the look-back period. However, model M1, which uses the "basic variables" and a look-back period size w = 3, provides the best statistical performance (minimum error), measured from the unseen data of the test set.
Although they provide insights into the accuracy of the prediction, the errors used do not allow for an effective comparison of the different versions of the model because they only provide an estimate of the average of the errors over all the stocks and all the iterations used in the model. Moreover, they do not consider the selection process of the stocks in the portfolio, which is limited to only some of the stocks. For this reason, a financial performance evaluation of the portfolio is necessary.

Financial Performance of the Model
In addition to the statistical performance, we evaluated the financial performance of the eight robust portfolios from the different versions of the model, as shown in Table 3. For this purpose, we calculated the 15 performance and risk measures (detailed above in the "Evaluation of portfolio performance" section) for the different portfolios to be compared (P1, P2, . . . , P8) on the one hand, and for the benchmarks on the other hand. Recall that we used the "US three-month Treasury Bill" rate for the risk-free rate, and two benchmarks, the "S&P500 Consumer Staples" sector index from which we selected the stocks, and the "S&P 500" index, representing the market, which was also used in the calculation of the performance metrics.
To study the behavior of the different versions of the model in normal times and in highly volatile periods of crisis, we measured their performance over two periods: the pre-COVID-19 period and then the entire test period, including the period of the COVID-19 crisis. The frontier date separating the pre-COVID-19 period and the post-COVID-19 period was estimated by the effective start of the influence of the pandemic on the behavior of financial markets. In this study, we took the start date of the largest drop in the S&P500 index, 19 February 2020, as the frontier date.
According to the results of the financial performance of the portfolios represented by the NAV evolution graph (base USD 100) in Figures 10-13, and summarized in Tables 5a,b and 6a,b, we reached the following conclusions: 1.
All portfolios outperformed their sector index in terms of risk-adjusted returns (Sharpe ratio, Sortino ratio, Treynor ratio, Omega ratio, Calmar ratio, and information ratio).

2.
Portfolios P5, P6, P7, and P8 from models M3 and M5, which use all of the explanatory variables, outperformed the sector index and the S&P500 index representing the market on the one hand; on the other hand, they largely outperformed portfolios P1, P2, P3, and P4 from models M1 and M2, which use "basic variables".

3.
Portfolios from models where LSTM neural networks use a look-back period of size w = 4 to predict the stock's future return outperformed the other models using w = 3 for the whole period including-COVID-19. However, portfolios from models using a look-back period with a size w = 3 outperformed the others during the pre-COVID-19 period. 4.
The P7 portfolio, which consists of six stocks in each of its long and short sides and which is generated by the model M4 (w = 4, all variables), provided the best performance both in the pre-COVID-19 period and in the period including COVID-19. It achieved an annualized average return of 25% over the entire test period compared to 27% over the pre-COVID-19 period, a decrease of 2%; meanwhile, the annualized average returns of the S&P 500 index and the sector index decreased from 15% to 10% and from 5% to 3%, respectively. However, the annualized volatility of the P7 portfolio increased from 19% pre-COVID-19 to 22% over the entire period (including COVID-19): an increase of 3%. Meanwhile, the annualized volatilities of the S&P500 index and the CS sector index increased from 12% to 18% and from 12% to 16%, increases of 6% and 4%, respectively. 5.
Over the entire period, the P7 portfolio achieved 1.04, 1.92, and 0.93 for the Sharpe, Sortino, and Treynor ratios, respectively, indicating that it achieved an acceptable riskadjusted excess return. Indeed, the Sortino ratio is higher than the Sharpe ratio because it only takes into account the downside volatility, which is lower than volatility. In addition, its Sharpe, Sortino, and Treynor ratios are significantly higher than those of the benchmarks: the S&P500 market index had values of 0.48, 0.62, and 0.09 and the CS sector index had values of 0.12, 0.16, and 0.03 for the three ratios. 6.
The P7 portfolio has an Alpha of 23%, i.e., most of its returns are not made through systematic market risk taking, but are rather due to its own strategy. Its Beta and Correlation relative to the market are 0.25 and 0.20 over the whole test period, and 0.17 and 0.11 over the pre-COVID-19 period, respectively. This means that the portfolio has a very low correlation to the market, which is the goal of the EMN strategy. 7.
The P7 portfolio has an information ratio of 0.56, which means that it outperformed the benchmark, given its risk. Furthermore, its positive excess returns outperformed its negative excess returns over the entire period, which is reflected in its Omega ratio of 1.51. This is higher than the benchmarks S&P500 index and the CS sector index, which had values of 1.26 and 1.09, respectively.

8.
The Calmar ratio of the P7 portfolio reaches 1.11, compared to 0.27 for the S&P 500 index and 0.09 for the CS sector. This ratio measures the risk-adjusted return using the maximum drawdown in the denominator, which reached a value of −21% for this portfolio on 11 January 2019. This maximum loss is lower than the maximum drawdown of the benchmarks that took place simultaneously on 20 March 2020, with values of −32% for the S&P 500 index and −22% for the CS sector index. 9.
As for the upside capture and downside capture ratios, the P7 portfolio scored 0.42 and 0.09 for these two ratios, respectively, indicating that the P7 portfolio underperformed while the benchmark S&P 500 index was performing well; however, the portfolio was very resilient during periods when the benchmark S&P500 index declined. In addition, the CS Sector index had an upside capture value of 0.53, meaning that it also underperformed while the benchmark S&P500 index performed well, but with a downside capture of 0.79, showing little resilience to market downturns compared to the P7 portfolio. 10. The risk-adjusted performance of all portfolios in the pre-COVID-19 period was better than that in the period including the COVID-19 pandemic. When the COVID-19 pandemic period was introduced to the test data, the returns experienced a decline ranging from 6% to 16% for the M1 and M2 model portfolios (using the basic variables). The level of decline ranged from 2% to 12% for the M3 and M4 model portfolios (using all variables). The pandemic caused the volatility of all portfolios to rise. The increase in volatility ranged from 3% to 6%. Similarly, the returns for the S&P500 benchmark and the CS sector index decreased by 5% and 2% and their volatilities increased by 6% and 4%, respectively. This means that the EMN strategy portfolios were more strongly impacted by the COVID-19 pandemic than the benchmarks were. Moreover, EMN was the strategy with the lowest performance according to a study conducted by Ganchev (2022) on the performance of hedge fund strategies before and after the COVID-19 crisis. 11. Transitioning from the M1 and M2 models to the M4 and M5 models by introducing the three baskets of variables ("Piotroski", "Scores", and "stock to sector fundamental indicators") greatly improved the performance of the EMN strategy portfolios. Indeed, for all of the portfolios, we saw an increase in the annualized average return, from 6% to 15%, with almost the same volatility. 12. Portfolios from models M1 and M3 using a look-back period of size w = 3 performed well during the pre-COVID-19 period, while those from models M3 and M4 using w = 4 outperformed over the entire period, including the COVID-19 crisis period.
In summary, the portfolios obtained, on average, a pre-COVID-19 annualized average return of 30% compared to 5% and 15% for the CS sector index and the S&P500 index over the same period, respectively. Meanwhile, over the whole period, including the pandemic period, the portfolios of our model achieved an annualized average return of 22% compared to 3% and 10% for the same benchmarks, respectively.
As for the annualized volatility of the portfolios, it remains higher than that of the two benchmarks, reaching 18% on average, versus 12% for the two benchmarks during the pre-COVID-19 period, and 22% versus 16% and 18% for the CS sector index and the S&P500 index, respectively, during the period including COVID-19. However, despite the high volatility of the portfolios when compared against the benchmarks, the risk-adjusted return of the portfolios remains well above the benchmarks at, on average, 1.6 versus 0.3 and 1.05 in the pre-COVID-19 period, and 0.94 versus 0.48 and 0.12 in the COVID-19-inclusive period. Moreover, the Sortino ratio, which considers downside volatility, is significantly higher than that of the benchmark index, with an average value of 3.16 for all portfolios compared to 0.41 and 1.48 for the two indexes in the pre-COVID-19 period. The same ratio reached an average of 1.60 for all portfolios versus 0.16 and 0.62 for the benchmarks over the COVID-19-inclusive period. Int. J. Financial Stud. 2023, 11, x        Based on the results of this empirical study, we can conclude that portfolios constructed according to the EMN strategy and utilizing LSTM neural networks for return prediction outperformed the benchmarks (sector index and market index). This is due to the advantage of LSTM neural networks in predicting stock returns by effectively identifying sequential patterns in the data. LSTM is one of the most advanced techniques for capturing complex dependencies and relationships in financial time series data. This confirms the initially proposed hypothesis (1).

Conclusions
Furthermore, the results also show that the use of feature engineering and integration of new variable categories such as "Scores", "Piotroski", and "stock to sector fundamental indicators" enhance the portfolio's performance. This is due to the fact that feature engineering enables the extraction of analytical representations from data, making them more relevant to the studied problem and easier to capture by the model. The categories of indicators, "Scores" and "Piotroski", assess the financial quality of stocks in the medium and long term, whereas the "stock to sector fundamental indicators" category captures interactions by comparing the financial state of stocks to their sector based on financial statements information. This finding supports the previously stated hypothesis (2).

Conclusions
This study fills the existing gap in the literature regarding the construction of a profitable portfolio built according to an equity-market-neutral investment strategy using LSTM neural networks, which are widely used in portfolio management due to their strength in time series prediction. To achieve this purpose, this study proposed a new two-step portfolio construction approach according to the alternative equity-market-neutral investment strategy. The first step of our approach involved predicting stock returns using LSTM neural networks in 15 different iterations based on historical price data, technical indicators, fundamental indicators, and sector indicators. The second step consisted of selecting the stocks for the long and short sides of the portfolio by ranking the stocks according to their predicted returns. The long portfolio was made up of the stocks that we expected to perform the best, while the short portfolio was made up of the stocks that we expected to perform the worst. Thus, we constructed several portfolios by changing some of the hyperparameters of the model.
In the next stage of the research, we compared the performance of the constructed portfolios against each other and against two benchmarks, in periods exclusive and inclusive of the COVID-19 pandemic, using 15 performance and risk metrics that are commonly used in portfolio management. Our model was tested on the S&P500 Consumer Staples sector stocks with weekly portfolio rebalancing. By including all of the variables in the model, the portfolios experienced a change in their performance levels. Nonetheless, all of them outperformed the benchmarks.
The results show that integrating LSTM neural networks to predict returns and construct a portfolio based on the market-neutral strategy outperformed benchmarks. Moreover, incorporating all types of variables such as historical quotes, technical and fundamental indicators, stock-to-sector indicators, and indicators that assess the quality of stocks into the input data greatly improved the model's performance. These results should give investors and managers more confidence in using alternative strategies that use LSTM neural networks in the process of developing investment strategies, stock selection and portfolio construction.
These findings support this research's hypotheses: (1) Constructing a portfolio based on the EMN investment strategy, which utilizes LSTM neural networks to forecast returns, outperform both benchmarks: the sector index and the market index; and (2) enriching the input data by including features using feature engineering techniques enhances the portfolio's performance.
Future work will focus on improving the predictive abilities of the model during crisis periods, such as the COVID-19 pandemic, in order to reduce the volatility of the portfolio returns. In fact, during the training period of the present model from 2010 to 2016, there was no sharp drop in the market such as that experienced during the COVID-19 crisis period. Whereas the test data used to measure the performance of the model from 2016 to 2020 included the COVID-19 crisis period.
One avenue to be explored in further research is the use of a rolling training period, i.e., using past data to predict later week's return. Then, once the return is achieved, it can be incorporated into the model training data to predict the later week's return, and so on. With this method, LSTM networks will adjust as they go along by using more and more recent data.
Finally, we intend to extend the scope of this approach to other sectors of activity, as well as to other alternative investment strategies.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author. The data are not publicly available due to limitations in the use of the Bloomberg license. Table A1. List of stocks with their Bloomberg symbols and their start dates, end dates, and number of weeks in the data after imputation.         • If _netDebt > 0 and _ebitda1Y ≤ 0: score = 0 • If _netDbt2Ebd < 2 and _netDebt ≤ 0 and_ebitda1Y ≤ 0: score = 2 • If _netDbt2Ebd < 2 and _netDebt < 0 and _ebitda1Y > 0: score = 2 • If _netDbt2Ebd < 2 and _netDebt > 0 and _ebitda1Y > 0: score = 2 • If _netDbt2Ebd ≥ 2 and f_netDbt2Ebd < 3.5: score = 1 • If _netDbt2Ebd ≥ 3.5: score = 0 scCurrentRatio Current ratio score   yoySlGr_sec