Integration of LSTM Networks in Random Forest Algorithms for Stock Market Trading Predictions

King, Juan C.; Amigó, José M.

doi:10.3390/forecast7030049

Open AccessArticle

Integration of LSTM Networks in Random Forest Algorithms for Stock Market Trading Predictions

by

Juan C. King

and

José M. Amigó

^*

Centro de Investigación Operativa, Universidad Miguel Hernández, 03202 Elche, Spain

^*

Author to whom correspondence should be addressed.

Forecasting 2025, 7(3), 49; https://doi.org/10.3390/forecast7030049

Submission received: 11 July 2025 / Revised: 4 September 2025 / Accepted: 10 September 2025 / Published: 12 September 2025

(This article belongs to the Section Forecasting in Economics and Management)

Download

Browse Figures

Versions Notes

Abstract

The aim of this paper is the analysis and selection of stock trading systems that combine different models with data of a different nature, such as financial and microeconomic information. Specifically, based on previous work by the authors and with the application of advanced techniques of machine learning and deep learning, our objective is to formulate trading algorithms for the stock market with empirically tested statistical advantages, thus improving results published in the literature. Our approach integrates long short-term memory (LSTM) networks with algorithms based on decision trees, such as random forest and gradient boosting. While the former analyzes price patterns of financial assets, the latter is fed with economic data of companies. Numerical simulations of algorithmic trading with data from international companies and 10-weekday predictions confirm that an approach based on both fundamental and technical variables can outperform the usual approaches, which do not combine those two types of variables. In doing so, random forest turned out to be the best performer among the decision trees. We also discuss how the prediction performance of such a hybrid approach can be boosted by selecting the technical variables.

Keywords:

machine learning; stock market; algorithmic trading; recurrent neural networks; random forest; gradient boosting

1. Introduction

1.1. Objectives

This paper is a follow-up to previous work by the authors on algorithmic trading in the cryptocurrency market using blockchain metrics and indicators [1]. Here, we extend that work to the stock market. In doing so, we draw on our experience with the cryptocurrency market, although the approach is necessarily different. More precisely, the main objective of this article is to implement an algorithmic intraweek trading system in the stock market by applying advanced artificial intelligence to fundamental and technical variables. An intraweek trading system consists of opening and closing trades over a period of several days or weeks [2].

In this context, the ability to accurately predict asset behavior is crucial for investors. To meet this challenge, financial analysts and data scientists have developed a variety of predictive models employing different methods and datasets. In this article, we are going to analyze the following three types of models.

(a): Fundamental models rely on key company financial data, e.g., earnings per share (EPS), operating cash flow, and market capitalization (fundamental variables). Using advanced decision tree algorithms such as random forest and gradient boosting, these models analyze a wide range of fundamental variables to predict the future behavior of assets. With a sufficiently large number of assets, these models offer a comprehensive view of the market and can identify significant patterns and trends influencing stock prices.
(b): Technical models focus on stock price patterns and trading volumes (technical variables). Using deep learning techniques such as long short-term memory (LSTM) networks, these models incorporate a wide range of technical indicators such as the relative strength index (RSI), moving average convergence divergence (MACD), and Bollinger bands. By processing these technical signals, LSTM networks can capture complex patterns in stock prices and provide accurate predictions about their evolution.
(c): Models that combine fundamental and technical variables will be referred to as hybrid models. In our approach, individual LSTM nets are built for each asset, using both price data and technical indicators. The output of these nets is then fed as an additional variable into an algorithm such as random forest. This approach allows for the capture of both long-term trends based on fundamentals and short-term patterns based on technical analysis, thus providing a more comprehensive view of the market and increasing prediction accuracy.

Whether based on fundamental variables, technical variables, or a combination of both, the above models offer investors powerful tools to make informed decisions and maximize their returns in a highly competitive and volatile environment. In the present work, we study the integration of random forest and LSTM neural networks to obtain models with a greater predictive capacity.

1.2. Related Work

Predictive models in the stock market have evolved considerably, thanks to advances in data science and access to large datasets. In 2013, when the use of deep learning algorithms and massive data processing and storage had not yet exploded, algorithmic trading accounted for 52% of market order volume and 64% of nonmarketable limit order volume [3], i.e., algorithmic trading systems were monitoring market liquidity more actively than human traders. The first algorithmic trading systems were implemented with classic trading strategies, including MACD [4,5], Bollinger bands [6], and RSI. More recently, approaches to algorithmic trading have been proposed in which predictions are based on time series and indicators such as stochastic oscillators, moving averages, momentum, or Williams. Regarding fundamental variables, models based on EPS, company size, and more have been investigated in [7,8]. In these models, the objective is to replicate the manual trading strategies computationally. Nowadays, the increase in computational power and the new algorithms that accompany it allow thousands of variables to be analyzed in real time.

In [9], a hybrid model was proposed in conjunction with a multi-layer perceptron. The data were stock prices and financial ratios of technological companies listed on Nasdaq. In this case, the technical variables were discretized to combine them with the fundamental variables.

Combining predictive models is a field of extensive research in financial investment, as its use improves the possibilities and profitability of operations. Thus, in the work [10], the dataset was analyzed using different cascade models, achieving better predictions. The proposed predictive model is a BiLSTM-GCN, a combination of directional LSTM networks and graph convolutional networks (GCNs). In this case, no technical variables were used, only the price time series.

Similarly, in the work [11], only a set of technical variables obtained exclusively from the prices was used. The number of variables was reduced with principal component analysis (PCA), which facilitates data manipulation and reduces the computational resources required, thus improving the performance of the system. The system also improved the prediction accuracy metrics.

Unlike the previous two references, our predictive model uses both technical and fundamental data. This is why we need a different approach, which we call a hybrid model.

A different type of hybrid model is tested in [12]. In this case, algorithms are combined instead of fundamental and technical data. To be more precise, the authors combine several algorithms based on time series; if the first stage (consisting of intrinsic mode functions (IMFs) with different scales and characteristics) surpasses a certain state, then an autoregressive moving average (ARMA) is applied.

Deep learning-based stock market prediction using price time series and LSTM neural networks is studied in [13]. The algorithmic trading system proposed by the authors achieves a statistical advantage; the accuracy metric reported is 54.5%.

Finally, some models integrate predictive models with decision-making rules, so they are two separate models, rather than hybrid models. For example, [14] proposed a model for the fundamental discrete variables and another for the search of patterns in time series.

A critical analysis of the existing literature reveals a persistent challenge: the effective integration of fundamentally distinct data modalities. While several works attempt hybrid approaches, they often rely on simplifications that compromise data integrity. Some models combine technical and fundamental variables but resort to discretization [9] or dimensionality reduction [11], inevitably losing granular information. Others focus on combining algorithms that process homogeneous data types, such as sequential time series [10,12,13], neglecting the integration of dynamic sequences with static multidimensional data. Crucially, no prior work addresses the native fusion of algorithms specifically designed for these intrinsically different data structures (sequential vs. static) without significant transformation or information loss. This represents a significant gap, as technical indicators (temporal dynamics) and fundamental factors (static context) capture complementary, yet structurally incompatible, aspects of market behavior.

To bridge this gap, our work introduces a novel hybrid architecture that achieves a breakthrough integration. Unlike previous approaches, our model preserves the native structure of each data type by seamlessly combining two specialized algorithms: one explicitly designed for processing sequential time-series data (technical indicators) and another optimized for handling static multidimensional variables (fundamental factors). This design allows, for the first time, a true synergistic analysis in which temporal patterns and contextual fundamentals interact without reductive preprocessing, enabling a more comprehensive and accurate representation of market mechanisms. Our approach constitutes a significant advancement by directly addressing the core challenge of fusing heterogeneous data modalities at the algorithmic level.

1.3. Contents

This paper is organized as follows. In Section 2, we describe the variables used for the configuration of the models. They are of three types: fundamental variables (related to microeconomic data of companies), technical variables (related to indicators and oscillators), and the target variable (related to the predictions we want to make).

In Section 3, we explain the methodology and quality metrics used in this work to measure the performance of the fundamental, technical, and hybrid models briefly introduced in Section 1.1 and detailed in Section 4, after shortly explaining the preprocessing of the fundamental and technical variables in Section 4.1.

Explicitly, we have chosen (i) decision trees (random forest and gradient boosting) and artificial neural networks for the implementation of fundamental models (Section 4.2) and (ii) LSTM networks for the implementation of technical models (Section 4.3). As for the hybrid models, in Section 4.4, we present the main novelty of this paper: a hybrid model whose inputs include outputs of technical models through the integration of predictions obtained from LSTM networks into the fundamental models. Last but not least, in Section 4.5, we discuss how to improve the performance of our hybrid model by selecting technical models with good prediction accuracy.

To complete the picture, in Section 5, we simulate an investment strategy based on value (diversifying the entire capital across the 30 assets with the highest area under the curve) over three weeks of the validation set. The results are compared with major indices (S&P 500, NASDAQ, and Eurostoxx 50).

This paper ends with the main conclusions and a brief outlook in Section 6.

2. Variables

This section describes the fundamental and technical variables used in this article to implement predictive models. The prediction that these models are going to make is whether the price of a given asset will rise or fall in 10 business days (two weeks). To this end, in this section, we also introduce a binary variable called the target; its value at the current time provides a projection of the price direction with the desired prediction horizon.

2.1. Fundamental Variables

In this paper, we use 32 fundamental variables. They represent a curated subset of hundreds of available metrics, rigorously selected based on their complementary information value, empirical predictive power, and coverage of essential financial dimensions (profitability, valuation, leverage, and market dynamics). This selection captures more than 90% of the explanatory variance in equity returns according to fundamental factor studies [15], maximizing informational content while maintaining algorithmic tractability.

The data consists of bimonthly microeconomic data from 482 companies, including tech giants like Apple, Microsoft, and Nvidia; visit [16] for a list of all of these companies. This data was collected by a multinational financial data company that prefers to remain anonymous, and the data was generously made available to us for the present work; similar data can be found on many platforms, such as Alpha Vantage, Bloomberg, or IEX Cloud, although they are all paid [17]. Data collection for micro-economical data took place every 1st and 15th of each month in 2023, which amounts to a total of 482 datasets per company with its fundamental economical data. Price data, on the other hand, were collected from 1971 onwards, providing a long-term historical series calculation of indicators and technical models.

The 32 fundamental variables used in this study are designated below by a short name, followed by a succinct description. They are organized by category.

Profitability Ratios
(a)
Div Yld Y0 Year Ended, Div Yld Y1 Current Year, Div Yld Y2, Div Yld Y3: Dividend Yield for the year ended (Y0) and the following three years (Y1, Y2, Y3). It represents the ratio of dividends paid per share to the share price.
(b)
Margin Ebitda % Y0 Year Ended, Margin Ebitda % Y1 Current Year, Margin Ebitda % Y2, Margin Ebitda % Y3: EBITDA Margin for the year ended (Y0) and the following three years (Y1, Y2, Y3). It represents the ratio of EBITDA to total revenue, indicating profitability before interest, taxes, depreciation, and amortization [18],

$EBITDA Margin = \frac{EBITDA}{Total Revenue} \times 100 % .$

(c)
ROE Y1 Current Year: Return on Equity for the current year, representing the ratio of net income to shareholders’ equity,

$ROE = \frac{Net Income}{Shareholders ’ Equity} \times 100 % .$

(d)
Div Payout Y0 Year Ended, Div Payout Y1 Current Year, Div Payout Y2, Div Payout Y3: Dividend payout ratio for the year ended (Y0) and the following three years (Y1, Y2, Y3). It measures the percentage of earnings paid out to shareholders as dividends.
(e)
EPS +1E 3Meses, EPS +1E Actual, var % EPS +1E 3Meses: next-quarter estimated EPS, actual EPS, and percentage change in estimated EPS,

$EPS = \frac{Net Income - Preferred Dividends}{Number of Common Shares Outstanding} .$

(f)
EBIT +1E 3Meses, EBIT +1E Actual, var % EBIT +1E 3Meses: next-quarter estimated EBIT, actual EBIT, and percentage change in estimated EBIT.
(g)
Sales +1E 3Months, Sales +1E Current, var % Sales +1E 3Months: next-quarter estimated sales, actual sales, and percentage change in estimated sales.
Valuation Ratios
(a)
PER Y0 Year Ended, PER Y1 Current Year, PER Y2, PER Y3: The price-to-earnings ratio (PER) for the year ended (Y0) and the following three years (Y1, Y2, Y3). It measures the ratio of a company’s share price to its earnings per share [19],

$PER = \frac{Price per Share}{Earnings per share} .$

(b)
EV EBITDA Y0 Year Ended, EV EBITDA Y1 Current Year, EV EBITDA Y2, EV EBITDA Y3: Enterprise value to earnings before interest, taxes, depreciation, and amortization (EV/EBITDA) multiple for the year that has ended (Y0) and the following three years (Y1, Y2, Y3). It indicates the valuation of a company relative to its operational cash flow [20],

$EBITDA = Total Revenue - Variable Costs - Operating Expenses .$

(c)
Price to Book Value Y0 Year Ended, Price to Book Value Y1 Current Year, Price to Book Value Y2, Price to Book Value Y3: The price-to-book value ratio for the year that has ended (Y0) and the following three years (Y1, Y2, Y3). It compares a company’s market value to its book value [21],

$P / B = \frac{Current Stock Price}{Book Value per Share} .$

(d)
Price CF Y0 Year Ended, Price CF Y1 Current Year, Price CF Y2, Price CF Y3: The price-to-cash flow ratio for the year that has ended (Y0) and the following three years (Y1, Y2, Y3). It compares the market value of a company to its operating cash flow [22],

$P / CF = \frac{Current Stock Price}{Cash Flow per Share} .$

(e)
FCF/EV (%) Y0 Year Ended, FCF/EV (%) Y1 Current Year, FCF/EV (%) Y2, FCF/EV (%) Y3: The free cash flow-to-enterprise value ratio for the year that has ended (Y0) and the following three years (Y1, Y2, Y3). It measures the percentage of free cash flow to the enterprise value [23],

$FCF / EV = \frac{Free Cash Flow}{Enterprise Value} .$

(f)
FCF YLD (%) Y0 Year Ended, FCF YLD (%) Y1 Current Year, FCF YLD (%) Y2, FCF YLD (%) Y3: The free cash flow yield for the year ended (Y0) and the following three years (Y1, Y2, Y3). It represents the ratio of free cash flow per share to the share price [24],

$FCF Yield = \frac{Free Cash Flow}{Market Value of the Company} \times 100 % .$

(g)
PEG FY1, PEG FY2: The price/earnings-to-growth ratio for the next year (FY1) and the following year (FY2). It relates the P/E ratio to the anticipated future earnings growth rate [25],

$PEG Ratio = \frac{Price / Earnings Ratio}{Annual EPS Growth Rate} .$

(h)
Objective Price 12 months, Potential Objective Price %: the 12-month price target and percentage potential for the price target. The “Potential Objective Price”, also known as “Target Price” or “Target Price Potential”, is an estimation of the future value of a financial asset, typically a stock. This calculation is based on various factors, such as earnings forecasts, industry trends, market conditions, and other relevant information. Analysts and financial institutions often use different methodologies to derive target prices, including fundamental analysis, technical analysis, and valuation models [26],

$Potential Objective Price = Current Price \times (1 + \frac{Growth Potential}{100}) .$

(i)
Target Price 3Months, var % PO 3Months: 3-month price target and percentage change in the price target.
(j)
Long term growth %: Long-term growth percentage. It is a financial metric used in financial analysis and company valuation to estimate the expected growth rate of a company’s revenues, earnings, or other financial indicators in the future. This percentage represents the projected annual growth rate over an extended period, typically spanning several years.
Leverage Ratios
(a)
The net debt to EBITDA ratio for the year ended that has (Y0) and the following three years (Y1, Y2, Y3). It measures a company’s ability to pay off its debts using its earnings [27],

$Net Debt to EBITDA Ratio = \frac{Net Debt}{EBITDA} .$
Market and Trading Data
(a)
Market Value EUR millions: the total market value of a company’s outstanding shares, expressed in millions of euros.
(b)
Float Pct Total Outstdg: the percentage of total outstanding shares that are available for trading in the open market.
(c)
Free-float EUR millions: the market capitalization of a company adjusted for the proportion of shares available for public trading, expressed in millions of euros.
(d)
Recommendation and numerical recommendation for the stock: The recommendation and numerical recommendation for the stock. This indicator is obtained based on the analysis carried out by many traders and analysts manually.
(e)
12 months %, YTD %: percentage change over the last 12 months and year-to-date, respectively.
(f)
Last 52 Weeks Low Price and Last 52 Wks High Price: the lowest and highest prices over the last 52 weeks.
(g)
% From lows 1 year, % from highs 1 year: the percentage change from 1-year lows and highs, respectively.
(h)
3y Price Volatility: three-year price volatility.
(i)
Issue Common Shares Outstdg, Average Daily Volume: the number of common shares outstanding and average daily trading volume, respectively.
(j)
Volume/shares %: volume per share percentage.
(k)
3y BETA Rel to Loc Idx: the three-year beta relative to the local index.
(l)
% Capital contracted daily: the percentage of capital traded daily.
(m)
Diff % Mean 200, diff % Mean 50, diff % Mean 25, Mean 50/200: the percentage difference from the 200 moving average daily timeframe.
(n)
ECA Num EPS, ECA Num EBIT, EC Reco Total, EC Reco Up, EC Reco Down, EC Reco Unchng, % mod recom Positivas/Total, EC Reco Pos, EC Reco Neg, EC Reco Total.1, Positivas/Total (%): various parameters related to earnings per share (EPS), earnings before interest and taxes (EBIT), and recommendations.

2.2. Technical Variables

The technical variables used in this paper consist of the following 11 indicators and oscillators, grouped into three categories.

Trend Indicators
(a)
Simple moving average (SMA) [28]. The formula is given by

${SMA}_{n} = \frac{1}{n} \sum_{i = 1}^{n} {Close}_{i}$

where n is the size of the time window (number of periods in the window), and ${Close}_{i}$ is the closing price on the i-th day of the window. ${SMA}_{n}$ was calculated for $n = 20, 55$ .
(b)
Exponential moving average (EMA) [29]. The formula is given recurrently by

${EMA}_{n} = {EMA}_{n - 1} + \frac{2}{n + 1} \times ({Close}_{n} - {EMA}_{n - 1})$

where n is the EMA span (lookback period), and ${Close}_{n}$ is the closing price at the last day of the period. ${EMA}_{n}$ was calculated for $n = 20, 55, 200$ .
(c)
The Ichimoku cloud [30] is defined by the formulas

$\begin{matrix} {ITS}_{n} & = \frac{High + Low}{2} (over the last n periods) \\ {IKS}_{m} & = \frac{High + Low}{2} (over the last m periods) \\ {ISA}_{p} & = \frac{ITS + IKS}{2} (over the last p periods, plotted \frac{p}{2}) \\ {ISB}_{p} & = \frac{High + Low}{2} (over the last p periods, plotted \frac{p}{2}) \\ {CS}_{m} & = Closing Price (plotted m periods back) \end{matrix}$

The “Tenkan Sen” line ${ITS}_{n}$ is called the conversion line, the “Kijun Sen” line ${IKS}_{m}$ is called the base line, the “Senkou Span A” line ${ISA}_{p}$ is called leading span A, and the “Senkou Span B” line ${ISB}_{n}$ is called leading span B, and the “Chikou Span” line ${CS}_{m}$ is called the lagging span. High and low refer to the highest and lowest prices in the corresponding time window. We used the standard parametric values: $n = 9$ , $m = 26$ , and $p = 52$ .
(d)
The average directional index (ADX) [31] is calculated using the loop

${ADX}_{t} = \frac{{ADX}_{t - 1} \times (n - 1) + {DX}_{t}}{n}$

where n is the time window, $t = 1, \dots, n$ ,

${DX}_{t} = \frac{|{DI}_{t}^{+} - {DI}_{t}^{-}|}{|{DI}_{t}^{+} + {DI}_{t}^{-}|}$

and

${DI}_{t}^{+} = {High}_{t} - {High}_{t - 1}$

${DI}_{t}^{-} = {Low}_{t - 1} - {Low}_{t}$

where ${High}_{t}$ (resp., ${Low}_{t}$ ) is the highest (resp., lowest) price at period t. As usual, we set $n = 14$ .
Momentum Oscillators
(a)
The relative strength index (RSI) [32] is defined as

${RSI}_{n} = 100 - \frac{100}{1 + RS}$

where

$RS = \frac{Average {Gain}_{n}}{Average {Loss}_{n}}$

Here, $Average {Gain}_{n}$ is the average gain over the last n days, and $Average {Loss}_{n}$ is the average loss over the last n days. The RSI was calculated for $n = 6, 12, 14, 24$ .
(b)
The moving average convergence divergence (MACD) [33] is defined by the formulas

$\begin{matrix} {MACD}_{n, m} & = {EMA}_{n} (Close) - {EMA}_{m} (Close) \\ {MACDh}_{n, m, p} & = {MACD}_{n, m} - {EMA}_{p} ({MACD}_{n, m}) \\ {MACDs}_{n, m, p} & = {EMA}_{p} ({MACD}_{n, m}) \end{matrix}$

where n is the size of the time window for the shorter EMA, m is size of the time window for the longer EMA ( $m > n$ ), p is the number of periods of the EMA signal line, ${EMA}_{n} (Close)$ is the exponential moving average of the closing prices in the window of size n, ${MACDh}_{n, m, p}$ is the MACD histogram, and ${MACDs}_{n, m, p}$ is the MACD signal line. We used the standard parametric values: $n = 12$ , $m = 26$ , and $p = 9$ .
(c)
The Williams %R [34] is given by

$Williams % R_{n} = \frac{{High}_{n} - Close}{{High}_{n} - {Low}_{n}} \times 100 %$

where ${High}_{n}$ is the highest price over the last n periods, ${Low}_{n}$ is the lowest price over the last n periods, and Close is the closing price (at the last period). We used the standard value, $n = 14$ .
(d)
The stochastic oscillator (KDJ) [32] is an indicator that measures the current price of an asset in relation to its range over a time interval. It is defined as

$\begin{matrix} % K & = \frac{Close - Min {Low}_{n}}{Max {High}_{n} - Min {Low}_{n}} \times 100 \\ % D & = {SMA}_{n} (% K) \\ J & = 3 \times % K - 2 \times % D \end{matrix}$

where Close is the current closing price, $Min {Low}_{n}$ is the asset’s lowest price over the last n periods, and $Max {High}_{n}$ is the highest price over the same n periods. We used the standard parametric values: the calculation time interval for %K = 14, and %D smoothing window size = 3.
(e)
Squeeze momentum (SQZ) [35] is a volatility indicator defined as

${SQZ}_{n, m, p, q} = \frac{SMA ({Close}_{n}) - SMA ({Close}_{m})}{SMA ({Close}_{p}) \times q}$

where n is the size of the time window for the shorter mean (usually ranging from 5 to 20), m is the size of the time window for the longer mean (usually between 20 and 50), p is the size of the time window for the comparison mean, q is a multiplier (typically 2 or 3), and $SMA ({Close}_{n})$ is the simple average of the closing prices in the window of size n. We use the following parametric values: $n = 20$ , $m = 50$ , $p = 200$ , and $q = 2$ .
Volatility Indicators
(a)
Bollinger bands [36] are envelopes plotted at a standard deviation level above and below a simple moving average of the price. They are defined by the formulas

$\begin{matrix} Lower band : {BBL}_{n, k} & = {SMA}_{n} (Close) - k \times {StdDev}_{n} (Close) \\ Middle band : {BBM}_{n, k} & = {SMA}_{n} (Close) \\ Upper band : {BBU}_{n, k} & = {SMA}_{n} (Close) + k \times {StdDev}_{n} (Close) \\ Band width : {BBB}_{n, k} & = \frac{{BBU}_{n, k} - {BBL}_{n, k}}{{BBM}_{n, k}} \\ Band percent : {BBP}_{n, k} & = \frac{Close - {BBL}_{n, k}}{{BBU}_{n, k} - {BBL}_{n, k}} \end{matrix}$

where n is the size of the time window, k is the number of standard deviations of the data in the window (StdDev), Close is the closing price, ${StdDev}_{n} (Close)$ is the standard deviation of the closing prices over the window, and ${SMA}_{n} (Close)$ is the simple moving average of the closing prices over the window. The parameter n determines the sensitivity of the bands to changes in prices. The parameter k determines the distance between the bands and the moving average. We used the standard parametric values: $n = 20$ and $k = 2$ .
(b)
The average true range (ATR) [37] is a price volatility indicator defined as

${ATR}_{n} = \frac{1}{n} \sum_{i = 1}^{n} max ({High}_{i} - {Low}_{i}, | {High}_{i} - {Close}_{i - 1} |, | {Low}_{i} - {Close}_{i - 1} |)$

where n is the size of the time window, ${High}_{i}$ is the highest price in the periods $1, 2, \dots, i$ , ${Low}_{i}$ is the lowest price in the periods $1, 2, \dots, i$ , and ${Close}_{i - 1}$ is the closing price at period $i - 1$ . We used the standard value, $n = 14$ .

The selection of these 11 technical indicators was based on their widespread adoption and complementary functions:

Trend indicators (SMA, EMA, Ichimoku, ADX) identify market direction;
Momentum oscillators (RSI, MACD, Williams %R, KDJ, SQZ) measure price velocity and reversal points;
Volatility indicators quantify price fluctuation intensity, specifically, and Bollinger bands measure volatility through band width (BBB) and the band percentage (BBP), while ATR measures average trading range, regardless of the direction.

The above 11 technical indicators constitute a validated core set selected from thousands of available tools based on their complementary signal properties, computational efficiency, and empirical effectiveness. As demonstrated in recent literature [38], this combination provides optimal coverage of the trend, momentum, and volatility dimensions while capturing more than 85% of the technical signal information value in equity markets.

All indicators use standard parameter settings, as established in their original literature and industry practice, ensuring reproducibility and comparability with established financial research [38].

2.3. Target

Our target variable is the direction of stock prices in 10 business days. We will proceed formally and define a binary variable,

Target (n)

.

Let

P (1), P (2), \dots, P (T)

be a time series of closing stock prices to be used in a prediction algorithm. To define the target variable at time n (the time at which the prediction is made) with the prediction horizon h (the number of prediction periods, i.e., business days to make the prediction), such that

n + h \leq T

, we first need the so-called relative direction

Direct (n + h)

, which is defined as

Direct (n + h) = \frac{{Max}_{P} - P (n)}{2} + \frac{{Min}_{P} - P (n)}{2} + \frac{P (n + h) - P (n)}{2}

(1)

where

P (n)

is the closing price at time n,

{Max}_{P}

is the maximum closing price in the periods

n + 1, \dots, n + h

, and

{Min}_{P}

is the minimum closing price in the same

n + h

periods.

In all tests performed in the present study, the default value of the prediction horizon is

h = 10

. In view of Equation (1),

Direct (n + 10)

assigns equal importance to the fluctuations between the initial closing price

P (n)

and the maximum, the minimum, and the closing price of 10 days. This way, in the case of automated trading systems, it is possible to take profits before the close at 10 days if the price reaches a certain threshold or a stop loss is activated.

The target at time n for 10-day predictions,

Target (n)

, is then defined as

Target (n) = \{\begin{matrix} 1 & if Direct (n + 10) - P (n) > 0 \\ 0 & if Direct (n + 10) - P (n) \leq 0 \end{matrix}

(2)

The discrete 0-1 variability of

Target (n)

could have been smoothed with percentage increases, but for our purposes, it suffices to know whether the price of the asset is going to rise (

Target (n) = 1

) or fall (

Target (n) = 0

).

3. Methodology

Our study of algorithmic trading in the stock market started with the processing of historical microeconomic and stock price data of the 482 companies listed in [16] to obtain the fundamental and technical variables presented in Section 2.1 and Section 2.2, respectively, for each company. With these datasets, we compared the 10-day stock price direction, measured by the target variable (2) and predicted by the following models, which will be discussed in detail in Section 4.

(M1): Fundamental models, which are machine learning algorithms based on decision trees and fed only with fundamental variables.
(M2): Technical models (one per asset), which are LSTM networks trained only with technical variables.
(M3): Hybrid models, which are models built on both fundamental and technical models.

Thanks to the use of a binary classifier (the target) in the three cases, we could evaluate the performance of the above models with the same metrics, namely:

(i): Area under the ROC curve (AUC).
(ii): Accuracy (ACC).
(iii): Recall (sensitivity).
(iv): Specificity.
(v): Precision.
(vi): F1-score.
(vii): Type I error (false positive rate).
(viii): Type II error (false negative rate).

Furthermore, to detect overfitting, we resorted to the following metrics.

\begin{matrix} Diff AUC & = & Train AUC - Test AUC \end{matrix}

(3)

\begin{matrix} Diif ACC & = & Train ACC - Test ACC \end{matrix}

(4)

where Train AUC/ACC (resp., Test AUC/ACC) refers to results obtained with training (resp., testing) data.

To measure the prediction performance of models M1–M3 with the train and test datasets, we use, in Section 4, the train and test AUC as the main metrics. The reason for this choice is that, since the target is a discrete variable, the AUC provides a view of the prediction quality of the models at a statistical level.

4. Models and Performances

In this section, we describe the three models considered in this paper: the fundamental, technical, and hybrid models. To be more specific, the hybrid model combines the best fundamental model (according to our performance results) and certain outputs of the technical models. Lastly, and most importantly, we will also describe how to improve the performance of the hybrid model by selecting the technical models.

Due to the limited availability of fundamental data and the architecture of our hybrid model, which depends on the output of previously trained LSTM models, the effective training window of the hybrid and fundamental models was restricted to 2023. The variables derived from technical models are obtained by training them from the initial listing date of each asset. For some assets, this corresponds to as early as 1971. The predictions generated for the year 2023 are subsequently incorporated into the fundamental or hybrid model. As allocating a separate validation set within this limited range would have further reduced the training data, we opted to use all available data for training and testing. This trade-off was necessary to preserve the robustness of the hybrid model under data constraints. For the LSTM models, a chronological train–test split was implemented to prevent look-ahead bias. Approximately 60% of each time series was used for training, 30% for testing, and 10% for validation. In cases where this condition could not be met, the corresponding records were excluded to ensure the integrity of the training process and avoid data leakage.

4.1. Data Pre-Processing

This preprocessing approach was chosen as a compromise between data integrity and model coverage. The goal was to maximize the inclusion of informative records while maintaining consistency across the dataset. By eliminating unrecoverable data and applying simple imputation strategies where appropriate, we ensured that the dataset remained suitable for model training without introducing excessive noise. These choices reflect the practical constraints of working with real-world financial data, where missing information is frequent and must be addressed systematically to enable robust predictive modeling.

4.1.1. Fundamental Variables

The preprocessing of the fundamental variables involved the following steps:

Due to the lack of sufficient data, the following columns were entirely removed:
–
capitalization_millions
–
float_pct_total_outstdg
–
free_float_eur_millions
For records with missing values:
–
In the column representing the numerical analyst recommendation, missing values were replaced with a value of 1, as it represents a neutral or average recommendation.
–
In all other cases, missing values were replaced with 0. This imputation was applied to variables such as EBITDA, the P/E ratio, and the price-to-book ratio.

4.1.2. Technical Variables

For technical variables related to price data, entries with missing values or non-existent records on individual days were removed from the dataset. Additionally, during the calculation of technical indicators and oscillators, these days were excluded from the computation. In cases where moving averages resulted in null values (typically at the beginning of the series), those records were also removed to ensure consistency in the input features.

4.2. Fundamental Models

For the implementation of predictive models using fundamental data with a 10-day prediction horizon, we employed three advanced machine learning algorithms: random forest [39], gradient boosting [40], and a neural network model [41]. The models were implemented in Python 3.8 using Scikit-learn for tree-based models and Keras with TensorFlow backend for the neural network; hyperparameters were optimized using random search [42]. The best configurations obtained are described below.

Random Forest (RF): Implemented with Scikit-learn’s RandomForestClassifier, the best model was obtained with the following parameters: bootstrap=True, max_depth=5, max_samples=0.4,max_features=“sqrt”, min_samples_leaf=13, min_ samples_split=20, n_estimators=170, and class_weight=“balanced_subsample”. This configuration achieved a Test AUC of 0.563 and a Test ACC of 0.543.
Gradient Boosting (GB): The model was implemented using Scikit-learn’s Gradient- BoostingClassifier, which is a standard implementation of gradient boosting with decision trees. We did not use optimized variants such as LightGBM or XGBoost. The optimal configuration was as follows: learning_rate=0.001, sub_sample=0.4, max_depth=3, max_features=“sqrt”, min_samples_leaf=8, min_samples_split=12, and n_esti- mators=130. This model achieved a test AUC of 0.559 and a test ACC of 0.560.
Neural Network (NN): Built using Keras and TensorFlow, the model consists of an input layer with one neuron per input feature and ReLU activation, followed by two hidden layers. The first hidden layer contains $n^{2}$ neurons (with n being the number of input features), the second contains 256 neurons; both use ReLU activations and are regularized with Dropout (dropout rate = 0.3). The output layer has a single neuron with a sigmoid activation function for binary classification. The model was trained using the Adam optimizer with a learning rate of 0.01 and binary cross-entropy loss. The best configuration resulted in a test AUC of 0.544 and a test ACC of 0.534.

In view of the above test AUC results, the a priori best model is random forest with the configuration given in Point 1. Therefore, this model will be the only fundamental model considered henceforth, also referred to as the fundamental best model.

Despite challenges in negative class identification, the model maintains reasonably consistent performance in key operational metrics, with only moderate degradation from training to testing. Most notably, the model maintains a stable F1-score (0.543 on the test set, 0.541 on the validation set), indicating a balanced precision–recall trade-off, while achieving an accuracy of 54.3% on the test set and 56.0% on the validation set. This combination of 62.0% recall (effective opportunity capture) and 48.2% precision (moderate signal quality) provides a practical foundation for upside-focused strategies, in which missing potential gains is costlier than incurring false positives.

While the observed specificity degradation (67.3% → 48.3% test, 58.8% validation) and elevated Type I error (51.7% test) suggest overfitting in negative class identification, these metrics must be interpreted within the context of bullish market conditions:

Class imbalance naturally penalizes specificity. The scarcity of negative cases (losses) limits the model’s ability to generalize on their patterns.
Higher false positives are operationally tolerable if the core objective remains maximizing upside participation (as evidenced by recall stability: 62.0% on the test set, 53.0% on the validation set)

From a generalization perspective, the drop in accuracy from training (63.3%) to testing (54.3%) and validation (56.0%) confirms a moderate overfitting tendency, primarily driven by weaker negative class detection. However, the closeness between test and validation metrics across recall, precision, and F1-score suggests that the model’s behavior is stable when exposed to unseen data, supporting its operational reliability for real-time deployment in similar market regimes.

To complete the performance assessment, Table 1 “presents comprehensive performance metrics for the optimal fundamental model across training and testing datasets for 10-day-ahead forecasts.” The corresponding ROC curves are shown in Figure 1.

To further assess the model’s performance under a more robust validation framework, rolling-window time series cross-validation was implemented, using 70% of the data for training and 30% for testing in each split. The area under the ROC curve (AUC) was computed for each of the five folds, yielding values of 0.5769, 0.5352, 0.5238, 0.5265, and 0.5962, with a mean of 0.5517 and a standard deviation of 0.0293. These results indicate a moderate yet consistent predictive performance across different temporal segments, supporting the stability of the model’s behavior in cross-validation.

The corresponding ROC curves for each fold are shown in Figure 2.

4.3. Technical Models

In the case of technical variables (Section 2.2), we use LSTM neural networks [43] to forecast closing stock prices over a 10-day horizon. Since the time series of different assets have different patterns, the configuration of the network depends on the asset. Due to the complexity of LSTM networks, their adjustment and optimization were performed with the help of a search algorithm on a range of characteristics.

In our paper on blockchain indicators [1], we show that better results are achieved by converting the indicators into percentages with respect to the moving averages. Therefore, here, we repeat the same procedure regarding stock prices with some indicators and oscillators. In doing so, we maintain the closing price data.

Furthermore, for each asset, we run a greedy search optimization [44] algorithm that, by using different configurations, obtained the best metrics based on the AUC. The feasible configurations were the following:

Epochs: 20, 40, 60, 100.
Layers: 1, 4, 8.
Window: 10, 20, 30.

The best configuration, on average, was the one given by the triple

Epoch = 40, Layers = 8, Window = 30

(5)

These are the corresponding settings of our technical models. The other characteristics (notably the weights) depend on the given asset, so there is actually one technical model per asset.

LSTM models were implemented in Python 3.8 using the Keras library with a TensorFlow backend. To measure the performance of technical models, we made predictions on the train and test data. The aggregated results are given in Table 2.

The mean Test AUC of 0.527 across 482 assets, while modest in absolute terms, achieves statistical significance and represents a consistent edge over random classification. The number of assets with

Test AUC \geq 0.50

was 346 (71%), the number of assets with

Test AUC \geq 0.60

was 59 (12%), and the number of assets with

Test AUC \geq 0.65

was 18 (3%). These metrics indicate that, at a general level, the models present a statistical advantage, although patterns with a high statistical advantage can only be found in a small group of assets.

The models were trained using the following hardware configuration: an HP Pavilion Gaming Laptop 15-ec0xxx (HP Inc., Palo Alto, CA, USA) equipped with an AMD Ryzen 5 3550H processor with Radeon Vega Mobile Graphics (Advanced Micro Devices, Inc., Santa Clara, CA, USA) running at 2100 MHz, featuring four physical cores and eight logical processors. The system ran Microsoft Windows 11 (Microsoft Corporation, Redmond, WA, USA) with 8 GB of RAM and 23 GB of virtual memory. The approximate training duration for each technical model ranged from 20 to 35 min, resulting in an overall experimentation process that lasted several weeks.

4.4. A Hybrid Model

The hybrid model is a mixture of fundamental and technical models whose objective is to enhance the predictive capabilities of both types of models. To this end, we leverage predictions from already trained technical models as inputs to train a final model, the hybrid. Specifically, we feed three outputs from each technical model into the fundamental best model and, additionally, to the fundamental variables.

Figure 3 illustrates the architecture of the hybrid model that we propose. Of course, the technical models are the 482 LSTM networks of Section 4.3, trained with percentages and the closing stock prices of the 482 companies mentioned in Section 2.1 and listed in [16]. The outputs of these technical models, fed into the random forest model, along with the fundamental data of the same 482 companies, are the following.

(i): Test AUC
(ii): The pondered (or weighted) probability, Pond Prob. This probability is calculated according to the formula

$Pond Prob = \frac{Prob - {Prob}_{\min}}{{Prob}_{\max} - {Prob}_{\min}}$

(6)

where Prob is the probability that the price of the given asset will increase in 10 days, and ${Prob}_{\min}$ (resp., ${Prob}_{\max}$ ) is the minimum (resp., maximum) probability obtained in the test predictions.
(iii): Diff AUC (Equation (3)).

The ROC curves of the hybrid model for training and testing data are shown in Figure 4.

Table 3 compares the performances of the best fundamental model (Table 1), the technical models (Table 2), and our hybrid model, as measured by the test AUC and train AUC (average values in the case of the technical models). We see that the test AUC of the hybrid model is 0.566, an increase of 0.003 with respect to the best fundamental model, and an increase of 0.071 with respect to the average Test AUC of the technical models. We conclude that the new data fed into the random forest in the hybrid model improves its predictive capability, albeit only slightly. As shown in Figure 5, the variable “LSTM prediction” ranks 15th in the variable importance ranking of the hybrid model, with an importance of 0.025076—a good value. In the next section, we explain how to boost the Test AUC metric of the hybrid model. Statistical significance tests were conducted to evaluate the predictive performance of the proposed models.

The fundamental model achieved an AUC of 0.5628 with a 95% confidence interval of [0.547, 0.578], indicating that its performance is statistically significantly better than random guessing. Similarly, the hybrid model obtained an AUC of 0.566 with a 95% confidence interval of [0.550, 0.581], also demonstrating statistically significant predictive capability. In contrast, the technical model yielded an AUC of 0.493 with a wide 95% confidence interval of [−0.296, 1.282], and therefore, no conclusion regarding its statistical significance can be drawn. These results confirm the effectiveness of the fundamental and hybrid models in outperforming random predictions, while the technical model does not show evidence of predictive power beyond chance.

4.5. An Improved Hybrid Model

Here, we study the improvement in the hybrid model by selecting the technical models. For this purpose, we progressively filtered rows from the dataset based on LSTM-predicted values exceeding a threshold. This threshold corresponded to the AUC of the specific LSTM model and ranged from 0.3 to 0.6. All variables were retained; only rows not meeting the threshold were removed. For each filtered subset, we trained random forest models and measured their train and test AUC scores. Figure 6 shows the evolution of these metrics versus the LSTM AUC threshold.

We observe in Figure 6 that, as the threshold increases, the number of rows of the dataset (discontinuous line) decreases, while both the train AUC (red continuous line) and the test AUC (blue continuous line) increase. In fact, in the last iteration, where only 318 records with an AUC greater than 0.6 remain, the test AUC of the hybrid model is 0.73. This is a very high value for a trading system.

According to Figure 7, in the last filtered model, with an AUC greater than 0.6 in the LSTM prediction variable, this variable already occupies first place in the importance ranking of the model variables with an importance of 0.066, while it ranks 15th for the unfiltered hybrid model with an importance of 0.025 (Figure 5).

5. Simulation Results

In order to evaluate the effectiveness of the proposed system, a simulation was conducted on the validation set. The strategy was based on a value investing approach, selecting weekly the 30 assets with the highest AUC metric derived from the predictive model. Each asset was assigned the same initial capital, without leverage, and positions were held during the corresponding weekly period.

The returns obtained were compared with the performance of three major benchmark stock indices: the S&P 500, the Nasdaq Composite, and EuroStoxx 50. Over the three weeks analyzed, the proposed strategy demonstrated superior performance relative to these indices in terms of cumulative percentage return.

The weekly returns of the proposed algorithmic trading system were evaluated against three major market indices—the S&P 500, the Nasdaq Composite, and EuroStoxx 50—over a four-week period (week 0 to week 3). As shown in the figure, the first week (week 1) experienced a market downturn across all indices. Notably, the algorithmic system demonstrated greater resilience, exhibiting a smaller negative return compared to the indices, indicating its ability to mitigate losses during adverse market conditions. In the subsequent weeks (week 2 and week 3), the system outperformed all indices by achieving significantly higher positive returns, suggesting effective selection and allocation strategies. These results support the hypothesis that the investment approach, based on selecting the top 30 assets with the highest AUC and equally weighting positions, enhances risk-adjusted performance relative to broad market indices.

Figure 8 shows the weekly return comparison, where it can be observed that the system consistently outperforms the benchmark indices during the evaluated dates.

6. Conclusions and Outlook

The question studied in this paper is whether fundamental, technical, or hybrid models provide a statistical advantage when trading stock market assets and which of them performs best. To answer this question, we converted it into a classification problem. The classes are defined by the binary variable

Target (n)

at each prediction time n:

Target (n) = 1

means that the price of the asset will increase in 10 business days, whereas

Target (n) = 0

means the opposite.

The type of model refers to the data used. Thus, for the fundamental models, we used microeconomic data of 482 companies in the year 2023. These data provided good predictive capability. For the technical models, we used indicators and oscillators based on stock price and volume data from the same companies. We found that, in some cases, technical models also provided good predictive capabilities. As their name suggests, hybrid models are a combination of fundamental and technical models.

For fundamental models, we considered three candidates: random forest, gradient boosting, and neural network. The selection criterion consisted of predicting, with a 10-day objective, which companies were the best to invest in. The fundamental model that performed best (the “fundamental best model”) was random forest, with the hyperparameters given in Section 4.2 and a test AUC of 0.563.

The technical models were implemented with LSTM networks, one per asset. The task to measure their performance consisted of predicting, also with a 10-day objective, whether the price of the corresponding asset was going to rise or fall. We found that the models built using price percentages with respect to indicators provided more accurate predictions. In these technical models, a statistical significance test was performed on all aggregated models, and the result was negative, with an AUC of 0.493 and a wide 95% confidence interval of [−0.296, 1.282]. In models where the AUC is below 0.5 or the AUC difference is excessively high, there is a risk of overfitting. This may indicate that the integration of variables obtained from the LSTM networks does not lead to a substantial improvement of the first hybrid model (Section 4.3) over the fundamental model.

As for hybrid models, the model implemented in this paper is made up of the fundamental best model (random forest), additionally fed with information output by the technical models (three inputs per technical model, i.e., per asset). As a result, the test AUC of random forest increased from 0.563 (with only fundamental data) to 0.566 (with additional technical data). This finding suggests that the technical features derived exclusively from the LSTMs are insufficient to enhance the model.

Test data were obtained using 20% of the dataset, and validation AUC data were obtained using 10% of the dataset. The results were comparable between the fundamental and hybrid models, with the fundamental model achieving AUC values of 0.690 (train), 0.563 (test), and 0.577 (validation), and the hybrid model achieving 0.686, 0.566, and 0.566, respectively. These findings suggest that both approaches exhibit similar generalization capabilities, with no substantial improvement observed when integrating hybrid features over the purely fundamental approach. Nevertheless, it is worth noting that the variable lstm_prediction ranked 15th in the feature importance analysis, indicating that it was a relatively important variable in the hybrid model.

With this being the case, in Section 4.5, we demonstrated that the quality of the hybrid model (as measured by the AUC) can be further improved by reducing the number of records, in fact, by operating with LSTM networks that have a Test AUC greater than 0.6. In this situation, while the universe of assets decreases, the statistical advantage of the model increases, reaching an AUC of 0.73. In addition, the variable obtained through such LSTM networks was one of the 10 most important, according to Figure 7. This point is crucial because it indicates that, if the best technical models (LSTM networks) are part of the hybrid model, the statistical advantage increases considerably, and this is one of the most important objectives in algorithmic trading.

LSTM models are highly complex and computationally intensive models. Ideally, a hyperparameter optimization algorithm tailored to each asset would have been more appropriate, rather than applying a single generalized configuration. However, given the large number of assets (482), we were compelled to generalize and seek a configuration that was optimal overall. In this case, a pattern search algorithm is needed for each asset, and the optimization of the corresponding algorithm requires a different number of layers, epochs, window sizes, and hyperparameter settings. Therefore, it is unfeasible in practice to generalize a solution for all assets. In our case, we focused on a limited number of possible solutions, certainly very small compared to the huge number of algorithm configurations.

Another important point is the quality of the data and its cost. In general, the provision of fundamental data sets is expensive and not within reach of small traders. This can mean a significant disadvantage for the latter compared to large corporations. And yet, we showed in the previous sections that it is relatively easy to obtain a statistical advantage with fundamental models when trading in the stock market.

We have also shown that technical data (basically, stock price and trading volume) can furnish predictive algorithms with a sufficient statistical advantage. Since that type of data is available to any trader, it is profitability via technical models. The problem with technical models based on LSTM networks is that they predict a single asset and, therefore, the risk is very high. Indeed, an unforeseen event that causes the price to move in the opposite direction to that expected can lead to large capital losses. For this reason, it is advisable to use hybrid models that allow for the diversification of positions.

It is likely that the optimal architecture for each individual asset was not found, as different assets exhibit different patterns and may require customized configurations. Implementing an algorithm capable of identifying the best architecture per asset could have resulted in significantly improved performance—potentially achieving AUC scores above 0.6 for most assets. However, such an optimization process would require substantial computational infrastructure, which was beyond the scope and resources of this study. While filtering the asset universe to those with LSTM test AUC > 0.6 may improve predictive accuracy, it also significantly narrows the investable pool, potentially compromising diversification and increasing concentration risk. Furthermore, regulated investment frameworks—such as the UCITS Directive (2009/65/EC)—mandate that no more than 5% of a fund’s assets may be invested in transferable securities or money market instruments issued by the same body [45]. This regulatory constraint illustrates why a sufficiently broad universe of eligible assets is necessary to comply with concentration limits and ensure robust risk management. Therefore, it becomes essential to continue improving this contribution by developing techniques that increase the AUC across all LSTM models, in order to make such a system viable and implementable in real-world financial environments.

One potential direction for future research involves replacing the LSTM networks with graph neural networks (GNNs), which may offer a more powerful representation of relationships among financial variables. GNNs have shown promising results in domains where the structure of data can be modeled as a graph, and this approach could help improve the model’s predictive performance (e.g., AUC). However, implementing GNNs comes with significantly higher computational costs and infrastructure requirements, as they require explicit graph construction, greater memory consumption, and more complex training pipelines. Despite these challenges, their ability to capture latent dependencies between assets or macroeconomic indicators could be valuable in real-world financial applications.

While the results presented demonstrate promising improvements with the hybrid modeling approach, some limitations should be noted. First, the computational cost of training separate LSTM networks for each asset can be substantial, especially as the number of assets grows, which may limit scalability and real-time applicability in larger universes. Second, our study focuses primarily on a specific market (e.g., developed markets), and the generalizability of these findings to other contexts, such as emerging markets with different market dynamics, liquidity profiles, and data availability, remains to be validated. In particular, emerging markets often present additional challenges due to the scarcity of sufficient historical data, which makes it more difficult to build statistically robust models. Furthermore, creating LSTM models to cover all assets across emerging markets would require extremely high storage and processing capabilities. Moreover, LSTMs themselves struggle to be effectively trained on very long sequences, which further complicates their application in these markets. Despite these difficulties, emerging markets hold significant potential due to their higher volatility and possible opportunities for greater predictive gains. Importantly, the proposed hybrid system could be adapted and applied to other similar markets, such as Asian or South American markets, where similar conditions and challenges exist. Future work could explore strategies to reduce computational demands—such as transfer learning or model sharing—and extend the analysis to diverse markets to assess robustness and adaptability.

The results of the market application simulator for validation data simulating an Algorithmic Trading System indicate that the algorithmic system operates in line with the main market indices, but it slightly outperforms them during the three-week test period. However, to draw robust conclusions about its long-term effectiveness, a comprehensive evaluation over multiple years and diverse market conditions would be necessary.

As a final message, we state that the combination of fundamental and technical models into hybrid models with improved predictive capabilities is a topic worth investigating in algorithmic trading. Actually, in this paper, we have proven the feasibility of that approach. In addition to new ways of defining hybrid models with ever-better performances, future work will also include the study of more cognitive models taking into account macroeconomic data, responses to news or events, psychological supports and resistances, Fibonacci levels, or, for that matter, any variable that could have a correlation with the price direction. Other related topics are the optimization of the LSTM nets, as well as the impact of taking different targets, timeframes, and combinations of both.

Author Contributions

Conceptualization, J.C.K. and J.M.A.; data curation, J.C.K.; formal analysis, J.C.K.; investigation, J.C.K.; methodology, J.C.K. and J.M.A.; project administration, J.M.A.; software, J.C.K.; supervision, J.M.A.; validation, J.C.K.; visualization, J.C.K.; writing—original draft, J.C.K.; writing—review and editing, J.C.K. and J.M.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data and computer codes used in this paper can be found in [16].

Acknowledgments

The authors are very grateful to the reviewers for their constructive criticism, which has helped us improve this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

King, J.C.; Dale, R.; Amigó, J.M. Blockchain metrics and indicators in cryptocurrency trading. Chaos Solitons Fractals 2024, 178, 114305. [Google Scholar] [CrossRef]
Chen, J.; Haboub, A.; Khan, A.; Mahmud, S. Investor clientele and intraday patterns in the cross section of stock returns. Rev. Quant. Financ. Account. 2025, 64, 757–797. [Google Scholar] [CrossRef]
Hendershott, T.; Riordan, R. Algorithmic trading and the market for liquidity. J. Financ. Quant. Anal. 2013, 48, 1001–1024. [Google Scholar] [CrossRef]
Lei, Y.; Peng, Q.; Shen, Y. Deep learning for algorithmic trading: Enhancing macd strategy. In Proceedings of the 6th International Conference on Computing and Artificial Intelligence, Hohhot, China, 17–20 July 2020; pp. 51–57. [Google Scholar]
Mahajan, Y. Optimization of macd and rsi indicators: An empirical study of indian equity market for profitable investment decisions. Asian J. Res. Bank. Financ. 2015, 5, 13–25. [Google Scholar] [CrossRef]
Prasetijo, A.B.; Saputro, T.A.; Windasari, I.P.; Windarto, Y.E. Buy/sell signal detection in stock trading with bollinger bands and parabolic sar: With web application for proofing trading strategy. In Proceedings of the 2017 4th International Conference on Information Technology, Computer, and Electrical Engineering (ICITACEE), Semarang, Indonesia, 18–19 October 2017; pp. 41–44. [Google Scholar]
Patell, J.M. Corporate forecasts of earnings per share and stock price behavior: Empirical test. J. Account. Res. 1976, 1, 246–276. [Google Scholar] [CrossRef]
Chan, E.P. Quantitative Trading: How to Build Your Own Algorithmic Trading Business; John Wiley & Sons: Hoboken, NJ, USA, 2021. [Google Scholar]
Namdari, A.; Li, Z.S. Integrating fundamental and technical analysis of stock market through multi-layer perceptron. In Proceedings of the 2018 IEEE Technology and Engineering Management Conference (TEMSCON), Evanston, IL, USA, 28 June–1 July 2018; pp. 1–6. [Google Scholar]
Lazcano, A.; Herrera, P.J.; Monge, M. A combined model based on recurrent neural networks and graph convolutional networks for financial time series forecasting. Mathematics 2023, 11, 224. [Google Scholar] [CrossRef]
Nobre, J.; Neves, R.F. Combining principal component analysis, discrete wavelet transform and XGBoost to trade in the financial markets. Expert Syst. Appl. 2019, 125, 181–194. [Google Scholar] [CrossRef]
Lv, P.; Wu, Q.; Xu, J.; Shu, Y. Stock index prediction based on time series decomposition and hybrid model. Entropy 2022, 24, 146. [Google Scholar] [CrossRef]
Pang, X.; Zhou, Y.; Wang, P.; Lin, W.; Chang, V. An innovative neural network approach for stock market prediction. J. Supercomput. 2020, 76, 2098–2118. [Google Scholar] [CrossRef]
Yoon, Y.; Guimaraes, T.; Swales, G. Integrating artificial neural networks with rule-based expert systems. Decis. Support Syst. 1994, 11, 497–507. [Google Scholar] [CrossRef]
Damodaran, A. Equity Risk Premiums: Determinants, Estimation and Implications, 2020th ed.; NYU Stern School of Business: New York, NY, USA, 2020. [Google Scholar]
Data, Computer Codes, and List of 482 Companies. Available online: https://github.com/JuanCarlosKing/StockmarketAlgoritmicTrading (accessed on 21 February 2025).
Ng, S.L.; Rabhi, F. A data analytics architecture for the exploratory analysis of high-frequency market data. In Proceedings of the 11th International Workshop on Enterprise Applications, Markets and Services in the Finance Industry 2022, Lecture Notes in Business Information Processing (LNBIP), Twente, The Netherlands, 23–24 August 2022; Springer: Cham, Switzerland, 2023; Volume 467, pp. 1–16. [Google Scholar]
Okeke, C.M.G. Evaluating company performance: The role of EBITDA as a key financial metric. Int. J. Comput. Appl. Technol. Res. 2020, 9, 336–349. [Google Scholar]
Ghaeli, M.R. Price-to-earnings ratio: A state-of-art review. Accounting 2016, 3, 131–136. [Google Scholar] [CrossRef]
Zelmanovich, B.; Hansen, C.M. The basics of ebitda. Am. Bankruptcy Inst. J. 2017, 36, 36–37. [Google Scholar]
Budianto, E.W.H.; Dewi, N. The influence of book value per share (BVS) on Islamic and conventional financial institutions: A bibliometric study using VOSviewer and literature review. Islam. Econ. Bus. Rev. 2023, 2, 139–147. [Google Scholar]
Akhtar, T.; Rashid, K. The relationship between portfolio returns and market multiples: A case study of Pakistan. Oecon. Knowl. 2015, 7, 2–28. [Google Scholar]
Toumeh, A.A.; Yahya, S.; Amran, A. Surplus free cash flow, stock market segmentations and earnings management: The moderating role of independent audit committee. Glob. Bus. Rev. 2023, 24, 1353–1382. [Google Scholar] [CrossRef]
Purohit, S.; Agarwal, B.; Kanoujiya, J.; Rastogi, S. Impact of shareholder yield on financial distress: Using competition and firm size as moderators. J. Econ. Adm. Sci. 2025. [Google Scholar] [CrossRef]
Babii, A.; Ball, R.T.; Ghysels, E.; Striaukas, J. Panel data nowcasting: The case of price–earnings ratios. J. Appl. Econom. 2024, 39, 292–307. [Google Scholar] [CrossRef]
Damodaran, A. Investment Valuation: Tools and Techniques for Determining the Value of Any Asset; Wiley: Hoboken, NJ, USA, 2012; ISBN 978-1118011522. [Google Scholar]
Bouwens, J.; De Kok, T.; Verriest, A. The prevalence and validity of EBITDA as a performance measure. Comptabilité-Contrôle-Audit 2019, 25, 55–105. [Google Scholar] [CrossRef]
Huang, X.D.; Qiu, X.X.; Wang, H.J.; Jin, X.F.; Xiao, F. A prospective randomized double-blind study comparing the dose-response curves of epidural ropivacaine for labor analgesia initiation between parturients with and without obesity. Front. Pharmacol. 2024, 15, 1348700. [Google Scholar] [CrossRef]
Formis, G.; Scanzio, S.; Cena, G.; Valenzano, A. Linear Combination of Exponential Moving Averages for Wireless Channel Prediction. In Proceedings of the 2023 IEEE 21st International Conference on Industrial Informatics (INDIN), Lemgo, Germany, 18–20 July 2023; pp. 1–6. [Google Scholar] [CrossRef]
Lutey, M.; Rayome, D. Ichimoku cloud forecasting returns in the US. Glob. Bus. Financ. Rev. 2022, 27, 17–26. [Google Scholar] [CrossRef]
Salkar, T.; Shinde, A.; Tamhankar, N.; Bhagat, N. Algorithmic trading using technical indicators. In Proceedings of the 2021 International Conference on Communication information and Computing Technology (ICCICT), Mumbai, India, 25–27 June 2021; pp. 1–6. [Google Scholar]
Wu, M.; Diao, X. Technical analysis of three stock oscillators testing macd, rsi and kdj rules in sh & sz stock markets. In Proceedings of the 2015 4th International Conference on Computer Science and Network Technology (ICCSNT), Harbin, China, 19–20 December 2015; Volume 1, pp. 320–323. [Google Scholar]
Kang, B.K. Optimal and Non-Optimal MACD Parameter Values and Their Ranges for Stock-Index Futures: A Comparative Study of Nikkei, Dow Jones, and Nasdaq. J. Risk Financ. Manag. 2023, 16, 508. [Google Scholar] [CrossRef]
Steele, R.; Esmahi, L. Technical indicators as predictors of position outcome for technical 665 trading. In Proceedings of the International Conference on e-Learning, e-Business, Enterprise Information Systems, and e-Government (EEE), Las Vegas, NV, USA, 27–30 July 2015; p. 3. [Google Scholar]
Alostad, H.; Davulcu, H. Directional prediction of stock prices using breaking news on Twitter. In Proceedings of the 2015 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), Singapore, 6–9 December 2015; pp. 523–530. [Google Scholar]
Su, G. Analysis of the Bollinger Band Mean Regression Trading Strategy, Proceedings of the 3rd International Conference on Economic Development and Business Culture (ICEDBC 2023), Dali, China, 30 June–2 July 2023; Atlantis Press: Dordrecht, The Netherlands, 2023. [Google Scholar]
Panapongpakorn, T.; Banjerdpongchai, D. Short-term load forecast for energy management systems using time series analysis and neural network method with average true range. In Proceedings of the 2019 First International Symposium on Instrumentation, Control, Artificial Intelligence, and Robotics (ICA-SYMP), Bangkok, Thailand, 16–18 January 2019; pp. 86–89. [Google Scholar]
Aronson, D.; Masters, T. Evidence of technical analysis profitability in the foreign exchange market. J. Financ. Data Sci. 2019, 5, 279–297. [Google Scholar]
Breitung, C. Automated stock picking using random forests. J. Empir. Financ. 2023, 72, 532–556. [Google Scholar] [CrossRef]
Natekin, A.; Knoll, A. Gradient boosting machines, a tutorial. Front. Neurorobot. 2013, 7, 21. [Google Scholar] [CrossRef]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; The MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Wang, X.; Hong, L.J.; Jiang, Z.; Shen, H. Gaussian process-based random search for continuous optimization via simulation. Oper. Res. 2025, 73, 385–407. [Google Scholar] [CrossRef]
Sherstinsky, A. Fundamentals of recurrent neural network (rnn) and long short-term memory (lstm) network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef]
García, A. Greedy algorithms: A review and open problems. J. Inequal. Appl. 2025, 2025, 11. [Google Scholar] [CrossRef]
Kasyanov, R.A.; Vladislav, A.; Kachalyan, V.A. Evolution of the Legal Framework for UCITS Funds in the European Union (1985–2023). Kutafin Law Rev. 2024, 11, 325–369. [Google Scholar] [CrossRef]

Figure 1. Train and test ROC curves of the fundamental best model (random forest).

Figure 2. Train and test ROC curves in cross-validation (random forest).

Figure 3. Scheme of the hybrid model.

Figure 4. Train and test ROC curves for the hybrid model.

Figure 5. Top 20 feature importance ranking for the hybrid model. The variable “lstm_prediction” is included at position 15 with an importance of 0.025.

Figure 6. Evolution of the train and test AUC of the hybrid model vs. the LSTM AUC threshold.

Figure 7. Top 10 feature importance ranking for the hybrid model filtered by LSTM AUC higher than 0.6. The variable “lstm_prediction” occupies the first position with an importance of 0.066.

Figure 8. Comparison of algorithmic trading system with indices.

Table 1. Training, testing, and validation metrics of the fundamental best model (random forest).

	Train	Test	Validation
AUC	0.690	0.563	0.575
ACC (Accuracy)	0.633	0.543	0.560
Recall (Sensitivity)	0.591	0.620	0.530
Specificity	0.673	0.483	0.588
Precision	0.634	0.482	0.552
F1-Score	0.612	0.543	0.541
Type I Error	0.327	0.517	0.412
Type II Error	0.409	0.380	0.470

Table 2. Average aggregated results for the technical models.

Metric	Av. Ag. Value
Train AUC	0.631
Test AUC	0.527
Train ACC	0.578
Test ACC	0.524
Diff AUC	0.104
Diff ACC	0.053

Table 3. Comparative performance of the fundamental, technical, and hybrid models (average values in the case of technical models), including Test AUC values, 95% confidence intervals, and statistical significance results (Test AUC Sign.).

Model	Train AUC	Test AUC	Test AUC Sign.	95% CI
Fundamental	0.690	0.563	0.563	[0.548, 0.578]
Technical	0.631	0.527	0.493	[−0.296, 1.282]
Hybrid	0.686	0.566	0.566	[0.550, 0.581]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

King, J.C.; Amigó, J.M. Integration of LSTM Networks in Random Forest Algorithms for Stock Market Trading Predictions. Forecasting 2025, 7, 49. https://doi.org/10.3390/forecast7030049

AMA Style

King JC, Amigó JM. Integration of LSTM Networks in Random Forest Algorithms for Stock Market Trading Predictions. Forecasting. 2025; 7(3):49. https://doi.org/10.3390/forecast7030049

Chicago/Turabian Style

King, Juan C., and José M. Amigó. 2025. "Integration of LSTM Networks in Random Forest Algorithms for Stock Market Trading Predictions" Forecasting 7, no. 3: 49. https://doi.org/10.3390/forecast7030049

APA Style

King, J. C., & Amigó, J. M. (2025). Integration of LSTM Networks in Random Forest Algorithms for Stock Market Trading Predictions. Forecasting, 7(3), 49. https://doi.org/10.3390/forecast7030049

Article Menu

Integration of LSTM Networks in Random Forest Algorithms for Stock Market Trading Predictions

Abstract

1. Introduction

1.1. Objectives

1.2. Related Work

1.3. Contents

2. Variables

2.1. Fundamental Variables

2.2. Technical Variables

2.3. Target

3. Methodology

4. Models and Performances

4.1. Data Pre-Processing

4.1.1. Fundamental Variables

4.1.2. Technical Variables

4.2. Fundamental Models

4.3. Technical Models

4.4. A Hybrid Model

4.5. An Improved Hybrid Model

5. Simulation Results

6. Conclusions and Outlook

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI