Exploring Market Efficiency with GRU-D Neural Networks: Evidence from Global Stock Markets

Ben Jbara, Abdelhamid; Rabah Gana, Marjène; Dakhlaoui, Mejda

doi:10.3390/ijfs14020046

Open AccessArticle

Exploring Market Efficiency with GRU-D Neural Networks: Evidence from Global Stock Markets

by

Abdelhamid Ben Jbara

¹,

Marjène Rabah Gana

^2,3

and

Mejda Dakhlaoui

^4,*

¹

Economics and Management Department, Polytechnic School of Tunisia, University of Carthage, P.O. Box 743, La Marsa 2078, Tunisia

²

Department of Decision Sciences, HEC Montréal, Montréal, QC H3T 2A7, Canada

³

Department of General Teaching, École de Technologie Supérieure de Montréal, 1100 R. Notre Dame O, Montréal, QC H3C 1K3, Canada

⁴

Financial Sciences Department, Applied College, Imam Abdulrahman Bin Faisal University, P.O. Box 1982, Dammam 31441, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Int. J. Financial Stud. 2026, 14(2), 46; https://doi.org/10.3390/ijfs14020046

Submission received: 30 December 2025 / Revised: 25 January 2026 / Accepted: 11 February 2026 / Published: 14 February 2026

Download

Browse Figures

Versions Notes

Abstract

This study revisits the Efficient Markets Hypothesis by employing a GRU-D neural network to predict stock return distributions across global equity markets, accounting for missing and irregular data. It examines whether stock returns exhibit statistically significant departures from purely random behavior. By combining price, technical and fundamental inputs, it tests both weak and semi-strong market efficiency. We implement the GRU-D model on a global dataset of stock returns, where daily returns are classified into quartiles. Model performance is assessed using Micro-Average Area Under the Curve (AUC) and Relative Classifier Information (RCI). Robustness checks include sub-sample tests across countries and sectors, an examination of the COVID-19 sub-period, and a price-memory persistence analysis. The results reveal that the GRU-D model achieves a ranking accuracy of approximately 75% when classifying returns, with statistical significance at the 99.99% confidence level, and exhibits modest but robust deviations from strict market efficiency. These deviations persist for up to 200 trading days. Notably, the findings indicate that the GRU-D model is more robust during the COVID-19 period. These findings are consistent with the Adaptive Markets Hypothesis and underscore the relevance of machine-learning frameworks, particularly those designed for imperfect data environments, for identifying time-varying departures from strict market efficiency in global equity markets.

Keywords:

Efficient Market Hypothesis; Returns Prediction; machine learning; GRU-D

1. Introduction

The Efficient Market Hypothesis (EMH) is one of the most influential and widely debated concepts in neoclassical finance. It asserts that asset prices fully and continuously reflect all available information (Fama, 1970). Market efficiency theory examines whether markets can form prices that accurately capture the fundamental value of financial assets. Rational investors react to new information, updating expectations and adjusting asset prices. When observed returns deviate from CAPM predictions, anomalies arise, challenging the EMH.

Empirical studies have shown that CAPM alone cannot explain certain asset-pricing anomalies. To address these limitations, multi-factor models were introduced, incorporating additional factors that capture specific risk exposures and market anomalies. The Fama–French three-factor model (Fama & French, 1993) adds size and value factors to market risk, while the five-factor model (Fama & French, 2015) further includes a firm’s profitability and momentum.

Behavioral finance complements this perspective by questioning investor rationality and emphasizing the influence of cognitive biases, such as overconfidence, herding, loss aversion and anchoring. Studies by Daniel et al. (1998), Barberis et al. (2001) and Baker and Wurgler (2006) integrate behavioral proxies into traditional asset pricing models, demonstrating that investors’ behavior and psychology significantly shape asset prices. Despite these advances, no single comprehensive model exists due to the complex interaction of behavioral and market factors. Recent studies, such as Z. Li and Wu (2023), have extended the three-factor model by incorporating investor sentiment from online stock bar text data, highlighting the growing role of behavioral indicators in asset pricing models.

Traditional econometric methods, such as linear regression and time-series models, remain valuable for their interpretability and analytical simplicity. However, their reliance on linearity and low-dimensional data limits their effectiveness when facing the complexity, nonlinearity, and scale of modern economic and financial systems. As data becomes larger and more intricate, researchers increasingly turn to hybrid approaches that integrate traditional models with machine learning techniques to enhance predictive accuracy, uncover hidden patterns, and address complex problems (Sonkavde et al., 2023; Doe, 2024).

In this context, recurrent neural network architectures, and specifically the Gated Recurrent Unit with Decay (GRU-D) model, offer significant methodological advantages. The GRU model offers advantages such as computational efficiency, faster training, and the ability to capture long-term dependencies in sequential data, making it ideal for tasks like time series forecasting and stock price prediction. More precisely, the GRU-D model is designed to handle missing or irregularly spaced data by incorporating a decay mechanism that adjusts the influence of past observations based on the time elapsed, improving prediction accuracy in complex time series scenarios (Che et al., 2018; Sonkavde et al., 2023).

The objective of this study is to employ the GRU-D model to test the weak and semi-strong forms of market efficiency in financial time series, where irregular and incomplete observations are common. In line with prior studies on market efficiency that distinguish between information sets (Fama, 1970), the model is specified to incorporate variables associated with weak-form efficiency (past prices and technical indicators) and semi-strong-form efficiency (financial ratios and fundamental indicators). The strong form is excluded, as it requires access to insider information that is unavailable. The study examines whether stock returns exhibit a predictive structure that significantly exceeds the random baseline. Demonstrating that the GRU-D consistently delivers statistically significant improvements over chance constitutes direct evidence against strict market efficiency (Pagliaro, 2025). To ensure external validity, predictive performance is further evaluated across countries, sectors and time periods, thereby capturing heterogeneity in economic conditions and institutional settings.

This study has several contributions. From a methodological perspective, and to the best of our knowledge, it is the first to apply the GRU-D framework to test market efficiency. Its capacity to handle missing data and capture complex temporal patterns enables a more nuanced understanding of market behavior, overcoming the limitations of traditional tests that rely on fully specified asset pricing models (Fama, 1991). For practitioners, this study presents a framework that detects persistent and significant inefficiencies with minimal preprocessing.

Furthermore, by identifying markets, sectors, or periods with greater deviations from strict market efficiency, investors can better calibrate their risk exposure and seek arbitrage opportunities. Our research also serves as a benchmark for future comparisons with both classical models and alternative neural architectures. For regulators, our results offer a diagnostic tool to monitor how efficiently information is reflected in asset prices. Persistent market inefficiencies highlight areas where market transparency and disclosure practices could be strengthened.

This study makes three key contributions that clarify its novelty beyond simply being the first to use GRU-D.

First, on the methodological side, this is to our knowledge the first market-efficiency study to apply GRU-D, and we do so with clear intent. Our framework integrates high-frequency market data with fundamental information that is updated infrequently and unevenly across firms and markets. This results in structural missing data and irregular information arrival. GRU-D is specifically designed for such conditions, as it incorporates missingness and time-gap data through a decay mechanism. This allows for a principled way to distinguish between stale and newly released public information—an aspect that is central to interpreting market efficiency but not addressed by standard GRU or LSTM models without relying on ad-hoc preprocessing.

Second, the study offers an empirical and testing contribution by evaluating return predictability in a truly global context. We use a consistent framework that brings together price-based signals with publicly available fundamentals. Instead of focusing on single-point predictions, we classify future returns into quartiles and evaluate performance using Micro-AUC and Relative Classifier Information. These metrics show whether the model meaningfully reduces uncertainty beyond what is implied by class frequencies, which offers a natural alignment with how the Efficient Market Hypothesis can be tested, as discussed in Fama’s (1991) work.

Third, from an interpretive and robustness perspective, we analyze whether the detected predictability is short-lived or persistent through a price-memory analysis. We also assess the stability of results across different countries, sectors, and during the COVID period. In practical terms, the framework highlights where and when deviations from strict market efficiency are more likely, which supports efforts in risk monitoring and portfolio calibration. From a regulatory standpoint, evidence of persistent inefficiency can offer a valuable signal about how effectively public information is reflected in asset prices and where transparency standards might need further attention.

The remainder of this paper is structured as follows. Section 2 reviews relevant literature on market efficiency and machine learning applications. Section 3 describes the research design, Section 4 presents and discusses the empirical results and Section 5 concludes.

2. Literature Review: Market Efficiency and Machine Learning Models

Today, most market transactions are executed by algorithms, fundamentally altering market microstructure. This shift has accelerated the adoption of machine learning (ML) in finance, where systems learn patterns from historical data and optimize decisions without manual programming. ML is a subset of artificial intelligence (AI) that encompasses techniques enabling computers to emulate human reasoning and solve complex tasks with minimal human input (Russell & Norvig, 2010). While early AI focused on explicit rules and formal logic, ML allows systems to learn from data, overcoming limitations of traditional approaches.

The rapid expansion of financial data has outpaced human analytical capacity, prompting increased reliance on AI and deep learning methods. These approaches can process diverse unstructured data, including text, images and videos, allowing for the identification of novel determinants of asset prices. Recent evidence suggests that such models often surpass traditional econometric approaches in predictive accuracy (Sirignano & Cont, 2019).

Clustering techniques, such as k-means and hierarchical clustering, have been used to group assets based on historical price patterns, revealing insights beyond conventional market classifications. For instance, Q. Li et al. (2023) combined k-means clustering with Long Short-Term Memory (LSTM) models to group U.S. stocks, achieving strong forecasting performance.

A systematic review by Kumbure et al. (2022) of 138 stock market prediction studies (2000–2019) found that Artificial Neural Networks (ANNs), Support Vector Machines (SVMs) and fuzzy logic were most applied.

Several studies illustrate the practical applications of ML in financial forecasting. Dimitriadou et al. (2018) predicted WTI oil prices using SVM with a nonlinear Radial Basis Function kernel. Diamond and Perkins (2022) challenged the semi-strong EMH across asset classes using ML techniques, including Logistic Regression, Random Forest, Gradient Boosting, SVM and ANN, demonstrating that intermarket information improves predictive accuracy. Machine learning has also proven effective in credit risk assessment by exploiting alternative data to reduce information asymmetry and improve prediction accuracy (Mhlanga, 2021).

In recent years, LSTM and GRU architectures have gained prominence for their robustness and superior predictive performance in financial time-series. Patel et al. (2020) combined LSTM and GRU to forecast cryptocurrency prices, while Lee and Yoo (2020) compared RNN, LSTM and GRU models for predicting S&P 500 stock returns, concluding that LSTM outperformed the others. Similar findings were reported by Yao and Yan (2024) and Dželihodžić et al. (2024), who highlighted LSTM and GRU’s superiority over CNN and conventional RNN models. This conclusion is supported by a recent systematic review, which shows that GRU and LSTM models excel in financial time-series prediction. GRU is faster and less prone to overfitting, while LSTM captures long-term dependencies (Sonkavde et al., 2023).

Region-specific applications include X. Li et al. (2020) for Hong Kong, Budiharto (2021) for Indonesia, Yadav et al. (2020) for India, Samarawickrama and Fernando (2017) for Sri Lanka and Nti et al. (2020) for Ghana. Hossain and Kaur (2024) demonstrated the complementary strengths of XGBoost and LSTM for U.S. ETFs. Kadam et al. (2024) further explored the nuances of LSTMs and GRUs, guiding the choice of financial forecasting models. Finally, Shahi et al. (2020) showed that incorporating financial news sentiment into LSTM and GRU models significantly improves stock price prediction. The LSTM-News and GRU-News models achieved superior accuracy compared to models relying solely on market data.

Recent research on return predictability increasingly evaluates models in market- or index-level settings, such as broad-market returns—an approach that is especially relevant for cross-country comparisons. Within this literature, a small number of benchmark model families are commonly used to provide context and interpretability. Traditional time-series models remain widely used, including autoregressive models like AR and ARIMA, multivariate extensions such as VAR, and GARCH-type models that account for time-varying volatility. These approaches continue to serve as standard baselines in market and index forecasting.

At the same time, more recent studies have introduced comparisons against baseline neural network models, including basic RNNs, LSTMs, and GRUs, which are designed to capture non-linear patterns and temporal dependencies. This benchmarking framework helps establish a clear evaluation context for our approach and supports the use of a recurrent model architecture. It highlights the value of using models like GRU-D in settings where the predictor set combines continuously updated market data with fundamentals that arrive less frequently and unevenly.

Recent forecasting research also points to adjacent advances that complement time-series models in global equity settings. Some studies explicitly model dynamic inter-asset relationships—often using dynamic or multi-relational graph architectures—to capture evolving cross-sectional dependencies beyond individual asset histories (Qian et al., 2024). Others leverage advanced text representations from financial news and corporate disclosures, using finance-specific transformer language models to extract signals from unstructured public information (Wu et al., 2023). A further line of work investigates zero-shot or generalization approaches, where learned structures are transferred across assets or targets with minimal additional supervision—a particularly relevant direction for cross-market applications (Noh & Kim, 2025). These developments provide valuable context and are complementary to our focus on modeling predictability under missing and irregular information with mixed-frequency data.

Over the past one to two years, research has continued to deepen the link between machine learning and quantitative trading, with growing interest in developing end-to-end trading systems that can remain effective in noisy, non-stationary market environments. Notably, new hybrid trading frameworks have emerged that combine ensemble or fusion learning with transfer learning to enhance signal robustness and adaptability across different market regimes (Yan et al., 2026). In parallel, recent systematic reviews have highlighted how deep learning models are becoming more integrated into algorithmic trading pipelines, while also drawing attention to real-world challenges like overfitting, non-stationarity, and the need for evaluation under practical constraints (Bhuiyan et al., 2025). This body of recent work adds valuable context to the current landscape of ML-driven trading and complements our focus on testing market efficiency using mixed-frequency financial data.

3. Research Design

3.1. Data and Sample

Given the unique characteristics of each asset type and the availability of data, we have chosen to focus on the stock market. One of the most reputable financial sources is Yahoo!Finance. We used the yfinance library, which enables downloading and manipulating data using Python 3.9. To exploit the application programming interface, we have prepared a script that allows us to browse and download a sample from the data available in a list of 106,328 stock tickers provided by “investexcel”. This script browses the list of these tickers to download the data available from yfinance, which provides coverage for 9686 tickers.

Our collected data sample includes 83 predictor variables. The data excluded from our selection includes unstructured data (primarily textual) and data for which we have no publication date. The categories of variables collected are summarized below:

-: Asset prices time series data OHLC-(Open, Low, High, Close and Volume) with daily frequency.
-: Financial data with a total of 50 data points1 as well as their publication dates, which generally fall in the third month of each year (March). These data comprise 28 balance sheet variables and 22 income statement variables.
-: Fundamental data with a total of 27 data points (such as PER, ROE, ROA), including their publication date.

The final database includes 9686 assets distributed across 32 countries and 47 economic sectors2, spanning 10 years and 40 days from 27 July 2011, to 6 September 2021. Each asset has an average observation window of 5800 days. The United States tops the list, accounting for approximately 50% of the database.

3.2. Data Preprocessing

To predict future asset returns, a data preprocessing phase is required. This phase involves transforming independent variables, extracting predictor features and defining the dependent variable. Variable transformation includes computing logarithmic returns and calculating financial ratios from accounting data. New predictor variables are generated from seasonal patterns and chartist indicators. The dependent variable is defined by discretizing returns into categorical classes.

We align firm fundamentals—such as financial statement variables and ratios—based on each firm’s actual public release date, rather than applying a uniform fiscal-year calendar. Once released, a fundamental value is treated as stepwise information and carried forward on a daily basis until the next disclosure. This approach accounts for differences in fiscal-year timing without introducing look-ahead bias. If a fundamental variable is missing at the time of disclosure, it remains missing and is handled by GRU-D using its missingness indicators and time-gap/decay mechanisms, rather than through manual or ad-hoc imputation.

3.3. Variables Construction

3.3.1. Dependent Variables

We calculated the daily logarithmic returns of the adjusted closing price and then classified them into discrete return quartiles (Z. Hu et al., 2021). The use of four categories reflects a trade-off between granularity and simplicity of implementation in the predictive framework.

Quartiles are used because they offer a practical and economically interpretable way to discretize returns for efficiency testing at scale. They produce balanced class sizes through data-driven thresholds, enabling stable training and evaluation across heterogeneous firms and markets. Quartiles also maintain financial relevance by distinguishing downside risk and upside opportunity from the middle of the return distribution. This structure supports our aim of assessing whether the information set improves discrimination across return regions beyond unconditional class frequencies, as measured by Micro-AUC and RCI. This return-states framing aligns with prior index-level work that emphasizes informative parts of the return distribution using multinomial classification (Nevasalmi, 2020). We considered finer discretization (e.g., deciles), but found it reduced both interpretability and stability. Continuous regression and binary extreme-move classification were also explored, but these either shift focus toward point prediction or compress the distributional structure our framework is designed to test.

The four classes are defined as follows:

Class 1: Includes returns falling below the first quartile. These represent the lowest returns in our dataset, indicating a decrease in asset prices.

Class 2: Includes returns that are equal to or greater than the first quartile but less than the median of the logarithmic returns. These returns are in the lower-middle range.

Class 3: Includes returns that are above the median but less than the third quartile, placing them in the upper-middle range of returns.

Class 4: Includes returns that are equal to or exceed the third quartile, denoting the highest returns.

Since the return categories are defined using quartiles, the four classes are roughly balanced by design. Minor deviations may occur due to ties or gaps in data, but substantial class imbalance is not expected to influence the results. Moreover, our evaluation focuses on Micro-AUC and especially RCI, which is measured relative to class-frequency priors. This approach helps ensure that observed performance gains reflect meaningful predictive information rather than being driven by differences in class proportions.

3.3.2. Independent Variables

The following variables proxy for weak-form market efficiency3:

-: Raw data on asset prices and logarithmic returns: The raw data includes asset prices (open, close, high, low and trading volume) and adjusted closing prices. These prices are transformed into logarithmic returns.
-: Seasonal data: Two variables are included in this category. The first variable is the day of the week, which takes values 1–5 (Monday–Friday). The second variable denotes the calendar week of the year, taking values from 1 to 52. These variables try to capture well-documented market anomalies such as the weekend effect (Penman, 1987) and the January effect (Ariel, 1987).
-: Chartist indicators: They were selected using the approaches proposed by Prachyachuwong and Vateekul (2021) and Širůček and Šíma (2016). The set of indicators includes the Relative Strength Index, Moving Average Convergence Divergence and Chande Momentum Oscillator, among others4.

Regarding the semi-strong form of efficiency, the selected variables related to fundamental data encompass financial ratios, financial statement information (Alexakis et al., 2010) and governance or competition-related measures (Bodie et al., 2009) (see note 3 above). The variables are organized into three categories and scaled to allow for comparability across firms: Balance sheet items are scaled by total assets and income statement items are normalized by total revenue. Additional fundamental data included sector classification, governance indicators and measures of market competition.

3.4. Experimental Design

This section outlines the empirical design underlying the framework used to test market efficiency. It first details the construction of the database and the generation of training and test samples within a rolling-window framework that preserves temporal ordering and closely mimics real-time forecasting conditions. Training samples are used to estimate model parameters by minimizing a cross-entropy loss function. The loss function measures the discrepancy between the model’s predictions and the actual target values. The testing sample provides out-of-sample predictions. The learning phase further refines this optimization process by adjusting model weights to minimize the loss function. The GRU-D model is implemented in PyTorch5 1.10, with parameter calibration, model estimation and predictive validation.

3.4.1. Training and Testing Design

A large dataset enables the model to detect reliable relationships among variables and support robust estimation. The training set consists of 6000 observations, representing 12% of the dataset. This subset was constructed to reflect the heterogeneity of the sample, which spans multiple sectors and countries. We confirmed that the 6000-observation training subset provides broad coverage across the 32 countries and 47 sectors in the dataset, without being materially concentrated in a small number of groups.

The geographical distribution is led by U.S. firms, with significant representation from Germany, France and Canada.

The relatively small training set reflects hardware constraints, particularly memory limitations during model estimation.

After selecting assets with sufficient data coverage, we constructed the model inputs using a rolling-window approach. This procedure, commonly adopted in financial forecasting, trains the model on fixed-length sequences of past observations to predict the subsequent value. The window then advances by one period, generating a new input–output pair. Repeating this process yields a large set of training instances, allowing the model to learn temporal patterns while closely mimicking real-time forecasting conditions. To account for varying degrees of temporal dependence, we considered alternative window lengths, as described in the learning phase below.

To make sure our empirical setup reflects a realistic test of market efficiency—and stays completely free of information leakage—we frame the predictability task as a classification problem over return distributions rather than as a point forecast. Returns are grouped into quartile classes using thresholds based on the data itself, which helps maintain balanced class sizes and supports meaningful interpretation in terms of downside risk versus upside potential. If markets are strictly efficient, then conditioning on the available information shouldn’t give the model a real advantage beyond what’s already implied by the base frequencies of these classes. That’s why we use Micro-AUC for general performance and emphasize RCI, which directly measures how much additional information the model provides.

To avoid any look-ahead bias, we evaluate all models using a strictly chronological rolling-window approach, with no data shuffling. At each point in time, the model only uses information that would have been available at that moment, and predictions are made for the next period. We apply the same calendar cutoffs across markets to define training, validation, and test splits. All preprocessing steps—such as scaling, computing class thresholds, or estimating model-specific statistics—are done using only the training data and then carried over unchanged to the validation and test sets. Importantly, fundamental variables are included based on their actual release dates, so they enter the model only after becoming publicly available, eliminating any risk of forward-looking bias.

3.4.2. The GRU-D Model Implementation

The GRU-D model was implemented using Han-JD’s6 open-source Python code, based on the PyTorch library, which provides a flexible environment for sequential data modeling. In its original binary setting, the output layer consisted of a single unit, trained with a binary cross-entropy loss that compared the predicted probability to the actual outcome. In this study, the output layer- that is, the final step of the model where predictions are generated- was redefined with four units corresponding to our four classes of return. The multi-class cross-entropy loss function evaluates how well the predicted probability distribution matches the true class. The Softmax function converts raw model returns into probabilities, ensuring that all class probabilities sum to one (Razali et al., 2025; Terven et al., 2025).

3.4.3. Learning Phase

A distinctive feature of GRU-D is its ability to incorporate missing or irregular observations directly into the learning process. To achieve this, the model uses vectors indicating which values are missing and how much time has elapsed since the last observation. A global mean vector, computed from the average of each variable in the training data, is used to impute missing values. Inputs are thus represented as observed values, missingness indicators and time gaps, enabling the GRU-D to learn from both financial variables and the structure of missing data via temporal decay7.

Building on the dataset construction and rolling-window design described previously, the GRU-D model’s learning phase was implemented in two stages. In the first stage, we conducted a hyperparameter search to identify the most suitable specification for our prediction task. The configurations varied along several key dimensions: the observation window length, the network complexity (number of hidden layers), the learning rate (speed of parameter updates) and the training duration (number of epochs). This trade-off between horizon and complexity is consistent with prior applications of recurrent architectures in financial forecasting (Fischer & Krauss, 2018; Y. Hu et al., 2021). This exploration phase will allow us to evaluate alternative setups and determine the configuration that yields the most stable and efficient learning outcomes, as measured by the loss function. In the second stage, the training process was extended using the optimal hyperparameters to further enhance the model’s predictive performance. Five model specifications were evaluated, combining alternative observation window lengths (200, 400 and 600 days) with hidden-layer sizes of either 150 or 300 units.

To manage computational cost while ensuring stability, training was organized in two phases. The first phase involved 20 epochs for model selection, while the second extended the final configuration to 40 epochs. Optimization was performed using the Adam algorithm8 with an initial learning rate of 0.01, halved every 10 epochs to ensure convergence. Each configuration was trained five times to account for initialization randomness, with an average training time of 4 hours and a maximum of 8 hours for the most complex models. This regimen ensured robust, reproducible performance across runs, in line with recommendations from prior studies (Sirignano & Cont, 2019).

Figure 1 illustrates this two-stage training process. Among the tested configurations, the green curve corresponds to the specification with a 400-day observation window and 300 hidden layers. This configuration achieved the lowest training loss, indicating it minimized the gap between predicted and actual values during training.

To mitigate concerns about overfitting, we validated the model on a substantially larger, more diverse testing set comprising 45,894 observations (88.4% of the dataset). This design allows the model’s generalization ability to be rigorously evaluated across different sectors and countries. As shown in Figure 2, training loss (red curve) declined steadily from 1.22 to 1.15, while validation (green) and testing losses (blue) stabilized around 1.18 and 1.17 after epoch 10. The close alignment of these loss curves indicates that the model learned real patterns that also worked on unseen data.

4. Empirical Results and Discussion

4.1. Predictive Power of the GRU-D Model

The AUC represents the area under the ROC (Receiver Operating Characteristic) curve. The ROC curve is a graphical representation of a classifier’s diagnostic performance, plotting the true positive rate (the proportion of correctly identified positive cases) against the false positive rate (the proportion of negative cases incorrectly classified as positive) across varying classification thresholds. By summarizing the ROC curve into a single value, the AUC captures the model’s overall discriminatory power across all possible thresholds. AUC values range from 0 to 1, where a value greater than 0.5 corresponds to a discriminative ability superior to mere chance (Che et al., 2018).

We computed the Micro-Average AUC, which aggregates performance across all observations and thus provides a more representative measure of the model’s overall discriminative ability under class imbalance (Fawcett, 2006; Saito & Rehmsmeier, 2015). AUC values for each of the four return categories illustrate the model’s ability to accurately classify different levels of returns. The results are reported in Table 1. The Micro-Average AUC is 0.75, with a tight 99.99% confidence interval of [0.7454, 0.7471], indicating that the model correctly ranks positive versus negative outcomes 75% of the time. This value is considered acceptable (Luo et al., 2025), suggesting that the GRU-D captures meaningful financial signals.

The predictive performance is not evenly distributed across classes. It is concentrated on the extremes of the return distribution, with the model more effective at identifying the lowest returns (Class 1, AUC = 0.684). This asymmetry suggests that violations of EMH signals are most pronounced in the left tail of the return distribution. This is consistent with empirical evidence from Choi (2021), who documents that market efficiency drops during episodes of extreme events. They also align with the Efficient Tail Hypothesis (ETH) proposed by Jiang et al. (2025), which posits that inefficiency is particularly pronounced during extreme market events, with empirical evidence indicating that negative extremes exert greater predictive power than positive ones. Accordingly, the GRU-D model demonstrates a more substantial capacity to anticipate severe losses. From a risk management perspective, predictability in extreme downside events is of considerable value, as it enables investors to design more effective hedging strategies and adjust portfolio allocations to mitigate large drawdowns. Our results are particularly relevant for portfolio managers seeking to reduce the impact of extreme losses. By identifying predictive signals in severe downside risk, they can implement dynamic tail-risk protection strategies that outperform traditional diversification or option-based hedging approaches (Spilak & Härdle, 2022).

The four return classes naturally correspond to different financial conditions. The lowest quartile captures downside or tail-risk scenarios, while the highest quartile reflects strong upside potential. So, when the model assigns a higher probability to the lowest quartile, it signals greater downside risk ahead. On the other hand, a shift toward the highest quartile suggests more favorable market conditions.

This setup helps connect our Micro-AUC and RCI results to the core idea of market efficiency. In an efficient market, knowing the available information shouldn’t significantly improve the model’s ability to tell these return states apart beyond what would be expected by chance. Looking at performance across each class gives us a more intuitive, economically meaningful picture of where the model is finding predictability. This perspective also motivates the robustness and time-variation analyses that follow, helping us understand when and where these patterns hold up.

To further assess the effectiveness of our GRU-D model, we conducted a benchmarking exercise against other deep learning architectures commonly used in financial forecasting. For comparability, our four-class dependent variable was re-coded into a binary format (0 = negative returns; 1 = positive returns), following approaches such as those of Z. Hu et al. (2021). Although the training datasets differ, the benchmarking exercise remains valuable. Our GRU-D model achieves an accuracy of 68.48%, which is competitive with other deep learning architectures. Its performance is marginally below that of the multi-task RNN with Markov Random Fields proposed by C. Li et al. (2019), which reports an accuracy of 68.95%. Yet, it surpasses the deep learning models developed by Ding et al. (2015) and Dos Santos Pinheiro and Dras (2017), achieving 65.08% and 63.34% accuracy, respectively. It is worth recalling that, beyond this, the GRU-D architecture addresses a key limitation of these models by effectively managing missing and irregular financial data.

It’s important to note that this benchmark is presented for context and does not, on its own, claim methodological novelty. Our choice to use GRU-D is driven by the nature of the data: the predictor set combines continuously tracked market variables with fundamental data that is updated infrequently and inconsistently across firms and markets. This leads to structural missing data and irregular observation timing. GRU-D is designed to handle exactly this type of challenge, as it directly incorporates information about missingness and time gaps through its masking and decay mechanisms. This offers a principled alternative to the ad-hoc imputation methods typically required by standard GRU or LSTM models.

4.2. Implications for Weak and Semi-Strong Market Efficiency

To evaluate the effectiveness of our GRU-D model under the weak and semi-strong forms of market efficiency, we complemented the AUC metric with the Relative Classifier Index (RCI). While AUC indicates that the model has learned a regular, repeatable pattern in market behavior, RCI captures the reduction in uncertainty relative to a naïve classifier based on class frequencies. In the context of the EMH, an efficient market would not allow a model to reduce uncertainty significantly. Therefore, combining AUC and RCI provides a fuller picture of both predictive accuracy and informational value.

As shown in Figure 3, both the RCI and the AUC attain their highest values when the model integrates predictors associated with both the weak and semi-strong forms of efficiency. The RCI result The AUC is 60% and the RCI is 0.04, both statistically significant at the 99.99% confidence level. indicates that the model captures roughly 4% more information than random classification. In financial markets, such small but systematic improvements in predictive accuracy are often regarded as economically significant when consistently exploited (Xu, 2004).

The performance decreases slightly when only weak-form predictors are used, yet remains significant, contradicting the weak-form efficiency hypothesis. The most pronounced drop occurs under the semi-strong form, where both AUC and RCI values are lowest. This gap suggests that technical signals may be more immediately exploitable than fundamental data, possibly because weak-form information is more exhaustive and frequently updated, enabling the model to anticipate near-term returns better.

Together, these insights support the view that markets exhibit inefficiencies that can be captured and exploited using a GRU-D model. Its predictive structure arises from complementary informational channels when price-based and public information are considered simultaneously.

4.3. Additional Analysis

To assess the robustness of the results, we conducted several sensitivity analyses. First, weak and semi-strong form efficiency are examined separately at the country and sector levels to assess the stability of the results across the sub-samples and to verify that the main findings are not attributable to a small subset of countries or sectors. Second, the analysis is replicated over the COVID-19 period to assess the model’s performance under adverse market conditions. Finally, we examined the model’s sensitivity to the length of the historical observation window, which effectively tests the influence of price memory on the predictive accuracy9.

4.3.1. Market Efficiency Across Countries and Sectors

The results, summarized in Table 2, indicate that under the weak form of efficiency, mean RCIs are statistically significant at 99.99% confidence level, yet remain economically small and stable at 1.9–2.6% across sub-samples. The semi-strong form shows mean values of 1.3–2.3%. The systematic decrease in RCIs as we move from weak to semi-strong tests is consistent with reduced predictability as broader public information is incorporated.

The results also show that the interquartile distribution of RCI values is strictly positive across all sub-samples. Under both weak and semi-strong forms of efficiency, the first quartile, median, and third quartile increase gradually across countries and sectors, indicating that no single extreme observation drives predictive content. Overall, the results suggest that predictive content arises from dispersed and heterogeneous deviations from efficiency rather than being concentrated within a particular sub-sample.

4.3.2. Market Efficiency Under COVID-19

Our study explores the dynamic nature of market efficiency during the first nine months of 2021. The period chosen is significant due to the availability of consistent data. It is also a period marked by the global impact of the COVID-19 pandemic, presenting an opportunity to examine how markets adapt to extreme circumstances.

Figure 4 reveals a consistent pattern of market behavior through the RCI for both weak and semi-strong forms of market efficiency. The early months of 2021 show significant fluctuations. This initial volatility can be attributed to heightened uncertainty stemming from the pandemic, including investor reactions to changing economic indicators, government interventions and evolving public health information. The decrease in the RCI around April highlights how the market integrated new information and adapted to changing conditions.

As we moved post-April 2021, a trend towards stabilization emerged. This shift aligns with Lo’s (2004) concept of adaptive markets, which posits that financial markets are not static. Rather, they dynamically adjust in response to new information. In the context of the COVID-19 pandemic, such adjustments could reflect shifts in investor behavior as they get used to the ongoing volatility, as well as adaptations to changing economic conditions and regulatory measures aimed at restoring confidence and mitigating uncertainty.

Crisis periods often involve sudden regime shifts and heightened volatility, which can cause previously informative signals to lose relevance more quickly and increase the importance of handling irregularly updated inputs. GRU-D is well suited to these conditions because it explicitly accounts for time gaps and missingness, applying temporal decay to reduce reliance on stale observations as the time since their last update increases. Its gating mechanism further enables non-linear adaptation by selectively updating the hidden state, allowing the model to filter out short-lived shocks while preserving persistent signals. These architectural features are core to GRU-D’s design for handling missing and irregular time-series data and offer a model-based explanation for the meaningful information captured by RCI during the COVID-19 stress period (Che et al., 2018).

The higher average RCI values associated with the weak-form efficiency relative to the semi-strong form indicate that predictability is stronger when based on historical information. The semi-strong form shows greater instability, reflecting the challenges markets face in assimilating new public information, especially in periods of crisis when behavioral biases affecting investors intensify.

Overall, our findings indicate that the GRU-D model is more robust under extreme market conditions.

4.3.3. Price Memory Limit Dynamics

To further test the robustness of our model and the persistence of market inefficiencies, we examined the dynamics of price memory, which refers to how far back in time historical prices contribute to predictive accuracy.

Figure 5 illustrates the relationship between different observation windows (200, 400, 600 days) and the two key performance metrics: RCI and AUC.

As noted by Sirignano and Cont (2019), past prices can influence future prices, suggesting a path-dependent pricing process in which longer historical windows may improve predictions. However, the results in Figure 5 do not support this for windows beyond 200 days. Both AUC and RCI metrics attain their maximum at the 200-day window, suggesting that this is the optimal observation period for capturing market inefficiencies. Extending the window to 400 or 600 days does not enhance performance and, in the case of AUC, even reduces it. These findings indicate that, while price memory exists, its relevant horizon is limited, and including older data may introduce more noise than signal, thereby reducing predictive accuracy (Sirignano & Cont, 2019; Lo, 2004; Neely et al., 2014).

5. Conclusions

This study applied the GRU-D model to financial asset return prediction across multiple dimensions (countries, sectors and time horizons), addressing the challenges of missing and irregular observations in multivariate financial time series. Using data on 9686 listed assets from 32 countries and 47 economic sectors over more than a decade, we provided one of the first large-scale examinations of market efficiency using a deep learning architecture. The analysis is an initial step toward integrating GRU-D models into the empirical finance literature to evaluate weak- and semi-strong-form efficiency across heterogeneous contexts.

The predictive performance of the GRU-D measured by Micro-AUC is about 75% and is statistically significant at the 99.99% level. This confirms that the model can discriminate return classes beyond chance. Notably, its strongest class-level performance occurs in the lowest-return quartile, with an AUC of 68.4%, indicating that violations from strict market efficiency signals are most pronounced during periods of severe downside risk. The GRU-D can be leveraged to design dynamic tail-risk protection strategies, enabling investors to anticipate better and mitigate extreme losses. It is worth noting that our GRU-D model’s accuracy of 68.48% positions it competitively within the range of existing financial forecasting models, closely following some multi-task RNN architectures (C. Li et al., 2019) and surpassing others that incorporate news or sentiment analysis (Ding et al., 2015; Dos Santos Pinheiro & Dras, 2017). This confirms its predictive ability in classification terms, while the low RCI values indicate that the associated informational gains remain modest. Price memory analysis further shows that a 200-day observation window is sufficient for capturing return predictability and extending this period to 400 or 600 days does not significantly enhance predictive performance and may even reduce it.

Overall, this study underscored the value of applying GRU-D models in finance: while they reliably classify future returns into discrete states, the associated informational gains remain limited, potentially due to the following limitations. First, the quartile-based return classification may oversimplify the distribution of returns and obscure finer variations in predictability, potentially leading to an incomplete understanding of market dynamics. Second, computational constraints limited the adequate sample size, potentially affecting the model’s learning and generalization capacity. Furthermore, the absence of external data, such as market sentiments, macroeconomic indicators, or geopolitical events, may have restricted the model’s ability to capture complex interactions and external factors influencing market behavior. Integrating such data could provide a more comprehensive framework for enhancing predictive accuracy and addressing the inherent limitations of the current approach.

Future research could enhance predictive accuracy and practical applicability in real-world scenarios by employing more granular return categories, leveraging high-frequency data, or benchmarking GRU-D against both traditional econometric approaches (e.g., ARIMA, SVM) and newer hybrid architectures (e.g., GRU-D combined with transformers). Investigating the dynamics of inefficiency across different volatility regimes or in response to other external shocks is also a promising avenue for further exploration. Finally, subsequent studies might translate the GRU-D’s probabilistic forecasts into explicit buy and sell signals, thereby enabling the construction of trading strategies whose profitability and risk-adjusted performance can be analyzed.

Author Contributions

Conceptualization, A.B.J. and M.R.G.; methodology, A.B.J. and M.R.G.; software, A.B.J.; validation, M.R.G. and M.D.; formal analysis, A.B.J., M.R.G. and M.D.; investigation, M.R.G. and M.D.; resources, A.B.J. and M.R.G.; data curation, A.B.J.; writing—original draft preparation, M.R.G. and M.D.; writing—review and editing, M.R.G. and M.D.; visualization, M.R.G. and M.D.; supervision, M.R.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data used in this study is unavailable due to privacy restrictions.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. List of Countries and Sectors

N°.	Country/Region	Sectors
1	France	Credit Services
2	Spain	Semiconductor Equipment & Materials
3	Italy	Application Software
4	Denmark	Medical Appliances & Equipment
5	Brazil	Business Software & Services
6	United Kingdom	Independent Oil & Gas
7	Sweden	Money Center Banks
8	Ireland	Biotechnology
9	Taiwan	Wireless Communications
10	Turkey	Entertainment-Diversified
11	Russia	Internet Information Providers
12	Germany	Auto Parts
13	India	Oil & Gas Pipelines
14	Norway	Textile-Apparel Footwear & Accessories
15	Austria	Information Technology Services
16	Singapore	Gold
17	Belgium	Steel & Iron
18	Canada	Restaurants
19	New Zealand	Specialty Chemicals
20	Hong Kong	Resorts & Casinos
21	Argentina	Real Estate Development
22	Indonesia	Diversified Machinery
23	Thailand	Food-Major Diversified
24	Australia	Aerospace/Defense-Major Diversified
25	USA	Asset Management
26	Greece	Auto Manufacturers-Major
27	Finland	Property & Casualty Insurance
28	Switzerland	Diversified Electronics
29	Netherlands	Personal Products
30	Mexico	Packaging & Containers
31	Portugal	General Contractors
32		Electric Utilities
33		Diversified Utilities
34		Communication Equipment
35		Technical & System Software
36		Drug Manufacturers-Major
37		Industrial Metals & Minerals
38		Major Integrated Oil & Gas
39		Chemicals-Major Diversified
40		Business Services
41		Property Management
42		Oil & Gas Equipment & Services
43		Specialty Retail, Other
44		Farm Products
45		Conglomerates
46		General Building Materials
47		Life Insurance

Notes

1	Unit of time-series data, whether collected directly or derived from financial statements, OHLC data, or any other source of financial information. Each data point represents a specific financial metric—such as total revenue, net profit, assets, liabilities, or stock price movements—tracked over a series of time intervals.
2	Full details on countries and sectors are provided in Appendix A.
3	The detailed list of these variables is available upon request.
4	The detailed list is available upon request.
5	PyTorch is a widely used open-source deep learning framework initially developed by Meta.
6	https://github.com/Han-JD/GRU-D (accessed on 1 January 2020).
7	This means that when a financial variable is missing for several periods, its last known value is gradually downweighted based on the time elapsed since its last observation. Rather than manually fixing the decay rates, the model learns them directly from the data, allowing it to adapt to the specific patterns and timing of missing data (Che et al., 2018).
8	A commonly used optimization algorithm in machine learning that adjusts model weights based on gradients, improving training speed and accuracy.
9	Price memory refers to the tendency for past prices to influence current or future prices, often due to the way information is processed and retained by market participants or systems (Chow et al., 1995).

References

Alexakis, C., Patra, T., & Poshakwale, S. (2010). Predictability of stock returns using financial statement information: Evidence on semi-strong efficiency of emerging Greek stock market. Applied Financial Economics, 20(16), 1321–1326. [Google Scholar] [CrossRef][Green Version]
Ariel, R. A. (1987). A monthly effect in stock returns. Journal of Financial Economics, 18(1), 161–174. [Google Scholar] [CrossRef]
Baker, M., & Wurgler, J. (2006). Investor sentiment and the cross-section of stock returns. The Journal of Finance, 61(4), 1645–1680. [Google Scholar] [CrossRef]
Barberis, N., Huang, M., & Santos, T. (2001). Prospect theory and asset prices. The Quarterly Journal of Economics, 116(1), 1–53. Available online: https://www.jstor.org/stable/2696442 (accessed on 15 March 2025). [CrossRef]
Bhuiyan, M. D. S. M., Rafi, M. D. A. L., Rodrigues, G. N., Mir, M. N. H., Ishraq, A., Mridha, M. F., & Shin, J. (2025). Deep learning for algorithmic trading: A systematic review of predictive models and optimization strategies. Array, 26, 100390. [Google Scholar] [CrossRef]
Bodie, Z., Kane, A., & Marcus, A. J. (2009). Investments. McGraw-Hill/Irwin. [Google Scholar]
Budiharto, W. (2021). Data science approach to stock prices forecasting in Indonesia during COVID-19 using long short-term memory (LSTM). Journal of Big Data, 8(1), 47. [Google Scholar] [CrossRef] [PubMed]
Che, Z., Purushotham, S., Cho, K., Sontag, D., & Liu, Y. (2018). Recurrent neural networks for multivariate time series with missing values. Scientific Reports, 8, 6085. [Google Scholar] [CrossRef] [PubMed]
Choi, S.-Y. (2021). Analysis of stock market efficiency during crisis periods in the US stock market: Differences between the global financial crisis and COVID-19 pandemic. Physica A: Statistical Mechanics and Its Applications, 574, 125988. [Google Scholar] [CrossRef]
Chow, K. V., Denning, K. C., Ferris, S. P., & Noronha, G. (1995). Long-term and short-term price memory in the stock market. Economics Letters, 49(3), 287–293. [Google Scholar] [CrossRef]
Daniel, K., Hirshleifer, D., & Subrahmanyam, A. (1998). Investor psychology and security market under- and overreactions. The Journal of Finance, 53(6), 1839–1885. [Google Scholar] [CrossRef]
Diamond, N., & Perkins, G. (2022). Using intermarket data to evaluate the efficient market hypotheses with machine learning. arXiv. [Google Scholar] [CrossRef]
Dimitriadou, A., Gogas, P., Papadimitriou, T., & Plakandaras, V. (2018). Oil market efficiency under a machine learning perspective. Forecasting, 1(1), 157–168. [Google Scholar] [CrossRef]
Ding, X., Zhang, Y., Liu, T., & Duan, J. (2015, July 25–31). Deep learning for event-driven stock prediction [Conference paper]. Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) (pp. 2327–2333), Buenos Aires, Argentina. Available online: https://www.ijcai.org/Proceedings/15/Papers/329.pdf (accessed on 15 March 2025).
Doe, J. (2024). Comparative analysis of machine learning and traditional models in economic forecasting. Journal of Computer Technology and Software, 3(3), 1–6. [Google Scholar] [CrossRef]
Dos Santos Pinheiro, L., & Dras, M. (2017, December 6–8). Stock market prediction with deep learning: A character-based neural language model for event-based trading. Australasian Language Technology Association Workshop 2017 (pp. 6–15), Brisbane, Australia. [Google Scholar]
Dželihodžić, A., Žunić, A., & Žunić Dželihodžić, E. (2024). Predictive modeling of stock prices using machine learning: A comparative analysis of LSTM, GRU, CNN, and RNN models. In N. Ademović, Z. Akšamija, & A. Karabegović (Eds.), Advanced technologies, systems, and applications IX (IAT 2024) (Vol. 1143, pp. 447–467). Lecture Notes in Networks and Systems. Springer. [Google Scholar] [CrossRef]
Fama, E. F. (1970). Efficient capital markets: A review of theory and empirical work. The Journal of Finance, 25(2), 383–417. [Google Scholar] [CrossRef]
Fama, E. F. (1991). Efficient capital markets II. The Journal of Finance, 46(5), 1575–1617. [Google Scholar] [CrossRef]
Fama, E. F., & French, K. R. (1993). Common risk factors in the returns on stocks and bonds. Journal of Financial Economics, 33(1), 3–56. [Google Scholar] [CrossRef]
Fama, E. F., & French, K. R. (2015). A five-factor asset pricing model. Journal of Financial Economics, 116(1), 1–22. [Google Scholar] [CrossRef]
Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recognition Letters, 27(8), 861–874. [Google Scholar] [CrossRef]
Fischer, T., & Krauss, C. (2018). Deep learning with long short-term memory networks for financial market predictions. European Journal of Operational Research, 270(2), 654–669. [Google Scholar] [CrossRef]
Hossain, S., & Kaur, G. (2024, May 3–4). Stock market prediction: XGBoost and LSTM comparative analysis [Conference paper]. 2024 3rd International Conference on Artificial Intelligence for Internet of Things (AIIoT) (pp. 1–6), Vellore, India. [Google Scholar] [CrossRef]
Hu, Y., Liu, Y., & Pan, J. (2021). A survey on machine learning in stock price prediction. Finance Research Letters, 38, 101476. [Google Scholar] [CrossRef]
Hu, Z., Zhao, Y., & Khushi, M. (2021). A survey of forex and stock price prediction using deep learning. Applied System Innovation, 4(1), 9. [Google Scholar] [CrossRef]
Jiang, J., Richards, J., Huser, R., & Bolin, D. (2025). The efficient tail hypothesis: An extreme value perspective on market efficiency. Journal of Business & Economic Statistics, 1–14. [Google Scholar] [CrossRef]
Kadam, J., Kasbe, J., Nalawade, N., Readdy, A., & Sonkusare, T. (2024). Stock market prediction using machine learning. International Journal of Advanced Research in Computer and Communication Engineering, 13(4), 349–352. [Google Scholar] [CrossRef]
Kumbure, M. M., Lohrmann, C., Luukka, P., & Porras, J. (2022). Machine learning techniques and data for stock market forecasting: A literature review. Expert Systems with Applications, 197, 116659. [Google Scholar] [CrossRef]
Lee, S. I., & Yoo, S. J. (2020). Threshold-based portfolio: The role of the threshold and its applications. The Journal of Supercomputing, 76, 8040–8057. [Google Scholar] [CrossRef]
Li, C., Song, D., & Tao, D. (2019). Multi-task recurrent neural networks and higher-order Markov random fields for stock price movement prediction. In Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining (KDD ‘19), Anchorage, AK, USA, 4–8 August (pp. 1141–1151). Association for Computing Machinery. [Google Scholar] [CrossRef]
Li, Q., Wen, Z., Wu, Z., Hu, S., Wang, N., Li, Y., Liu, X., & He, B. (2023). A survey on federated learning systems: Vision, hype and reality for data privacy and protection. IEEE Transactions on Knowledge and Data Engineering, 35(4), 3347–3366. [Google Scholar] [CrossRef]
Li, X., Wu, P., & Wang, W. (2020). Incorporating stock prices and news sentiments for stock market prediction: A case of Hong Kong. Information Processing & Management, 57(5), 102212. [Google Scholar] [CrossRef]
Li, Z., & Wu, Y. (2023). Stock pricing with textual investor sentiment: Evidence from Chinese stock markets. Review of Economics and Finance, 21, 1801–1815. [Google Scholar] [CrossRef]
Lo, A. W. (2004). The adaptive markets hypothesis. The Journal of Portfolio Management, 30(5), 15–29. [Google Scholar] [CrossRef]
Luo, Z., Wang, S. P., Ho, E. H., Yao, L., & Gershon, R. C. (2025). Predicting and evaluating cognitive status in aging populations using decision tree models. American Journal of Alzheimer’s Disease & Other Dementias, 40, 15333175251339730. [Google Scholar] [CrossRef]
Mhlanga, D. (2021). Financial inclusion in emerging economies: The application of machine learning and artificial intelligence in credit risk assessment. International Journal of Financial Studies, 9(3), 39. [Google Scholar] [CrossRef]
Neely, C. J., Rapach, D. E., Tu, J., & Zhou, G. (2014). Forecasting the equity risk premium: The role of technical indicators. Management Science, 60(7), 1772–1791. [Google Scholar] [CrossRef]
Nevasalmi, L. (2020). Forecasting multinomial stock returns using machine learning methods. Journal of Financial Data Science, 6, 86–106. [Google Scholar] [CrossRef]
Noh, Y., & Kim, S. (2025). Zero-shot learning for S&P 500 forecasting via constituent-level dynamics: Latent structure modeling without index supervision. Mathematics, 13, 2762. [Google Scholar] [CrossRef]
Nti, I. K., Adekoya, A. F., & Weyori, B. A. (2020). Predicting stock market price movement using sentiment analysis: Evidence from Ghana. Applied Computer Systems, 25(1), 33–42. [Google Scholar] [CrossRef]
Pagliaro, A. (2025). Artificial intelligence vs. efficient markets: A critical reassessment of predictive models in the big data era. Electronics, 14(9), 1721. [Google Scholar] [CrossRef]
Patel, M. M., Tanwar, S., Gupta, R., & Kumar, N. (2020). A deep learning-based cryptocurrency price prediction scheme for financial institutions. Journal of Information Security and Applications, 55, 102583. [Google Scholar] [CrossRef]
Penman, S. H. (1987). The distribution of earnings news over time and seasonalities in aggregate stock returns. Journal of Financial Economics, 18(2), 199–228. [Google Scholar] [CrossRef]
Prachyachuwong, K., & Vateekul, P. (2021). Stock trend prediction using deep learning approach on technical indicator and industrial specific information. Information, 12(6), 250. [Google Scholar] [CrossRef]
Qian, H., Zhou, H., Zhao, Q., Chen, H., Yao, H., Wang, J., Liu, Z., Yu, F., Zhang, Z., & Zhou, J. (2024). MDGNN: Multi-relational dynamic graph neural network for comprehensive and dynamic stock investment prediction. Proceedings of the AAAI Conference on Artificial Intelligence, 38, 14642–14650. [Google Scholar] [CrossRef]
Razali, M. N., Arbaiy, N., Lin, P.-C., & Ismail, S. (2025). Optimizing multiclass classification using convolutional neural networks with class weights and early stopping for imbalanced datasets. Electronics, 14(4), 705. [Google Scholar] [CrossRef]
Russell, S., & Norvig, P. (2010). Artificial intelligence: A modern approach (3rd ed.). Pearson. [Google Scholar]
Saito, T., & Rehmsmeier, M. (2015). The precision–Recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE, 10(3), e0118432. [Google Scholar] [CrossRef] [PubMed]
Samarawickrama, A. J. P., & Fernando, T. G. I. (2017, December 15–16). A recurrent neural network approach in predicting daily stock prices: An application to the Sri Lankan stock market [Conference paper]. 2017 IEEE International Conference on Industrial and Information Systems (ICIIS) (pp. 1–6), Peradeniya, Sri Lanka. [Google Scholar] [CrossRef]
Shahi, T. B., Shrestha, A., Neupane, A., & Guo, W. (2020). Stock price forecasting with deep learning: A comparative study. Mathematics, 8(9), 1441. [Google Scholar] [CrossRef]
Sirignano, J., & Cont, R. (2019). Universal features of price formation in financial markets: Perspectives from deep learning. Quantitative Finance, 19(9), 1449–1459. [Google Scholar] [CrossRef]
Sonkavde, G., Dharrao, D. S., Bongale, A. M., Deokate, S. T., Doreswamy, D., & Bhat, S. K. (2023). Forecasting stock market prices using machine learning and deep learning models: A systematic review, performance analysis and discussion of implications. International Journal of Financial Studies, 11(3), 94. [Google Scholar] [CrossRef]
Spilak, B., & Härdle, W. K. (2022). Tail-risk protection: Machine learning meets modern econometrics. In C. F. Lee, & A. C. Lee (Eds.), Encyclopedia of Finance. Springer. [Google Scholar] [CrossRef]
Širůček, M., & Šíma, K. (2016). Optimized indicators of technical analysis on the New York Stock Exchange. Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis, 64(6), 2123–2131. [Google Scholar] [CrossRef]
Terven, J., Cordova-Esparza, D. M., Romero-González, J. A., Ramírez-Pedraza, A., & Chávez-Urbiola, E. A. (2025). A comprehensive survey of loss functions and metrics in deep learning. Artificial Intelligence Review, 58, 195. [Google Scholar] [CrossRef]
Wu, S., Irsoy, O., Lu, S., Dabravolski, V., Dredze, M., Gehrmann, S., Kambadur, P., Rosenberg, D., & Mann, G. (2023). BloombergGPT: A large language model for finance. arXiv, arXiv:2303.17564. [Google Scholar] [CrossRef]
Xu, Y. (2004). Small levels of predictability and large economic gains. Journal of Empirical Finance, 11(2), 247–275. [Google Scholar] [CrossRef]
Yadav, A., Jha, C. K., & Sharan, A. (2020). Optimizing LSTM for time series prediction in Indian stock market. Procedia Computer Science, 167, 2091–2100. [Google Scholar] [CrossRef]
Yan, K., Yue, Z., Wu, C. C., He, Q., Zhou, J., Hao, Z., & Li, Y. (2026). Flexible target prediction for quantitative trading in the American stock market: A hybrid framework integrating ensemble models, fusion models and transfer learning. Entropy, 28(1), 84. [Google Scholar] [CrossRef] [PubMed]
Yao, D., & Yan, K. (2024). Time series forecasting of stock market indices based on DLWR-LSTM model. Finance Research Letters, 68, 105821. [Google Scholar] [CrossRef]

Figure 1. Optimal model hyperparameters.

Figure 2. Model performance.

Figure 3. Results for weak and semi-strong market efficiency.

Figure 4. Time-varying market efficiency.

Figure 5. Price memory limit over time.

Table 1. GRU-D Model performance.

Upper Limit at 99.99%	Lower Limit at 99.99%	Average	Metric
74.71%	74.54%	74.62%	Micro-Average AUC
68.70%	68.15%	68.43%	Class 1-AUC
54.98%	54.62%	54.80%	Class 2-AUC
58.83%	58.04%	58.43%	Class 3-AUC
55.19%	54.82%	55.01%	Class 4-AUC

Table 2. RCI across sub-samples.

	Sub-Samples	Mean	Standard Deviation	Min	25%	50%	75%	Max
Weak form of efficiency	Countries sub-samples	2.2%	1.9%	0.01%	0.08%	0.26%	0.26%	0.86%
Weak form of efficiency	Sectors sub-samples	2.6%	0.26%	0.01%	0.09%	0.17%	0.34%	1.39%
Semi-strong form of efficiency	Countries sub-samples	1.6%	0.15%	0.01%	0.09%	0.13%	0.18%	0.69%
Semi-strong form of efficiency	Sectors sub-samples	2.3%	0.23%	0.01%	0.07%	0.16%	0.28%	1.09%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ben Jbara, A.; Rabah Gana, M.; Dakhlaoui, M. Exploring Market Efficiency with GRU-D Neural Networks: Evidence from Global Stock Markets. Int. J. Financial Stud. 2026, 14, 46. https://doi.org/10.3390/ijfs14020046

AMA Style

Ben Jbara A, Rabah Gana M, Dakhlaoui M. Exploring Market Efficiency with GRU-D Neural Networks: Evidence from Global Stock Markets. International Journal of Financial Studies. 2026; 14(2):46. https://doi.org/10.3390/ijfs14020046

Chicago/Turabian Style

Ben Jbara, Abdelhamid, Marjène Rabah Gana, and Mejda Dakhlaoui. 2026. "Exploring Market Efficiency with GRU-D Neural Networks: Evidence from Global Stock Markets" International Journal of Financial Studies 14, no. 2: 46. https://doi.org/10.3390/ijfs14020046

APA Style

Ben Jbara, A., Rabah Gana, M., & Dakhlaoui, M. (2026). Exploring Market Efficiency with GRU-D Neural Networks: Evidence from Global Stock Markets. International Journal of Financial Studies, 14(2), 46. https://doi.org/10.3390/ijfs14020046

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Exploring Market Efficiency with GRU-D Neural Networks: Evidence from Global Stock Markets

Abstract

1. Introduction

2. Literature Review: Market Efficiency and Machine Learning Models

3. Research Design

3.1. Data and Sample

3.2. Data Preprocessing

3.3. Variables Construction

3.3.1. Dependent Variables

3.3.2. Independent Variables

3.4. Experimental Design

3.4.1. Training and Testing Design

3.4.2. The GRU-D Model Implementation

3.4.3. Learning Phase

4. Empirical Results and Discussion

4.1. Predictive Power of the GRU-D Model

4.2. Implications for Weak and Semi-Strong Market Efficiency

4.3. Additional Analysis

4.3.1. Market Efficiency Across Countries and Sectors

4.3.2. Market Efficiency Under COVID-19

4.3.3. Price Memory Limit Dynamics

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. List of Countries and Sectors

Notes

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI