This paper explores the benefits of developing fractional-order extensions of standard optimization techniques to more effectively capture long-term dependencies during training. The proposed fractional optimizers are integrated into LSTM networks and evaluated using stock market datasets within the context of financial time-series forecasting. To assess the performance of conventional versus fractional optimizers, we examine traditional SGD-based algorithms alongside their fractional counterparts. Through a series of experiments, we highlight the primary advantages of fractional optimization in enhancing the efficiency and stability of time-series predictions. This benchmarking study focuses in particular on stock price forecasting—where long-term dependencies play a critical role—and investigates the extent to which fractional-order LSTM models outperform their classical equivalents.
The implementation was performed using TensorFlow (via Keras) in Google Colab, which provides access to Tesla T4 GPUs. The key configuration details are as follows:
The environment also included custom implementations of fractional optimizers (e.g., Frac-Adagrad) based on fractional derivative approximations integrated into the optimizer update rules.
4.2. Algorithms
The fundamental model is a conventional LSTM neural network, designed to handle time-dependent sequential data. The architecture includes a 50-unit LSTM layer followed by a dense output layer for stock price forecasting. The model uses a 60-day sequence of closing prices to predict the next day’s closing price [
44].
The fractional LSTM model extends the conventional LSTM by incorporating fractional memory effects. In this model, the LSTM output is combined with the weighted mean of the preceding input sequence, influenced by a fractional parameter
. This modification aims to more accurately capture long-term dependencies, potentially improving forecasting performance. The value of
varies from 0.1 to 0.9 to assess the effects of different memory levels on model accuracy. Fractional LSTMs have also been explored in other time series prediction domains [
45].
Table 2 lists the key hyperparameters used by the optimizers in this experiment. These hyperparameters are critical for model convergence and performance. The learning rate is the most important parameter, while the various optimizers include additional tuning factors such as momentum or decay.
Adam is an adaptive optimizer that combines the advantages of two extensions of gradient descent, AdaGrad and RMSProp, by utilizing moment estimates to adjust learning rates for individual parameters. Adam is commonly used for time series tasks because it is both efficient and robust [
27].
RMSprop divides the learning rate by a moving average of the square root of recent gradients. This technique is particularly useful for non-stationary problems like stock price forecasting, where the data distribution may change over time [
46].
SGD (Stochastic Gradient Descent) is a relatively simple optimizer that updates parameters based on a single mini-batch. Although SGD tends to be less effective for complex tasks such as time series prediction, it remains popular due to its simplicity and effectiveness when properly tuned.
Adagrad adapts the learning rate according to the frequency of parameter updates. It is well suited for sparse data scenarios where some features require more frequent updates than others.
Adadelta improves on Adagrad by mitigating its rapid learning rate decay using a running average of squared gradients to dynamically adjust the learning rate.
Nadam combines Adam with Nesterov’s accelerated gradient (NAG), providing the benefits of momentum alongside adaptive learning rates. This combination often results in faster convergence in practical applications.
FTRL (Follow-The-Regularized-Leader) is an optimizer designed for large-scale machine learning, commonly applied to sparse datasets such as those encountered in recommender systems or logistic regression tasks.
Adamax is a variant of Adam that uses the infinity norm, offering greater robustness in certain situations.
In addition, we have conducted a multi-dimensional sensitivity analysis varying key LSTM architectural parameters—hidden units ([20, 50, 100]) and dropout rates ([0.0, 0.2, 0.5])—across different memory window sizes ([30, 90]). The number of layers was fixed to 2 to limit complexity, but model performance was systematically evaluated under all other design variations; see
Table 3. The results demonstrate that architectural choices significantly affect the Sharpe ratio and directional accuracy, confirming that both model design and optimizer behavior influence predictive performance.
The experimental results highlight that the best configuration in terms of Sharpe ratio is obtained with , a window size of 90, 20 hidden units, and a dropout rate of , yielding a Sharpe ratio of . However, the highest cumulative return of is achieved with the configuration , window size 90, 50 hidden units, and dropout rate of , along with a solid Sharpe ratio of and accuracy of . Although the maximum accuracy observed is for , window size 90, 20 hidden units, and dropout , this setting yields poor financial performance, with a cumulative return of only and a modest Sharpe ratio of .
Overall, the results suggest that a window size of 90 consistently leads to better performance across all metrics compared to 30. Additionally, a higher value of () outperforms in most top-performing configurations. Dropout values of or appear to enhance generalization and improve the Sharpe ratio and cumulative returns, particularly when combined with a larger window size. Lastly, the number of hidden units influences performance, with 50 or 100 units generally providing stronger returns, although the best Sharpe ratio is observed with only 20 units. These observations indicate that the configuration of , window size 90, hidden units 50, and dropout represents the most balanced setup for both profitability and risk-adjusted return.
This experimental framework provides a systematic approach to evaluating LSTM and fractional LSTM models for stock price forecasting. By exploring different optimizers and hyperparameter configurations, we aim to identify the optimal setup for accurate and robust predictions. The experimental results are expected to deepen our understanding of the effects of fractional memory and optimizer choices on forecasting accuracy.
4.3. Results and Analysis
This section offers a comprehensive analysis of the results of the standard optimization algorithms—SGD, Adam, RMSprop and Adagrad—and their fractional extensions, namely Frac-Adam, Frac-RMSprop and Frac-Adagrad, emphasizing their contrasting convergence behavior.
4.3.1. Analysis of Standard vs. Fractional Adam Optimization Results
Figure 4 shows the forecasts provided by the classic and fractional Adam methods.
The comparisons of conventional Adam and Fractional Adam (Frac-Adam) optimization at various fractional orders for forecasting stock prices are presented and yield interesting insights regarding the effectiveness of fractional derivatives in the optimization procedure. A comprehensive discussion of the positive and negative aspects of the results is presented in this section.
The overall tendency noted in the results indicates that the behavior of the fractional Adam optimizer differs remarkably according to the choice of
and stock market ticker, highlighting that fractional derivatives used may not be generally suitable for all stock price forecasting tasks. Across the analyzed values, the standard Adam optimizer typically outperformed or closely matched the fractional Adam at smaller values of
, with some variation between values. For
GOOGL (Google)—see
Figure 5—the conventional Adam optimizer scored a reasonable MSE of 62.64, and the fractional Adam with
improved this figure to 32.40, whereas
yielded a marginal enhancement with an MSE of 52.93. However, as
increases (e.g., to
), the MSE deteriorates, although it improves marginally for
.
For
AMZN (Amazon)—see
Figure 6—classical Adam reached an MSE of 74.65, but fractional Adam with
upped the MSE to 93.66, and additional increments in
resulted in significantly greater MSE values, peaking at 349.39 for
.
META—see
Figure 7—demonstrated considerable performance deterioration with fractional Adam, where the conventional Adam optimizer yielded an MSE of 148.65, while fractional optimization generated significant MSE increases, especially for larger values of
(e.g.,
, MSE = 11,691.78).
Finally, for
NVDA (NVIDIA)—see
Figure 8—the traditional Adam optimizer achieved the lowest MSE of 3.15, denoting excellent performance; however, the fractional Adam with smaller values of
led to only marginal increments in MSE, and larger values of
disrupted the optimization process, leading to a higher increment in MSE (e.g.,
, MSE = 55.91).
Smaller values of (0.1 and 0.3) typically result in minor enhancements or slight deteriorations, while higher values more frequently worsen performance, notably for stocks like META, AMZN, and NVDA, suggesting that fractional derivatives can disrupt the optimization workflow. The efficiency of fractional optimization differs depending on the stock: it performs remarkably well for AAPL and GOOGL at specific values but significantly deteriorates performance for certain other stocks, such as META and UNH. In most cases, conventional Adam performs better than or similarly to fractional Adam, most notably for stocks like META, NVDA, and JPM, where larger values of result in noticeable increases in MSE.
The results in
Table 4 demonstrate the significant impact of the fractional order
on the forecasting and trading performance of the Frac-Adam-LSTM model for AAPL stock. Compared to the classical Adam-LSTM (
), which achieves a Sharpe ratio of 0.78 and a cumulative return of 89.04, fractional models with intermediate
values generally yield enhanced risk-adjusted returns and directional accuracy. In particular,
results in the highest Sharpe ratio (1.305), the greatest cumulative return (148.66), and the best directional accuracy (51.73%), indicating improved profitability and predictive performance. However, very high fractional orders, such as
, lead to a sharp decline in performance, underscoring the sensitivity of the model to excessive long-memory effects in the volatile AAPL market. Overall, the inclusion of fractional derivatives provides a clear advantage, suggesting that well-calibrated fractional orders can more effectively capture complex market dynamics and support more informed trading decisions.
Figure 9 presents the sensitivity analysis of the Frac-Adam-LSTM model on AAPL. The results show that the model’s performance is affected by both the learning rate (
) and the fractional order (
). For
, the Sharpe ratio peaks at
, accompanied by a relatively high cumulative return of 66.59, while accuracy remains stable. In contrast, for
and
, cumulative returns are more consistent across higher learning rates (
and
), with
achieving the highest cumulative return of 71.12 at
. Notably,
yields robust cumulative returns across all learning rates, indicating the fractional optimizer’s resilience; however, its peak Sharpe ratio of 0.2137 at
remains slightly lower than those at mid-range
values. Interestingly, lower fractional orders such as
display instability at small learning rates, evidenced by negative Sharpe ratios and returns. This suggests that combinations of low
and low
may lead to underperformance. Overall, higher fractional orders demonstrate greater tolerance and stability across a wider range of learning rates, reinforcing the idea that the fractional component serves as an implicit regularizer and enhances optimizer robustness.
Figure 10 gives the sensitivity analysis of Frac-Adam-LSTM on GOOGL memory window size. The sensitivity analysis shows that the best Sharpe ratio and cumulative return were achieved with
and a memory window of 60 days, yielding a Sharpe ratio of 0.1134 and a cumulative return of 32.29. Overall, models with a memory window of 60 consistently outperform others across all alpha values, indicating that a medium-term memory window provides a better trade-off between risk and return. Higher values of
(0.7 and 0.9) slightly improve accuracy, with the highest directional accuracy of 0.5103 observed for
. This suggests that both the fractional learning rate and memory window size significantly influence the performance of the Frac-ADAM-LSTM model.
4.3.2. Analysis of Standard vs. Fractional RMSprop Optimization Results
Figure 11 gives the MSE associated with RMSprop and fracRMSprop used to learn LSTM on AAPL, MSFT, GOOGL, AMZN, META, NVDA, JPM, V, and UNH stocks.
Benchmarking the standard RMSprop optimizer and its fractional order variants on various stocks (AAPL, MSFT, GOOGL, AMZN, META, NVDA, JPM, V, and UNH) reveals considerable differences in performance depending on the chosen fractional order
. In numerous cases, particularly for AAPL, GOOGL, JPM (see
Figure 12), and UNH, fractional RMSprop with smaller
values (e.g.,
or
) outperforms the standard optimizer in terms of loss minimization. For example, AAPL’s loss decreases from 52.54 (standard) to 44.48 when using
, while GOOGL shows a similar trend with a decline from 62.64 to 32.40. These improvements suggest that lower fractional orders may provide beneficial memory effects and smoother convergence on specific datasets, particularly those with low volatility or complex data patterns.
Nonetheless, when exceeds 0.5, the loss margin tends to worsen significantly for many stocks, reflecting either volatile or divergent formation dynamics. This trend is particularly evident for META and UNH, where losses reach and , respectively, at , compared to their standard values of and . Such behavior suggests that high-order memory effects may excessively weight past gradients, causing delayed responses to new data or even leading to optimization divergence. Stocks with more complex or volatile time series, like META, seem especially vulnerable to this phenomenon, underscoring the importance of carefully tuning to the characteristics of the data.
Conversely, for stocks such as MSFT and AMZN (see
Figure 13), the classical RMSprop often outperforms the fractional versions at lower
, with the best results occurring at intermediate values like 0.3 or 0.5.
Table 5 presents a performance comparison between the standard LSTM and FracLSTM models for AMZN stock price forecasting, highlighting the nuanced benefits of fractional memory integration. While the standard LSTM yields an RMSE of 7.8237, the FracLSTM with
(corresponding to classical LSTM behavior) achieves the lowest RMSE of 5.4988, indicating enhanced short-term prediction accuracy. As the fractional order
increases, RMSE generally deteriorates, peaking at
, which may suggest diminishing returns of long-memory effects in the highly stochastic context of AMZN price movements. Despite fluctuations in RMSE, the Sharpe ratio remains constant at 0.1949 across all models, indicating comparable risk-adjusted performance. Notably, the highest directional accuracy (51.30%) is observed at
, suggesting that a modest degree of fractional memory can improve the model’s ability to capture directional trends—an asset in trading strategies. These findings support the notion that careful tuning of the fractional order in FracLSTM can lead to improved predictive performance, especially for volatile assets such as AMZN.
Figure 13 gives the sensitivity analysis of Frac-Adagrad-LSTM on AAPL considering the learning rate. The sensitivity analysis of Frac-RMSprop-LSTM on AAPL reveals that learning rates around
combined with higher alpha values (
or
) tend to yield the best trade-offs between Sharpe ratio and cumulative return. Notably, the configuration
achieves a high Sharpe ratio of 0.206 and the highest cumulative return of 73.19. In contrast, extremely low or high learning rates result in suboptimal performance regardless of alpha, suggesting a sweet spot in learning dynamics governed by fractional control. Accuracy remains relatively stable across configurations, centered around 51%, indicating modest directional predictability.
Figure 14 gives the sensitivity analysis of Frac-RMSprop-LSTM on GOOGL memory window size. The sensitivity analysis of the Frac-RMSprop-LSTM model applied to GOOGL stock reveals that the performance varies with the memory window size and the fractional scaling parameter
. The highest Sharpe ratio of 0.1308 is achieved at
and a memory window of 90, while the best cumulative return of 31.43 corresponds to
and window size 60. In general, intermediate memory windows (60 or 90) yield better Sharpe ratios and returns across all
values. Higher
values (e.g., 0.9) slightly improve accuracy but do not consistently enhance financial performance. This indicates a trade-off between memory depth and learning rate scaling, with moderate values offering the best predictive and financial outcomes.
4.3.3. Analysis of Standard vs. Fractional Adagrad Optimization Results
Figure 15 illustrates the mean squared error (MSE) associated with the standard Adagrad and Frac-Adagrad optimizers applied to LSTM models for forecasting stock prices of AAPL, MSFT, GOOGL, AMZN, META, NVDA, JPM, V, and UNH. Overall, the Frac-Adagrad-LSTM with a fractional order
outperforms the standard Adagrad-LSTM in terms of prediction accuracy, exhibiting notably lower errors for stocks such as AAPL, GOOGL, JPM, and UNH. However, as the fractional order
increases, performance deteriorates markedly. In particular, high values of
(e.g.,
and
) result in significantly elevated MSEs for several stocks, including META and UNH, indicating instability and poor generalization. These results highlight the data-dependent nature of fractional-order optimization: while a small fractional component can enhance learning by introducing mild memory effects, larger values may hinder convergence and negatively impact predictive accuracy.
The benchmarking investigation involving the traditional Adagrad optimizer and its fractional counterpart (Frac-Adagrad) at various values of the fractional order
demonstrates a complex, asset-dependent response in terms of loss performance. In particular, for certain stocks such as
AAPL (see
Figure 16),
GOOGL (see
Figure 17),
JPM, and
UNH (see
Figure 18), the fractional versions with smaller
values (e.g.,
) consistently outperform conventional Adagrad, indicating that the memory-retaining features of Frac-Adagrad are advantageous in such cases—possibly due to smoother loss surfaces or slower dynamic behaviors.
In contrast, intermediate values such as or tend to yield substantially higher loss values, suggesting unstable or even divergent optimization behavior in many cases—especially for UNH and META, where losses spike to abnormally high levels.
In contrast, assets like MSFT, META and AMZN display a reversal of this tendency, with classical Adagrad outperforming or being comparable to Frac-Adagrad for smaller . As grows, the loss values rise faster, especially for META and UNH, indicating a significant deterioration in optimization efficiency. This observation suggests a potential downside of using high memory effects in volatile or highly non-linear loss surfaces, which can magnify gradients and result in overrunning or mediocre convergence. NVDA displays outlier behavior with extremely low losses across all methods, while moderate rises at larger still suggest that fractional dynamics require careful tuning.
In short, Frac-Adagrad provides encouraging enhancements over Adagrad for a number of stocks where fractional order is small, benefiting from its capacity to preserve long-term gradient memory without introducing excessive volatility. Nevertheless, it also induces instability at larger values, particularly for complicated or volatile datasets. This underlines the relevance of the adaptive or data-driven screening of fractional order , which can change according to asset or training phase, to secure convergence and consistent performance.
We benchmarked the performance of the following optimization methods: Frac-SGD, Frac-Adadelta, Frac-Nadam, Frac-FTRL and Frac-Adamax. A comparable in-depth analysis would reveal the identical impact of
selection. However, to avoid crowding the paper with redundant analyses, we only supply the corresponding figures to provide a quick visual idea of these effects (
Table 6).
The results clearly demonstrate the superiority of the Frac-Adagrad-LSTM models over the standard Adagrad-LSTM baseline. Notably, the configuration with achieves the highest Sharpe ratio (0.6286) and final cumulative return (63.65), indicating that this fractional model offers the best risk-adjusted performance and profitability. Meanwhile, the highest directional accuracy (0.5289) is observed at , suggesting enhanced prediction of price movement direction with higher fractional influence. Overall, fractional memory improves the model’s effectiveness across all key financial metrics, making Frac-Adagrad a compelling alternative to traditional optimizers.
Figure 19 gives the Sensitivity analysis of Frac-Adagrad-LSTM on AAPL.
Figure 19 presents the sensitivity analysis of the Frac-Adagrad-LSTM model applied to AAPL stock. The results indicate that higher learning rates (e.g.,
) consistently lead to improved Sharpe ratios and cumulative returns across all values of the fractional order
. The best performance is achieved with
and
, yielding a Sharpe ratio of 0.221 and a cumulative return of approximately 52.49. In contrast, very small learning rates (
) produce poor performance regardless of the chosen
, suggesting insufficient update magnitudes for effective learning. Notably, directional accuracy remains relatively stable across all configurations (approximately 48–52%), indicating that predictive directionality is less sensitive to variations in these hyperparameters compared to risk-adjusted and return-based metrics.
Figure 20 gives the sensitivity analysis of Frac-Adagrad-LSTM on GOOGL memory window size. The sensitivity analysis of the Frac-Adagrad-LSTM model on GOOGL stock data reveals that the optimal memory window size generally lies around 60 or 90 days. For all tested
values, a memory window of 60 consistently delivers strong performance across the Sharpe ratio, directional accuracy, and cumulative return. For instance, with
,
achieves a Sharpe ratio of
and a cumulative return of
, outperforming other memory settings. Performance tends to decrease with either very short (30) or very long (120) memory windows, highlighting the importance of selecting an appropriate temporal context for modeling market dynamics.
4.4. Comparison to the ARIMA and GARCH Models
In this sub-section, we compare the performance of Frac-ADAM, Frac-Adagrad, Frac-RMSprop, ARIMA, and GARCH Models; see
Table 7.
The performance comparison reveals that among all models and fractional orders, the Frac-Adagrad optimizer with achieves the highest Sharpe ratio (0.7404) and cumulative return (74.95), indicating superior risk-adjusted profitability. Frac-Adagrad consistently outperforms both Frac-Adam and Frac-RMSprop across most values of , exhibiting better directional accuracy and more stable returns. In contrast, Frac-RMSprop and Frac-Adam generally yield negative Sharpe ratios and cumulative returns, with only marginal improvements at certain fractional orders, but their overall performance remains inferior. Classical statistical models such as ARIMA and GARCH perform poorly in this context: ARIMA provides low directional accuracy and negligible returns, while GARCH exhibits unrealistic Sharpe ratios and fails to predict directional movements altogether. These findings highlight the effectiveness of fractional adaptive optimizers—particularly Frac-Adagrad—in capturing the complex dynamics of financial time series and enhancing predictive and trading performance.
The RMSE histograms for the Frac-SGD, Frac-Adadelta, Frac-Nadam, Frac-FTRL, and Frac-Adamax methods—evaluated across stock prices of various companies—are presented in
Appendix A.
Figure A1 gives the MSE associated with standard SGD and Frac-SGD used to train the LSTM on AAPL, MSFT, GOOGL, AMZN, META, NVDA, JPM, V, and UNH stocks. Fractional SGD-LSTM with
generally improves prediction accuracy compared to standard SGD-LSTM for some stocks (e.g., GOOGL, JPM, UNH), showing lower error values. However, as
increases beyond 0.3, performance often deteriorates drastically, leading to high prediction errors (e.g., META and UNH at
), indicating instability and poor convergence. Thus, fractional SGD can be beneficial at small
, but higher values are detrimental to model performance.
Figure A2 gives the MSE associated with standard Adadelta and Frac-Adadelta used to train LSTM on AAPL, MSFT, GOOGL, AMZN, META, NVDA, JPM, V, and UNH stocks. The comparison between the standard Adadelta-LSTM and its fractional variants across different stock price predictions reveals a mixed performance. For certain stocks such as AAPL, GOOGL, JPM, and UNH, the fractional Adadelta-LSTM with lower fractional orders (particularly
) significantly outperforms the standard approach in terms of reduced MSE. For instance, the MSE drops from 52.54 to 44.48 for AAPL and from 62.64 to 32.40 for GOOGL. However, higher values of
tend to degrade performance, often substantially, as observed in META and UNH, where MSE reaches over 11,000 and 7800, respectively. These results suggest that fractional optimization with carefully chosen
values can enhance prediction accuracy, but inappropriate settings may lead to instability or overfitting.
Figure A3 gives the MSE associated with standard Nadam and Frac-Nadam used to train the LSTM on AAPL, MSFT, GOOGL, AMZN, META, NVDA, JPM, V, and UNH stocks. The performance comparison between the standard Nadam-LSTM and its fractional counterparts across various stock price prediction tasks demonstrates that the fractional approach can offer improved accuracy, but only under specific fractional orders. For example, with
, the Frac-Nadam-LSTM significantly reduces the MSE for stocks like AAPL (from 52.54 to 44.48), GOOGL (from 62.64 to 32.40), and JPM (from 48.86 to 25.89), indicating enhanced predictive capability. However, as
increases, performance generally degrades, with some extreme values—such as for META and UNH—reaching very high MSEs (e.g., 11,691.78 and 7890.24, respectively at
and beyond). This highlights the sensitivity of fractional Nadam to the choice of
, where smaller values may lead to improved accuracy, while larger ones risk severe instability or overfitting.
Figure A4 gives the MSE associated with standard FTRL and Frac-FTRL used to train the LSTM on AAPL, MSFT, GOOGL, AMZN, META, NVDA, JPM, V, and UNH stocks. The evaluation of standard FTRL-LSTM against its fractional variants for stock price prediction reveals that fractional FTRL can enhance or degrade performance depending on the fractional order
. For instance, with
, improvements are observed in several cases, notably for AAPL, GOOGL, JPM, and UNH, where the mean squared error (MSE) is substantially reduced compared to the standard version. However, as
increases beyond 0.3, the performance tends to deteriorate significantly, with large error spikes, particularly for META and UNH at
, reaching values as high as 11,691.78 and 2798.63, respectively. This suggests that while fractional FTRL introduces additional flexibility, its effectiveness is highly sensitive to the choice of
, where lower values are generally more stable and beneficial.
Figure A5 gives the MSE associated with standard Adamax and Frac-Adamax used to train LSTM on AAPL, MSFT, GOOGL, AMZN, META, NVDA, JPM, V, and UNH stocks. The comparative results between standard Adamax-LSTM and its fractional variant across various stock price prediction tasks indicate that fractional Adamax can improve prediction performance when the fractional order
is carefully selected. For instance, with
, a notable reduction in prediction error is observed for AAPL, GOOGL, JPM, and especially UNH, where the error drops from 1281.37 to 304.45. However, higher values of
generally lead to performance degradation, as evidenced by extreme error increases for META (
: 11,691.78) and UNH (
: 7890.24). These findings suggest that fractional Adamax introduces significant sensitivity to
, where low-order memory effects (small
) can be beneficial, while higher orders may introduce instability and reduced accuracy.
Toward the end, the RMSE analysis of fractional optimizers in LSTM-based stock price prediction reveals that variants with small memory parameters () consistently outperform their standard counterparts. Improvements are notable for stocks like AAPL, GOOGL, JPM, and UNH, where RMSEs drop significantly. However, increasing generally leads to performance degradation, with large errors observed for META and UNH. This trend holds consistently across Frac-SGD, Frac-Adadelta, Frac-Nadam, Frac-FTRL, and Frac-Adamax. The results highlight the potential of fractional optimization when low values are chosen carefully. Nonetheless, inappropriate fractional orders may introduce instability, overfitting, or convergence failures.
4.5. Synthesis of Optimization Analysis Results
The examination of different optimization techniques, namely Adam, RMSprop, and standard Adagrad, compared to their fractional analogues (Frac-Adam, Frac-RMSprop, and Frac-Adagrad), has provided valuable insights into the effect of fractional derivatives on the training of LSTM models for stock price forecasting.
The comparison between standard Adam and fractional Adam (Frac-Adam) yielded mixed results. For certain stocks such as GOOGL, fractional derivatives with low values (such as 0.1) typically resulted in improvements in root-mean-square error (RMSE). Nevertheless, as increases, Frac-Adam’s performance declines, especially for stocks like META and AMZN. In most cases, the classic Adam optimizer outperformed or performed similarly to Frac-Adam, particularly at higher values and when the optimization process was unstable.
A comparison of RMSprop with Frac-RMSprop also revealed performance disparities across stocks. Fractional RMSprop underperformed the conventional variant in certain stocks (such as AAPL and GOOGL) at lower values, indicating that fractional order may promote smoother convergence and enhanced memory effects. However, as rises above 0.5, the performance significantly deteriorates for stocks like META and UNH, with notably larger MSE values. This highlights the risk of overemphasizing historical gradients in volatile stocks, leading to instability.
For Adagrad and Frac-Adagrad, smaller fractional orders () had a positive effect for stocks like AAPL and GOOGL, where a smoother gradient descent benefited the optimization process. However, for stocks like MSFT, META, and AMZN, larger values resulted in worsened performance, suggesting the likelihood of excessive memory effects interfering with convergence. This indicates that too much memory due to high values can disrupt convergence. Therefore, fractional Adagrad should be used carefully and adapted according to the characteristics of the asset and dataset.
The choice of optimizers, particularly the adoption of fractional orders in optimizers such as Adam, RMSprop, and Adagrad, can have a considerable influence on the performance of LSTM models for stock price forecasting. While fractional derivatives can offer advantages in certain cases—particularly for stocks whose price movements are smoother or less volatile—excessively high fractional orders tend to destabilize the optimization process. In general, conventional optimizers such as Adam and RMSprop are likely to produce better or comparable results to their fractional counterparts, especially when larger memory effects are present. Therefore, tuning the fractional order is crucial for achieving optimal results, and it is recommended to employ fractional versions cautiously, tailoring the optimization approach to the specific characteristics of the financial asset being modeled.
4.6. Limitations and Deployment Considerations
Deploying fractional-order optimizers in live trading environments requires careful consideration beyond algorithmic design. Real-world deployment introduces additional challenges such as low-latency inference, infrastructure scalability, financial data integrity, and regulatory compliance. However, these constraints are predominantly engineering and system-integration issues rather than fundamental limitations of the fractional optimization paradigm itself. To this end, we analyze and propose practical strategies to mitigate these hurdles in the following section:
Latency: While fractional derivatives typically require historical memory, our implementation adopts a short-memory truncation approach, which approximates the Grünwald–Letnikov operator with a limited window. Empirical profiling indicates that for window sizes , the computational overhead is limited to 5–12% compared to standard optimizers. This is acceptable for medium-frequency strategies with minute-to-hour granularity.
GPU Acceleration: We propose a GPU-based implementation of fractional derivative convolutions via parallel tensor operations. Specifically, the discretized fractional derivative at time
t is computed using
where the weights
are precomputed and the convolution is implemented as a single `torch.nn.functional.conv1d()’ call on GPU. This significantly reduces the inference latency and makes the method compatible with online deployment.
Table 8 reports performance metrics across various stocks under different hyperparameter settings, especially CPU and GPU time; in this context, Speedup is a measure of how much faster one processor is compared to another. In this context, it compares CPU and GPU times and is defined as
Across all tested stocks, CPU execution times are remarkably low, consistently below 0.0011 s. The performance difference between CPU and GPU is minimal, indicating that for lightweight financial computations (e.g., a Sharpe ratio and accuracy over short memory windows), GPU acceleration does not provide significant speedups. In several cases (e.g., rows 2, 10, and 13), the CPU is actually faster than the GPU, suggesting that the overhead associated with GPU data transfer may outweigh any benefits from parallelization. Notably, speedups rarely exceed 1.1×, with the highest GPU speedup observed for META (alpha 0.5, memory window 90) at 1.36×, which still reflects only a modest advantage. In conclusion, while GPU acceleration remains valuable for large-scale or deep-learning-based models, for fast statistical metrics like those presented here, CPU execution is highly efficient and often preferable.
Hybrid Optimizer Pipelines: For ultra high-frequency trading (UHFT), we suggest a hybrid optimizer framework where fractional optimizers are used during slower retraining phases, while faster, conventional optimizers (e.g., RMSprop) are deployed during live signal execution. This ensures stability and long-term learning benefits of memory-aware updates without sacrificing real-time responsiveness.
Data Integrity and Compliance: Fractional optimizers are orthogonal to the issues of data authenticity and regulation. Nonetheless, our pipeline is fully compatible with institutional-grade backtesting engines and real-time monitoring dashboards. Adopting industry-standard logging and audit protocols ensures compliance and traceability.
Together, these adaptations demonstrate that fractional optimizers are not only theoretically sound but can also be engineered for live deployment in medium- to high-frequency financial systems. We identify further directions such as volatility-aware scheduling and streaming memory management as future enhancements to expand real-time viability.
Note: While our experiments focused on financial time series, the proposed methodology is designed to be domain-agnostic. This paper provides a general roadmap for transforming standard gradient-based optimizers (e.g., Adam, RMSprop, Adagrad) into their fractional-order counterparts using Caputo-based memory-aware formulations. This transformation and its practical implementation—based on the Grünwald–Letnikov approximation and the short-memory principle—are independent of the specific statistical properties of the time series.
Therefore, fractional optimizers could be readily adapted to other domains such as energy consumption forecasting, physiological signal modeling in healthcare, or climate and weather prediction, especially when memory effects and long-range dependencies are relevant. The potential of these methods to enhance convergence stability and trend sensitivity may also be beneficial in these contexts.