A Variational-Mode-Decomposition-Cascaded Long Short-Term Memory with Attention Model for VIX Prediction

Kim, Do-Hyeon; Kim, Dong-Jun; Choi, Sun-Yong

doi:10.3390/app15105630

Open AccessArticle

A Variational-Mode-Decomposition-Cascaded Long Short-Term Memory with Attention Model for VIX Prediction

by

Do-Hyeon Kim

,

Dong-Jun Kim

and

Sun-Yong Choi

^*

Department of Finance and Big Data, Gachon University, Seongnam 13120, Republic of Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(10), 5630; https://doi.org/10.3390/app15105630

Submission received: 16 April 2025 / Revised: 11 May 2025 / Accepted: 15 May 2025 / Published: 18 May 2025

(This article belongs to the Special Issue Application of Artificial Intelligence and Semantic Mining Technology)

Download

Browse Figures

Versions Notes

Abstract

Financial time-series forecasting presents a significant challenge due to the inherent volatility and complex patterns in market data. This study introduces a novel forecasting framework that integrates Variational Mode Decomposition (VMD) with a Cascaded Long Short-Term Memory (LSTM) network enhanced by an Attention mechanism. The primary objective is to enhance the predictive accuracy of the VIX, a key measure of market uncertainty, through advanced signal processing and deep learning techniques. VMD is employed as a preprocessing step to decompose financial time-series data into multiple Intrinsic Mode Functions (IMFs), effectively isolating short-term fluctuations from long-term trends. These decomposed features serve as inputs to a Cascaded LSTM model with an Attention mechanism, which enables the model to capture critical temporal dependencies, thereby improving forecasting performance. Experimental evaluations using VIX and S&P 500 data from January 2020 to December 2024 demonstrate the superior predictive capability of the proposed model compared to seven benchmark models. The results highlight the effectiveness of combining signal decomposition techniques with Attention-based deep learning architectures for financial market forecasting. This research contributes to the field by introducing a hybrid model that improves predictive accuracy, enhances robustness against market fluctuations, and underscores the importance of Attention mechanisms in capturing essential temporal dynamics.

Keywords:

attention; Cascaded LSTM; forecasting; VMD; VIX; S&P 500

1. Introduction

In recent years, machine learning (ML) has driven innovation across various fields with its exceptional predictive accuracy. ML has become an indispensable tool in industries such as healthcare, logistics, climate science, and marketing, thanks to its ability to analyze vast amounts of data and uncover complex patterns. Accurate predictions in these areas often guide critical decisions with significant impacts on outcomes.

ML-based models have demonstrated superior performance in financial market predictions by processing large datasets, identifying complex patterns, and incorporating new information more effectively than traditional methods. For example, advanced neural network models have been applied to predict the Borsa Istanbul Banks Index, achieving better results than traditional approaches [1]. Similarly, the effectiveness of fuzzy clustering combined with artificial neural networks for financial market predictions has been highlighted [2], while the advantages of deep reinforcement learning in option pricing have also been demonstrated [3]. These advancements highlight ML’s growing role in financial analytics, delivering more accurate and timely insights into market dynamics.

Traditional financial models, such as linear regression and time-series analysis, often fail to capture the nonlinear and complex nature of financial data. In contrast, LSTM networks effectively handle long-term dependencies and extract meaningful patterns from time-series data. However, standard LSTM models treat all input features equally, limiting their ability to prioritize the most important information. Attention mechanisms address this limitation by dynamically focusing on the most relevant parts of the input sequence, thereby enhancing prediction accuracy. Studies have demonstrated the superior performance of Attention-LSTM models in financial market predictions, establishing them as a vital tool in this domain [4,5,6]. Recent Attention-augmented architectures have further validated this trend. Li et al. [7] developed a CNN–LSTM model enhanced with attention for the credit risk prediction of listed companies, achieving superior accuracy over traditional neural and statistical approaches. Sang and Li [8] proposed an Attention Mechanism Variant LSTM (AMV-LSTM) for stock price forecasting, demonstrating improved generalization and robustness compared to standard LSTM models. Finally, Luo et al. [9] introduced a CNN–BiLSTM–Attention (CLATT) framework for short-term stock correlation forecasting, in which the Attention layer dynamically re-weighs BiLSTM outputs to further boost predictive performance.

The strength of ML lies in its ability to process high-dimensional data and uncover nonlinear relationships, allowing it to deliver more accurate and reliable predictions. Unlike traditional statistical models, ML adapts to dynamic environments, making it particularly effective in domains characterized by frequent changes. For instance, LSTM models have been shown to excel in capturing long-term dependencies in time-series data [10,11]. Furthermore, studies have demonstrated that when enhanced with Attention mechanisms, LSTM models improve predictive accuracy by focusing on the most critical features of input sequences [5,12]. A multi-feature approach using LSTM networks has also been shown to significantly outperform single-feature models in financial time-series forecasting, highlighting the importance of incorporating diverse market indicators [13]. Additionally, a hybrid model combining empirical mode decomposition (EMD) and Akima spline interpolation with LSTM has been proposed, effectively handling financial data’s nonlinearity and improving prediction accuracy [14].

In financial markets, ML’s predictive power is particularly impactful. Accurate forecasts of asset prices and market volatility are essential for investors, financial institutions, and policymakers. Asset price predictions directly influence investment strategies, portfolio management, and market efficiency, helping investors maximize returns and minimize risks. Market volatility, which represents the degree of fluctuation in asset prices over time, serves as a key indicator of market risk. Accurate volatility predictions provide insights into market uncertainty, enabling better risk management and ensuring market stability.

The objective of this paper is to propose the VMD-Cascaded LSTM with Attention model, a novel prediction framework designed to improve the accuracy of forecasting financial time-series data, particularly the market VIX. This model integrates LSTM networks with Attention mechanisms to effectively capture long-term dependencies and focus on key moments in the data.

The VIX, often referred to as the “fear index,” is a benchmark indicator of market volatility and investor sentiment. It is calculated based on the implied volatility of S&P 500 index options, directly linking it to expectations of future fluctuations in the U.S. stock market. Reflecting the sentiment toward the S&P 500—the most widely followed equity index—the VIX serves as a sensitive and timely measure of anticipated market risk. Accurate forecasting of the VIX is essential not only for volatility management but also for informing asset allocation, derivative pricing, and portfolio risk control. Its intrinsic connection to S&P 500 options makes the VIX a forward-looking barometer of equity market volatility.

To complement the deep learning approach, Variational Mode Decomposition (VMD) is introduced as a preprocessing step to decompose financial time-series data into multiple IMFs. VMD separates short-term fluctuations from long-term trends, enabling the model to better capture the complexities of financial volatility. By reducing noise and providing cleaner, more structured data, VMD enhances the model’s ability to identify meaningful patterns and improve prediction accuracy. Similar methodologies have been successfully applied in water quality forecasting [15] and wind speed prediction [16], where VMD improved pattern recognition and reduced noise. In financial markets, decomposition methods have been shown to significantly enhance predictive performance, further validating the use of VMD for financial time-series data [17].

While Transformer-based architectures have recently shown strong performance in time-series forecasting due to their ability to model long-range dependencies through self-attention, their computational complexity scales quadratically with sequence length, making them less suitable for long financial time series. Additionally, the data-hungry nature of such models increases the risk of overfitting in domains where historical data are often limited, such as financial volatility prediction.

Similarly, CNN-LSTM hybrid models can capture both local and sequential patterns but tend to suffer from structural complexity and a heavy reliance on extensive hyperparameter tuning, which hinders fair and reproducible evaluation under constrained computational environments.

To address these issues, we adopt a VMD-Cascaded LSTM architecture with integrated Multi-Head Attention. VMD decomposes the original signal into IMFs, which are individually modeled by LSTMs in a cascaded fashion to capture hierarchical temporal patterns. Multi-Head Attention allows the model to selectively focus on important time steps, improving interpretability and performance. This approach balances forecasting accuracy and computational efficiency, making it well-suited for nonstationary and data-constrained financial time-series environments.

Given the complex and nonlinear nature of financial time series—particularly volatility indices like the VIX—traditional statistical models and even standard LSTM-based approaches often struggle to capture the full range of dynamic and multi-frequency patterns. To address these challenges, this study investigates a hybrid architecture that combines VMD, Cascaded LSTM layers, and a multi-head attention mechanism.

Based on this motivation, this study is guided by the following research questions:

RQ1: Can the integration of VMD, Cascaded LSTM, and Attention mechanisms significantly enhance VIX forecasting accuracy compared to conventional deep learning models?
RQ2: Does the proposed hybrid model maintain robust predictive performance across different market regimes, such as periods of high versus low volatility?

To empirically address these questions, we formulate the following hypotheses:

H1: The VMD-Cascaded LSTM with Attention model yields statistically significantly lower forecasting errors (MSE and MAE) than standard LSTM-based benchmark models.
H2: The proposed model demonstrates stable performance across varying levels of market volatility.

Our research findings significantly contribute to the existing literature. First, this study empirically evaluates the effectiveness of the VMD-Cascaded LSTM with the Attention model in predicting VIX volatility and demonstrates its potential to contribute significantly to volatility management and investment strategy formulation in financial markets. Second, the empirical results demonstrate that our model outperforms benchmark models, achieving substantial improvements in MSE, RMSE, and MAE, highlighting the effectiveness of integrating the Attention mechanism. Third, by leveraging the Attention mechanism within a sequential process, our model enhances its overall stability rather than merely listing machine learning methods, offering a more robust framework for volatility forecasting. These contributions provide valuable insights for volatility management and investment strategy formulation in financial markets.

This study presents a VMD-Cascaded LSTM with the Attention model to improve market volatility forecasts, thereby enhancing risk management and investment strategies. Beyond finance, the architecture’s adaptability extends to domains such as healthcare for early disease detection and climate science for flood risk forecasting. Moreover, in environments equipped with high-performance GPUs, the model can deliver even faster training and inference times. This enables investors and risk managers to reassess volatility forecasts for specific indices or financial instruments in near real time, supporting more agile and responsive decision-making. Such applicability underscores the model’s potential integration into automated trading systems or real-time risk monitoring platforms.

The remainder of this study is structured as follows. Section 2 provides a brief review of previous studies on financial forecasting using LSTM models and volatility prediction through machine learning techniques. Section 3 describes the dataset and outlines the proposed methodology. Section 4 presents the experimental results, highlighting the performance of the proposed model. Finally, Section 5 offers concluding remarks and discusses potential future research directions.

2. Literature Review

In this section, we introduce the literature on LSTM and volatility prediction, categorizing it into studies on financial forecasting using LSTM, research on volatility prediction with machine learning techniques, and applications of machine learning across industries beyond traditional methods.

In this literature review, we adopt a two-part structure to guide the reader through key advances and applications. Section 2.1.1, Section 2.1.2, Section 2.1.3 and Section 2.1.4 survey developments in LSTM-based time-series modeling, covering comparative analyses with conventional machine learning algorithms, the incorporation of attention mechanisms, hybrid architectures that integrate signal-decomposition techniques with convolutional networks, and applications to investor sentiment analysis and cryptocurrency forecasting. Section 2.2.1, Section 2.2.2 and Section 2.2.3 then examine machine-learning methods for financial volatility prediction in commodity futures markets, cryptocurrency markets, and high-frequency data settings. This organization clarifies both methodological innovations and application domains, thereby underscoring the motivations and contributions of the proposed VMD-Cascaded LSTM with the Attention framework.

2.1. Financial Data Prediction Using LSTM in Finance

2.1.1. Comparative Evaluation of LSTM and Conventional ML Models for Financial Time Series

LSTM networks and random forests have been utilized to forecast the directional movements of S&P 500 stocks, showing that a multi-feature setting outperforms a single-feature setting [13]. Daily returns were 0.64% for LSTM and 0.54% for random forests, compared to 0.41% and 0.39% with the single-feature setting. The multi-feature approach also achieved higher Sharpe ratios, lower standard deviations, reduced maximum drawdowns, and lower daily Value at Risk (VaR). Additionally, LSTM outperformed random forests. Future research will focus on applying and comparing reinforcement learning methodologies, both model-free and model-based, for stock and cryptocurrency predictions.

Various ML strategies for stock price prediction and portfolio management have been reviewed, including Autoregressive Integrated Moving Average Model (ARIMA), LSTM, CNN, and hybrid models [18]. The accuracy and limitations of these models were analyzed using standard measures like RMSE, Mean Absolute Percentage Error (MAPE), and Mean Absolute Error (MAE). LSTM models are noted for their accuracy due to low MSE, RMSE, and MAE, while CNN models excel at capturing rapid changes and forecasting stock trends. Hybrid models leverage the strengths of both LSTM and CNN to offer efficient and accurate predictions of stock attributes.

2.1.2. Enhancing LSTM Forecasts with Attention Mechanisms

The Attention mechanism in LSTM networks has been shown to significantly improve stock price prediction performance in the Hong Kong market [12]. Data from over 100 Hong Kong stocks were used for evaluation, and parameter tuning confirmed its effectiveness. The attention mechanism was also validated through a ‘Long-Only’ strategy, showing enhanced prediction accuracy. Overall, it notably enhances the performance of LSTM-based models in predicting stock movements.

A comparison of several deep learning models for financial time-series prediction, including Multilayer Perceptron (MLP), one-dimensional Convolutional Neural Networks (1D CNNs), LSTM, and Attention networks, highlighted the benefits of a 60-day lookback period [11]. Attention networks achieved the highest hit ratio, while LSTM models excelled in handling sequential data. Weighted attention networks also enhanced visualization and analysis. The proposed model outperformed others, achieving a 0.76% hit ratio in KOSPI 200 trend prediction, underscoring the effectiveness of long-term sequential data for improved accuracy and interoperability.

An Attention-LSTM model has been proposed for early warning of systemic risk in China’s financial market, incorporating a network public opinion index from text mining [19]. Systemic risk arises when a financial institution’s failure impacts the entire system, highlighting the need for a reliable warning model. The Attention-LSTM model enhances prediction accuracy with an Attention mechanism, outperforming other models in empirical analysis. The network public opinion index serves as a key indicator for forecasting market trends and risks, with potential for adaptation beyond China.

A new time-series prediction model, LSTM-Attention-LSTM, has been introduced by combining LSTM and Attention mechanisms using an encoder–decoder structure to manage long input sequences effectively [20]. The experimental results demonstrate that this model achieves higher prediction accuracy than existing mainstream models, performing well in both short and long time steps. The effectiveness of the LSTM-Attention-LSTM model has been validated using five open-source datasets.

2.1.3. Hybrid LSTM Architectures Incorporating Signal Decomposition and CNN

A model combining wavelet transform, Attention mechanism, and LSTM neural networks has been proposed for predicting stock opening prices with high accuracy [5]. The experimental results on S&P 500 and DJIA datasets demonstrated excellent performance, marked by high coefficients of determination and low mean square errors. This model outperformed existing models such as LSTM, Gated Recurrent Units (GRUs), and wavelet-transformed LSTM models. Future plans include incorporating stock-related news and fundamental information to further enhance stability and prediction accuracy.

A hybrid model combining a new version of EMD with Akima spline interpolation and an LSTM network has been introduced to handle complexities and nonlinearity in stock market forecasting [14]. It outperforms other machine learning models in predicting the daily closing prices of the KSE-100 index. Performance evaluation using RMSE, MAE, and MAPE confirms the model’s effectiveness for stock market prediction. Future research will extend this approach to stock markets in other countries and various nonstationary time-series data. This hybrid model proves highly useful for predicting complex financial time series data.

A CNN-Bidirectional LSTM (Bi-LSTM)-Attention-based model has been proposed to address the challenges of stock price prediction, including high frequency, nonlinearity, and long memory [21]. This model significantly enhances prediction accuracy, outperforming other models, particularly for the CSI300 index and 12 other stock market indices. Despite its superior performance, the model has limitations due to various financial factors. Future research may focus on improving accuracy by integrating multi-source heterogeneous information and combining the latest models.

2.1.4. LSTM Applications in Sentiment-Driven and Cryptocurrency Forecasting

A deep learning-based stock market prediction model incorporating investors’ emotional tendencies has been developed by simplifying stock price sequences with EMD and using an LSTM network with an Attention mechanism [6]. Experiments with Apple Inc. (AAPL) stock data showed that the model excels in predicting closing prices, classifying price movements, and reducing time delays. It offers valuable benefits for both governments and individual investors by providing accurate and timely predictions. Future research will investigate the use of semi-supervised models for sentiment analysis.

LSTM has been employed to predict stock price movements by incorporating investor Attention proxies to improve accuracy [4]. The LSTM model effectively handles nonlinear and nonstationary financial time-series data. Adding new attention proxies, such as Baidu search volume and news counts, enhances predictive performance. The experimental results show that the LSTM model with these proxies achieves the highest prediction accuracy. These findings are beneficial for investors focusing on quantitative investment strategies.

The impact of aggregate investor Attention on Bitcoin returns has been evaluated using LSTM, incorporating direct proxies such as Google Trends and Tweets alongside Bitcoin trading variables, significantly enhancing prediction accuracy [22]. Specifically, Google Trends and Tweets have been shown to greatly improve forecasting performance. The findings suggest that investor attention provides crucial information for predicting Bitcoin price movements. LSTM is demonstrated as a promising model for these predictions.

2.2. Financial Volatility Prediction Using Machine Learning in Finance

2.2.1. Machine Learning for Commodity Market Volatility Prediction

The impact of COVID-19-related news on crude oil futures volatility has been investigated using a model combining Genetic Algorithm (GA), Regularization with Forgetting Online Sequential (RFOS), and Extreme Learning Machine (ELM) to address the nonlinearity and nonstationarity of crude oil prices [23]. The results indicate that COVID-19 news plays a crucial role in predicting short-term volatility in crude oil futures, outperforming traditional forecasting methods. These findings provide valuable insights for investors and policymakers in understanding and responding to market volatility driven by pandemic-related information. Additionally, this model can be applied to other commodity markets.

The relationship between financial markets and oil futures volatility has been analyzed using traditional, machine learning, and combination models [24]. The results indicate that machine learning models, especially ensemble and neural network models, outperform others in forecasting oil futures volatility. Combination models consistently show the best performance across various prediction and evaluation methods. Additionally, the interpretability of prediction models and the contribution of each indicator have been discussed. This study provides new insights into the application of machine learning for predicting futures market volatility, offering valuable information for academics, policymakers, and investors.

A deep learning CNN model, Group Neural Network (GroupNet), has been proposed to forecast oil price volatility using gold, silver, platinum, palladium, and rhodium prices [25]. Leveraging group blocks and the GroupNet architecture, it outperforms seven machine learning and three traditional models. The Waikato Environment for Knowledge Analysis (WEKA) tool’s correlation-based feature selection reduces input dimensions and bias. The CNN model excels in MAE, MAPE, and RMSE. Future work aims to enhance learning, reduce overfitting, and apply transfer learning and optimization techniques.

Various machine learning models have been applied to forecast crude oil and natural gas volatility from 2003 to 2023, with performance varying across forecast periods [26]. Random Forest Regression and Extreme Gradient Boosting (XGBoost) consistently excel in daily, weekly, and monthly forecasts, while Huber Regression and ANN show period-specific strengths but lack consistency. SVM and LSTM underperform in long-term forecasts. This study offers insights for policymakers and investors, highlighting the need for a transition to renewable energy to enhance economic resilience and stability.

2.2.2. Machine Learning Techniques for Cryptocurrency Volatility Forecasting

Various orders of GARCH hybrid models have been explored for forecasting Bitcoin volatility, leading to the proposal of a new stacking ensemble methodology using Least Absolute Shrinkage and Selection Operator (LASSO) [27]. Higher model orders improve prediction accuracy, with the SVM-based hybrid model performing best. The proposed stacking ensemble, incorporating dimensionality reduction, outperforms general stacking methods. The LASSO-based ensemble surpasses most base models and addresses model selection and order determination issues. Future research should examine the impact of different feature selection and extraction techniques.

A comparison of 12 statistical and machine learning methods, including Human Activity Recognition (HAR), GARCH, LASSO, and Support Vector Regression (SVR), has been conducted to predict the volatility of Bitcoin, Ethereum, Litecoin, and Monero [28]. No single method dominates, as performance varies by cryptocurrency and forecast period. Notably, Random Forest and LSTM outperform GARCH, with LSTM further improved by genetic algorithms and artificial bee colony optimization. Shapley Additive exPlanations (SHAP) analysis highlights the importance of internal decision factors, and multi-cryptocurrency models surpass single-cryptocurrency ones. This study offers insights for investors and financial institutions on cryptocurrency market dynamics.

Machine learning has been applied to forecast cryptocurrency volatility, with Random Forest and LSTM significantly outperforming the traditional GARCH model [29]. Optimizing LSTM’s hyperparameters using Genetic Algorithm and Artificial Bee Colony substantially improves prediction performance. SHAP analysis highlights the importance of internal determinants in volatility forecasts. Models using data from multiple cryptocurrencies outperform those using data from a single cryptocurrency. This research provides practical insights for investors and financial institutions, aiding in capturing the complex dynamics of the cryptocurrency market.

2.2.3. High-Frequency Data and Hybrid ML Approaches to Financial Volatility Prediction

The use of high-frequency data and machine learning to forecast realized volatility has been investigated using a panel data approach [30]. This study compares six machine learning methods with traditional linear models and single time-series forecasting. The results show that panel-data-based machine learning (PDML) methods outperform other approaches, with Elastic Net Regularization, Stochastic Gradient Boosting, and Neural Networks being the most effective. Key features derived from high-frequency returns are strongly correlated with daily realized volatility.

A hybrid method combining GARCH with Recurrent Neural Network (RNN) architectures (LSTM, GRU, and mixed models) has been introduced to forecast volatility and implement risk control strategies for a multi-asset portfolio of US and international equities [31]. This study focuses on the Risk Parity method for portfolio allocation, showing positive results despite ignoring asset correlations. It suggests potential improvements by incorporating other risk measures and asset types, exploring alternative neural network algorithms, and comparing Risk Parity with VolTarget strategies. This approach outperforms standard risk/return portfolio structures.

While attention-augmented LSTM models have improved financial forecasting, prior studies remain limited in their ability to model long-range dependencies, handle multi-scale structures, and mitigate nonstationarity. Additionally, many frameworks fail to capture interactions among key financial indicators and suffer from overfitting due to raw sequence modeling. These gaps motivate the incorporation of signal decomposition techniques like VMD to enhance feature robustness and improve predictive stability.

As discussed above, forecasting volatility in finance is important both academically and practically. Meanwhile, the LSTM model, which has demonstrated superior performance in time-series forecasting compared to traditional models, is widely used. Therefore, in this study, LSTM is employed to predict the VIX, a representative volatility index. Furthermore, to enhance the performance of LSTM, the Attention mechanism, which has been recently utilized in Transformer models, is incorporated. Additionally, the S&P 500, the underlying asset of the VIX, is used as an input variable, and the VMD technique is applied to extract key signals. Through this approach, a hybrid model that integrates VMD, LSTM, and the Attention mechanism is proposed, and its predictive performance is validated.

3. Data Description and Methods

3.1. Data Description

The data used in this study include the VIX and the closing prices of the S&P 500, which were utilized to analyze market volatility and movements. The data cover the period from January 2020 to December 2024 and were collected from the Yahoo Finance platform. This study specifically employed data starting from 2020 to minimize discrepancies with pre-COVID-19 data and to effectively reflect structural changes in the market.

Figure 1 visualizes the movements of the S&P 500 and VIX over the selected period. The blue solid line represents the S&P 500 index, while the orange dashed line shows the VIX index. The graph highlights that S&P 500 and VIX generally exhibit an inverse relationship. For example, during the early pandemic phase in 2020, VIX experienced sharp spikes while the S&P 500 declined. As the market stabilized, VIX decreased and the S&P 500 resumed an upward trend. This inverse pattern is also observed during correction periods in 2022 and 2023.

To better understand the characteristics of the data, statistical analyses were conducted. The results are summarized in Table 1.

The statistical analysis results indicate that VIX data exhibit high skewness (2.5211) and kurtosis (11.1301), reflecting frequent extreme market events. In contrast, S&P 500 data follow a near-normal distribution with slight positive skewness (0.2480) and a kurtosis close to zero (−0.0865). The Jarque–Bera test confirms that both datasets significantly deviate from a normal distribution. Moreover, the Augmented Dickey–Fuller (ADF) test results show that VIX data are stationary, whereas S&P 500 data are nonstationary. These findings highlight the necessity of proper preprocessing techniques, such as VMD, to address the nonstationary characteristics of the S&P 500 data and account for extreme events in the VIX data.

3.2. The Baseline Methodologies

In this section, we provide a brief overview of the methodologies that form the foundation of the proposed model. Specifically, we introduce and explain VMD, LSTM, Cascaded LSTM, and the Attention mechanism.

3.2.1. VMD

VMD is a decomposition technique frequently utilized in the analysis of nonstationary signals. It decomposes complex signals with various frequency components into multiple narrow-band modes. This method was proposed to overcome the drawbacks of the traditionally widely used EMD. In EMD, issues such as mode mixing, boundary effects, and sensitivity to noise have been identified during the sequential decomposition of the signal into IMFs. VMD mitigates these problems by adopting a variational optimization approach to simultaneously estimate the modes. Here, the concept of variational optimization means setting an objective function to efficiently identify multiple narrow-band components inherent in the signal and minimizing it through an iterative update process.

The core idea pursued by VMD is to express the original signal as a sum of multiple modes, each assumed to be a narrow-band component with a central frequency, while simultaneously updating all modes. Mathematically, the given signal

f (t)

can be represented as the sum of K modes

{u_{k} (t)}_{k = 1}^{K}

and their corresponding central frequencies

{ω_{k}}_{k = 1}^{K}

:

f (t) = \sum_{k = 1}^{K} u_{k} (t)

(1)

VMD is formulated as an optimization problem that minimizes the following objective function to ensure each mode

u_{k} (t)

maintains narrow-band characteristics:

min_{{u_{k}}, {ω_{k}}} \sum_{k = 1}^{K} {∥\partial_{t} [(δ (t) + \frac{j}{π t}) * (u_{k} (t) e^{- j ω_{k} t})]∥}_{2}^{2} .

(2)

The algorithm initializes the central frequencies

ω_{k}^{(0)}

and the initial modes

u_{k}^{(0)} (t)

. It then iteratively performs mode updates, recalculates the central frequencies, and updates the Lagrange multipliers and penalty parameters. The process continues until the rate of change falls below a threshold or a maximum number of iterations is reached, resulting in the convergence to the final modes. To ensure that the sum of the decomposed modes matches the original signal, Lagrange multipliers are introduced to impose this constraint, and penalty terms are used to maintain accuracy.

Unlike traditional EMD, which extracts one mode at a time, VMD simultaneously optimizes a predetermined number of modes that interact with each other. This simultaneous optimization reduces interference between modes and enhances robustness against noise. In the frequency domain, VMD updates each mode’s frequency component and recalculates the central frequency to maintain narrow-band characteristics. When applying VMD, several key parameters must be appropriately set. The most important parameter is the number of modes to decompose, which should be chosen based on the frequency characteristics of the signal to achieve proper decomposition. Additionally, the penalty parameter

α

controls the narrow-bandness of the modes and is typically set empirically or determined through cross-validation to find the optimal value.

3.2.2. LSTM

LSTM is a representative model designed to address the ‘Gradient Vanishing’ problem that arises in RNNs. Traditional RNNs adopt a simple structure where, at each time step, input information and the previous hidden state are combined to produce a new hidden state. However, as the sequence lengthens, gradients can vanish or explode during the backpropagation process, leading to the inability to properly learn important past information. In contrast, LSTM is designed with a separate pathway called the cell state, which allows the preservation of long-term information. To achieve this, it introduces several gates that serve as key mechanisms. These gates, primarily the input gate, forget gate, and output gate, finely regulate the incoming information, the information to be retained, and the information to be output, thereby ensuring that significant information is maintained even in long sequences.

In an LSTM cell, at time step t, let the input vector be

x_{t}

, the previous hidden state be

h_{t - 1}

, and the cell state maintained up to the previous time step be

C_{t - 1}

. First, the forget gate is calculated as follows. The forget gate determines which parts of the cell state from the previous time step

C_{t - 1}

should be erased. Next, the input gate decides how much new information from the current input should be added to the cell state, during which the potential candidate information

{\tilde{C}}_{t}

is created through the hyperbolic tangent (tanh) function. Subsequently, the updated cell state

C_{t}

is determined by combining the values calculated by the forget gate and the input gate. Finally, the output gate selects which information from the updated cell state should be exposed to the hidden state, thereby producing the final hidden state

h_{t}

. In summary, at each time step, the gates and the cell and hidden states follow the flow illustrated below (where

σ

denotes the sigmoid function and tanh represents the hyperbolic tangent function):

\begin{matrix} r C l f_{t} & = & σ (W_{f} [h_{t - 1}, x_{t}] + b_{f}) \to (Forget Gate) \end{matrix}

(3)

\begin{matrix} i_{t} & = & σ (W_{i} [h_{t - 1}, x_{t}] + b_{i}) \to (Input Gate) \end{matrix}

(4)

\begin{matrix} {\tilde{C}}_{t} & = & tanh (W_{C} [h_{t - 1}, x_{t}] + b_{C}) \to (Cell State Candidate) \end{matrix}

(5)

\begin{matrix} C_{t} & = & f_{t} ⊙ C_{t - 1} + i_{t} ⊙ {\tilde{C}}_{t} \to (New Cell State) \end{matrix}

(6)

\begin{matrix} o_{t} & = & σ (W_{o} [h_{t - 1}, x_{t}] + b_{o}) \to (Output Gate) \end{matrix}

(7)

\begin{matrix} h_{t} & = & o_{t} ⊙ tanh (C_{t}) \to (New Hidden State) \end{matrix}

(8)

Figure 2 illustrates the role of each gate in the above equations. Here, the forget gate

f_{t}

determines the proportion of the previous cell state

C_{t - 1}

that should be erased, removing unnecessary or outdated information. The input gate

i_{t}

decides how much new information to incorporate into the cell state, and

{\tilde{C}}_{t}

represents the candidate information to be added. The cell state

C_{t}

is obtained by combining the retained part from the previous state (

f_{t} ⊙ C_{t - 1}

) and the new incoming part (

i_{t} ⊙ {\tilde{C}}_{t}

). The output gate

o_{t}

determines the extent to which the updated cell state is reflected in the hidden state

h_{t}

. By dividing the roles across different gates and finely controlling the information flow, LSTM can effectively preserve and utilize essential past information even in very long sequences.

When modeling an actual LSTM, various hyperparameters such as the dimensionality of the hidden state, the initialization methods for each gate’s parameters, and the learning rate must be set. It is necessary to find appropriate values tailored to the characteristics of the data. Additionally, since the gate structures themselves use many parameters, overfitting can occur if there are insufficient data or if the model size becomes excessively large. Therefore, regularization techniques are often employed in conjunction.

3.2.3. Cascade LSTM

Cascade LSTM is an extended version of the basic LSTM structure, consisting of multiple stacked layers of LSTM cells. It combines the expressive power of deep neural networks with the inherent sequence processing capabilities of LSTM. While traditional single-layer LSTMs utilize cell states and gates to maintain important past information in long sequences, they are limited by a simple path where inputs and outputs are connected only once. In contrast, Cascade LSTM connects multiple layers of LSTM in series, allowing higher layers to further process and interpret the time-series patterns extracted by lower layers. This enables the model to learn more complex and abstract sequential structures.

Each layer in a Cascade LSTM performs the same computational processes as a basic LSTM cell. At time step t, let the input to layer l be

x_{t}^{(l)}

, and the previous hidden state and cell state be

h_{t - 1}^{(l)}

and

C_{t - 1}^{(l)}

, respectively. First, the forget gate (forget gate) is calculated as follows:

f_{t}^{(l)} = σ (W_{f}^{(l)} [h_{t - 1}^{(l)}, x_{t}^{(l)}] + b_{f}^{(l)}) .

(9)

The forget gate determines which parts of the previously accumulated cell state should be erased to prevent excessive accumulation of information. Next, the input gate decides how much new input to accept, and the potential cell state candidate

{\tilde{C}}_{t}^{(l)}

is generated through the hyperbolic tangent function:

\begin{matrix} r C l i_{t}^{(l)} & = & σ (W_{i}^{(l)} [h_{t - 1}^{(l)}, x_{t}^{(l)}] + b_{i}^{(l)}), \end{matrix}

(10)

\begin{matrix} {\tilde{C}}_{t}^{(l)} & = & tanh (W_{C}^{(l)} [h_{t - 1}^{(l)}, x_{t}^{(l)}] + b_{C}^{(l)}) . \end{matrix}

(11)

Combining these, the new cell state

C_{t}^{(l)}

is determined by summing the retained part of the previous cell state (

f_{t}^{(l)} ⊙ C_{t - 1}^{(l)}

) and the newly incoming part (

i_{t}^{(l)} ⊙ {\tilde{C}}_{t}^{(l)}

):

C_{t}^{(l)} = f_{t}^{(l)} ⊙ C_{t - 1}^{(l)} + i_{t}^{(l)} ⊙ {\tilde{C}}_{t}^{(l)} .

(12)

Finally, the output gate (output gate) selects which information from the updated cell state should be exposed to the hidden state:

\begin{matrix} r C l o_{t}^{(l)} & = & σ (W_{o}^{(l)} [h_{t - 1}^{(l)}, x_{t}^{(l)}] + b_{o}^{(l)}), \end{matrix}

(13)

\begin{matrix} h_{t}^{(l)} & = & o_{t}^{(l)} ⊙ tanh (C_{t}^{(l)}) . \end{matrix}

(14)

The resulting hidden state

h_{t}^{(l)}

is immediately passed to the next time step

(t + 1)

within the same layer and, if the current layer is the l-th layer, it serves as the input

x_{t}^{(l + 1)}

for the upper layer. Therefore, in Cascade LSTM, as the layers deepen, they produce richer time-series representations and contextual information, which demonstrates excellent performance on data with high complexity.

Figure 3 visualizes the overall structure of the Cascade LSTM. The input

x_{t}

at time step t passes through multiple layers of LSTM cells in sequence to produce the final output

y_{t}

. The greatest advantage of the Cascade LSTM is that it provides a much more flexible and powerful representational capacity compared to a simple LSTM. This offers a significant benefit in tasks where understanding long-term context is crucial. By stacking layers in depth so that each layer further processes features extracted from the previous layer, the Cascade LSTM can learn richer and more complex patterns than a simple LSTM. However, as the number of layers increases, the number of model parameters can grow exponentially. Moreover, the final performance can vary greatly depending on hyperparameters such as the model architecture, learning rate, and batch size. Therefore, careful tuning is required to match the given problem and data characteristics.

In summary, Cascade LSTM effectively handles complex sequential problems by maximizing the benefits of mitigating gradient vanishing and managing long-term dependencies inherent in single LSTMs, combined with the advantages of hierarchical abstraction provided by multiple layers. It is a powerful model capable of effectively dealing with intricate sequence issues.

3.2.4. Attention

The Attention mechanism is a technique designed to allow models, particularly in various deep learning fields such as Natural Language Processing (NLP), to ‘focus’ on specific parts of the input sequence that are especially important for the current output. Originally developed to address the issue in RNN-based machine translation models (Sequence-to-Sequence structures) where treating all words in a long sentence with equal importance could lead to missing crucial information, the Attention mechanism calculates the interrelationships among all positions in a sequence through relatively simple computations. By assigning higher weights to important positions, it enables the model to accurately grasp the context. This process primarily involves three components: ‘Query’, ‘Key’, and ‘Value’ vectors. The Query represents what information needs to be found, the Key represents what information is held, and the Value represents the actual information to be referenced.

Specifically, when calculating Attention at time step t, the Query

Q

, Key

K

, and Value

V

are derived for each element in the input sequence. The similarity between the Query and each Key is measured by taking their dot product, followed by normalization into a probability distribution using the softmax function. This distribution indicates where the model should focus, with higher probabilities indicating greater importance. Finally, this probability distribution is used to compute a weighted sum of the Value vectors, resulting in a ’Context’ vector that the model can reference. The following equation outlines how Attention values are computed at each time step:

Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V .

(15)

Here,

\sqrt{d_{k}}

is the square root of the dimension

d_{k}

of the Query and Key vectors, used to prevent scaling issues that can arise when the dot product becomes large. If multiple heads are used to compute Attention from different perspectives and their results are then combined, this approach is referred to as ‘Multi-Head Attention’.

By incorporating the Attention mechanism, models can focus on specific parts of the sequence, allowing them to effectively maintain contextual clues even in long sentences or large-scale time-series data. This significantly mitigates the issues of gradient vanishing and the need to compress all information into a single hidden state that were prevalent in previous RNN-based structures.

3.3. Proposed Model

Despite substantial advances engendered by the deployment of LSTM architectures augmented with attention mechanisms in financial forecasting, extant studies continue to exhibit several critical shortcomings. Predominantly, existing approaches emphasize short-term trend extrapolation [32], while under-representing the modeling of long-range sequential dependencies that are pivotal for capturing macro-scale market dynamics [11,12]. Although Attention layers mitigate this to an extent by adaptively weighting temporal features, they remain insufficient for discerning multi-scale temporal structures and attenuating high-frequency, noise-induced artifacts inherent in raw financial time series. Moreover, a sizable portion of attention-based frameworks operate on raw, untransformed sequences laden with nonstationary components and stochastic volatilities leading to pronounced overfitting and performance instability under volatile market conditions [33,34]. Finally, most prior investigations are constrained to univariate or sparsely multivariate models, thereby failing to capture synergistic interactions between critical market indicators such as the S&P 500 and the VIX index [35,36]. To address these limitations, we advocate for the integration of signal decomposition techniques such as VMD that deconstruct financial series into denoised, frequency-specific modes prior to the application of attention, thereby furnishing models with stable, information-rich feature representations [37,38].

In this study, we propose the VMD-Cascaded LSTM with Attention model to improve the stability and accuracy of VIX volatility predictions. The overall structure of the model is shown in Figure 4.

The proposed model consists of two main stages. In the first stage, the S&P 500 close data are decomposed using VMD, generating multi-scale features represented as VMD-extracted S&P 500 features. These features are then fed into the first LSTM layer, which captures the initial temporal patterns of the market. This design leverages the S&P 500 data as a representation of market trends and ensures that key temporal dependencies are learned effectively.

In the second stage, the VIX close data are also processed through VMD, producing VMD-extracted VIX features. These data are then passed through a Multi-Head Attention mechanism, which dynamically emphasize critical features within the VMD-extracted VIX features. The enhanced features are subsequently used as inputs to the second LSTM layer, which refines the patterns learned from the first stage and integrates them with the newly emphasized VIX features. This hierarchical structure stabilizes the learning process and ensures more precise predictions of VIX volatility. The detailed architecture of the proposed model is already illustrated in Figure 4, which captures both stages of the VMD-Cascaded LSTM with Attention structure. As shown in the figure, the first stage begins with the decomposition of the S&P 500 data into its VMD modes (sig1, sig2, sig3), which are then summed to form VMD S&P 500 data. These data are input to the first LSTM layer, where primary temporal patterns are captured. Simultaneously, the VIX data are decomposed into its VMD modes and summed to produce VMD VIX data, which serves as the input to the Multi-Head Attention mechanism. The Attention mechanism dynamically highlights the most critical features within the VMD VIX data, which are passed to the second LSTM layer as features for further learning.

The dimension transformations in the proposed model play a crucial role in ensuring seamless data flow between different layers. After the first LSTM layer processes the VMD S&P 500 data, it outputs a 2D tensor in the form

(batch size, LSTM units)

. To prepare this output for subsequent integration, it is flattened into a 2D structure. Simultaneously, the VIX data, initially represented as a 3D tensor

(batch size, sequence length, 1)

, are reshaped into a 2D tensor

(batch size, sequence length)

.

These two datasets are horizontally concatenated (hstack), resulting in a unified 2D tensor of the form

(batch size, sequence length + LSTM units)

. This integration merges the temporal patterns learned from the S&P 500 data and the inherent characteristics of the VIX data into a single feature representation. The concatenated input enables the second LSTM layer to effectively model interactions between market trends and volatility patterns, enhancing the overall prediction capability of the model.

By combining the temporal patterns learned from the S&P 500 data and the emphasized features from the VIX data, the second LSTM layer generates a refined output that predicts VIX volatility with greater stability and precision. This cascaded approach ensures that the model effectively captures both market-wide trends and volatility-specific patterns.

The proposed model integrates the strengths of VMD for multi-scale feature decomposition, the hierarchical learning of Cascaded LSTM, and the feature enhancement capabilities of Multi-Head Attention. Together, these components form a robust framework designed to provide reliable and accurate predictions of VIX volatility.

4. Empirical Results

4.1. Experimental Design

This study uses VIX close data and S&P 500 close data obtained from Yahoo Finance, covering the period from 1 January 2020 to 31 December 2024. The dataset contains daily closing prices, which were divided into a training set (80%, 2020–2023) and a test set (20%, 2024). All data were normalized to ensure consistent feature scaling.

The raw closing prices of VIX and S&P 500 were first normalized using Z-score standardization (mean 0, standard deviation 1). Missing values were handled by simple row deletion, as they were sparse and did not significantly affect the time-series continuity. The dataset was split chronologically to prevent future information leakage. The last year (2024) was used exclusively as the out-of-sample test set to ensure realistic evaluation.

To evaluate the model’s robustness across different time periods, a rolling-window backtest was additionally conducted. This test design allows the model to be exposed to various market conditions, including periods of high volatility. The relatively consistent MSE trends suggest strong generalization capabilities. In this approach, the training and testing windows were shifted across the time series using a fixed-size scheme (500-day training, 50-day testing). At each step, the model was retrained and evaluated on new out-of-sample data. This method enables a more comprehensive understanding of temporal generalization performance.

Figure 5 presents the MSE values for each step of the rolling-window evaluation. The model maintains stable predictive accuracy across most periods, though slight performance degradation is observed during highly volatile intervals (e.g., mid-2021 and mid-2023), consistent with expectations in financial time-series forecasting.

The proposed model consists of two primary stages. First, the S&P 500 and VIX data are decomposed using VMD, generating multiple IMFs that capture different frequency components of the original signals. These IMFs provide multi-scale features that enhance the model’s ability to learn temporal dependencies effectively.

To generate these IMFs, appropriate configuration of the VMD parameters is crucial. In this study, the number of modes K and the penalty parameter

α

for VMD were set to 3 and 2000, respectively. The value of

K = 3

was determined empirically through preliminary experiments, as it offered a favorable trade-off between decomposition granularity and predictive performance in our dataset. Meanwhile,

α = 2000

was selected based on prior literature; notably, Zuo et al. [39] employed the same value in a two-stage VMD framework for streamflow forecasting, where it proved effective in achieving stable mode separation without introducing excessive smoothing. Maintaining these fixed parameter values across all experiments ensured consistency when comparing different model configurations. A more detailed sensitivity analysis on K and

α

will be considered in future work to further examine the robustness of these choices.

Figure 6 illustrates the decomposed IMFs of the VIX and S&P 500 data. The left column represents the IMFs of VIX, while the right column shows those of the S&P 500. The first IMF captures high-frequency fluctuations, while the subsequent IMFs reveal lower-frequency trends and long-term patterns. By leveraging these multi-resolution representations, the model ensures that both short-term volatility and long-term market trends are incorporated into the forecasting process.

In the first processing stage, the aggregated VMD components of the S&P 500 (VMD S&P 500) are passed through an LSTM layer, which captures sequential dependencies and extracts essential trend information. Meanwhile, the aggregated VMD components of the VIX (VMD VIX) are processed separately using a Multi-Head Attention mechanism, enhancing the model’s ability to focus on critical patterns in market volatility. In the second stage, the outputs from these two processes—the LSTM-extracted features from VMD S&P 500 and the Attention-enhanced features from VMD VIX—are combined and fed into another LSTM layer. This final LSTM layer integrates information from both market trends and volatility fluctuations, further refining the VIX prediction.

Furthermore, we applied Random Grid Search to explore the hyperparameter space of the Cascaded LSTM model. This process was conducted separately for the first and second layers of the model, as well as for the Multi-Head Attention mechanism embedded in the second layer. For the first layer, which processes S&P 500 data, we tuned key parameters such as the number of LSTM units, dense units, batch size, and epochs. Likewise, for the second layer—which integrates VIX data along with the output from the first layer—additional parameters, including the number of attention heads and input dimensions, were optimized.

The number of Attention heads was determined via hyperparameter tuning using a Random Search strategy, exploring values between 2 and 8 heads. While an explicit ablation study isolating the number of heads was not separately conducted, the tuning process inherently evaluated different configurations and selected the optimal setting based on validation loss.

The search space for each hyperparameter was predefined based on prior knowledge and computational constraints. The random grid search method allowed efficient exploration of the parameter space without the exhaustive computational cost of a full grid search. The tuned parameters ensured that both the hierarchical and Attention-based features of the model were leveraged effectively. Table 2 outlines the search space and the optimal configurations for both LSTM layers and the Attention mechanism.

To ensure optimal performance of the proposed model, a random search strategy was employed to explore the hyperparameter space, using an 80/20 train-validation split within the training data (2020–2023). Table 2 summarizes the search space and the optimal values selected for both the first and second layers of the model. The first layer, which processes VMD-transformed S&P 500 data, was configured with 32 LSTM units and 48 dense units, using a batch size of 64 and 5 training epochs. The second layer, which integrates both VMD VIX features and outputs from the first layer, was tuned with 80 LSTM units, 6 attention heads, and a smaller batch size of 32 to facilitate convergence. Beyond architectural hyperparameters, Table 3 details the full configuration used during training and evaluation. The model employed the Adam optimizer with a learning rate of 0.001, and dropout (rate = 0.2) was applied in the first LSTM layer to mitigate overfitting. L2 regularization was also applied in the dense layers to enhance generalization. Early stopping with a patience of 5 was used to halt training when validation loss stopped improving. These settings reflect a balance between model complexity and stability, ensuring that the model remains both expressive and robust under real-world market volatility. To prevent overfitting, we adopted multiple regularization techniques including dropout (rate = 0.2), L2 weight decay in dense layers, early stopping (patience = 5), and an 80/20 train-validation split during hyperparameter tuning.

In response to concerns about implementation transparency, we provide the complete model configuration details in Table 3, including optimizer settings, learning rates, dropout usage, and early stopping criteria.

To evaluate the effectiveness of the proposed model, we compare it against seven benchmark models. Transformer-based architectures have gained considerable attention in recent time-series forecasting research due to their strong ability to model long-term dependencies. The self-attention mechanism enables dynamic learning of relationships across all time steps in a sequence. However, its computational complexity grows quadratically with the sequence length (

O (N^{2})

), making it computationally expensive for long time series. Moreover, in domains like financial time series where data are often limited, such complex models are prone to overfitting, and their theoretical advantages may not translate into practical performance gains.

Likewise, hybrid CNN-LSTM models can capture both local patterns and temporal dynamics. Nevertheless, their structural complexity and the extensive hyperparameter tuning they require make fair and reproducible comparisons difficult, especially under constrained computational resources.

Given these limitations, we adopt a VMD-Cascaded LSTM architecture with integrated Multi-Head Attention, instead of employing full Transformer or CNN-LSTM models. In our framework, the original time series is first decomposed using VMD, which separates the signal into multiple IMFs that represent different frequency components. These IMFs are then modeled using a Cascaded LSTM structure to hierarchically learn temporal dependencies. Multi-Head Attention is incorporated to allow the model to focus dynamically on informative time steps during prediction. This architecture captures the core benefits of attention mechanisms while maintaining suitability for nonstationary, data-constrained time-series environments. As such, the proposed model achieves a balanced trade-off between computational efficiency and forecasting performance.

These models are chosen to systematically analyze the impact of three key components:

The effect of VMD on predictive performance;
The influence of hyperparameter tuning across layers;
The contribution of the Attention mechanism in improving volatility forecasting.

The baseline models, LSTM1 and LSTM2, tests the performance of standard LSTM architectures without VMD. VMD LSTM models evaluate how feature decomposition affects prediction accuracy, differentiating between configurations that use only VMD S&P 500 data and those incorporating VMD VIX components. Finally, the VMD-Cascaded LSTM models assess the performance gains from cascading multiple LSTM layers and applying the Attention mechanism.

The details of these models are as follows:

LSTM1: Predicts VIX using S&P 500 data with first-layer hyperparameter tuning. This model serves as a baseline LSTM model without VMD, helping to assess the effectiveness of feature extraction from raw S&P 500 data.
LSTM2: Predicts VIX using S&P 500 data with second-layer hyperparameter tuning. Similar to LSTM1 but with different hyperparameter tuning in the second layer, this model helps analyze the influence of hyperparameter optimization across layers.
VMD LSTM 1 (1,1): Predicts VMD-decomposed VIX using VMD-decomposed S&P 500 data with first-layer hyperparameter tuning. This model introduces VMD to both input sources (S&P 500 and VIX) and allows evaluation of whether decomposing financial time-series data enhances prediction accuracy.
VMD LSTM 1 (2,1): Predicts VMD-decomposed VIX using VMD-decomposed S&P 500 data and VMD VIX features with first-layer hyperparameter tuning. This model additionally incorporates decomposed VIX features, enabling an assessment of how including VIX-specific components improves volatility prediction.
VMD LSTM 2 (1,1): Predicts VMD-decomposed VIX using VMD-decomposed S&P 500 data with second-layer hyperparameter tuning. By adjusting hyperparameters in the second layer, this model tests the sensitivity of deep learning parameter configurations in VMD-enhanced architectures.
VMD LSTM 2 (2,1): Predicts VMD-decomposed VIX using VMD-decomposed S&P 500 data and VMD VIX features with second-layer hyperparameter tuning. This model represents a more refined approach by incorporating decomposed VIX features while fine-tuning deeper network layers, assessing the synergy between feature engineering and model optimization.
VMD-Cascaded LSTM: Implements the proposed architecture without the Attention mechanism. This model serves as a key benchmark to measure the contribution of the Attention mechanism in capturing critical market dependencies and improving VIX prediction.

All models were evaluated using three metrics: MSE, RMSE, and MAE. These metrics assess the prediction accuracy by quantifying the difference between predicted and actual values.

Mean Squared Error (MSE): Measures the average squared difference between predicted and actual values. It penalizes larger errors more heavily, making it sensitive to outliers.

$M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}$

(16)
Root Mean Squared Error (RMSE): The square root of MSE, providing an error measure in the same unit as the target variable. It retains the sensitivity to large errors while improving interpretability.

$R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}$

(17)
Mean Absolute Error (MAE): Measures the average absolute difference between predicted and actual values. Unlike MSE, it treats all errors linearly, making it more robust to outliers.

$M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |$

(18)

4.2. Key Results

In this section, we conduct VIX prediction using the proposed model and benchmark models, presenting the results accordingly. First, Figure 7 visualizes the prediction results of the proposed model, further demonstrating its ability to capture market volatility patterns effectively. The figure highlights that the VMD-Cascaded LSTM with Attention model closely follows actual VIX movements, indicating its robustness in capturing complex financial time-series dynamics. The combination of VMD, cascaded LSTM, and Attention allows the model to extract relevant features while mitigating noise, leading to improved predictive accuracy.

Furthermore, Figure 8 illustrates the results of hyperparameter tuning for each benchmark model, providing a visual comparison of their predictive performance. The results show that different architectures respond differently to hyperparameter optimization. While some models exhibit noticeable improvements after tuning, others show limited gains. For example, VMD LSTM 1 (2,1) and VMD LSTM 2 (2,1), which utilize additional input features, demonstrate enhanced performance compared to basic LSTM models. However, even after tuning, their prediction errors remain higher than those of the proposed model, reaffirming the effectiveness of combining VMD with Attention in reducing prediction errors.

Finally, Table 4 compares the proposed model with seven benchmark models across three evaluation metrics. The proposed model achieves the best performance, demonstrating the effectiveness of combining VMD with Cascaded LSTM and Attention mechanisms. Specifically, the VMD-Cascaded LSTM with Attention model achieves the lowest MSE (0.0018), RMSE (0.0427), and MAE (0.0316) among all models, significantly outperforming the VMD-Cascaded LSTM model, which recorded MSE (0.0532), RMSE (0.2306), and MAE (0.1305). These results emphasize that the integration of Attention further refines the predictive capability of VMD-based LSTM models by allowing the model to focus on key temporal dependencies, which is particularly beneficial in financial time-series forecasting. Moreover, compared to the best-performing benchmark model (VMD-Cascaded LSTM), the proposed model reduces MSE by approximately 96.6%, RMSE by 81.5%, and MAE by 75.8%, demonstrating the substantial improvement achieved by incorporating Attention. This reduction in prediction error highlights the significance of combining VMD with an advanced sequence-learning mechanism.

This reduction in prediction error highlights the significance of combining VMD with an advanced sequence-learning mechanism.

To evaluate the robustness of this substantial improvement, we conducted three complementary analyses. First, we constructed a bootstrap-based confidence interval for MSE, confirming that the low error value was not a result of statistical outliers. For example, the VMD-Cascaded LSTM with Attention model achieved a 95% confidence interval of [0.0015, 0.0030] for MSE (mean = 0.0023), significantly outperforming the next-best model (VMD-Cascaded LSTM), which recorded [0.0327, 0.1002] (mean = 0.0590). In contrast, basic LSTM baselines such as LSTM1 and LSTM2 yielded much higher intervals—[0.1399, 0.2689] and [0.4812, 0.7402], respectively—demonstrating the superior predictive stability of the proposed model.

Second, a rolling-window out-of-sample test was conducted to verify temporal generalization across different market conditions. The model maintained consistent performance across periods, with only slight degradation during highly volatile intervals.

Finally, prediction residuals were visualized both over time and as a distribution. The absence of systematic drift or extreme skew in the residuals indicates that the model does not overfit noise, further validating the robustness of the results.

These robustness checks collectively confirm that the significant performance improvements are not due to random chance or data leakage, but instead arise from the architectural advantages of combining VMD with Multi-Head Attention and a Cascaded LSTM framework.

The proposed model outperforms all benchmark models due to two key factors. First, VMD effectively decomposes the original signals into distinct IMFs, allowing the model to extract meaningful patterns while reducing noise. Previous studies have demonstrated that VMD improves predictive accuracy across various domains, including water quality forecasting [15] and wind speed prediction [16], by enhancing pattern recognition and filtering out noise. In the financial domain, decomposition-based approaches, such as VMD, have been shown to significantly enhance forecasting performance by isolating meaningful components from volatile market data [17]. These findings further validate the effectiveness of VMD in financial time-series modeling, where capturing complex volatility patterns is essential.

Furthermore, VMD is particularly beneficial in financial markets, where price movements often exhibit nonstationary and highly volatile characteristics. By decomposing raw VIX data into multiple IMFs, the model can distinguish fundamental trends from short-term fluctuations, allowing for more stable and interpretable forecasting results. The effectiveness of this approach aligns with prior studies, demonstrating that decomposition-based forecasting methods significantly improve accuracy in volatile financial environments.

Multi-input structures such as VMD LSTM 1 (2,1) and VMD LSTM 2 (2,1) leverage both S&P 500 and VIX features, further enhancing predictive performance. However, despite these improvements, the absence of Attention in these models limits their ability to selectively focus on critical time steps, leading to suboptimal performance compared to the proposed approach.

Second, the Attention mechanism enables the model to prioritize critical parts of the data, capturing key patterns and temporal dependencies more effectively. Studies have shown that incorporating Attention into LSTM improves stock price prediction by allowing the model to focus on the most informative time steps [12]. Attention networks have been found to enhance long-term trend forecasting by selectively emphasizing key sequential dependencies [11]. Additionally, Attention mechanisms have been demonstrated to help capture complex financial patterns, mitigating short-term noise while preserving essential information [21]. Similarly, an Attention-based LSTM model has been shown to dynamically adjust its focus, leading to more accurate and stable predictions [5]. This selective attention leads to a significant reduction in prediction errors compared to models without it. By dynamically adjusting its focus on relevant time steps, the model mitigates the impact of short-term noise and better captures long-term volatility trends.

By leveraging both VMD and Attention, the proposed model achieves a more accurate and robust performance in forecasting financial time series. These results provide strong empirical evidence that combining VMD with an Attention-based LSTM framework significantly enhances VIX prediction accuracy, especially in volatile market conditions.

4.2.1. Answer to RQ1 and H1

Table 4 shows that the proposed VMD-Cascaded LSTM with Attention outperforms all benchmark models across all evaluation metrics. These include various configurations of LSTM with and without VMD, confirming the superior predictive power of the proposed hybrid architecture. Although classical models such as GARCH have historically been applied to volatility forecasting, this study focuses on deep learning-based approaches to ensure a consistent comparison framework. These results support H1 and affirm that the integration of VMD and Attention mechanisms significantly improves forecasting accuracy over standard LSTM-based models.

4.2.2. Answer to RQ2 and H2

To assess robustness across varying market regimes, a rolling-window out-of-sample evaluation was conducted. The results, visualized in Figure 5, demonstrate that the proposed model maintains stable predictive performance across different temporal segments, including both high- and low-volatility periods. In addition, residual analysis (Figure 9 and Figure 10) confirms the absence of systematic drift or overfitting to noise. These findings support H2, indicating that the proposed VMD-Cascaded LSTM with Attention model generalizes well across changing market conditions.

5. Concluding Remarks

In this study, we proposed an innovative approach to financial time-series forecasting by integrating VMD-Cascaded LSTM with Attention model. The primary objective was to enhance the predictive accuracy of the VIX, a critical indicator of market uncertainty, using advanced signal processing and deep learning techniques.

The proposed model incorporates VMD as a preprocessing step to decompose financial time-series data into multiple IMFs, which effectively separate short-term fluctuations from long-term trends. By leveraging these decomposed features, the model captures essential patterns in the data. Furthermore, the Cascaded LSTM with Attention architecture enables the model to focus on key temporal dependencies, significantly improving predictive accuracy.

The experimental results, using VIX and S&P 500 data from January 2020 to December 2024, demonstrated the model’s superiority over seven benchmark models. The proposed model achieved highly competitive error metrics, establishing it as a state-of-the-art approach for financial time-series forecasting. These results highlight the potential of combining signal decomposition techniques and attention mechanisms for addressing the complexities of market volatility.

This research contributes to the field of financial time-series forecasting in three key aspects. First, it introduces a novel combination of VMD and Attention-based LSTM to effectively address the complexities of market volatility modeling. Second, the model’s performance outperforms benchmark models, achieving a mean squared error (MSE) of 0.0018, which represents a 96.6% improvement compared to the best benchmark model, VMD LSTM 2 (2,1). Additionally, the inclusion of the Attention mechanism significantly enhances the model’s performance, as evidenced by a 99.7% reduction in MSE, an 81.5% reduction in RMSE, and a 75.8% reduction in MAE compared to VMD-Cascaded LSTM (without Attention). These findings underscore the importance of integrating Attention to capture critical temporal dependencies and improve prediction accuracy.

The proposed VMD-Cascaded LSTM with Attention model enhances financial time-series forecasting by integrating advanced signal decomposition and deep learning techniques. VMD decomposes financial data into IMFs, reducing noise and isolating meaningful patterns. This enables the model to extract essential features while minimizing the impact of irrelevant fluctuations. Cascaded LSTM refines temporal dependencies, improving long-term forecasting accuracy by capturing hierarchical representations of sequential data. Additionally, the incorporation of Multi-Head Attention highlights critical time steps, allowing the model to focus on key market movements more effectively than traditional LSTM models.

Compared to benchmark models, our approach demonstrates superior performance in multiple aspects. VMD enhances noise handling, providing cleaner input signals for predictive modeling. The cascaded structure of LSTM layers ensures stronger pattern recognition by sequentially refining extracted features. Furthermore, the Attention mechanism improves interpretability by dynamically emphasizing influential time points, leading to better predictive accuracy and more explainable results.

Overall, this study highlights the advantages of combining VMD, LSTM, and Attention mechanisms for financial volatility forecasting. The proposed approach improves accuracy, enhances model robustness, and increases interpretability, making it a valuable tool for financial modeling and risk management applications.

This study proposes a VMD-Cascaded LSTM with Attention model aimed at improving the accuracy of market volatility forecasts. By doing so, it is anticipated to aid financial institutions, asset management firms, and individual investors in developing more sophisticated and stable risk management and investment strategies. In particular, proactively preparing for periods when a sharp increase in VIX is anticipated can help minimize potential losses stemming from market shocks, while maximizing portfolio returns. Moreover, the integration of the Attention mechanism enables the model to more effectively capture “critical time points” and “key features” within the market—signals that conventional LSTM models or simple signal decomposition approaches might miss—thereby enhancing predictive precision. As a result, this framework holds promise for broad applicability and heightened practical value across various real-world scenarios, such as trading and risk monitoring systems.

Furthermore, the model proposed in this study can be applied to various fields beyond finance. For instance, in the healthcare sector, machine learning (ML) models facilitate more effective and timely interventions by predicting disease outbreaks, patient outcomes, and treatment responses [40,41]. Additionally, the proposed model can be utilized in predicting farmland flooding by integrating ML models with weather forecasts [42].

Despite its strong predictive capabilities, the proposed model has several limitations. The high computational cost associated with VMD and hyperparameter tuning makes it less suitable for real-time applications or large-scale datasets. Moreover, the model’s performance heavily depends on the quality of input data and may remain sensitive to noise and abrupt market fluctuations. Although VMD contributes to noise reduction, additional measures are required to further enhance robustness.

The inclusion of both VMD and the Attention mechanism also increases architectural complexity, which reduces interpretability and makes it challenging to analyze feature importance or fully understand the model’s internal decision-making process. To address these limitations, future work may explore alternative signal decomposition methods with lower computational overhead and consider simplifying the architecture to improve interpretability and deployment potential.

In terms of efficiency, the basic training and testing of the proposed model were completed within approximately 19 s on an NVIDIA RTX 2060 Super GPU, highlighting its practicality despite the use of Cascaded LSTM and Attention modules. When extended evaluation procedures—such as bootstrap-based confidence interval estimation, rolling-window backtesting, and residual diagnostics—were included, the total runtime increased to approximately 127 s. Nonetheless, this level of computational demand remains manageable and suggests that the model is feasible for deployment in time-sensitive applications, including real-time risk monitoring or automated trading systems. These results collectively demonstrate that the proposed architecture strikes a favorable balance between predictive performance and computational efficiency.

Although the proposed model demonstrated superior forecasting performance compared to all benchmark models, several practical and methodological considerations should be addressed.

First, while rolling-window evaluations were employed to assess temporal generalization, the analysis did not explicitly segment performance under distinct market regimes (e.g., high vs. low volatility). Future research could incorporate scenario-specific testing to identify performance boundaries more precisely.

Second, the hybrid architecture—combining VMD, Cascaded LSTM layers, and Multi-Head Attention—yields considerable gains in accuracy but at the cost of increased model complexity. This complexity may challenge interpretability and require careful tuning, particularly for practitioners with limited computational resources. In such cases, simplified architectures or modular variants of the model may offer more practical alternatives with acceptable performance trade-offs.

For practitioners, the model shows promise for near real-time applications, such as automated trading or dynamic risk monitoring, particularly when GPU acceleration is available. However, robust deployment would require ongoing retraining, anomaly detection, and model validation to adapt to changing market conditions and mitigate the risk of model drift.

These insights underscore that while the model is computationally feasible and highly accurate, its application in operational settings demands strategic consideration of resource availability, interpretability, and lifecycle maintenance.

In conclusion, this study represents a significant step forward in financial time-series forecasting by demonstrating the effectiveness of integrating VMD with an Attention-based LSTM model. The findings not only advance the understanding of market volatility but also lay the groundwork for future innovations in financial modeling. The proposed approach, with its demonstrated accuracy and robustness, holds the potential for broader applications across domains requiring precise time-series forecasting.

While this study focuses on the VIX as a representative volatility index, future research may consider extending the proposed methodology to other indices such as NASDAQ or sector-specific volatility indicators. However, such extensions are not pursued in this work, as the current architecture—designed with dual-inputs from S&P 500 and VIX—is specialized for VIX prediction. Applying this model to external indices would likely require modifications to input structure or model design and is thus reserved as a dedicated topic for follow-up studies.

Author Contributions

Conceptualization, S.-Y.C.; Data curation, D.-H.K.; Formal analysis, D.-H.K. and D.-J.K.; Investigation, D.-H.K., D.-J.K. and S.-Y.C.; Methodology, D.-H.K.; Software, D.-H.K.; Writing—original draft, D.-H.K., D.-J.K. and S.-Y.C.; Writing—review and editing, S.-Y.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RS-2024-00454493) and by Seoul R&BD Program (QR240016) through the Seoul Business Agency (SBA) funded by Seoul Metropolitan Government.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Armagan, I.U. Price prediction of the Borsa Istanbul banks index with traditional methods and artificial neural networks. Borsa Istanb. Rev. 2023, 23, S30–S39. [Google Scholar] [CrossRef]
Zhou, Y.; Xie, C.; Wang, G.J.; Zhu, Y.; Uddin, G.S. Analysing and forecasting co-movement between innovative and traditional financial assets based on complex network and machine learning. Res. Int. Bus. Financ. 2023, 64, 101846. [Google Scholar] [CrossRef]
Ngo, V.M.; Nguyen, H.H.; Van Nguyen, P. Does reinforcement learning outperform deep learning and traditional portfolio optimization models in frontier and developed financial markets? Res. Int. Bus. Financ. 2023, 65, 101936. [Google Scholar] [CrossRef]
Zhang, Y.; Chu, G.; Shen, D. The role of investor attention in predicting stock prices: The long short-term memory networks perspective. Financ. Res. Lett. 2021, 38, 101484. [Google Scholar] [CrossRef]
Qiu, J.; Wang, B.; Zhou, C. Forecasting stock prices with long-short term memory neural network based on attention mechanism. PLoS ONE 2020, 15, e0227222. [Google Scholar] [CrossRef] [PubMed]
Jin, Z.; Yang, Y.; Liu, Y. Stock closing price prediction based on sentiment analysis and LSTM. Neural Comput. Appl. 2020, 32, 9713–9729. [Google Scholar] [CrossRef]
Li, J.; Xu, C.; Feng, B.; Zhao, H. Credit risk prediction model for listed companies based on CNN-LSTM and attention mechanism. Electronics 2023, 12, 1643. [Google Scholar] [CrossRef]
Sang, S.; Li, L. A novel variant of LSTM stock prediction method incorporating attention mechanism. Mathematics 2024, 12, 945. [Google Scholar] [CrossRef]
Luo, A.; Zhong, L.; Wang, J.; Wang, Y.; Li, S.; Tai, W. Short-term stock correlation forecasting based on CNN-BiLSTM enhanced by attention mechanism. IEEE Access 2024, 12, 29617–29632. [Google Scholar] [CrossRef]
Ran, X.; Shan, Z.; Fang, Y.; Lin, C. An LSTM-based method with attention mechanism for travel time prediction. Sensors 2019, 19, 861. [Google Scholar] [CrossRef]
Kim, S.; Kang, M. Financial series prediction using Attention LSTM. arXiv 2019, arXiv:1902.10877. [Google Scholar]
Chen, S.; Ge, L. Exploring the attention mechanism in LSTM-based Hong Kong stock price movement prediction. Quant. Financ. 2019, 19, 1507–1515. [Google Scholar] [CrossRef]
Ghosh, P.; Neufeld, A.; Sahoo, J.K. Forecasting directional movements of stock prices for intraday trading using LSTM and random forests. Financ. Res. Lett. 2022, 46, 102280. [Google Scholar] [CrossRef]
Ali, M.; Khan, D.M.; Alshanbari, H.M.; El-Bagoury, A.A.A.H. Prediction of complex stock market data using an improved hybrid emd-lstm model. Appl. Sci. 2023, 13, 1429. [Google Scholar] [CrossRef]
Bi, J.; Chen, Z.; Yuan, H.; Zhang, J. Accurate water quality prediction with attention-based bidirectional LSTM and encoder–decoder. Expert Syst. Appl. 2024, 238, 121807. [Google Scholar] [CrossRef]
Yang, D.; Li, M.; Guo, J.e.; Du, P. An attention-based multi-input LSTM with sliding window-based two-stage decomposition for wind speed forecasting. Appl. Energy 2024, 375, 124057. [Google Scholar] [CrossRef]
Wang, J.; Cui, Q.; Sun, X.; He, M. Asian stock markets closing index forecast based on secondary decomposition, multi-factor analysis and attention-based LSTM model. Eng. Appl. Artif. Intell. 2022, 113, 104908. [Google Scholar] [CrossRef]
Shah, J.; Vaidya, D.; Shah, M. A comprehensive review on multiple hybrid deep learning approaches for stock prediction. Intell. Syst. Appl. 2022, 16, 200111. [Google Scholar] [CrossRef]
Ouyang, Z.S.; Yang, X.T.; Lai, Y. Systemic financial risk early warning of financial market in China using Attention-LSTM model. North Am. J. Econ. Financ. 2021, 56, 101383. [Google Scholar] [CrossRef]
Wen, X.; Li, W. Time series prediction based on LSTM-attention-LSTM model. IEEE Access 2023, 11, 48322–48331. [Google Scholar] [CrossRef]
Zhang, J.; Ye, L.; Lai, Y. Stock price prediction using CNN-BiLSTM-Attention model. Mathematics 2023, 11, 1985. [Google Scholar] [CrossRef]
Wang, C.; Shen, D.; Li, Y. Aggregate investor attention and Bitcoin return: The long short-term memory networks perspective. Financ. Res. Lett. 2022, 49, 103143. [Google Scholar] [CrossRef]
Weng, F.; Zhang, H.; Yang, C. Volatility forecasting of crude oil futures based on a genetic algorithm regularization online extreme learning machine with a forgetting factor: The role of news during the COVID-19 pandemic. Resour. Policy 2021, 73, 102148. [Google Scholar] [CrossRef]
Lu, X.; Ma, F.; Xu, J.; Zhang, Z. Oil futures volatility predictability: New evidence based on machine learning models. Int. Rev. Financ. Anal. 2022, 83, 102299. [Google Scholar] [CrossRef]
Mohsin, M.; Jamaani, F. A novel deep-learning technique for forecasting oil price volatility using historical prices of five precious metals in context of green financing–A comparison of deep learning, machine learning, and statistical models. Resour. Policy 2023, 86, 104216. [Google Scholar] [CrossRef]
Tiwari, A.K.; Sharma, G.D.; Rao, A.; Hossain, M.R.; Dev, D. Unraveling the crystal ball: Machine learning models for crude oil and natural gas volatility forecasting. Energy Econ. 2024, 134, 107608. [Google Scholar] [CrossRef]
Aras, S. Stacking hybrid GARCH models for forecasting Bitcoin volatility. Expert Syst. Appl. 2021, 174, 114747. [Google Scholar] [CrossRef]
Niu, Z.; Wang, C.; Zhang, H. Forecasting stock market volatility with various geopolitical risks categories: New evidence from machine learning models. Int. Rev. Financ. Anal. 2023, 89, 102738. [Google Scholar] [CrossRef]
Wang, Y.; Andreeva, G.; Martin-Barragan, B. Machine learning approaches to forecasting cryptocurrency volatility: Considering internal and external determinants. Int. Rev. Financ. Anal. 2023, 90, 102914. [Google Scholar] [CrossRef]
Zhu, H.; Bai, L.; He, L.; Liu, Z. Forecasting realized volatility with machine learning: Panel data perspective. J. Empir. Financ. 2023, 73, 251–271. [Google Scholar] [CrossRef]
Di Persio, L.; Garbelli, M.; Mottaghi, F.; Wallbaum, K. Volatility forecasting with hybrid neural networks methods for risk parity investment strategies. Expert Syst. Appl. 2023, 229, 120418. [Google Scholar] [CrossRef]
Bukhari, A.H.; Raja, M.A.Z.; Sulaiman, M.; Islam, S.; Shoaib, M.; Kumam, P. Fractional neuro-sequential ARFIMA-LSTM for financial market forecasting. IEEE Access 2020, 8, 71326–71338. [Google Scholar] [CrossRef]
Yu, X.; Zhang, D.; Zhu, T.; Jiang, X. Novel hybrid multi-head self-attention and multifractal algorithm for non-stationary time series prediction. Inf. Sci. 2022, 613, 541–555. [Google Scholar] [CrossRef]
Jin, Y.; Mao, Y.; Chen, G. DFCNformer: A Transformer Framework for Non-Stationary Time-Series Forecasting Based on De-Stationary Fourier and Coefficient Network. Information 2025, 16, 62. [Google Scholar] [CrossRef]
Chen, B.X.; Sun, Y.L. The impact of VIX on China’s financial market: A new perspective based on high-dimensional and time-varying methods. N. Am. J. Econ. Financ. 2022, 63, 101831. [Google Scholar] [CrossRef]
Velappan, S. Co-volatility dynamics in global cryptocurrency and conventional asset classes: A multivariate stochastic factor volatility approach. Stud. Econ. Financ. 2024, 41, 1023–1043. [Google Scholar] [CrossRef]
Luo, B.; Zuo, P.; Zhu, L.; Hua, W. A Wind Power Density Forecasting Model Based on RF-DBO-VMD Feature Selection and BiGRU Optimized by the Attention Mechanism. Atmosphere 2025, 16, 266. [Google Scholar] [CrossRef]
Sivakumar, M.; George, S.T.; Subathra, M.S.P.; Leebanon, R.; Kumar, N.M. Nine novel ensemble models for solar radiation forecasting in Indian cities based on VMD and DWT integration with the machine and deep learning algorithms. Comput. Electr. Eng. 2023, 108, 108691. [Google Scholar] [CrossRef]
Zuo, G.; Luo, J.; Wang, N.; Lian, Y.; He, X. Two-stage variational mode decomposition and support vector regression for streamflow forecasting. Hydrol. Earth Syst. Sci. 2020, 24, 5491–5518. [Google Scholar] [CrossRef]
Ahmed, I.; Ahmad, M.; Chehri, A.; Jeon, G. A heterogeneous network embedded medicine recommendation system based on LSTM. Future Gener. Comput. Syst. 2023, 149, 1–11. [Google Scholar] [CrossRef]
Hess, A.; Spinler, S.; Winkenbach, M. Real-time demand forecasting for an urban delivery platform. Transp. Res. Part E Logist. Transp. Rev. 2021, 145, 102147. [Google Scholar] [CrossRef]
Jiang, Z.; Yang, S.; Liu, Z.; Xu, Y.; Xiong, Y.; Qi, S.; Pang, Q.; Xu, J.; Liu, F.; Xu, T. Coupling machine learning and weather forecast to predict farmland flood disaster: A case study in Yangtze River basin. Environ. Model. Softw. 2022, 155, 105436. [Google Scholar] [CrossRef]

Figure 1. Comparison of S&P 500 and VIX movements over time.

Figure 2. The diagram of LSTM.

Figure 3. The diagram of Cascade LSTM.

Figure 4. Overview of the VMD-Cascaded LSTM with Attention model.

Figure 5. Out -of-sample MSE over time using a rolling-window backtesting approach.

Figure 6. VMD mode 3 of VIX and S&P500 data.

Figure 7. Prediction results of the VMD-Cascaded LSTM with Attention model.

Figure 8. Comparison of benchmark models based on hyperparameter tuning results.

Figure 9. Prediction residuals over time.

Figure 10. Histogram of prediction residuals.

Table 1. Summary statistics of VIX and S&P 500 data, including normality (J.-B. = Jarque–Bera) and stationarity (ADF) test results.

	Mean	Std. Dev.	Skewness	Kurtosis	J.-B.	J.-B. p-Value	ADF	p-Value
VIX	21.4152	8.2642	2.5211	11.1301	7819.7997	0.0000	−4.1679	0.0007
S&P 500	4258.3165	766.3876	0.2480	−0.0865	13.2813	0.0013	−0.3038	0.9250

Table 2. Hyperparameter search space and optimal values for the first and second layers.

Layer	Hyperparameter	Search Space	Optimal Value
1st Layer (S&P500)	LSTM Units	{32, 48, 64, 80, 96, 112, 128}	32
	Dense Units	{32, 48, 64, 80, 96, 112, 128}	48
	Epochs	{5, 10, 15, 20}	5
	Batch Size	{16, 32, 64}	64
2nd Layer (VIX + Attention)	Dense Units	{32, 48, 64, 80, 96, 112, 128}	48
	LSTM Units	{32, 48, 64, 80, 96, 112, 128}	80
	Attention Heads	{2, 4, 6, 8}	6
	Epochs	{5, 10, 15, 20}	15
	Batch Size	{16, 32, 64}	32

Table 3. Model configuration of the proposed VMD-Cascaded LSTM with Attention architecture.

Component	Configuration
Optimizer	Adam
Learning Rate	0.001
Dropout (1st Layer)	0.2
L2 Regularization (Dense Layer)	Yes
Early Stopping	Patience = 5, monitor = val_loss
Batch Size	64 (1st Layer), 32 (2nd Layer)
Epochs	5 (1st Layer), 15 (2nd Layer)
LSTM Units (1st Layer)	32
Dense Units (1st Layer)	48
LSTM Units (2nd Layer)	80
Dense Units (2nd Layer)	48
Multi-Head Attention Heads	6

Table 4. Performance comparison of the proposed model and benchmark models (MSE with 95% confidence interval).

Model	95% CI(MSE)	MSE	RMSE	MAE
LSTM1	[0.1399, 0.2689]	0.2212	0.4704	0.3929
LSTM2	[0.4812, 0.7402]	0.4995	0.7068	0.5123
VMD LSTM 1 (1,1)	[0.1113, 0.1422]	0.1290	0.3592	0.3142
VMD LSTM 1 (2,1)	[0.0545, 0.0714]	0.0699	0.2644	0.2274
VMD LSTM 2 (1,1)	[0.0894, 0.1170]	0.1138	0.3373	0.2960
VMD LSTM 2 (2,1)	[0.0810, 0.1000]	0.0843	0.2903	0.2598
VMD-Cascaded LSTM	[0.0327, 0.1002]	0.0532	0.2306	0.1305
VMD-Cascaded LSTM with Attention	[0.0015, 0.0030]	0.0018	0.0427	0.0316

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, D.-H.; Kim, D.-J.; Choi, S.-Y. A Variational-Mode-Decomposition-Cascaded Long Short-Term Memory with Attention Model for VIX Prediction. Appl. Sci. 2025, 15, 5630. https://doi.org/10.3390/app15105630

AMA Style

Kim D-H, Kim D-J, Choi S-Y. A Variational-Mode-Decomposition-Cascaded Long Short-Term Memory with Attention Model for VIX Prediction. Applied Sciences. 2025; 15(10):5630. https://doi.org/10.3390/app15105630

Chicago/Turabian Style

Kim, Do-Hyeon, Dong-Jun Kim, and Sun-Yong Choi. 2025. "A Variational-Mode-Decomposition-Cascaded Long Short-Term Memory with Attention Model for VIX Prediction" Applied Sciences 15, no. 10: 5630. https://doi.org/10.3390/app15105630

APA Style

Kim, D.-H., Kim, D.-J., & Choi, S.-Y. (2025). A Variational-Mode-Decomposition-Cascaded Long Short-Term Memory with Attention Model for VIX Prediction. Applied Sciences, 15(10), 5630. https://doi.org/10.3390/app15105630

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Variational-Mode-Decomposition-Cascaded Long Short-Term Memory with Attention Model for VIX Prediction

Abstract

1. Introduction

2. Literature Review

2.1. Financial Data Prediction Using LSTM in Finance

2.1.1. Comparative Evaluation of LSTM and Conventional ML Models for Financial Time Series

2.1.2. Enhancing LSTM Forecasts with Attention Mechanisms

2.1.3. Hybrid LSTM Architectures Incorporating Signal Decomposition and CNN

2.1.4. LSTM Applications in Sentiment-Driven and Cryptocurrency Forecasting

2.2. Financial Volatility Prediction Using Machine Learning in Finance

2.2.1. Machine Learning for Commodity Market Volatility Prediction

2.2.2. Machine Learning Techniques for Cryptocurrency Volatility Forecasting

2.2.3. High-Frequency Data and Hybrid ML Approaches to Financial Volatility Prediction

3. Data Description and Methods

3.1. Data Description

3.2. The Baseline Methodologies

3.2.1. VMD

3.2.2. LSTM

3.2.3. Cascade LSTM

3.2.4. Attention

3.3. Proposed Model

4. Empirical Results

4.1. Experimental Design

4.2. Key Results

4.2.1. Answer to RQ1 and H1

4.2.2. Answer to RQ2 and H2

5. Concluding Remarks

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI