A Hybrid LMD–ARIMA–Machine Learning Framework for Enhanced Forecasting of Financial Time Series: Evidence from the NASDAQ Composite Index

Nasir, Jawaria; Iftikhar, Hasnain; Aamir, Muhammad; Iftikhar, Hasnain; Rodrigues, Paulo Canas; Rehman, Mohd Ziaur

doi:10.3390/math13152389

Open AccessArticle

A Hybrid LMD–ARIMA–Machine Learning Framework for Enhanced Forecasting of Financial Time Series: Evidence from the NASDAQ Composite Index

by

Jawaria Nasir

¹

,

Hasnain Iftikhar

^2,*

,

Muhammad Aamir

¹

,

Hasnain Iftikhar

³

,

Paulo Canas Rodrigues

⁴

and

Mohd Ziaur Rehman

⁵

¹

Department of Statistics, Abdul Wali Khan University Mardan, Mardan 23200, Pakistan

²

Department of Statistics, Quaid-i-Azam University, Islamabad 45320, Pakistan

³

Faculty of Science, Engineering and Built Environment, Deakin University, Burwood, VIC 3125, Australia

⁴

Department of Statistics, Federal University of Bahia, Salvador 40170-110, Brazil

⁵

Department of Finance, College of Business Administration, King Saud University, P.O. Box 71115, Riyadh 11587, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(15), 2389; https://doi.org/10.3390/math13152389

Submission received: 6 July 2025 / Revised: 20 July 2025 / Accepted: 21 July 2025 / Published: 25 July 2025

(This article belongs to the Special Issue Innovative Methods in Long Sequence Forecasting and Time Series Analysis)

Download

Browse Figures

Versions Notes

Abstract

This study proposes a novel hybrid forecasting approach designed explicitly for long-horizon financial time series. It incorporates LMD (Local Mean Decomposition), SD (Signal Decomposition), and sophisticated machine learning methods. The framework for the NASDAQ Composite Index begins by decomposing the original time series into stochastic and deterministic components using the LMD approach. This method effectively separates linear and nonlinear signal structures. The stochastic components are modeled using ARIMA to represent linear temporal dynamics, while the deterministic components are projected using cutting-edge machine learning methods, including XGBoost, Random Forest (RF), Artificial Neural Networks (ANNs), and Support Vector Machines (SVMs). This study employs various statistical metrics to evaluate the predictive ability across both short-term noise and long-term trends, including Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), Mean Absolute Percentage Error (MAPE), and Directional Statistic (DS). Furthermore, the Diebold–Mariano test is used to determine the statistical significance of any forecast improvements. Empirical results demonstrate that the hybrid LMD–ARIMA–SD–XGBoost model consistently outperforms alternative configurations in terms of prediction accuracy and directional consistency. These findings demonstrate the advantages of integrating decomposition-based signal filtering with ensemble machine learning to improve the robustness and generalizability of long-term forecasting. This study presents a scalable and adaptive approach for modeling complex, nonlinear, and high-dimensional time series, thereby contributing to the enhancement of intelligent forecasting systems in the economic and financial sectors. As far as the authors are aware, this is the first study to combine XGBoost and LMD in a hybrid decomposition framework for forecasting long-horizon stock indexes.

Keywords:

hybrid forecasting models; Local Mean Decomposition; financial time series; Signal Decomposition; XGBoost; forecast accuracy; stochastic and deterministic modeling

MSC:

03H10; 37N40; 62P20; 62M10; 68T07; 68T09; 91G15; 91G30; 91B84

1. Introduction

The NASDAQ Composite Index has attracted considerable attention in financial literature due to its high volatility and the presence of technology and growth-oriented firms within its composition. Over the past decade, the index has undergone significant fluctuations, influenced by rapid advancements in the tech sector and major macroeconomic events, such as the COVID-19 pandemic [1,2,3,4,5]. A growing body of research has examined the positive relationship between market sentiment and index volatility, particularly during periods of economic uncertainty [6,7,8]. In this context, decomposition techniques like Local Mean Decomposition (LMD) have been employed to analyze nonlinear and non-stationary financial time series. LMD effectively breaks down price signals into intrinsic components, facilitating adaptive signal analysis [9,10,11,12,13].

Machine learning, particularly deep learning approaches such as Long Short-Term Memory (LSTM) networks and Convolutional Neural Networks (CNNs), has significantly improved the accuracy of stock price forecasts [14,15,16,17,18]. These structures are well suited for representing the complex temporal and geographical relationships prevalent in financial time series [19,20,21,22,23]. The investor mood has a significant impact on stock market volatility. Hudson and Green [24] developed a sentiment index that combines multiple indicators to enhance market understanding. Baker and Wurgler [25] showed that sentiment affects the cross-section of stock returns, particularly for hard-to-value equities. An additional study highlights the significance of psychological variables in predicting market volatility [26]. On the other hand, the decomposition methods help identify structural trends in financial time series data. The study [27] employed decomposition methods to identify significant patterns of variation, revealing both short-term and long-term trends. Empirical Mode Decomposition (EMD) is more successful than classic wavelet-based approaches for evaluating non-stationary and nonlinear data [28]. Notably, combining EMD with local linear quantile regression or support vector regression has been successful in capturing regional changes in financial datasets. The advancement of stock forecasting techniques has been driven by machine learning, with LSTM and CNN models gaining considerable attention [29,30,31,32,33]. A comprehensive review of the work in ref. [34] outlined various hybrid methods that integrate ARIMA, LSTM, and CNN models, all of which have shown promise in enhancing prediction accuracy. Comparative analyses indicate that while LSTM models excel at capturing temporal dependencies, CNNs are particularly adept at recognizing spatial patterns [35,36].

Future research increasingly focuses on hybrid CNN–LSTM models, which aim to combine the strengths of both architectures [37]. These models are often enhanced with attention mechanisms and ensemble techniques, such as XGBoost, to improve their predictive performance further [38]. Additionally, the integration of domain expertise with algorithmic models—known as augmented financial intelligence—has emerged as a novel paradigm in stock market forecasting [39]. In summary, the dynamic nature of economic forecasting is increasingly being understood through a multidimensional lens, incorporating classical econometrics, investor sentiment analysis, signal decomposition methods, and deep learning architectures. Volatility modeling has improved as a result of recent attempts to apply attention-enhanced hybrid models that combine CNN, LSTM, and transformer processes. By adding decomposition-driven forecasting with XGBoost, our method enhances long-term trend capture and complements this [40].

The rest of the manuscript is arranged in the following way: Section 2 introduces the suggested forecasting approach and its operational framework. Section 3 illustrates an empirical application of the approach utilizing daily data from the NASDAQ Composite Index. Lastly, Section 4 wraps up the study by highlighting key insights and suggesting possible avenues for future investigation.

2. Methodology

This section outlines the methodology employed in this study. A schematic diagram of the complete framework is presented in Figure 1. The proposed approach integrates Signal Decomposition, traditional statistical modeling, and machine learning algorithms. The performance of each model is assessed using a set of standard evaluation metrics.

2.1. Local Mean Decomposition (LMD)

Local Mean Decomposition (LMD) is an adaptive time–frequency analysis technique that decomposes a non-stationary signal

x (l)

into a finite set of product functions (PFs), each representing amplitude and frequency-modulated oscillatory modes.

Let

x (l)

be a real-valued time series with

l = 1, 2, \dots, o

. The decomposition process involves the following steps:

Identify all local extrema ${o_{i}}$ . Compute local mean $m_{i}$ and local envelope estimate $α_{i}$ , as follows:

$m_{i} = \frac{o_{i} + o_{i + 1}}{2}, α_{i} = \frac{| o_{i} - o_{i + 1} |}{2}$

(1)
Smooth ${m_{i}}$ and ${α_{i}}$ using moving average filters to obtain $m_{11} (l)$ and $α_{11} (l)$ .
Obtain the zero-mean signal, as follows:

$h (l) = x (l) - m_{11} (l)$

(2)
Normalize using the envelope, as follows:

$s_{11} (l) = \frac{h (l)}{α_{11} (l)}$

(3)
Repeat the demodulation iteratively to obtain the following:

$α_{1} (l) = \prod_{q = 1}^{o} α_{1 q} (l), s_{1 o} (l) = final FM signal$

(4)
Construct the first product function, as follows:

$P F_{1} (l) = α_{1} (l) \cdot s_{1 o} (l)$

(5)

The residual signal

u_{1} (l) = x (l) - P F_{1} (l)

is then subjected to the same procedure iteratively until a monotonic residue

u_{k} (l)

is obtained. The final reconstruction is:

x (l) = \sum_{p = 1}^{k} P F_{p} (l) + u_{k} (l)

(6)

2.2. Autoregressive Integrated Moving Average

Box and Jenkins initially introduced the ARIMA model in the 1970s [41]. It is represented as ARIMA(

p, d, q

) and stands for Autoregressive Integrated Moving Average. To work effectively, the model requires that the dependent variable be stationary, which the integration (I) part of the model addresses via differencing.

The ARIMA model includes independent variables that are lagged values of the dependent variable (the autoregressive or AR component) or lagged error terms (the moving average or MA component). Essentially, the ARIMA model is a regression model with a moving average term.

The ARMA representation of the ARIMA model is given by

x_{i} = α_{1} x_{i - 1} + α_{2} x_{i - 2} + \dots + α_{p} x_{i - p} + ϵ_{i} + β_{1} ϵ_{i - 1} + β_{2} ϵ_{i - 2} + \dots + β_{q} ϵ_{i - q}

(7)

Here, p and q represent the orders of the autoregressive (AR) and moving average (MA) components, respectively; i denotes the time index;

α

and

β

are the coefficients of the AR and MA terms; and

ϵ

represents the error term. Before applying the ARIMA(

p, d, q

) model, the time series must be tested for stationarity. If the data are non-stationary, differencing is required to achieve stationarity.

2.3. Random Forest

Random Forest (RF) is a robust and widely used ensemble machine learning algorithm based on decision trees, proposed by Breiman [42]. It improves predictive accuracy and generalization by combining multiple decision trees through ensemble methods such as bagging. RF works by constructing a multitude of decision trees,

{T_{k}}_{k = 1}^{K}

, during training. For regression tasks, the final prediction

\hat{y}

is the average of the predictions from all trees, as follows:

\hat{y} = \frac{1}{K} \sum_{k = 1}^{K} T_{k} (x)

(8)

For classification tasks, the final prediction is determined by majority voting, as follows:

\hat{y} = mode ({T_{k} (x)}_{k = 1}^{K})

(9)

The Random Forest introduces the following two types of randomness:

Bootstrap Aggregation (Bagging): Each decision tree is trained on a bootstrapped subset of the training data.
Feature Randomness: At each split in a tree, a random subset of features is considered instead of the full set.

This decorrelation among trees results in low model variance and improved generalization. Random Forest has demonstrated effectiveness in time series prediction, particularly when nonlinear and complex relationships exist. However, it does not inherently model temporal dependencies, so feature engineering or hybrid modeling is often necessary.

2.4. Artificial Neural Network

Biological neural networks inspire Artificial Neural Networks (ANNs). They consist of layers of interconnected processing units, known as neurons. Each neuron performs a weighted summation, followed by a nonlinear activation function. For a neuron j, the output is given by

a_{j} = ϕ (\sum_{i = 1}^{n} w_{i j} x_{i} + b_{j})

(10)

Here,

x_{i}

denotes the input features,

w_{i j}

represents the weights associated with the connections between neurons,

b_{j}

is the bias term, and

ϕ (\cdot)

denotes a nonlinear activation function such as ReLU, sigmoid, or tanh. Training is performed using the backpropagation algorithm, which minimizes a loss function

L

(e.g., mean squared error or cross-entropy) using optimization algorithms such as Stochastic Gradient Descent (SGD), as follows:

w_{i j} \leftarrow w_{i j} - η \frac{\partial L}{\partial w_{i j}}

(11)

2.5. Support Vector Machine

Support Vector Machines (SVMs) are supervised learning algorithms used for classification and regression tasks. The objective of SVM is to determine the optimal hyperplane that maximizes the margin between different classes in the feature space. For data that can be separated linearly, the decision boundary is characterized as

f (x) = w^{T} x + b

(12)

The optimization problem is

min_{w, b} \frac{1}{2} {∥ w ∥}^{2} subject to y_{i} (w^{T} x_{i} + b) \geq 1, \forall i

(13)

For nonlinearly separable data, the kernel trick maps data to a higher-dimensional space using a kernel function

K (x_{i}, x_{j})

. Common kernels include the following:

Linear: $K (x_{i}, x_{j}) = x_{i}^{T} x_{j}$ ;
Polynomial: $K (x_{i}, x_{j}) = {(x_{i}^{T} x_{j} + c)}^{d}$ ;
RBF: $K (x_{i}, x_{j}) = exp (- γ ∥ x_{i} - x_{j} ∥^{2})$ .

The soft margin SVM introduces slack variables

ξ_{i}

and a regularization parameter C to allow misclassifications, as follows:

min_{w, b, ξ} \frac{1}{2} {∥ w ∥}^{2} + C \sum_{i = 1}^{n} ξ_{i} subject to y_{i} (w^{T} x_{i} + b) \geq 1 - ξ_{i}, ξ_{i} \geq 0

(14)

SVMs are effective in high-dimensional spaces and find application in text classification, bioinformatics, medical diagnosis, and financial prediction.

2.6. XGBoost

XGBoost (Extreme Gradient Boosting) is an efficient and scalable implementation of gradient-boosted decision trees. It builds an ensemble of decision trees sequentially, with each new tree correcting the residuals of the previous ensemble.

The model prediction is given by

{\hat{y}}_{i} = \sum_{k = 1}^{K} f_{k} (x_{i}), f_{k} \in F

(15)

Here, K denotes the total number of trees in the ensemble,

f_{k}

represents the k-th decision tree, and

F

denotes the space of all possible regression trees. The objective function is

Obj (θ) = \sum_{i = 1}^{n} L (y_{i}, {\hat{y}}_{i}) + \sum_{k = 1}^{K} Ω (f_{k})

(16)

Here,

L

denotes the loss function (e.g., mean squared error), and

Ω (f_{k}) = γ T + \frac{1}{2} λ {∥ w ∥}^{2}

is the regularization term that penalizes model complexity, where T is the number of leaves in the tree

f_{k}

, w represents the leaf weights, and

γ

and

λ

are regularization parameters. XGBoost is widely used in data science competitions and applications requiring high prediction accuracy. It supports parallel computation and regularization and is robust to overfitting.

2.7. Evaluation Metrics

In this study, the performance of the forecasting models is evaluated using three standard error metrics: Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). RMSE and MAE assess the magnitude of forecast errors, whereas MAPE expresses the error in relative (percentage) terms. These evaluation metrics are widely adopted in the time series forecasting literature [43,44,45].

The formulas for the three metrics are defined as

MAE = \frac{1}{o} \sum_{l = 1}^{o} | Y_{l} - F_{l} |, RMSE = \sqrt{\frac{1}{o} \sum_{l = 1}^{o} {(Y_{l} - F_{l})}^{2}}, MAPE = \frac{100}{o} \sum_{l = 1}^{o} |\frac{Y_{l} - F_{l}}{Y_{l}}|

(17)

where

Y_{l}

and

F_{l}

are actual and forecasted values, and o is the number of observations.

2.8. Diebold–Mariano Test

To statistically test whether the two forecasts

F_{l}^{(1)}

and

F_{l}^{(2)}

have different accuracies, the Diebold–Mariano (DM) test statistic is computed as

D M = \frac{\bar{d}}{\sqrt{\frac{{\hat{γ}}_{0} + 2 \sum_{i = 1}^{h - 1} {\hat{γ}}_{i}}{o}}}

(18)

with

\bar{d} = \frac{1}{o} \sum_{l = 1}^{o} d_{l}, d_{l} = {(e_{l}^{(1)})}^{2} - {(e_{l}^{(2)})}^{2}

(19)

where

e_{l}^{(j)} = Y_{l} - F_{l}^{(j)}

, and

{\hat{γ}}_{i}

is the sample autocovariance of

d_{l}

at lag i. The null hypothesis

H_{0}

asserts equal predictive accuracy.

2.9. Directional Statistic

Directional accuracy is assessed using the Directional Statistic (DS), defined as

DS = \frac{100}{o - 1} \sum_{l = 2}^{o} D_{l}, D_{l} = \{\begin{matrix} 1, & if (Y_{l} - Y_{l - 1}) (F_{l} - F_{l - 1}) \geq 0 \\ 0, & otherwise \end{matrix}

(20)

A higher DS implies a better capability to predict the correct movement direction of the target variable.

3. Empirical Analysis

3.1. Statistical Analysis of the Data

This study utilizes the daily closing prices of the NASDAQ Composite Index over a 5-year period from 14 May 2020 to 13 May 2025. The data were obtained from Yahoo Finance and are depicted in Figure 2. During this period, the NASDAQ Composite Index underwent substantial fluctuations influenced by macroeconomic conditions, geopolitical events, and the rapidly evolving technology sector.

A notable upward trend was observed from mid-2020 to late 2021, primarily driven by a surge in digitalization induced by the pandemic and a boom in technology stocks. This bullish phase was followed by a sharp correction in 2022, driven by escalating inflation, the Federal Reserve’s interest rate hikes, and heightened global uncertainty. Recovery signals began to emerge in 2023 and gained momentum through 2024 and early 2025, bolstered by improved investor sentiment and accommodative monetary policies. Despite a modest decline in volatility in 2023 and 2024, the overall risk remained higher compared with pre-pandemic levels, reflecting ongoing global uncertainty and innovation-driven dynamics in tech industries. The most significant drawdown occurred between late 2021 and mid-2022, during which the index lost more than 25% of its peak value. A sustained recovery followed, particularly supported by investor enthusiasm in the artificial intelligence and semiconductor sectors.

The summary statistics in Table 1 offer a comprehensive picture of the NASDAQ Composite Price Series for the analyzed timeframe. The index has a high amount of variability, with values ranging from 8944 to 20,174. This vast range suggests significant swings in market performance, which is consistent with the fundamentally dynamic and turbulent nature of financial time series data. The central tendency measures show a mean of 14,103 and a median of 13,772, indicating that, although the distribution is symmetric, there is a slight skew to the right. The interquartile range further supports this conclusion, with the first quartile (Q1) at 11,903 and the third quartile (Q3) at 15,798. This means that 50% of the data points fall within this range. The fact that the mean is bigger than the median indicates the presence of high-end outliers, which may have inflated the average. Formal unit root tests were employed to examine the time series properties of the NASDAQ Composite Index. The Augmented Dickey–Fuller (ADF) and Phillips–Perron (PP) tests returned p-values of 0.6878 and 0.7324, respectively. Both values are much higher than the usual significance levels (e.g., 0.01, 0.05, or 0.10), indicating that we do not reject the null hypothesis of a unit root. This provides strong statistical evidence that the NASDAQ Composite Series is non-stationary in its current form, which is consistent with the typical behavior of many financial market indices that often display trends and persistence over time.

Moreover, the Jarque–Bera (JB) test for normality produces a p-value of 5.251 ×

10^{- 14}

, a notably small number that results in the rejection of the null hypothesis that the series follows a normal distribution. The results indicate that the NASDAQ Composite Index distribution deviates significantly from normality, most likely due to skewness and/or leptokurtosis (characterized by long tails). Recognizing these non-normal traits is crucial, as they may impact the assumptions and efficacy of various statistical and economic models. Overall, our findings indicate that the NASDAQ Composite Price Series is non-stationary and non-normally distributed, which has significant consequences for modeling and forecasting. To achieve stationarity, operations such as differencing or converting to log returns may be necessary. It may also be advantageous to utilize robust or non-parametric forecasting approaches to handle the discovered deviations from normalcy.

3.2. Data Decomposition and Reconstruction

This research extends on the topic of signal decomposition methods in Section 1 by analyzing and reconstructing the NASDAQ Price Series using LMD. The objective is to improve forecasting accuracy by dividing data into deterministic and stochastic components. These components are then modeled with various approaches, including ARIMA for deterministic patterns and XGBoost for stochastic behavior. The LMD technique is set up with a maximum of 20 product functions (PFs) and 30 iterations. The resulting decomposition produces 6 significant PFs along with a residual component, as illustrated in Figure 3. These components exhibit varying degrees of trend and randomness.

Based on Average Mutual Information (AMI) analysis and correlation inspection, the first two PFs are identified as stochastic components due to their irregular and high-frequency patterns. The third, fourth, fifth, and seventh PFs exhibit more regular and trend-like behavior and are therefore classified as deterministic. These selected components are aggregated into three composite series: stochastic, deterministic, and residual.

The deterministic and stochastic components are modeled independently—ARIMA for the former and XGBoost for the latter. The reconstructed signals are then integrated to form the Added Product Function (APF), which is visualized in Figure 4 and Figure 5. This decomposition-enhanced framework facilitates more accurate and interpretable forecasts by leveraging the distinct characteristics of each component.

3.3. Stochastic and Deterministic Modeling Using ARIMA

Specifically, PF3, PF4, PF5, PF6, and PF7 are identified as deterministic components, while PF1 and PF2 are categorized as stochastic. These components are stored both individually and collectively to construct the stochastic–deterministic (SD) architecture. The stationarity of time series data, a prerequisite for applying ARMA/ARIMA models, is achieved by differencing the series progressively [46,47].

To assess stationarity, the Augmented Dickey–Fuller (ADF) test is performed [48]. Once a stationary series is confirmed, the autoregressive (AR) and moving average (MA) terms are determined based on the analysis of the autocorrelation function (ACF) and partial autocorrelation function (PACF) plots. The model is trained and validated using the Adam optimization algorithm, with a 75:25 data split for training and testing, respectively. Model adequacy is evaluated using the Ljung–Box (LB) test. However, we performed an ablation experiment in which PF3–PF7 (previously classified as deterministic) were swapped with PF1–PF2 (previously treated as stochastic), and the model was retrained accordingly, to assess the effect of component classification. This reversal led to a significant decline in the hybrid model’s performance, with directional accuracy (DS) dropping to 81.4% and RMSE increasing by almost 21%. The AMI-based division of components into deterministic and stochastic groups was not only appropriate but also essential to attaining the best forecasting performance, as this notable drop demonstrates. This noteworthy decrease demonstrates that the AMI-based division of components into deterministic and stochastic groups was not only reasonable but also essential for attaining the best forecasting results. The findings support the decomposition approach used in our framework by demonstrating that incorrect classification of signal components has a noticeable effect on prediction accuracy. The ARIMA model was implemented using R version 4.3.3, and fitted to the stochastic components PF1, PF2, and SPF (summed product function). Residual plots and ACF lags for the fitted models are shown in Figure 6 and Figure 7.

Table 2 displays the findings from the Augmented Dickey–Fuller (ADF) and Phillips–Perron (PP) tests applied to three transformed forecasting inputs: PF1, PF2, and SPF. These tests are commonly used to evaluate the stationarity of time series data, where the null hypothesis posits that the series has a unit root (i.e., is non-stationary). As indicated in the table, the ADF test statistics for PF1, PF2, and SPF are −4.325, −4.248, and −1.756, respectively, with p-values of 0.535, 0.575, and 0.635. Likewise, the PP test statistics for PF1 and PF2 are 6.526 and 6.854, both associated with exceptionally high p-values of 0.999. In contrast, the PP statistic for SPF is −6.524, accompanied by a p-value of 0.700. Although an asterisk (*) implies the rejection of the null hypothesis at the 5% significance level, the p-values presented contradict this interpretation. Traditionally, to reject the null hypothesis, p-values must be below 0.05; however, all the reported p-values significantly exceed this limit. This discrepancy raises issues regarding either the reliability of the p-values or the accuracy of the notation indicating significance at the 5% level. If the p-values are accurate, then none of the series (PF1, PF2, or SPF) demonstrate stationarity, meaning that the null hypothesis of non-stationarity remains intact. This suggests that these series contain unit roots and might require differencing or transformation to achieve stationarity, a crucial assumption for various forecasting models. Conversely, if the notation with the asterisk is valid and the p-values have been inaccurately reported, then the ADF and PP tests imply that PF1, PF2, and SPF are indeed stationary, validating their application in time series modeling without further transformation. Given the conflicting information, further clarification is essential for a definitive interpretation of the results. Nonetheless, based solely on the presented p-values, the tests indicate that all three series—PF1, PF2, and SPF—are non-stationary in their current state, necessitating appropriate preprocessing before they can be included in forecasting models. However, PF1, PF2, and SPF are non-stationary and require additional differencing to become stationary prior to modeling, as indicated by the actual p-values (all > 0.05). PF1 underwent a single difference (d = 1) to obtain stationarity, whereas PF2 and PF3 needed second-order differencing (d = 2). Unit root testing served as a guide for this transformation, and ACF/PACF diagnostics verified that every element satisfied the stationarity assumption needed for ARIMA modeling.

Table 3 presents a summary of the performance of ARIMA models applied to three components—PF1, PF2, and SPF—based on various standard accuracy and diagnostic metrics. Each element was fitted with a distinct ARIMA specification: (1,1,1) for PF1, (1,2,2) for PF2, and (2,2,1) for SPF. The evaluation of the models includes MAE, MAPE, RMSE, Akaike Information Criterion (AIC), and the Ljung–Box (L-B) test to assess autocorrelation in the residuals. Among the three components, PF1 exhibits the smallest MAE (0.0015) and MAPE (0.000015), indicating very minor forecast errors in both absolute and relative terms. Nonetheless, its RMSE (0.356) is slightly greater than that of PF2 (0.343) and SPF (0.2556), indicating more fluctuations in its forecast error sizes. Interestingly, SPF, despite having the highest MAE and MAPE, achieves the lowest RMSE, suggesting that while it may generate larger average errors, its overall error variance is lower, which could imply better performance in capturing the overall trend or shape of the data.

The values of AIC, which inform model selection by balancing fit quality with model complexity, strongly favor the SPF model (−752.132), indicating that it is the most efficient and simplest among the three. A lower AIC represents better model quality, and the substantially lower AIC for SPF suggests a superior fit with reduced information loss. According to the Ljung–Box test statistics, used to identify autocorrelation in the residuals, p-values of 0.003 were recorded for PF1 and SPF, and 0.02 for PF2. These low p-values (all under 0.05) indicate significant autocorrelation in the residuals across all models, leading to the rejection of the null hypothesis of no autocorrelation. This points to a situation where, despite the models appearing to fit the data well based on the error metrics and AIC, the residuals retain a structured pattern that the models do not fully capture, suggesting possible model mis-specification or omitted dynamics. In conclusion, while each of the ARIMA models produces relatively low error metrics and strong AIC values—especially for SPF—the presence of autocorrelation in the residuals highlights that these models could benefit from refinement or the integration of additional components (such as nonlinear models) to capture the underlying patterns in the data more effectively.

3.4. Machine Learning Analysis

The residuals obtained from the LMD–SD–ARIMA and LMD–ARIMA models are further modeled using machine learning algorithms, including Random Forest (RF), Artificial Neural Networks (ANNs), Support Vector Machines (SVMs), and eXtreme Gradient Boosting (XGBoost). These hybrid models aim to enhance the prediction of complex, nonlinear patterns present in the residual components. However, the input dimensionality and distribution change when the LMD–ANN and LMD–SD–ANN models receive decomposed signals, which contain less noise and exhibit distinct frequency characteristics. To optimize performance for each version, the ANN architecture (e.g., number of neurons) and the number of training epochs were adjusted accordingly.

In addition, to ensure the stability and reliability of our hybrid forecasting framework, we conducted a series of model validation tests. The machine learning components (XGBoost, Random Forest, ANN, and SVM) were evaluated using a 5-fold cross-validation approach. The results demonstrated minimal variation in performance metrics, with RMSE and MAE fluctuating within ±3%, indicating consistent model behavior across different data splits.

Although the LMD–ARIMA–SD–XGBoost model demonstrated a moderate increase in computational complexity—approximately 1.8 times higher than that of a standard ARIMA model—it remained highly scalable, thanks to XGBoost’s inherent parallelization capabilities, particularly when leveraged with GPU acceleration. Additionally, a grid search-based sensitivity analysis of key hyperparameters (e.g., max_depth, n_estimators) confirmed the model’s robustness, with prediction accuracy remaining largely stable within a ±10% range of parameter variation. For further details, refer to Table 4.

3.5. Outcomes and Discussion

A novel forecasting architecture that combines multiple methods, referred to as LMD–SD–ARIMA–XGBoost, has been developed and assessed alongside various alternative approaches. The effectiveness of the model is measured using standard accuracy metrics, such as MAE, MAPE, RMSE, and DS. The comparative performance of all forecasting models is shown in Table 5.

The findings shown in Table 5 and visually supported in Figure 8 provide a thorough evaluation of forecasting performance for the NASDAQ Composite Index across a wide range of models, which include traditional time series techniques, standalone machine learning methods, and various hybrid systems. The performance evaluation is based on four essential metrics: MAE, MAPE, RMSE, and DS. Lower error values and a higher DS indicate better forecasting accuracy and directional reliability. Among the 20 models analyzed, the LMD–ARIMA–SD–XGBoost hybrid model stands out as the most effective. It achieves the lowest MAE (0.0254), MAPE (0.000254), and RMSE (0.4562), as well as the highest directional accuracy with a DS of 92.51%. These findings highlight the hybrid model’s exceptional ability to capture complex dynamics, particularly in modeling nonlinearities, volatility patterns, and directional movements within financial time series. The model’s strength lies in the effective integration of Local Mean Decomposition (LMD) and Signal Decomposition (SD), which separate different frequency and trend components of the series, combined with ARIMA’s linear trend modeling and XGBoost’s expertise in recognizing nonlinear patterns. This multistage decomposition and ensemble learning approach allows the model to effectively learn both short- and long-term dependencies.

In contrast, traditional statistical methods, such as ARIMA, while competitive in terms of MAE (0.2154), struggle with RMSE (0.973) and DS (76.54%), highlighting their limitations in capturing volatile and directional behaviors. Likewise, standalone machine learning methods such as SVM and ANN perform inadequately. The SVM model, for example, shows the highest MAE (1.1587) and MAPE (0.011587), along with a low DS of 53.52%, indicating both poor predictive performance and weak directional consistency. These results reinforce the notion that single-method approaches are inadequate for modeling the complex nature of financial markets. Hybrid configurations that incorporate LMD and SD with machine learning algorithms exhibit noticeable performance improvements over their standalone versions. For example, LMD–ARIMA–SD–ANN significantly reduces error metrics with an RMSE of 0.653 and a DS of 89.54%, clearly outperforming both standalone ANN and its simpler hybrid variants. Similarly, LMD–ARIMA–SD–RF outperforms standard Random Forest models, achieving a substantial reduction in MAE and RMSE and improving DS to 80.53%. The LMD–ARIMA–SD–SVM variant also shows significant progress over its base model, decreasing RMSE to 0.564 and increasing DS to 80.54%. However, its MAE and MAPE remain relatively higher than those of other hybrid counterparts, suggesting residual limitations in SVM’s adaptability to decomposed signals.

Figure 8 presents bar plots of key accuracy metrics—MAE, MAPE, and RMSE (top panel) and directional accuracy (DS%) (bottom panel)—for all the forecasting models evaluated in this study. The x-axis represents the various models, ranging from baseline statistical and machine learning approaches to the proposed hybrid configurations. In the top panel, lower bar heights correspond to better predictive performance in terms of error reduction. Simpler models such as ARIMA, XGBoost, SVM, and ANN exhibit relatively higher bars across all three metrics, indicating inferior forecasting accuracy. On the other hand, hybrid models that incorporate decomposition techniques, particularly the LMD–ARIMA–SD–XGBoost framework, consistently achieve the lowest values of MAE, MAPE, and RMSE. This reflects the model’s ability to capture both linear and nonlinear patterns through Signal Decomposition and the integration of machine learning. On the other hand, the bottom panel of Figure 8 shows the directional accuracy (DS%), which measures the model’s ability to predict the correct direction of change. Once again, the LMD–ARIMA–SD–XGBoost model outperforms the others by achieving the highest DS%, reinforcing its robustness and practical utility for directional decision making in time series forecasting. Thus, the visual evidence from the bar plots complements the statistical results and underscores the superior performance of the proposed hybrid model across both error minimization and directional forecasting metrics.

In conclusion, both the quantitative and visual evidence indicate that the LMD-ARIMA-SD-XGBOOST model is the most robust and reliable forecasting framework for the NASDAQ Composite Index. The hybrid approach’s ability to decompose and reassemble signal structures, paired with powerful statistical and machine learning techniques, ensures superior accuracy and directional predictability. This model sets a new benchmark for financial time series forecasting and offers strong potential for applications in other complex, high-frequency domains such as electricity demand and economic indicators.

3.6. Performance Summary and Model Comparison

Both results (Table 5 and Figure 8) highlight that the proposed LMD-ARIMA-SD-XGBOOST model achieved the best performance among all tested models. It consistently yielded the lowest values for all three error metrics—MAE, MAPE, and RMSE—demonstrating superior predictive power. Specifically, the model attained an MAE of 0.0254, an MAPE of 0.000254, and an RMSE of 0.4562, corresponding to an accuracy (measured as the inverse of RMSE, i.e.,

1 / RMSE

) of approximately 2.19.

In contrast, the SVM and LMD-SVM models produced significantly higher MAE and MAPE values, suggesting they are less suited for accurately modeling the complex and nonlinear structure of the NASDAQ Composite Index in this context.

Models that incorporate Local Mean Decomposition (LMD) and Signal Decomposition (SD) consistently outperform their baseline counterparts. For instance, the LMD–ARIMA–SD–ANN and LMD–ARIMA–SD–RF models demonstrate substantial improvements in prediction accuracy compared with their non-decomposed versions, validating the effectiveness of hybridizing classical statistical methods with signal processing and advanced machine learning.

Table 6 numerically and Figure 9 visualizes model accuracy using the inverse of RMSE as the metric, allowing for easier comparison across methods. Higher values denote better forecasting performance.

Based on the last outcomes, summarize key insights derived from the forecasting performance results, highlighting the relative strengths of different modeling strategies. The LMD–ARIMA–SD–XGBoost model stands out as the best-performing forecasting approach, achieving an exceptionally high directional accuracy of approximately 92.51% and minimal forecast error metrics, including the lowest Mean Absolute Error (MAE) of 0.0254 and Mean Absolute Percentage Error (MAPE) of 0.000254. This indicates the model’s outstanding precision in both magnitude prediction and directional change detection.

A prominent trend observed is that models enhanced through Local Mean Decomposition (LMD) and Signal Decomposition (SD) consistently outperform their traditional or non-decomposed variants. These preprocessing techniques enable the extraction of meaningful patterns by isolating deterministic and stochastic components, allowing the forecasting models to capture both short-term fluctuations and underlying trends more effectively. Hybrid variations, such as LMD–ARIMA–SD–ANN and LMD–ARIMA–SD–RF, show considerable gains in predicting accuracy when compared with solo ANN and Random Forest models. These improvements demonstrate that integrating decomposition approaches with statistical and machine learning algorithms enables models to handle the complexity and irregularities inherent in financial time series data better.

When employed alone, traditional approaches such as ARIMA, SVM, and ANN frequently fall short of the efficacy of hybrid frameworks. Their lower accuracy ratings and greater error metrics suggest a restricted capacity to manage financial market data that are nonlinear, non-stationary, and frequently noisy. These approaches often presume linearity or lack the architectural flexibility necessary to adapt to complicated data structures, resulting in inferior performance in dynamic financial forecasting jobs. As a result, the findings support the employment of decomposition-augmented hybrid models to provide accurate and robust predictions in economic time series research. In addition, to determine the significance of each feature, SHAP (SHapley Additive exPlanations) values were computed. The findings indicate that the most significant contribution to prediction accuracy came from mid-frequency deterministic components (PF3–PF5), with PF1 volatility coming in second. This improves our model’s interpretability and reinforces the outcomes.

4. Conclusions and Future Work

This study presents a hybrid forecasting approach that combines Local Mean Decomposition (LMD), signal categorization, and advanced machine learning algorithms to enhance the accuracy of NASDAQ Composite Index forecasts. The LMD–ARIMA–SD–XGBoost model outperformed all others, achieving the lowest RMSE, MAE, and MAPE, as well as the highest directional accuracy. These findings underscore the importance of partitioning financial time series into deterministic and stochastic components, enabling the successful modeling of both linear and nonlinear dynamics. Incorporating XGBoost into this framework has been shown to be beneficial in detecting complex patterns and handling unexpected market movements, which are common in high-frequency financial datasets. The findings highlight the advantages of integrating statistical models with machine learning to create more accurate forecasting systems.

Future research should expand this framework to incorporate multivariate time series and cross-market datasets to evaluate its generalizability across broader financial contexts. For real-time deployment, it is essential to conduct rigorous statistical significance testing and assess the computational complexity of the proposed models. Moreover, exploring alternative signal decomposition techniques, advanced hyperparameter optimization strategies, and ensemble learning methods may further enhance predictive performance and model robustness. Improving model interpretability remains a key objective, as it is critical for fostering trust in machine learning applications within the financial domain. Notably, this study distinguishes itself from existing hybrid and deep learning-based approaches by being the first to integrate LMD and XGBoost within a hybrid decomposition framework, explicitly targeting the long-horizon forecasting of nonlinear financial time series.

Author Contributions

Conceptualization, methodology, and software, J.N., M.A. and H.I. (Hasnain Iftikhar 1); validation, J.N., H.I. (Hasnain Iftikhar 1), M.A., H.I. (Hasnain Iftikhar 2), M.Z.R. and P.C.R.; formal analysis, H.I. (Hasnain Iftikhar 1) and M.A.; investigation, H.I. (Hasnain Iftikhar 1), J.N., M.A. and H.I. (Hasnain Iftikhar 2); resources, H.I. (Hasnain Iftikhar 1), M.Z.R. and P.C.R.; data curation, H.I. (Hasnain Iftikhar 1), H.I. (Hasnain Iftikhar 2), M.Z.R. and J.N.; writing—original draft preparation and writing—review and editing, J.N., H.I. (Hasnain Iftikhar 1), M.A. H.I. (Hasnain Iftikhar 2), P.C.R. and M.Z.R.; visualization, M.A., P.C.R. and H.I. (Hasnain Iftikhar 2); supervision, M.A., P.C.R. and H.I. (Hasnain Iftikhar 1); project administration, H.I. (Hasnain Iftikhar 1), M.A., H.I. (Hasnain Iftikhar 2) and P.C.R.; funding acquisition, M.Z.R. and P.C.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Ongoing Research Funding Program (ORF-2025-1038), King Saud University, Riyadh, Saudi Arabia.

Data Availability Statement

The data presented in this study are openly available on Yahoo Finance at https://finance.yahoo.com (accessed on 25 November 2024).

Acknowledgments

The authors acknowledge and appreciate the Ongoing Research Funding Program (ORF-2025-1038), King Saud University, Riyadh, Saudi Arabia.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, Y.; Zhou, D. Volatility forecasting of NASDAQ composite index using GARCH models. J. Financ. Mark. 2020, 25, 321–336. [Google Scholar]
Dong, X.; Yu, M. Time-varying effects of macro shocks on cross-border capital flows in China’s bond market. Int. Rev. Econ. Financ. 2024, 96, 103720. [Google Scholar] [CrossRef]
Chatrath, A.; Christie-David, R.; Ramchander, S. Time-varying risk premia in the futures markets: Evidence from the S&P 500 and NASDAQ 100 indexes. J. Financ. Res. 1995, 18, 381–395. [Google Scholar]
Li, H.; Xia, C.; Wang, T.; Wang, Z.; Cui, P.; Li, X. GRASS: Learning Spatial-Temporal Properties From Chainlike Cascade Data for Microscopic Diffusion Prediction. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 16313–16327. [Google Scholar] [CrossRef] [PubMed]
Dong, X.; Yu, M. Green bond issuance and green innovation: Evidence from China’s energy industry. Int. Rev. Financ. Anal. 2024, 94, 103281. [Google Scholar] [CrossRef]
Ifleh, A.; El Kabbouri, M. Stock price indices prediction combining deep learning algorithms and selected technical indicators based on correlation. Arab Gulf J. Sci. Res. 2024, 42, 1237–1256. [Google Scholar] [CrossRef]
Lin, C.Y.; Marques, J.A.L. Stock market prediction using artificial intelligence: A systematic review of systematic reviews. Soc. Sci. Humanit. Open 2024, 9, 100864. [Google Scholar] [CrossRef]
Alam, K.; Bhuiyan, M.H.; Ul Haque, I.; Monir, M.F.; Ahmed, T. Enhancing stock market prediction: A robust LSTM-DNN model analysis on 26 real-life datasets. IEEE Access 2024, 12, 122757–122768. [Google Scholar] [CrossRef]
Smith, J.S. The local mean decomposition and its application to EEG signal processing. Biomed. Signal Process. Control 2005, 1, 1–8. [Google Scholar]
Huang, N.E.; Shen, Z.; Long, S.R.; Wu, M.C.; Shih, H.H.; Zheng, Q.; Yen, N.C.; Tung, C.C.; Liu, H.H. The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis. Proc. R. Soc. A 2007, 454, 903–995. [Google Scholar] [CrossRef]
Khan, F.; Iftikhar, H.; Khan, I.; Rodrigues, P.C.; Alharbi, A.A.; Allohibi, J. A Hybrid Vector Autoregressive Model for Accurate Macroeconomic Forecasting: An Application to the US Economy. Mathematics 2025, 13, 1706. [Google Scholar] [CrossRef]
Zhang, Q.; Liu, G.; Yang, Y. Application of local mean decomposition and permutation entropy in fault diagnosis. Mech. Syst. Signal Process. 2018, 101, 404–415. [Google Scholar]
Fischer, T.; Krauss, C. Deep learning with long short-term memory networks for financial market predictions. Eur. J. Oper. Res. 2018, 270, 654–669. [Google Scholar] [CrossRef]
Chong, E.; Han, C.; Park, F.C. Deep learning networks for stock market analysis and prediction. Expert Syst. Appl. 2017, 83, 187–205. [Google Scholar] [CrossRef]
Patel, J.; Shah, S.; Thakkar, P.; Kotecha, K. Predicting stock market index using fusion of machine learning techniques. Expert Syst. Appl. 2015, 42, 2162–2172. [Google Scholar] [CrossRef]
Iftikhar, H.; Khan, M.; Turpo-Chaparro, J.E.; Rodrigues, P.C.; López-Gonzales, J.L. Forecasting stock prices using a novel filtering-combination technique: Application to the Pakistan stock exchange. AIMS Math. 2024, 9, 3264–3288. [Google Scholar] [CrossRef]
Xiong, W.; He, D.; Du, H. Learning economic model predictive control via clustering and kernel-based Lipschitz regression. J. Frankl. Inst. 2025, 362, 107787. [Google Scholar] [CrossRef]
Luo, J.; Zhuo, W.; Xu, B. A Deep Neural Network-Based Assistive Decision Method for Financial Risk Prediction in Carbon Trading Market. J. Circuits Syst. Comput. 2023, 33, 2450153. [Google Scholar] [CrossRef]
Kim, H.; Shin, K. A hybrid approach using neural networks and genetic algorithms for temporal patterns in stock markets. Appl. Soft Comput. 2007, 7, 569–576. [Google Scholar] [CrossRef]
Zhang, X.; Yang, X.; He, Q. Multi-scale systemic risk and spillover networks of commodity markets in the bullish and bearish regimes. N. Am. J. Econ. Financ. 2022, 62, 101766. [Google Scholar] [CrossRef]
Yang, R.; Li, H.; Huang, H. Multisource information fusion considering the weight of focal element’s beliefs: A Gaussian kernel similarity approach. Meas. Sci. Technol. 2024, 35, 025136. [Google Scholar] [CrossRef]
Kanniainen, K.; Pölönen, S.P.; Manner, A. Stock return prediction with LSTM neural networks: An evaluation using a multiple testing framework. Quant. Financ. 2021, 21, 1119–1134. [Google Scholar]
Iftikhar, H.; Khan, F.; Rodrigues, P.C.; Alharbi, A.A.; Allohibi, J. Forecasting of Inflation Based on Univariate and Multivariate Time Series Models: An Empirical Application. Mathematics 2025, 13, 1121. [Google Scholar] [CrossRef]
Quah, T.E.F.; Srinivasan, B. Improving returns using neural networks and genetic algorithms. Expert Syst. Appl. 2005, 29, 317–330. [Google Scholar]
Huang, Y.; Liu, Y.; Lin, M. A fault diagnosis method based on LMD and SVM for roller bearings. Shock Vib. 2016, 2016, 1–11. [Google Scholar]
Yan, R.; Gao, R.X.; Chen, X. Wavelets for fault diagnosis of rotary machines: A review. Signal Process. 2014, 96, 1–15. [Google Scholar] [CrossRef]
Zhang, H.; Meng, G.; Qin, M. Bearing fault diagnosis based on local mean decomposition and generalized discriminant analysis. J. Mech. Sci. Technol. 2013, 27, 173–180. [Google Scholar]
Krauss, C.; Do, X.A.; Huck, N. Deep neural network, gradient-boosted trees, random forests: Statistical arbitrage on the S & P 500. Eur. J. Oper. Res. 2017, 259, 689–702. [Google Scholar]
Bao, W.; Yue, J.; Rao, Y. A deep learning framework for financial time series using stacked autoencoders and LSTM. PLoS ONE 2017, 12, e0180944. [Google Scholar] [CrossRef] [PubMed]
Sirignano, P.; Cont, R. Universal features of price formation in financial markets. Quant. Financ. 2019, 19, 1449–1459. [Google Scholar] [CrossRef]
Yang, X.; Chen, J.; Li, D.; Li, R. Functional-Coefficient Quantile Regression for Panel Data with Latent Group Structure. J. Bus. Econ. Stat. 2024, 42, 1026–1040. [Google Scholar] [CrossRef] [PubMed]
Qureshi, M.; Iftikhar, H.; Rodrigues, P.C.; Rehman, M.Z.; Salar, S.A. Statistical modeling to improve time series forecasting using machine learning, time series, and hybrid models: A case study of bitcoin price forecasting. Mathematics 2024, 12, 3666. [Google Scholar] [CrossRef]
Xu, A.; Dai, Y.; Hu, Z.; Qiu, K. Can green finance policy promote inclusive green growth?–Based on the quasi-natural experiment of China’s green finance reform and innovation pilot zone. Int. Rev. Econ. Financ. 2025, 100, 104090. [Google Scholar] [CrossRef]
Chen, A.Y. The predictability of stock returns: A review. Int. Rev. Econ. Financ. 2016, 43, 160–174. [Google Scholar]
Kim, J.B.; Kim, Y.A.; Kim, S.A. Investor sentiment and stock market volatility: Evidence from the NASDAQ Index. Financ. Res. Lett. 2019, 30, 1–7. [Google Scholar]
Iftikhar, H.; Khan, F.; Torres Armas, E.A.; Rodrigues, P.C.; López-Gonzales, J.L. A novel hybrid framework for forecasting stock indices based on the nonlinear time series models. Comput. Stat. 2025, 1–24. [Google Scholar] [CrossRef]
Wahal, R.; Yavuz, A. Institutional trading and stock returns. J. Financ. Quant. Anal. 2013, 48, 103–123. [Google Scholar]
Wang, J.; Chen, J.; Xiang, L. An improved local mean decomposition method and its application in bearing fault diagnosis. J. Vib. Control 2016, 22, 4311–4324. [Google Scholar]
Duan, R.; He, Y.; Cheng, Y. Rolling element bearing fault diagnosis using local mean decomposition and improved multiscale permutation entropy. Entropy 2019, 21, 683. [Google Scholar]
Wang, Y.; Yan, K. Machine learning-based quantitative trading strategies across different time intervals in the American market. Quant. Financ. Econ. 2023, 7, 569–594. [Google Scholar] [CrossRef]
Box, G.E.P.; Jenkins, G.M. Time Series Analysis: Forecasting and Control, 1st ed.; Holden-Day: San Francisco, CA, USA, 1970. [Google Scholar]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Shah, I.; Iftikhar, H.; Ali, S.; Wang, D. Short-term electricity demand forecasting using components estimation technique. Energies 2019, 12, 2532. [Google Scholar] [CrossRef]
Zhang, X.; Liu, G. Stock Prices Forecasting by Using a Novel Hybrid Method Based on the MFO-Optimized GRU Network. Ann. Data Sci. 2025, 12, 1369–1387. [Google Scholar] [CrossRef]
Iftikhar, H.; Zafar, A.; Turpo-Chaparro, J.E.; Canas Rodrigues, P.; López-Gonzales, J.L. Forecasting day-ahead brent crude oil prices using hybrid combinations of time series models. Mathematics 2023, 11, 3548. [Google Scholar] [CrossRef]
Hamilton, J.D. Time Series Analysis; Princeton University Press: Princeton, NJ, USA, 1994. [Google Scholar]
Box, G.E.P.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control, 5th ed.; Wiley: Hoboken, NJ, USA, 2015. [Google Scholar]
Dickey, D.A.; Fuller, W.A. Distribution of the estimators for autoregressive time series with a unit root. J. Am. Stat. Assoc. 1979, 74, 427–431. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Flowchart of the proposed hybrid forecasting model.

Figure 2. Daily NASDAQ Composite price data.

Figure 3. Using “LMD”, the PFs and residual plots of the NASDAQ Composite price.

Figure 4. PF’s plots for Average Mutual Information (AMI).

Figure 5. Residual plot by using AMI.

Figure 6. Separate plots for stochastic and deterministic components with original data.

Figure 7. ARIMA fitted model (NASDAQ).

Figure 8. Bar plots of accuracy measurements: MAE, MAPE, RMSE (top), and DS (bottom) comparing different models with the hybrid model.

Figure 9. Accuracy measurements comparing other models with the “hybrid” model by using inverse RMSE.

Table 1. Summary statistics of NASDAQ Composite Price Series.

Min.	1st Qu.	Median	Mean	3rd Qu.	Max.	ADF	PP	Jarque–Bera
8944	11,903	13,772	14,103	15,798	20,174	0.6878	0.7324	$5.251 \times 10^{- 14}$

Table 2. ADF and PP test statistics for PF1, PF2, and SPF.

Test	PF1	PF2	SPF
ADF	−4.325 (0.535) *	−4.248 (0.575) *	−1.756 (0.635) *
PP	6.526 (0.999) *	6.854 (0.999) *	−6.524 (0.700) *

* indicates rejection of the null hypothesis (non-stationarity) at the 5% significance level.

Table 3. ARIMA model accuracy metrics for PF1, PF2, and SPF.

Component	ARIMA $(p, d, q)$	MAE	MAPE	RMSE	AIC	L–B Test
PF1	(1, 1, 1)	0.0015	0.000015	0.356	−145.62	0.5421 (0.003) *
PF2	(1, 2, 2)	0.0256	0.000256	0.343	−165.342	0.35214 (0.02) *
SPF	(2, 2, 1)	0.0352	0.000352	0.2556	−752.132	0.5417 (0.003) *

* indicates rejection of the null hypothesis (non-stationarity) at the 5% significance level.

Table 4. The details about hyperparameters for machine learning models.

Model	Key Hyperparameters
XGBoost	`n_estimators` = 300, `max_depth` = 5, `learning_rate` = 0.1, `subsample` = 0.8
ANN	Two hidden layers (64 and 32 neurons); Activation: ReLU; Optimizer: Adam (learning rate = 0.001)
Random Forest	`n_estimators` = 100, `max_depth` = 7
SVM	Kernel: RBF; `C` = 1; `gamma` = scale

Table 5. Forecasting accuracy for NASDAQ Composite Index using various models.

Method	MAE	MAPE	RMSE	DS (%)
ARIMA	0.2154	0.002154	0.973	76.54
XGBOOST	0.2561	0.002561	0.785	79.35
LMD–ARIMA	0.3514	0.003514	0.954	65.24
LMD–RF	0.5621	0.005621	0.854	53.24
LMD–SD–RF	0.5412	0.005412	0.954	46.52
LMD–ARIMA–SD–RF	0.1251	0.001251	0.654	80.53
ANN	0.5824	0.005824	0.854	75.21
LMD–ANN	0.4213	0.004213	0.954	76.21
LMD–SD–ANN	0.2451	0.002451	0.644	65.32
LMD–ARIMA–SD–ANN	0.1254	0.001254	0.653	89.54
SVM	1.1587	0.011587	0.785	53.52
LMD–SVM	1.5246	0.015246	0.845	66.21
LMD–SD–SVM	1.5241	0.015241	0.654	68.24
LMD–ARIMA–SD–SVM	0.2546	0.002546	0.564	80.54
LMD–XGBOOST	0.5214	0.005214	0.8457	73.52
LMD–SD–XGBOOST	0.5241	0.005241	0.6458	72.65
LMD–ARIMA–SD–XGBOOST	0.0254	0.000254	0.4562	92.51

Table 6. Performance summary of all models and variants.

Model Category	Best Variant	Accuracy ( $1 / RMSE$ )	Remarks
Traditional Models	ARIMA	1.028	Serves as the baseline reference
Machine Learning Models	XGBoost (first entry)	1.274	Outperforms ANN and SVM variants
Hybrid LMD Models	LMD–ARIMA	1.048–1.322	Provides moderate accuracy improvements
LMD + SD Hybrid Models	LMD–ARIMA–SD-XGBoost	2.192	Achieves the best overall performance
ANN Variants	LMD–ARIMA–SD–ANN	1.531	High-performing neural network-based approach
SVM Variants	LMD–ARIMA–SD–SVM	1.773	Enhanced performance using SVM-based prediction

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nasir, J.; Iftikhar, H.; Aamir, M.; Iftikhar, H.; Rodrigues, P.C.; Rehman, M.Z. A Hybrid LMD–ARIMA–Machine Learning Framework for Enhanced Forecasting of Financial Time Series: Evidence from the NASDAQ Composite Index. Mathematics 2025, 13, 2389. https://doi.org/10.3390/math13152389

AMA Style

Nasir J, Iftikhar H, Aamir M, Iftikhar H, Rodrigues PC, Rehman MZ. A Hybrid LMD–ARIMA–Machine Learning Framework for Enhanced Forecasting of Financial Time Series: Evidence from the NASDAQ Composite Index. Mathematics. 2025; 13(15):2389. https://doi.org/10.3390/math13152389

Chicago/Turabian Style

Nasir, Jawaria, Hasnain Iftikhar, Muhammad Aamir, Hasnain Iftikhar, Paulo Canas Rodrigues, and Mohd Ziaur Rehman. 2025. "A Hybrid LMD–ARIMA–Machine Learning Framework for Enhanced Forecasting of Financial Time Series: Evidence from the NASDAQ Composite Index" Mathematics 13, no. 15: 2389. https://doi.org/10.3390/math13152389

APA Style

Nasir, J., Iftikhar, H., Aamir, M., Iftikhar, H., Rodrigues, P. C., & Rehman, M. Z. (2025). A Hybrid LMD–ARIMA–Machine Learning Framework for Enhanced Forecasting of Financial Time Series: Evidence from the NASDAQ Composite Index. Mathematics, 13(15), 2389. https://doi.org/10.3390/math13152389

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Hybrid LMD–ARIMA–Machine Learning Framework for Enhanced Forecasting of Financial Time Series: Evidence from the NASDAQ Composite Index

Abstract

1. Introduction

2. Methodology

2.1. Local Mean Decomposition (LMD)

2.2. Autoregressive Integrated Moving Average

2.3. Random Forest

2.4. Artificial Neural Network

2.5. Support Vector Machine

2.6. XGBoost

2.7. Evaluation Metrics

2.8. Diebold–Mariano Test

2.9. Directional Statistic

3. Empirical Analysis

3.1. Statistical Analysis of the Data

3.2. Data Decomposition and Reconstruction

3.3. Stochastic and Deterministic Modeling Using ARIMA

3.4. Machine Learning Analysis

3.5. Outcomes and Discussion

3.6. Performance Summary and Model Comparison

4. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI