A Study on Metal Futures Price Prediction Based on Piecewise Cubic Bézier Filtering for TCN

Zhao, Qingliang; Li, Hongding; Zhang, Qiangqiang; Wang, Yiduo

doi:10.3390/app15179792

Open AccessArticle

A Study on Metal Futures Price Prediction Based on Piecewise Cubic Bézier Filtering for TCN

¹

College of Economics and Management, Beijing University of Chemical Technology, Beijing 100029, China

²

China United Network Communications Group Co., Ltd., Beijing 100033, China

³

School of Mathematics and Physics, Beijing University of Chemical Technology, Beijing 100029, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(17), 9792; https://doi.org/10.3390/app15179792

Submission received: 16 August 2025 / Revised: 31 August 2025 / Accepted: 3 September 2025 / Published: 6 September 2025

(This article belongs to the Special Issue Advanced AI and Machine Learning Techniques for Time Series Analysis and Pattern Recognition)

Download

Browse Figures

Versions Notes

Abstract

This study develops an effective forecasting model for metal futures prices with enhanced capability in trend identification and abrupt change detection, aiming to improve decision-making in both financial and industrial contexts. A hybrid framework is proposed that integrates non-uniform piecewise cubic Bézier curves with a temporal convolutional network (TCN). The Bézier–Hurst (BH) decomposition extracts multi-scale trend components, which are then processed by a TCN to capture long-range dependencies. Empirical results show that the model outperforms LSTM, standard TCN, Bézier–TCN, and WD-TCN, achieving higher accuracy in trend detection and abrupt change response. This integration of Bézier-based decomposition with TCN offers a novel and robust tool for forecasting, providing valuable support for risk control and strategic planning in commodity markets.

Keywords:

metal futures prices; piecewise cubic Bézier filtering; TCN; price forecasting; Hurst exponent

1. Introduction

Metals, as essential industrial assets, play a pivotal role in shaping supply–demand dynamics and influencing corporate cost structures. Their futures prices not only affect resource allocation and financial stability but also have significant implications for macroeconomic policy [1]. In recent years, with the deepening integration of global markets, price fluctuations in metal futures have attracted growing attention in sectors such as manufacturing and energy [2,3]. Therefore, developing robust predictive models for metal futures forecasting is essential for improving market efficiency. It also helps mitigate financial risks and supports industrial decision-making [4,5].

Traditional statistical models, such as autoregressive integrated moving average (ARIMA) and generalized autoregressive conditional heteroskedasticity (GARCH), have been widely employed to capture linear patterns and volatility clustering in financial time series [6]. While these models are effective for such purposes, they often struggle to handle nonlinear dynamics and sudden market shocks. To address these limitations, machine learning approaches—including support vector machine (SVM), random forest (RF), and extreme gradient boosting (XGBoost)—have been applied to time series forecasting, as they can capture complex nonlinear relationships and feature interactions [7,8]. Despite their flexibility and strong generalization capabilities, these models still face challenges in fully accounting for temporal dependencies and adapting to rapidly evolving market conditions [9,10,11].

In recent years, deep learning (DL) techniques have achieved notable success in time series prediction. Architectures such as convolutional neural network (CNN), long short-term memory network (LSTM), and transformer-based models have demonstrated strong performance in both feature extraction and temporal modeling [12,13,14]. Among these, temporal convolutional network (TCN) leverages dilated convolutions and residual connections to efficiently capture long-range dependencies and often outperform recurrent networks across various tasks [15,16,17,18,19]. Nevertheless, single-model approaches remain limited in simultaneously addressing linear and nonlinear dynamics across multiple scales. They may also fail to adequately capture both long-term trends and localized abrupt changes.

To overcome these challenges, hybrid frameworks have been proposed that integrate statistical or deep learning models with transform-domain signal decomposition techniques. This enables simultaneous modeling of linear and nonlinear features [20,21]. Such hybrid strategies have demonstrated superior capability in capturing abrupt fluctuations, highlighting the importance of multi-scale, data-driven approaches in time series forecasting [22,23,24].

Building on this idea, hybrid deep learning models have been further explored by combining multiple neural network architectures with signal decomposition methods. For instance, wavelet decomposition (WD) and empirical mode decomposition (EMD) are commonly used to extract trend, cyclical patterns, and noise components, which are then processed by deep learning models. These decomposition–DL frameworks adapt well to highly noisy and nonlinear environments [22,23,25,26,27], while enhancing sensitivity to sudden price changes and overall forecasting robustness [24,28,29]. Overall, hybrid DL approaches have demonstrated strong performance in commodity and financial forecasting, particularly in handling noisy, high-dimensional, and volatile market data [25].

Filtering can be viewed as a special case of signal decomposition, projecting the original series onto subspaces or frequency bands to separate informative signals from noise [30]. Unlike traditional filtering techniques such as moving averages or Fourier-based filters, Bézier curves offer smoothness, local controllability, and robustness, making them well-suited for financial signal smoothing and noise reduction. Consequently, Bézier curves have recently gained attention and have been successfully applied in signal modeling, smoothing, and denoising-related tasks [31,32]. In this study, we employ a non-uniform piecewise cubic Bézier curve primarily due to its smoothing capability and local controllability, facilitating their integration with deep learning architectures.

Metals, as indispensable raw materials in construction, manufacturing, and electronics, are characterized by high demand and pronounced price volatility. Accurate forecasting of metal futures prices is, therefore, crucial for optimizing resource allocation in downstream industries, mitigating operational risks, and supporting governmental macroeconomic policies [2,33,34]. Motivated by these advances, this study proposes a novel hybrid framework that integrates multi-level decomposition based on non-uniform piecewise Bézier curves and the Hurst exponent in conjunction with a temporal convolutional network. By benchmarking against conventional approaches such as ARIMA, LSTM, and standard TCN, the proposed framework aims to achieve enhanced predictive accuracy and robustness, providing valuable insights for industrial decision-making and policy formulation.

2. Methodology

2.1. Bézier Curve

A Bézier curve is a parametric polynomial curve constructed by the Bernstein basis functions [35]. Bernstein polynomials exhibit advantageous properties for approximating continuous functions, allowing for precise control over the curve’s shape by adjusting the positions of a series of control points. Additionally, within its defined domain, a Bézier curve remains continuously differentiable to the first order, ensuring smoothness and operability. The formal definition is as follows:

Definition 1.

For any

N_{B} + 1

2D or 3D points

\{P_{0}, P_{1}, \dots, P_{N_{B}}\}

, which are called control points, connecting these points sequentially forms a control polygon. The corresponding Bézier curve of degree

N_{B}

is given by:

B (t) = \sum_{i = 0}^{n} P_{i} B_{i, N_{B}} (t), t \in [0, 1]

(1)

where

B_{i, N_{B}} (t)

represents the Bernstein basis function, which is defined as follows:

B_{i, N_{B}} (t) = (\begin{matrix} N_{B} \\ i \end{matrix}) t^{i} {(1 - t)}^{N_{B} - i} = \frac{N_{B}!}{i! (N_{B} - i)!} t^{i} {(1 - t)}^{N_{B} - i}, t \in [0, 1]

(2)

The cubic Bézier curve is the most widely used type of Bézier curve, commonly applied in fields such as computer graphics, font design, and animation trajectory planning. It is defined by four control points

P_{0}, P_{1}, P_{2}, P_{3}

, and its parametric equation is as follows:

B (t) = {(1 - t)}^{3} P_{0} + 3 {(1 - t)}^{2} t P_{1} + 3 (1 - t) t^{2} P_{2} + t^{3} P_{3}, t \in [0, 1]

(3)

A piecewise cubic Bézier curve is a smooth composite curve formed by concatenating multiple cubic Bézier curve segments. Each segment is referred to as a subcurve. Suppose there are

n

cubic Bézier curve segments; the control points of the first segment are

P_{0}, P_{1}, P_{2}, P_{3}

, the second segment’s control points are

P_{3}, P_{4}, P_{5}, P_{6}

, and so forth. The expression for the

i

-th segment (with control points

P_{3 i}, P_{3 i + 1}, P_{3 i + 2}, P_{3 i + 3}

) is given as follows:

B_{i} (t) = {(1 - t)}^{3} P_{3 i} + 3 {(1 - t)}^{2} t P_{3 i + 1} + 3 (1 - t) t^{2} P_{3 i + 2} + t^{3} P_{3 i + 3}, t \in [0, 1]

(4)

Therefore, the entire curve is composed of

N

segments

B_{i} (t)

:

B (t) = \{\begin{array}{l} B_{0} (t), & t \in [0, 1] \\ B_{1} (t), & t \in [0, 1] \\ ⋮ \\ B_{N - 1} (t), & t \in [0, 1] \end{array}

(5)

To ensure that the tangent directions of adjacent curve segments are consistent at the connection points, thereby maintaining

G^{1}

continuity, the following constraint is imposed:

P_{i + 4} - P_{i + 3} = λ_{B} (P_{i + 3} - P_{i + 2}), λ_{B} > 0

(6)

Because the original time indices are often equally spaced integers, directly using them can result in an unreasonable distribution of control points, making it difficult to capture the underlying fluctuation patterns of the data. To better fit the data, this study applies a centripetal parameterization method to reassign the time indices, ensuring that the rate of change in the parameter vector aligns with the actual variation magnitude of the sequence. Given a time series

{\{(x_{i}, y_{i})\}}_{i = 1}^{n}

, where

x_{i}

represents the time index and

y_{i}

denotes the observation at time

x_{i}

, the parameter vector

{u_{i}}_{i = 1}^{n}

is calculated as follows:

\{\begin{array}{l} u_{1} = 0, \\ u_{i} = \frac{\sum_{j = 1}^{i} ∥ y_{j} - y_{j - 1} ∥^{α}}{\sum_{j = 1}^{n - 1} ∥ y_{j} - y_{j - 1} ∥^{α}}, α = \frac{1}{2}, i = 2, 3, \dots, n \end{array}

(7)

The parameter α = 0.5 corresponds to the centripetal parameterization scheme originally introduced in the Catmull–Rom spline family [36]. Among the continuum of parameterization schemes (uniform, α = 0; centripetal, α = 0.5; and chordal, α = 1), the centripetal variant has been mathematically proven to be the only one that guarantees the absence of cusps and self-intersections within each curve segment. It also provides high fidelity to the control points and ensures bounded deviation relative to the control polygon. In this study, we adopt the same parameterization strategy for Bézier curve construction, ensuring numerical stability and geometric reliability.

To obtain the optimal control points that ensure the fitting result closely follows the original data while maintaining smoothness, this study introduces a regularized fitting strategy. Specifically, a second-order difference penalty term is incorporated into the minimization of the fitting error. This approach prevents excessive fluctuations in the control point sequence, thereby regulating the smoothness of the fitted curve. The optimization objective of the fitting problem can be formalized as follows:

m i n \{∥ B D - Y ∥^{2} + λ ∥ Δ^{2} D ∥^{2}\}

(8)

where

B \in ℝ^{n \times n_{c}}

denotes the basis function matrix,

n

is the number of observations, and

n_{c}

is the number of control points.

Y = {[y_{0}, y_{1}, \dots, y_{n - 1}]}^{T}

represents the original sequence.

Δ^{2}

is the second-order difference operator, defined as

Δ^{2} D_{i} = D_{i + 2} - 2 D_{i + 1} + D_{i}

, which is used to penalize non-smooth behavior in the control points.

The regularization coefficient

λ > 0

controls the trade-off between fitting accuracy and curve smoothness.

2.2. Hurst Exponent

The Hurst exponent serves as a crucial indicator for evaluating the long-range dependence present in time series data. It is extensively applied in the analysis of fractal structures and self-similar patterns, particularly within disciplines such as finance and bioinformatics [37]. Owing to its high robustness and minimal reliance on predefined data distributions, rescaled range (R/S) analysis has found broad application across various domains, including financial market analysis, climate science, and network traffic modeling [38]. Accordingly, this research employs rescaled range (R/S) analysis to calculate the Hurst exponent.

In the R/S analysis, the time series is first divided into several subintervals, and the range R and standard deviation S are calculated for each subinterval. To perform the estimation, the time series bias is addressed by using the cumulative deviation series. Given a time series

Y (i)

,

i = 1, 2, \dots, N

, the cumulative deviation series is calculated as follows:

Z (i) = \sum_{j = 0}^{i} (Y (j) - \bar{Y})

(9)

where

\bar{Y}

denotes the average value of the time series, while

Z (i)

refers to the cumulative sum of deviations from the initial time point up to time

i

. This measure captures the overall trend or variability within the time series.

Initially, the time series is segmented into subperiods, each consisting of

k

observations. For each subinterval

k

, the range and standard deviation are calculated. Specifically, the maximum and minimum values within each subinterval are determined, and the range

R (k)

is defined as:

R (k) = m a x (Z (i), i \in [1, k]) - m i n (Z (i), i \in [1, k])

(10)

where

Z (i)

represents the cumulative deviation series within the subinterval.

The standard deviation

S (k)

for each subinterval is calculated as follows:

S (k) = \sqrt{\frac{1}{k} \sum_{i = 1}^{k} {(Z (i) - \bar{Z} (k))}^{2}}

(11)

where

\bar{Z} (k)

denotes the mean of the cumulative deviation series within the subinterval.

For each subinterval

k

, the ratio of the range to the standard deviation is calculated, and the logarithm of this ratio is taken:

\log (\frac{R (k)}{S (k)})

(12)

Finally, based on the subinterval lengths

k

and their corresponding logarithmic rescaled range ratios, the Hurst exponent

H

is estimated through least-squares regression, where the slope of the resulting fitted line represents the value of

H

:

\log (\frac{R (k)}{S (k)}) = H \log (k) + C

(13)

In this context,

k

represents the subinterval length,

R

refers to the range of cumulative deviations, and

S

represents the sample standard deviation. The Hurst exponent takes values within the interval (0, 1). A value of

H = 0.5

indicates that the series behaves like a random walk, whereas

H > 0.5

suggests the presence of long-term memory or persistence in the time series.

Time series often contain trends and volatility structures at multiple scales. While single-layer Bézier curve fitting can reveal the dominant trend, it has certain limitations when dealing with complex multi-layered structures. To tackle this challenge, this study introduces a residual discrimination mechanism based on the Hurst exponent, built on the Bézier curve fitting method, and constructs a multi-layer trend extraction algorithm framework. This method can automatically decompose various trends and volatilities in the time series until the residuals approach white noise, thereby enabling more accurate extraction of the underlying structure of the time series.

The iterative decomposition strategy is constructed as follows:

The original series is fitted using Bézier curves to obtain the first-layer trend component

{\hat{Y}}^{(1)}

, and the residual

R^{(1)} = Y - {\hat{Y}}^{(1)}

is computed.

The Hurst exponent of the residual sequence

R^{(1)}

is calculated. If

H > 0.5

, the Bézier curve fitting is subsequently reapplied to extract the second-level trend component

{\hat{Y}}^{(2)}

, while the residual is correspondingly updated to

R^{(2)}

.

After extracting each trend component, the Hurst exponent

H (R^{(i)})

of the current residual is calculated. If

H_{0}

is below the predefined threshold of 0.5, it is determined that the residual is close to white noise, and the decomposition process is stopped.

2.3. Temporal Convolutional Network

TCN is a type of deep learning model based on CNN architecture, especially used for handling time series data [15]. Compared to CNN and LSTM, a TCN utilizes causal and dilated convolutions to effectively capture long-range dependencies within sequences, while mitigating issues related to gradient vanishing and exploding during training.

(1): Causal convolution

Causal convolutions are specifically structured to prevent the use of future data when forecasting time series, ensuring that predictions rely solely on past and present information. Traditional convolutions, when computing outputs, take into account future inputs, which violates the fundamental assumption in time series modeling that “the current state depends only on the past.” Causal convolutions address this by sliding the convolution kernel strictly in the backward direction, preserving the temporal causality of the sequence. Strictly speaking, a temporal convolutional network is equivalent to a 1D fully convolutional network with causal convolutions. Let the input time series be represented as

x = \{x_{1}, x_{2}, \dots, x_{T}\}

, with a convolution kernel of length

k

. In the context of a one-dimensional causal convolution, the resulting output

y_{t}

at a specific time step

t

can be expressed as follows:

y_{t} = \sum_{i = 0}^{k - 1} f (i) \cdot x_{t - i}, t = k, k + 1, \dots, T

(14)

Let

f (i)

represent the

i

-th parameter of the convolution kernel,

x_{t - i}

represent the input at the

i

-th time step before the current time point

t

, and

y_{t}

denote the

t

-th element of the convolution output sequence. When

t < k

, padding is applied to ensure the alignment of the convolution kernel with the input. Left padding is typically used to preserve causality.

(2): Dilated convolution

Causal convolutions cover only the most recent

k

time steps, making it difficult to leverage information from earlier inputs. Simply stacking more layers increases the receptive field linearly but leads to a sharp rise in the number of parameters and significantly increases training complexity. To address this, dilated convolutions with a dilation factor

d

are introduced. By skipping a fixed number of

d

time steps as the convolution kernel slides, the receptive field grows exponentially while keeping the network depth and parameter count nearly unchanged. This allows the model to capture long-range temporal dependencies even with a relatively shallow network. The structure of a dilated convolution is illustrated in Figure 1. The causal convolution with a dilation factor can be expressed as:

y_{t} = (x *_{d} f) (t) = \sum_{i = 0}^{k - 1} f (i) \cdot x_{t - d \cdot i}

(15)

where

t = d (k - 1) + 1, d (k - 1) + 2, \dots, T

, and

d

is the dilation factor. When

d = 1

, the above equation reduces to the standard causal convolution. To preserve causality and ensure alignment between the kernel and the input,

d (k - 1)

zeros are padded to the left side of the sequence.

(3): Residual connections

To enhance the training stability and representational capacity of deep networks, TCN incorporates residual connection modules. Deep networks often face challenges such as vanishing gradients or degradation during training. The residual structure introduces an identity mapping pathway, allowing gradients to propagate directly to earlier layers, thereby alleviating training difficulties and improving network performance. The structure of the residual connection is illustrated in Figure 2. Let the module input be

x

, and let the nonlinear mapping function within the residual module be

F (\cdot)

, the residual output

r

can then be defined as:

r = σ (F (\cdot) + x)

(16)

In the formula,

σ (\cdot)

denotes the nonlinear activation function. The function

F (\cdot)

is generally composed of two sequential layers of dilated causal convolutions, activation functions, and normalization operations. The residual connection enables the construction of deeper TCN architectures, where each layer can adopt a different dilation factor

d \in 1, 2, 4, \dots

. This allows the network to achieve an exponentially expanding receptive field while keeping the number of parameters manageable, thereby effectively modeling long-range temporal dependencies.

2.4. Piecewise Bézier Filtering

Time series data often exhibit non-stationary behavior and complex nonlinear dynamics. These characteristics make it difficult for traditional analytical methods to fully uncover the underlying structure and evolution mechanisms. To address this challenge, this study proposes an innovative time series decomposition approach that integrates multiscale analysis with piecewise Bézier filtering. The method applies piecewise Bézier filtering to hierarchically decompose the time series, extracting features from sub-sequences at different scales. For each subsequence, the Hurst exponent is estimated using the R/S analysis technique. This exponent is then used as a stopping criterion for the decomposition process. By doing so, the approach effectively captures time series features and provides more accurate support for trend analysis and forecasting.

Assuming the time series is denoted by

y (x_{i}), i = 1, 2, \dots, n

, its decomposition using piecewise Bézier filtering results in the sum of a smooth function

\hat{g} (x_{i})

and a random sequence

ϵ

, which can be expressed as:

y (x_{i}) = \hat{g} (x_{i}) + ϵ_{i}

(17)

By applying Bézier filtering to decompose the time series, we obtain a smooth function

f

and a residual sequence

r

. Supposing the original time series is eventually broken down into

m

hierarchical layers, the decomposition algorithm proceeds through the following computational steps:

First decomposition:

y (x_{i}) = f^{1} (x_{i}) + r_{i}^{1}, i = 1, 2, \dots n

(18)

Second decomposition:

\begin{matrix} y (x_{i}) = f^{1} (x_{i}) + f^{2} (x_{i}) + r_{i}^{2}, i = 1, 2, \dots n \\ \dots \dots \end{matrix}

(19)

m

-th decomposition:

y (x_{i}) = f^{1} (x_{i}) + f^{2} (x_{i}) + \dots + f^{m} (x_{i}) + r_{i}^{m}, i = 1, 2, \dots n

(20)

2.5. Proposed Method

Since most metal futures are inherently non-stationary signals, we propose a decomposition-then-prediction hybrid model called the Bézier–Hurst–TCN (BH-TCN) model to handle such non-stationary signals. The optimal parameters for each Bézier layer to achieve better decomposition performance can be determined by grid search. The cumulative chord length parameterization technique is employed to determine the corresponding parameter

t

for each individual data point. Subsequently, a novel hybrid model is applied to forecast each subsequence, aiming to enhance the overall predictive accuracy. The structural framework of the proposed algorithm is illustrated in Figure 3, with the detailed steps outlined below.

First, the metal futures price series is fitted using a non-uniform piecewise cubic Bézier curve optimized through grid search to extract its trend features. The Hurst exponent of the resulting residual sequence

r_{1}

is then calculated. If this exponent exceeds 0.5, it indicates that the residual sequence exhibits long-term dependence. In this case, the residual sequence undergoes further Bézier fitting. In the final stage, the time series is decomposed into a residual component

r_{n}

and a set of sub-sequences

f_{i} (i = 1, 2, \dots, n)

, capturing different trend characteristics. At this point, the Hurst exponent of

r_{n}

approaches 0.5, indicating that it approximates a pure random process.

Secondly, the TCN model is employed to model and predict each denoised sub-sequence, fully leveraging their inherent temporal features. Since the sequences decomposed using the Bézier method exhibit enhanced local stationarity and distinct trend characteristics, the TCN’s strengths in modeling long-term dependencies and facilitating parallel computation enable it to more effectively capture multi-scale dynamic variations, thereby enhancing forecasting accuracy.

Finally, the predicted results of each trend sub-sequence

f_{i}^{'} (t)

are combined with the predicted results of the residual sequence

r_{i}^{'} (t)

to reconstruct the final output of the model

y' (t) = \sum_{i = 1}^{n} f_{i}^{'} (t) + r_{i}^{'} (t)

. This approach preserves the hierarchical structure of the original series while integrating both trend and fluctuation features, thereby enhancing the characterization and predictive accuracy of metal futures price movements.

2.6. Evaluation Metrics

In this study, we use three loss functions—root mean squared error (RMSE), mean absolute error (MAE), and mean absolute percentage error (MAPE) to evaluate the predictive performance of the model. Smaller error values indicate better predictive performance of the model. The calculation formulas for these three loss functions are as follows:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(21)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - {\hat{y}}_{i} |

(22)

M A P E = \frac{100 %}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}|

(23)

In the formula,

y_{i}

stands for the observed metal futures price at the

i

-th time point,

{\hat{y}}_{i}

indicates the predicted value associated with that same point, and

n

signifies the total count of data samples used.

To more thoroughly assess the degree of correlation between the two measures, the coefficient of determination (

R^{2}

) is used for quantitative analysis, calculated as follows:

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(24)

Additionally, the directional statistic (

D_{s t a t}

) is presented to evaluate the model’s ability to correctly predict the direction of price movements, and it is computed using the following formula:

D_{s t a t} = \frac{1}{N} \sum_{t = 1}^{N} a_{t}

(25)

where

a_{t}

represents the directional prediction accuracy at time

t

, and its specific definition is as follows:

a_{t} = \{\begin{array}{l} 1, & ({\hat{y}}_{t + 1} - y_{t}) (y_{t + 1} - y_{t}) \geq 0 \\ 0, & ({\hat{y}}_{t + 1} - y_{t}) (y_{t + 1} - y_{t}) < 0 \end{array}

(26)

Since the use of loss functions to evaluate model performance may be influenced by outliers in the sequence, The Diebold–Mariano (DM) test is employed to assess the statistical significance of the model’s predictive performance [40].

When the difference in prediction errors between models, as measured by the loss functions, is statistically significant, it can be concluded that there is an essential distinction in their predictive capabilities. This approach offers a more thorough and impartial assessment of each prediction model’s effectiveness, facilitating the selection of the best-performing model.

Let

{f_{t}}

represent the original true sequence, while

{{\hat{f}}_{1 t}}, {{\hat{f}}_{2 t}}

denote the predictions from models 1 and 2, respectively. The prediction errors for models 1 and 2 are given by

e_{1 t} = f_{t} - {\hat{f}}_{1 t}

and

e_{2 t} = f_{t} - {\hat{f}}_{2 t}

, respectively. At the

t

-th time point, the loss function

L (e_{i t}) (i = 1, 2)

for the

i

-th model is defined as follows:

L o s s_{i} = L (e_{i t}), i = 1, 2

(27)

Additionally, the difference in loss between the two models at time

t

is defined as:

k_{t} = L (e_{1 t}) - L (e_{2 t}), t = 1, 2, \dots, T

(28)

The initial hypothesis

H_{0}

is defined as:

H_{0} : E (k_{t}) = 0, t = 1, 2, \dots, T

(29)

That is, the initial hypothesis

H_{0}

assumes that the expected losses of the two models are identical, indicating no significant difference in their predictive capabilities. To test

H_{0}

, the sample mean

\bar{k} = T^{- 1} \sum_{t = 1}^{T} k_{t}

needs to be calculated. The long-memory variance of is represented by

ω^{2}

:

\sqrt{T} (\bar{k} - E (k_{t})) \to^{k} N (0, ω^{2})

(30)

where

ω^{2}

is the

\sum_{j = - \infty}^{\infty} γ_{j}

,

γ_{j} = Cov (k_{t}, k_{t - j})

is the autocovariance function.

In practice, the consistent estimator

{\hat{ω}}^{2}

is used to replace the true variance:

{\hat{ω}}^{2} = {\hat{γ}}_{0} + 2 \sum_{j = 1}^{h} {\hat{γ}}_{j}

(31)

In the formula,

{\hat{γ}}_{j} = T^{- 1} \sum_{t = j + 1}^{T} (k_{t} - \bar{k}) (k_{t - j} - \bar{k})

. Hence, the Diebold–Mariano (DM) test statistic is defined as:

D M = {\hat{ω}}^{- 1} \sqrt{T} \bar{k}

, where, under large sample conditions, this statistic asymptotically follows a standard normal distribution

N (0, 1)

. Considering that directly using the DM test in small samples may lead to significant biases, Harvey proposed a modification to the DM test, referred to as the modified Diebold–Mariano (MDM) test [41]. The modified statistic is defined as:

M D M = \sqrt{\frac{T + 1 - 2 h + h (h - 1) / T}{T}} \times \frac{\bar{k}}{\hat{ω}}

(32)

In this study,

h

denotes the prediction horizon, which is set to one-step ahead (

h = 1

). Additionally, to enhance the accuracy of predictive performance evaluation, the Diebold–Mariano test is modified by substituting

N (0, 1)

with the critical value from Student. During the evaluation phase, mean squared error (MSE), MAE, and MAPE are employed to measure the forecasting performance of each model.

The models’ predictive accuracy is thoroughly assessed by employing RMSE, MAE, and MAPE metrics, which quantify the extent of prediction errors, while the

D_{s t a t}

metric is employed to assess directional consistency. Additionally, the modified Diebold–Mariano test is conducted to determine whether the differences in predictive accuracy between models are statistically significant.

3. Results

3.1. Experimental Environment and Data

The experiment was conducted on a Windows 11 64-bit operating system, utilizing an AMD Ryzen 7 8845 H processor with Radeon 780 M Graphics at 3.80 GHz and 16 GB of RAM. Python 3.11 was used as the programming language, and Matplotlib 3.6.3 was employed for data visualization.

This study uses the closing prices of metal futures as the primary dataset, with data sourced from https://cn.investing.com (accessed on 18 March 2025) and shfe.com.cn (accessed on 15 March 2025). To assess the forecasting capabilities of the BH-TCN model across varying market environments and degrees of price volatility, gold futures data from both the Shanghai Futures Exchange (SHFE) and the New York Commodity Exchange (COMEX) were chosen as the focus of this study. The analysis focuses on the most liquid contracts traded on each exchange. The empirical dataset consists of daily closing prices across all continuous trading days from 2014 to 2024, as shown in Figure 4. In this plot, the x-axis corresponds to the trading day index, while the y-axis shows the associated closing price.

For sample division, 80% of the data are allocated to training, with the remaining 20% reserved for testing. All subsequent model analysis and evaluation are based on this division. The original price series is first normalized and scaled to the [0, 1] range. Normalization facilitates faster updates of neural network parameters, thereby accelerating model convergence and enhancing both training efficiency and predictive performance. The normalization procedure is described by the following equation:

x_{n o r m} = \frac{x - x_{\min}}{x_{\max} - x_{\min}}

(33)

where

x

represents the original data, and

x_{n o r m}

denotes the normalized value.

This study performs a descriptive statistical examination of the closing prices for SHFE gold (SHFE Au) and COMEX gold (COMEX Au) futures (see Table 1). The SHFE gold futures exhibit relatively greater price volatility, as indicated by a higher standard deviation. Both datasets display positive skewness, suggesting that extreme price events above the mean are more frequent. In terms of kurtosis, SHFE gold futures show a light-tailed distribution, whereas COMEX gold futures exhibit a heavy-tailed distribution, implying a higher likelihood of extreme fluctuations in the latter. The Jarque–Bera (J-B) test yields statistically significant results at the 1% level for both time series, thereby rejecting the null hypothesis of normality and indicating that the price distributions exhibit non-Gaussian characteristics. The Ljung–Box Q (10) test reveals significant autocorrelation at the 10th lag for both markets, suggesting the presence of long-range dependence is observed. In summary, the closing prices of SHFE and COMEX gold futures exhibit non-normality, right skewness, light, or heavy tails, and significant autocorrelation—providing a statistical foundation for subsequent time series modeling.

To evaluate whether the closing price series of SHFE and COMEX gold futures are stationary, the Augmented Dickey–Fuller (ADF) unit root test is applied. The p-values obtained for the SHFE and COMEX series are 0.9987 and 0.9949, respectively, both significantly exceeding the standard significance thresholds of 1%, 5%, and 10% (refer to Table 2). In addition, the corresponding ADF test statistics, 2.0484 and 1.0648, exceed the critical values at all standard significance levels. The results of the analysis imply that the null hypothesis of a unit root cannot be rejected, indicating that the price series of both SHFE and COMEX gold futures exhibit non-stationary behavior.

3.2. Decomposition Results

The original time series is decomposed using the BH algorithm, with its parameters optimized via a two-stage grid search to achieve a more reasonable number of decompositions and improved decomposition performance. The number of control points is chosen to strike an optimal balance between overfitting and underfitting within the model. In the first (coarse) stage,

λ

is searched over [0.1, 0.5] with a step size of 0.1, and the best performance is obtained at the boundary (

λ

= 0.1). Therefore, a second (refined) search is conducted over [0.01, 0.1] with a step size of 0.01, which yielded the final optimal value

λ

= 0.01. Model selection is performed by minimizing the MSE on the dataset. The optimal decomposition results for the metal futures prices are shown in Figure 5. Effective decomposition of the original dataset facilitates the extraction of meaningful components, which in turn reduces the complexity of the data. The resulting sub-series exhibit clear linearity, trend patterns, and low complexity, which provide suitable inputs for subsequent TCN-based forecasting. Meanwhile, the final residual sequence approximates white noise, indicating that the decomposition process has effectively removed noise components and enhanced the overall accuracy and stability of the modeling.

3.3. Hyperparameter Selection

In many predictive models, both the input window length and the number of training epochs play crucial roles in determining the model’s performance [42]. In this context, the term “window” denotes the length of the time steps in each input data sequence, whereas “training epoch” refers to the total number of times the model processes the entire training dataset. Although properly adjusting the window size and the number of epochs can enhance prediction accuracy, an excessive number of parameters does not necessarily yield better results; instead, it may cause the model to transition rapidly from underfitting to overfitting. To select hyperparameters reasonably, this study employs a grid search method for model tuning [43]. This approach begins by specifying a range of possible values for each hyperparameter, which are then combined to create multiple candidate configurations. The model is trained under each configuration, and its effectiveness is assessed using the validation dataset. Ultimately, the optimal parameter combination is chosen as the final configuration for the model. Table 3 presents the hyperparameter selections for the various models. This procedure significantly improves both the stability and accuracy of the model’s predictions.

To guarantee a fair and consistent comparison of performance, the hyperparameters of both the proposed BH-TCN model and the benchmark methods are appropriately configured and optimized. Bézier curve fitting and wavelet denoising are introduced as preprocessing steps to construct the Bézier-TCN and wavelet denoising-TCN (WD-TCN) models, respectively, aiming to denoise the original time series and enhance feature representation. Specifically, Bézier fitting follows the uniform 10-segment piecewise strategy implemented by Zhao [32], performing a uniform 10-segment, fifth-order Bézier curve fitting to smooth out noise components in the raw data. For wavelet denoising, the optimal wavelet basis function and decomposition level are selected based on the principles of maximum signal-to-noise ratio (SNR) and minimum MSE, effectively capturing the primary features of the signal.

For conventional time series models and machine learning models, the parameter selection methods are as follows.

In the ARIMA model, the autoregressive order

p

, differencing order

d

, and moving average order

q

are selected by minimizing the Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) within an appropriate search range, thereby balancing model fit and complexity.

For the SVR model, the penalty parameter

C

is determined through grid search within the range [0.1, 1, 10], while the insensitive zone

ϵ

is optimized within the interval [0.01, 0.1]. The radial basis function (RBF) is employed as the kernel function.

Similarly, the XGBoost model’s hyperparameters are tuned using a grid search strategy. The number of weak learners

n_e s t i m a t o r s

is searched within the range [500, 1000], while the

\max_d e p t h

is set within [6, 10, 15], the

l e a r n i n g_r a t e

is selected from [0.001, 0.005, 0.01], and both the subsample ratio

s u b s a m p l e

and column sampling ratio

c o l s a m p l e_b y t r e e

are optimized within the range [0.8, 1.0].

3.4. Predictive Performance

Initially, the RMSE, MAE, and MAPE metrics are computed for each model to assess the extent of prediction errors (refer to Table 4).

Figure 6 provides a visual comparison of these error metrics across all models.

As demonstrated by the experimental results in Table 4, the BH-TCN model consistently outperforms other methods, delivering superior predictive accuracy across both SHFE Au and COMEX Au datasets. In the SHFE Au dataset, the RMSE, MAE, and MAPE values are 5.4694, 3.9083, and 0.7644, respectively, markedly outperforming the other benchmark models. In the COMEX Au dataset, the BH-TCN model similarly exhibits superior performance, with RMSE, MAE, and MAPE values of 21.4865, 16.6375, and 0.7708, respectively. This demonstrates the model’s strong generalization capability. In summary, the BH-TCN model outperforms ARIMA, SVR, XGBoost, LSTM, and TCN across RMSE, MAE, and MAPE metrics. Moreover, compared to other TCN-based improved models (Bézier-TCN, WD-TCN), BH-TCN also exhibits a significant advantage.

Figure 6 presents the histogram results of main models based on MSE, MAE, and MAPE, providing a visual comparison. It can be observed that the histogram height of the BH-TCN model is the lowest across all metrics, indicating that it achieves the best results with minimal prediction errors. This confirms the effectiveness and superior predictive capability of the BH-TCN model for metal futures prices. The

R^{2}

of the BH-TCN model on the SHFE Au is 0.990741, and the

R^{2}

of the BH-TCN model on the COMEX Au is 0.994525, both outperforming other comparative models and demonstrating stronger fitting ability (see Table 5). Additionally, the model obtained

D_{s t a t}

values of 0.542056 and 0.532710, ranking at a relatively high level among all methods, indicating greater stability in prediction results. In contrast, models such as Bézier-TCN, WD-TCN, standard LSTM, and TCN exhibited slightly inferior performance in both

R^{2}

and

D_{s t a t}

metrics, suggesting potential instability when handling complex prediction scenarios. Conventional time series and machine learning models demonstrated significantly weaker performance compared to deep learning and hybrid models on both datasets, with limited fitting ability and lower stability. In comparison, the BH-TCN model more effectively captures the complex dynamic changes in time series, maintaining stable and low prediction errors, thereby exhibiting superior cross-market generalization capability.

To further confirm the superior forecasting capability of the BH-TCN model relative to alternative approaches, this study employs the MDM test utilizing loss functions such as MSE, MAE, and MAPE. The corresponding outcomes are presented in Table 6. According to the MDM test results, all competing models reject the null hypothesis at the 1% significance threshold when compared against the BH-TCN model. This demonstrates a statistically significant difference in predictive accuracy between the BH-TCN model and its counterparts. In essence, the BH-TCN model consistently delivers more precise and dependable predictions than the other evaluated models. This result further corroborates the outstanding performance and robustness of the BH-TCN model in commodity price forecasting tasks from a statistical validation perspective.

The results indicate BH-TCN demonstrates the best overall performance across all datasets and evaluation metrics. Except for one

D_{s t a t}

value, which is slightly lower than that of the WD-TCN model, all other metrics achieve optimal results. The BH-TCN model not only outperforms other comparative methods in terms of

R^{2}

but also maintains a leading position in

D_{s t a t}

, reflecting its strong predictive stability and robustness. Furthermore, the model exhibits excellent generalization ability under various market conditions, effectively adapting to diverse data structures and fluctuation patterns.

The model’s prediction outcomes are depicted in Figure 7. Specifically, Figure 7a illustrates the forecasted trend of gold futures prices on the SHFE test dataset spanning the years 2014 to 2024, while Figure 7b illustrates the predicted trend on the COMEX test set over the same period. In both figures, the x-axis corresponds to the number of data points in the test set, while the y-axis denotes the closing price. The mazarine line represents the predicted values from the BH-TCN model, while the crimson line shows the actual values. It is evident that the BH-TCN model’s predictions outperform those of the other models, whose forecasts exhibit larger deviations from the actual values. Notably, during periods of sharp price fluctuations, the discrepancy between predicted and actual values is more pronounced for the other models, while the BH-TCN model consistently approximates the actual values with greater precision.

The superior performance of the BH-TCN model in capturing sudden price changes can be attributed to its architectural design and multi-scale decomposition strategy. Specifically, the Bézier–Hurst (BH) decomposition isolates trend components at different scales, effectively filtering noise while preserving abrupt local variations. When these multi-scale sub-sequences are fed into the TCN, the dilated causal convolution layers expand the receptive field without increasing computational cost, allowing the model to leverage long-range dependencies and detect sudden shifts simultaneously. Additionally, the hybrid structure of the BH-TCN enhances its adaptability to nonlinear and volatile market dynamics. Consequently, the model not only achieves lower overall prediction errors but also demonstrates increased robustness and sensitivity during periods of sharp price fluctuations, outperforming standard TCN, LSTM, WD-TCN, and Bézier–TCN approaches.

3.5. Robustness Check

To further assess the robustness of the proposed model across different training–test split ratios and varying sizes of prediction datasets, this study performs additional empirical analyses. Initially, the first 90% of the original sequence is allocated as the training set, with the remaining 10% were reserved for testing. The MDM test outcomes for this setup are provided in Table 7, Panel A. Subsequently, the dataset is partitioned such that the first 70% constitutes the training set, while the final 30% is used for testing. The MDM test results corresponding to this configuration are displayed in Table 7, Panel B.

As shown in Table 7, Panel A, under the 90%–10% training–test split, the null hypothesis is rejected at the 1% significance level in 39 instances and at the 5% level in 3 instances. These results demonstrate that the BH-TCN model significantly outperforms the majority of the comparative models.

Similarly, in Table 7, Panel B (70%–30% training–test split), the initial hypothesis is rejected at the 1% significance level in 38 cases, providing additional evidence that the BH-TCN model retains robust predictive performance despite a reduced training set size.

In conclusion, through validation under different training–test split ratios and varying prediction sample sizes, it can be inferred that the BH-TCN model demonstrates notable stability, high adaptability, and superior predictive accuracy, demonstrating clear advantages over other models.

4. Conclusions

This study proposes and validates the BH-TCN model for predicting gold futures prices, demonstrating its effectiveness and superiority in forecasting prices across different markets. Through empirical analysis on the SHFE Au and COMEX Au datasets, the proposed model is comprehensively evaluated from multiple perspectives, including fitting ability, prediction error, model stability, and statistical significance. To improve the model’s capacity for fitting and denoising time series data, non-uniform segmented cubic Bézier curves are incorporated. Additionally, the Hurst exponent is employed to examine the long-term dependence within residual sequences, thereby aiding in the enhancement of the TCN for modeling metal futures price series. Experimental results demonstrate that the BH-TCN model attains the highest performance across various error evaluation metrics, significantly surpassing other methods in terms of prediction accuracy. Further assessment of fitting performance and prediction stability using

R^{2}

and

D_{s t a t}

metrics also reveals that BH-TCN exhibits stronger modeling and generalization capabilities. Moreover, the MDM test results indicate that the BH-TCN model statistically surpasses other models, delivering more precise and dependable predictions. This statistical validation further substantiates the BH-TCN model’s outstanding performance and robustness in commodity price forecasting tasks.

In summary, the BH-TCN model not only demonstrates significant advantages in fitting accuracy and prediction error but also exhibits outstanding performance in model stability, robustness, and generalization capability. It effectively adapts to complex data structures and fluctuation characteristics under different market conditions. This study presents a novel methodology for price forecasting in the metal futures market while also offering valuable methodological insights applicable to modeling and predicting prices in other commodity markets.

Author Contributions

Conceptualization, Q.Z. (Qingliang Zhao), H.L., Q.Z. (Qiangqiang Zhang) and Y.W.; Methodology, Q.Z. (Qingliang Zhao), H.L., Q.Z. (Qiangqiang Zhang) and Y.W.; Investigation, Q.Z. (Qingliang Zhao), H.L., Q.Z. (Qiangqiang Zhang) and Y.W.; Data curation, Q.Z. (Qingliang Zhao), H.L. and Q.Z. (Qiangqiang Zhang); Writing—original draft, H.L.; Writing—review and editing, Q.Z. (Qingliang Zhao), H.L., Q.Z. (Qiangqiang Zhang) and Y.W.; Supervision, Q.Z. (Qingliang Zhao) and Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Sinopec seed program project (325090).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

Author Qiangqiang Zhang was employed by China United Network Communications Group Co., Ltd. The remaining authors (Qingliang Zhao, Hongding Li, and Yiduo Wang) declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Xu, L.; Guo, Z. Effect of Regulation on the Increasing Price of Metals and Minerals to Meet the Challenges in Clean Energy Transitions: A Case Study of China. Sustainability 2022, 14, 764. [Google Scholar] [CrossRef]
Ponomareva, N.; Sheen, J.; Wang, B.Z. Metal and Energy Price Uncertainties and the Global Economy. J. Int. Money Financ. 2024, 143, 103044. [Google Scholar] [CrossRef]
Islam, M.M.; Sohag, K. Transmission and Volatility Mechanisms of Mineral and Oil Prices in the Context of Clean Energy Transition: A Global Perspective. Sustain. Futures 2024, 8, 100384. [Google Scholar] [CrossRef]
Hu, W.; Sun, D. Futures Price Prediction Based on Multidimensional Scaling and Long Short-Term Memory Network. Adv. Appl. Math. 2020, 9, 1798–1804. [Google Scholar] [CrossRef]
Zhang, X.; Yang, K.; Lu, Q.; Wu, J.; Yu, L.; Lin, Y. Predicting Carbon Futures Prices Based on a New Hybrid Machine Learning: Comparative Study of Carbon Prices in Different Periods. J. Environ. Manag. 2023, 346, 118962. [Google Scholar] [CrossRef]
He, T. Gold Futures Price Forecast Based on Artificial Intelligence. Adv. Econ. Manag. Political Sci. 2025, 174, 27–33. [Google Scholar] [CrossRef]
Xu, Y.; Liu, T.; Fang, Q.; Du, P.; Wang, J. Crude Oil Price Forecasting with Multivariate Selection, Machine Learning, and a Nonlinear Combination Strategy. Eng. Appl. Artif. Intell. 2025, 139, 109510. [Google Scholar] [CrossRef]
Nabavi, Z.; Mirzehi, M.; Dehghani, H. Reliable Novel Hybrid Extreme Gradient Boosting for Forecasting Copper Prices Using Meta-Heuristic Algorithms: A Thirty-Year Analysis. Resour. Policy 2024, 90, 104784. [Google Scholar] [CrossRef]
Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. Statistical and Machine Learning Forecasting Methods: Concerns and Ways Forward. PLoS ONE 2018, 13, e0194889. [Google Scholar] [CrossRef] [PubMed]
Kontopoulou, V.I.; Panagopoulos, A.D.; Kakkos, I.; Matsopoulos, G.K. A Review of ARIMA vs. Machine Learning Approaches for Time Series Forecasting in Data Driven Networks. Future Internet 2023, 15, 255. [Google Scholar] [CrossRef]
Parmezan, A.R.S.; Souza, V.M.A.; Batista, G.E.A.P.A. Evaluation of Statistical and Machine Learning Models for Time Series Prediction: Identifying the State-of-the-Art and the Best Conditions for the Use of Each Model. Inf. Sci. 2019, 484, 302–337. [Google Scholar] [CrossRef]
Chen, J.; Guan, A.; Du, J.; Ayush, A. Multivariate Time Series Prediction with Multi-Feature Analysis. Expert Syst. Appl. 2025, 268, 126302. [Google Scholar] [CrossRef]
Sudarshan, V.K.; Ramachandra, R.A.; Ojha, S.; Tan, R.-S. DCEnt-PredictiveNet: A Novel Explainable Hybrid Model for Time Series Forecasting. Neurocomputing 2024, 608, 128389. [Google Scholar] [CrossRef]
Nayak, G.H.H.; Alam, M.W.; Avinash, G.; Kumar, R.R.; Ray, M.; Barman, S.; Singh, K.N.; Naik, B.S.; Alam, N.M.; Pal, P.; et al. Transformer-Based Deep Learning Architecture for Time Series Forecasting. Softw. Impacts 2024, 22, 100716. [Google Scholar] [CrossRef]
Bai, S.; Kolter, J.Z.; Koltun, V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv 2018, arXiv:1803.01271. [Google Scholar] [CrossRef]
Liu, Y.; Huang, X.; Xiong, L.; Chang, R.; Wang, W.; Chen, L. Stock Price Prediction with Attentive Temporal Convolution-Based Generative Adversarial Network. Array 2025, 25, 100374. [Google Scholar] [CrossRef]
Zhu, J.; Su, L.; Li, Y. Wind Power Forecasting Based on New Hybrid Model with TCN Residual Modification. Energy AI 2022, 10, 100199. [Google Scholar] [CrossRef]
Shaikh, A.K.; Nazir, A.; Khalique, N.; Shah, A.S.; Adhikari, N. A New Approach to Seasonal Energy Consumption Forecasting Using Temporal Convolutional Networks. Results Eng. 2023, 19, 101296. [Google Scholar] [CrossRef]
Zhang, Z.; Wang, J.; Wei, D.; Xia, Y. An Improved Temporal Convolutional Network with Attention Mechanism for Photovoltaic Generation Forecasting. Eng. Appl. Artif. Intell. 2023, 123, 106273. [Google Scholar] [CrossRef]
Zolfaghari, M.; Gholami, S. A Hybrid Approach of Adaptive Wavelet Transform, Long Short-Term Memory and ARIMA-GARCH Family Models for the Stock Index Prediction. Expert Syst. Appl. 2021, 182, 115149. [Google Scholar] [CrossRef]
Alaminos, D.; Salas, M.B.; Partal-Ureña, A. Hybrid ARMA-GARCH-Neural Networks for Intraday Strategy Exploration in High-Frequency Trading. Pattern Recognit. 2024, 148, 110139. [Google Scholar] [CrossRef]
Guenoukpati, A.; Agbessi, A.P.; Salami, A.A.; Bakpo, Y.A. Hybrid Long Short-Term Memory Wavelet Transform Models for Short-Term Electricity Load Forecasting. Energies 2024, 17, 4914. [Google Scholar] [CrossRef]
Taheri, S.; Talebjedi, B.; Laukkanen, T. Electricity Demand Time Series Forecasting Based on Empirical Mode Decomposition and Long Short-Term Memory. Energy 2021, 118, 1577–1594. [Google Scholar] [CrossRef]
Niu, H.; Xu, K.; Wang, W. A Hybrid Stock Price Index Forecasting Model Based on Variational Mode Decomposition and LSTM Network. Appl. Intell. 2020, 50, 4296–4309. [Google Scholar] [CrossRef]
Zhang, J.; Liu, H.; Bai, W.; Li, X. A Hybrid Approach of Wavelet Transform, ARIMA and LSTM Model for the Share Price Index Futures Forecasting. N. Am. J. Econ. Financ. 2024, 69, 102022. [Google Scholar] [CrossRef]
Ozupek, O.; Yilmaz, R.; Ghasemkhani, B.; Birant, D.; Kut, R.A. A Novel Hybrid Model (EMD-TI-LSTM) for Enhanced Financial Forecasting with Machine Learning. Mathematics 2024, 12, 2794. [Google Scholar] [CrossRef]
Zhang, Y.; Yan, B.; Aasma, M. A Novel Deep Learning Framework: Prediction and Analysis of Financial Time Series Using CEEMD and LSTM. Expert Syst. Appl. 2020, 159, 113609. [Google Scholar] [CrossRef]
Zhang, K.; Cao, H.; Thé, J.; Yu, H. A Hybrid Model for Multi-Step Coal Price Forecasting Using Decomposition Technique and Deep Learning Algorithms. Appl. Energy 2022, 306, 118011. [Google Scholar] [CrossRef]
Fang, Y.; Wang, W.; Wu, P.; Zhao, Y. A Sentiment-Enhanced Hybrid Model for Crude Oil Price Forecasting. Expert Syst. Appl. 2023, 215, 119329. [Google Scholar] [CrossRef]
Haratian, R. Digital Filtering and Signal Decomposition: A Priori and Adaptive Approaches in Body Area Sensing. Biomed. Eng. Comput. Biol. 2023, 14, 11795972231166236. [Google Scholar] [CrossRef] [PubMed]
Krishna, B.M.; Satyanarayana, S.V.V.; Satyanarayana, P.V.V.; Suman, M.V. Improving Time–Frequency Resolution in Non-Stationary Signal Analysis Using a Convolutional Recurrent Neural Network. SIViP 2024, 18, 4797–4810. [Google Scholar] [CrossRef]
Zhao, Q.; Chen, J.; Feng, X.; Wang, Y. A Novel Bézier LSTM Model: A Case Study in Corn Analysis. Mathematics 2024, 12, 2308. [Google Scholar] [CrossRef]
Zhu, X.; Chen, Y.; Chen, J. Effects of Non-Ferrous Metal Prices and Uncertainty on Industry Stock Market under Different Market Conditions. Resour. Policy 2021, 73, 102243. [Google Scholar] [CrossRef]
Yang, S.; Yang, W.; Zhang, K.; Hao, Y. A Novel System Based on Selection Strategy and Ensemble Mode for Non-Ferrous Metal Futures Market Management. Systems 2023, 11, 55. [Google Scholar] [CrossRef]
Farin, G. Algorithms for Rational Bézier Curves. Comput.-Aided Des. 1983, 15, 73–77. [Google Scholar] [CrossRef]
Yuksel, C.; Schaefer, S.; Keyser, J. Parameterization and Applications of Catmull–Rom Curves. Comput.-Aided Des. 2011, 43, 747–755. [Google Scholar] [CrossRef]
Gómez-Águila, A.; Trinidad-Segovia, J.E.; Sánchez-Granero, M.A. Improvement in Hurst Exponent Estimation and Its Application to Financial Markets. Financ. Innov. 2022, 8, 86. [Google Scholar] [CrossRef]
Mariani, M.C.; Asante, P.K.; Bhuiyan, M.A.M.; Beccar-Varela, M.P.; Jaroszewicz, S.; Tweneboah, O.K. Long-Range Correlations and Characterization of Financial and Volcanic Time Series. Mathematics 2020, 8, 441. [Google Scholar] [CrossRef]
van den Oord, A.; Dieleman, S.; Zen, H.; Simonyan, K.; Vinyals, O.; Graves, A.; Kalchbrenner, N.; Senior, A.; Kavukcuoglu, K. WaveNet: A Generative Model for Raw Audio. arXiv 2016, arXiv:1609.03499. [Google Scholar] [CrossRef]
Diebold, F.X.; Mariano, R.S. Comparing Predictive Accuracy. J. Bus. Econom. Statist. 1995, 13, 253–263. [Google Scholar] [CrossRef]
Harvey, D.I.; Leybourne, S.J.; Whitehouse, E.J. Forecast Evaluation Tests and Negative Long-Run Variance Estimates in Small Samples. Int. J. Forecast. 2017, 33, 833–847. [Google Scholar] [CrossRef]
Bolboacă, R.; Haller, P. Performance Analysis of Long Short-Term Memory Predictive Neural Networks on Time Series Data. Mathematics 2023, 11, 1432. [Google Scholar] [CrossRef]
Zhao, Y.; Zhang, W.; Liu, X. Grid Search with a Weighted Error Function: Hyper-Parameter Optimization for Financial Time Series Forecasting. Appl. Soft Comput. 2024, 154, 111362. [Google Scholar] [CrossRef]

Figure 1. Architecture of a dilated causal convolutional network. Convolutional layers with increasing dilation factors progressively enlarge the receptive field while preserving causality for long-range dependency modeling [39].

Figure 2. Architecture of the residual block in a temporal convolutional network. The left diagram illustrates a causal convolution layer characterized by a kernel size

k

and dilation rate

d

. The central panel depicts a residual block composed of two dilated convolutional layers, incorporating weight normalization, ReLU activation (*), dropout, and an optional

1 \times 1

convolution to align dimensionality when necessary. The right panel shows a stack of residual blocks with exponentially increasing dilations (**).

Figure 2. Architecture of the residual block in a temporal convolutional network. The left diagram illustrates a causal convolution layer characterized by a kernel size

k

and dilation rate

d

. The central panel depicts a residual block composed of two dilated convolutional layers, incorporating weight normalization, ReLU activation (*), dropout, and an optional

1 \times 1

convolution to align dimensionality when necessary. The right panel shows a stack of residual blocks with exponentially increasing dilations (**).

Figure 3. BH-TCN forecasting framework. Original data are decomposed by Bezier filtering, smooth components are predicted by TCNs, and the final output is reconstructed.

Figure 4. Gold futures prices from SHFE (left) and COMEX (right). Both show long-term upward trends with recent volatility and acceleration.

Figure 5. Decomposition of gold futures prices from SHFE (top) and COMEX (bottom). In each set, the first subplot shows the original price series, the middle subplot presents the extracted long-term trend, and the last subplot displays the residual component.

Figure 6. Comparison of loss functions for different models on SHFE Au (a) and COMEX Au (b). Each group of bars shows MAPE, MAE, and RMSE for various deep learning models. ARIMA, SVR, and XGBoost are excluded due to significantly inferior performance.

Figure 7. Comparison of primary model predictions for SHFE Au (a) and COMEX Au (b) futures. Colored lines represent different models. ARIMA, SVR, and XGBoost are omitted due to their significantly inferior fitting performance.

Table 1. Statistical analysis of metal futures prices results.

Index	SHFE Au	COMEX Au
Count	2678	2816
Mean	344.4773	1576.4017
Std	96.7699	387.8429
Max	638.6200	2800.8000
Min	218.1500	1049.6000
Skewness	0.9581 ***	0.8570 ***
Excess Kurtosis	0.2936 ***	0.0739 ***
J-B	4.19 × 10² ***	3.45 × 10² ***
Q (10)	2.62 × 10⁴ ***	2.76 × 10⁴ ***

Note: J-B refers to the Jarque–Bera test statistic for normality, and Q(10) denotes the Ljung–Box Q statistic for testing autocorrelation up to lag 10. *** indicates rejection of the initial hypothesis at the 1% significance levels.

Table 2. ADF test results of metal futures prices.

	SHFE Au	COMEX Au
Statistic	2.0484	1.0648
p-value	0.9987	0.9949
Critical Value (1%)	−3.4328	−3.4327
Critical Value (5%)	−2.8626	−2.8626
Critical Value (10%)	−2.5673	−2.5673

Table 3. Selection of parameters for different models.

Models	Parameter	Range	SHFE AU	COMEX AU
BH-TCN	Window	[3, 6, 12, 18]	12	12
BH-TCN	Epoch	[50, 80, 100, 200, 300]	200	300
Bezier-TCN	Window	[3, 6, 12, 18]	6	12
Bezier-TCN	Epoch	[50, 80, 100, 200, 300]	200	300
WD-TCN	Window	[3, 6, 12, 18]	12	12
WD-TCN	Epoch	[50, 80, 100, 200, 300]	200	300
TCN	Window	[3, 6, 12, 18]	6	12
TCN	Epoch	[50, 80, 100, 200, 300]	200	300
LSTM	Window	[3, 6, 12, 18]	12	12
LSTM	Epoch	[50, 80, 100, 200, 300]	300	300

Table 4. The evaluation results based on the three loss functions are as follows.

Models	SHFE Au			COMEX Au
Models	RMSE	MAE	MAPE	RMSE	MAE	MAPE
BH-TCN	5.4694	3.9083	0.7644	21.4865	16.6375	0.7708
Bezier-TCN	6.3425	4.5729	0.8888	24.4700	17.4123	0.8068
WD-TCN	7.1044	4.9248	0.9224	27.1159	21.8272	0.9921
TCN	8.6397	7.5712	1.5081	38.7923	32.3054	1.4786
LSTM	16.538	11.954	2.1808	50.3358	37.5787	1.6322
ARIMA	98.162	73.678	13.418	579.061	500.992	22.144
SVR	125.22	104.53	19.626	535.242	325.836	13.124
XGBoost	148.09	105.74	19.003	343.264	223.689	9.1642

Table 5. Coefficient of determination and Directional Statistic.

Models	SHFE Au		COMEX Au
Models	R²	D_stat	R²	D_stat
BH-TCN	0.990741	0.542056	0.994525	0.532710
Bezier-TCN	0.984247	0.507993	0.994230	0.504673
WD-TCN	0.987176	0.506217	0.991280	0.534579
TCN	0.984317	0.517757	0.982154	0.520426
LSTM	0.942540	0.538318	0.969952	0.502664

Note: Results for ARIMA, SVR, and XGBoost are omitted due to their significantly inferior performance in this task.

Table 6. MDM test results between the models (Training set: Test set = 8:2) are as follows.

Models	SHFE Au			COMEX Au
Models	MDM₁	MDM₂	MDM₃	MDM₁	MDM₂	MDM₃
Bezier-TCN	2.12 × 10⁻³ ***	3.37 × 10⁻⁴ ***	3.25 × 10⁻⁴ ***	4.77 × 10⁻⁵ ***	8.48 × 10⁻⁴ ***	8.67 × 10⁻⁴ ***
WD-TCN	9.37 × 10⁻⁷ ***	1.86 × 10⁻⁶ ***	3.73 × 10⁻⁵ ***	6.36 × 10⁻⁶ ***	1.60 × 10⁻⁴ ***	3.45 × 10⁻³ ***
TCN	1.23 × 10⁻⁷ ***	9.72 × 10⁻¹⁴ ***	2.50 × 10⁻¹⁶ ***	1.54 × 10⁻²⁶ ***	5.53 × 10⁻⁴¹ ***	6.14 × 10⁻³⁴ ***
LSTM	1.45 × 10⁻¹⁶ ***	3.93 × 10⁻¹⁷ ***	5.28 × 10⁻¹⁶ ***	4.26 × 10⁻³⁴ ***	1.66 × 10⁻⁴⁸ ***	2.80 × 10⁻⁵¹ ***
ARIMA	1.54 × 10⁻⁷³ ***	6.86 × 10⁻¹³⁹ ***	4.09 × 10⁻⁸¹ ***	4.09 × 10⁻⁸¹ ***	7.70 × 10⁻¹⁷⁰ ***	1.55 × 10⁻²¹⁷ ***
SVR	5.41 × 10⁻⁵⁷ ***	6.97 × 10⁻⁸³ ***	4.82 × 10⁻⁹¹ ***	2.41 × 10⁻⁴⁵ ***	6.38 × 10⁻⁵⁸ ***	6.13 × 10⁻⁶¹ ***
XGBoost	5.78 × 10⁻⁵⁶ ***	3.62 × 10⁻⁹⁵ ***	5.07 × 10⁻¹⁰⁹ ***	4.50 × 10⁻⁴⁴ ***	1.76 × 10⁻⁶⁸ ***	2.07 × 10⁻⁷⁵ ***

Note: *** represents statistical significance at the 1% level. MDM₁, MDM₂, and MDM₃ denote the MDM test statistics based on MSE, MAE, and MAPE, respectively. H0: there is no significant difference in predictive performance between the BH-TCN model and the alternative models. H1: BH-TCN model exhibits superior predictive accuracy.

Table 7. MDM test results between models.

Models	SHFE Au			COMEX Au
Models	MDM₁	MDM₂	MDM₃	MDM₁	MDM₂	MDM₃
Panel A: 90% Training Set
Bezier-TCN	3.17 × 10⁻³ ***	2.85 × 10⁻² **	3.08 × 10⁻² **	6.06 × 10⁻³ ***	1.59 × 10⁻² **	6.80 × 10⁻³ ***
WD-TCN	2.34 × 10⁻²¹ ***	3.01 × 10⁻²⁹ ***	7.53 × 10⁻³⁰ ***	4.04 × 10⁻²⁵ ***	1.04 × 10⁻²⁷ ***	1.10 × 10⁻²⁶ ***
TCN	9.04 × 10⁻²¹ ***	4.22 × 10⁻³⁴ ***	3.38 × 10⁻³⁷ ***	1.20 × 10⁻³⁷ ***	1.01 × 10⁻⁷¹ ***	1.37 × 10⁻⁷⁵ ***
LSTM	1.73 × 10⁻³⁷ ***	2.14 × 10⁻⁵⁷ ***	2.19 × 10⁻⁶³ ***	3.12 × 10⁻⁵⁵ ***	2.81 × 10⁻¹²⁰ ***	1.81 × 10⁻¹⁴⁴ ***
ARIMA	2.84 × 10⁻⁴⁵ ***	6.13 × 10⁻⁶⁶ ***	1.77 × 10⁻⁸² ***	4.12 × 10⁻⁸⁴ ***	7.62 × 10⁻¹⁶⁵ ***	1.57 × 10⁻²¹³ ***
SVR	2.36 × 10⁻⁶² ***	6.27 × 10⁻⁵⁸ ***	6.11 × 10⁻⁶¹ ***	2.45 × 10⁻³⁵ ***	6.31 × 10⁻⁵¹ ***	6.12 × 10⁻⁶⁴ ***
XGBoost	6.17 × 10⁻³⁴ ***	1.79 × 10⁻⁵⁸ ***	1.93 × 10⁻⁶⁷ ***	7.71 × 10⁻⁴⁴ ***	8.32 × 10⁻⁶⁷ ***	3.96 × 10⁻⁷³ ***
Panel B: 70% Training Set
Bezier-TCN	3.89 × 10⁻² **	1.26 × 10⁻² **	5.98 × 10⁻³ ***	3.43 × 10⁻³ ***	6.80 × 10⁻² *	6.69 × 10⁻² *
WD-TCN	6.24 × 10⁻²⁵ ***	4.36 × 10⁻²⁹ ***	9.34 × 10⁻²⁶ ***	1.23 × 10⁻⁴ ***	1.22 × 10⁻³ ***	4.67 × 10⁻³ ***
TCN	6.40 × 10⁻²⁷ ***	1.97 × 10⁻³⁷ ***	3.36 × 10⁻⁴⁴ ***	5.94 × 10⁻²⁸ ***	9.85 × 10⁻⁶⁹ ***	4.97 × 10⁻⁷¹ ***
LSTM	1.06 × 10⁻³⁶ ***	3.63 × 10⁻⁶¹ ***	2.49 × 10⁻⁶⁹ ***	1.31 × 10⁻⁴³ ***	2.90 × 10⁻⁸⁹ ***	5.56 × 10⁻¹⁰³ ***
ARIMA	1.37 × 10⁻⁷⁵ ***	6.72 × 10⁻¹³⁴ ***	3.39 × 10⁻¹⁶⁹ ***	4.45 × 10⁻⁶⁵ ***	7.87 × 10⁻¹⁵⁰ ***	1.59 × 10⁻¹⁹⁸ ***
SVR	5.49 × 10⁻⁴⁸ ***	6.82 × 10⁻⁷⁹ ***	4.65 × 10⁻⁹⁷ ***	2.31 × 10⁻⁴⁶ ***	6.32 × 10⁻⁵⁷ ***	6.27 × 10⁻⁶⁴ ***
XGBoost	5.41 × 10⁻⁵⁵ ***	8.71 × 10⁻⁹³ ***	5.77 × 10⁻¹⁰⁶ ***	4.57 × 10⁻⁴⁴ ***	2.73 × 10⁻⁶⁸ ***	3.99 × 10⁻⁷⁵ ***

Note: *, **, and *** represent statistical significance at the 10%, 5%, and 1% levels, respectively. MDM₁, MDM₂, and MDM₃ denote the MDM test statistics based on MSE, MAE, and MAPE, respectively. H0: there is no significant difference in predictive performance between the BH-TCN model and the alternative models. H1: BH-TCN model exhibits superior predictive accuracy.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, Q.; Li, H.; Zhang, Q.; Wang, Y. A Study on Metal Futures Price Prediction Based on Piecewise Cubic Bézier Filtering for TCN. Appl. Sci. 2025, 15, 9792. https://doi.org/10.3390/app15179792

AMA Style

Zhao Q, Li H, Zhang Q, Wang Y. A Study on Metal Futures Price Prediction Based on Piecewise Cubic Bézier Filtering for TCN. Applied Sciences. 2025; 15(17):9792. https://doi.org/10.3390/app15179792

Chicago/Turabian Style

Zhao, Qingliang, Hongding Li, Qiangqiang Zhang, and Yiduo Wang. 2025. "A Study on Metal Futures Price Prediction Based on Piecewise Cubic Bézier Filtering for TCN" Applied Sciences 15, no. 17: 9792. https://doi.org/10.3390/app15179792

APA Style

Zhao, Q., Li, H., Zhang, Q., & Wang, Y. (2025). A Study on Metal Futures Price Prediction Based on Piecewise Cubic Bézier Filtering for TCN. Applied Sciences, 15(17), 9792. https://doi.org/10.3390/app15179792

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Study on Metal Futures Price Prediction Based on Piecewise Cubic Bézier Filtering for TCN

Abstract

1. Introduction

2. Methodology

2.1. Bézier Curve

2.2. Hurst Exponent

2.3. Temporal Convolutional Network

2.4. Piecewise Bézier Filtering

2.5. Proposed Method

2.6. Evaluation Metrics

3. Results

3.1. Experimental Environment and Data

3.2. Decomposition Results

3.3. Hyperparameter Selection

3.4. Predictive Performance

3.5. Robustness Check

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI