In this setting, each individual correlation series (e.g., carbon–oil or carbon–stock) is treated as a separate variable, while the full system of correlations is viewed as a dynamic network. The forecasting task is to predict future correlations based on historical observations within a rolling window.
2.2.1. The Multivariable Graph Neural Network with Self-Attention Mechanism (MSGNN)
- (1)
Embedding input data and connecting residuals
Following Cai et al. [
29], the MSGNN model first transforms the dynamic correlation matrix by integrating the temporal information embedded in each pairwise correlation. Let
denote the conditional correlation between assets
i and
j at time
t, as estimated by the DCC-GARCH model. Let
N denote the number of distinct time-varying pairwise correlations, and let
T represent the sample length. At each time
t, all pairwise correlations are collected into an
N-dimensional vector. Stacking these vectors over time yields a multivariate dynamic correlation matrix
, which serves as the input to the MSGNN model. Given a rolling input window of length
L, the model first constructs a temporal embedding by mapping the raw correlation series into a higher-dimensional representation as follows:
where
represents normalized observed values within a retrospective window with
L days spanning from time
to
t. The notation
indicates the
convolutional filters with a balancing factor
a. This convolution projects
into a matrix with a dimension of
, which is a key hyperparameter controlling model complexity. The summation term
denotes global learnable temporal embeddings, while
represents positional encodings. The resulting matrix
of size
integrates both temporal and value-based information.
MSGNN operates in a residual manner. For the
l-th layer, the input
(
-dimension matrix) is defined as:
where
represents the operations within each identical Scale Graph Block.
- (2)
Identifying prominent scales
As the next step, the MSGNN model identifies the prominent time scale. The intuition behind this is that correlation dynamics may differ across time horizons. For instance, correlations between carbon and energy markets may evolve slowly over longer horizons, while reacting more sharply to short-term shocks during periods of market stress. To capture such heterogeneity, the MSGNN model explicitly extracts frequency-based patterns using a Fast Fourier Transform (FFT). The procedure is defined as:
where
represents the embedding results as explained in Equation (
3). The term
denotes the Fast Fourier Transform operation, and
extracts amplitude values. The term
V represents the overall amplitude values of each frequency, averaged across
dimensions using the
operator.
selects the
k largest elements of
V, thereby identifying the
k dominant frequency components in the correlation dynamics. The parameter
k is a hyperparameter indicating the number of time scales, and
denotes the specific time scale. Based on the selected time scales
, the MSGNN model reshapes the input matrix
to obtain the corresponding representations
for different time scales through the following process:
where
is of dimension
.
denotes a tensor reshaping operator that reorganizes the padded input into an
structure without altering the underlying values.
applies zero-padding to ensure reshaping compatibility.
- (3)
Revealing inter-series correlations among dynamic conditional correlations
After identifying the prominent time scales, the MSGNN model then captures the multi-scale inter-series correlations among the dynamic correlations across carbon, energy, and stock markets at each respective time scale, and aggregate them across scales. This process begins with a linear projection for the
i-th scale, transforming
into
to capture correlations among
N dynamic conditional correlation series:
where
is a learnable weight matrix with
-dimensions.
Next, an adaptive adjacency matrix
is constructed to represent inter-series correlations at scale
i. This matrix is computed using two learnable matrices
and
:
where
is an activation function, defined as
.
normalizes the weights between different nodes for the dynamic correlations, ensuring well-balanced inter-series relationships.
To further capture inter-series dependencies among different dynamic conditional correlation series, the model employs the Mixhop graph convolution method:
where
refers to the output at the
i-th time scale,
is an activation function, and
denotes a set of integer adjacency powers in the graph convolution, which is also a critical hyperparameter. The notation
represents the connection at the column level to link the intermediate results for each iteration.
denotes the calculation results for the learned adjacency matrix
raised to the power
j. Finally, the outputs
is projected through a multi-layer perceptron (
) to produce
, preparing the results for the subsequent step.
- (4)
Revealing intra-series correlations among dynamic conditional correlations
Next, the MSGNN model computes the multi-scale intra-series correlations using the multi-head attention mechanism (MAM):
where
is the output obtained from the previous stage, and
leverages the intra-series dependencies through its learnable hidden matrices. The resulting tensor
represents the MAM output for the
i-th time scale
To integrate the MAM outputs across the
k selected time scales and prepare for the next ScaleGraph Block, each
is reshaped into
. The model applies a SoftMax function to compute the aggregation weights
based on the amplitude values
of each scale:
These weights are used to aggregate the
k scale-specific outputs into a single weighted result:
where
represents the combined multi-scale output. The aggregated
is subsequently used as input for the next ScaleGraph Block.
- (5)
Generating the final forecast
Finally, the MSGNN model forecasts the dynamic correlations among carbon, energy, and stock markets for the future
n days,
, as follows:
where
is a
matrix comprising learnable parameters along the variables dimension, and
is an
matrix comprising learnable parameters along the time dimension. Here,
L refers to the length of the input retrospective window sequence, and
n represents the forecast horizon. The term
b is also a learnable parameter.
2.2.2. The BOHB Algorithm
The BOHB algorithm integrates Bayesian Optimization and HyperBand algorithm identify the optimal hyperparameter configuration for the MSGNN method [
21]. Its primary advantage lies in its ability to substantially reduce computational time when searching over a large hyperparameter space, which is common in deep learning applications. By efficiently navigating this space, the BOHB algorithm improves the selection of hyperparameters for the MSGNN model and thereby enhances the accuracy of predicting dynamic correlations among carbon, energy, and stock markets.
Specifically, instead of selecting hyperparameters randomly, the BOHB algorithm first applies Bayesian Optimization (BO) to propose hyperparameter settings that maximize the acquisition function:
where
denotes the probabilistic model of the objective function
f according to the prior selection space for hyperparameters
for the
g-th selection procedure. Here,
with
. The algorithm selects the next candidate
that maximizes the acquisition function and appends the pair
to the set
.
Afterward, the BOHB algorithm employs the HyperBand (
) to evaluate a set of
n candidate hyperparameter configurations from
under a limited computational budget. HB iteratively allocates resources by discarding the worst-performing half of the configurations and doubling the budget for the remaining ones. This process continues until the most promising hyperparameter configuration is identified. Formally:
where
denotes the optimal hyperparameter combination selected by BOHB for the MSGNN model.
Overall, the proposed BOHB-MSGNN framework leverages the strengths of both Bayesian Optimization and HyperBand to efficiently search for and identify globally optimal hyperparameters for the MSGNN model. The structure of the BOHB-MSGNN model is shown in
Figure 1.