Next Article in Journal
Complexity-Aware Progressive Data Error Correction with Distilled Language Models and Conformal Reliability Control
Previous Article in Journal
Bayesian Analysis of Tuberculosis Spread Scenarios in Regions of Russian Federation
Previous Article in Special Issue
Carbon Convenience Yields and Probability Density Forecasts for Carbon Returns
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Oil Market Volatility Forecasting Under Uncertainty Theory: A Joint Modeling Framework via Uncertain Vector Autoregression

1
Haide College, Ocean University of China, Qingdao 266100, China
2
School of Mathematical Sciences, Ocean University of China, Qingdao 266100, China
*
Author to whom correspondence should be addressed.
Mathematics 2026, 14(10), 1601; https://doi.org/10.3390/math14101601
Submission received: 3 April 2026 / Revised: 3 May 2026 / Accepted: 6 May 2026 / Published: 8 May 2026
(This article belongs to the Special Issue Mathematical Problems in Financial Fluctuations and Forecasting)

Abstract

Oil price volatility forecasting remains a central challenge in financial risk management and macroeconomic policy, particularly when market uncertainty stems from expert judgment, geopolitical assessments, or imprecisely quantified fundamentals rather than statistical frequencies. We propose a bivariate uncertain vector autoregressive (UVAR) model to jointly forecast crude oil realized volatility (RV) and the Overall Equity Market Volatility (EMV) tracker within the framework of uncertainty theory, using 204 monthly observations from January 2008 to December 2024. Three cross-validation schemes consistently identify UVAR(1) as optimal, and least-squares estimation reveals an asymmetric bidirectional relationship between the two variables. Residual analysis and uncertain hypothesis testing confirm the adequacy of the fitted model at both α = 0.05 and α = 0.10 , the conventional significance levels reported in the empirical literature. Relative to a univariate UAR benchmark, UVAR(1) yields lower residual variance and, on average, narrower 95% confidence intervals for both variables and remedies the hypothesis-test failure of UAR(1) for realized volatility; while its fixed-origin ATE is marginally higher on the EMV tracker, this is more than offset by substantial gains on realized volatility, the primary economic variable of interest. Against a probabilistic VAR(1) benchmark, UVAR(1) attains marginally lower out-of-sample sum of squared mean errors while uniquely supporting principled uncertain-statistical inference under non-frequentist data-generating mechanisms. These results provide principled inputs for value-at-risk assessment and portfolio hedging in oil-dependent economies.

1. Introduction

Oil price volatility forecasting is crucial for financial risk management and macroeconomic policy formulation, as price fluctuations significantly affect inflation expectations, investment decisions, and economic stability across nations [1,2]. Recent volatility episodes have intensified the need for robust predictive models capable of capturing oil market dynamics. While conventional probability-based forecasting methods have proven effective with abundant historical data, they face fundamental limitations when oil market uncertainty primarily reflects expert judgments, geopolitical assessments, or imprecisely quantified fundamentals rather than statistical frequencies. Uncertainty theory, established by Liu (2007) [3] and extended through the product axiom [4], provides an alternative mathematical framework for modeling indeterminacy when data are scarce or derived from expert elicitation. Unlike probability theory, which operates through multiplication axioms, uncertainty theory employs a minimum-operation system based on normality, duality, and subadditivity axioms. This makes it particularly suitable for scenarios where expert belief degrees rather than statistical frequencies characterize phenomena. Liu (2010) initiated uncertain statistics to handle imprecise observations [5], which has since evolved into two primary branches: uncertain regression analysis and uncertain time series analysis.
In the domain of uncertain regression analysis, Yao and Liu (2018) developed the foundational framework for determining quantitative relationships between variables [6]. Subsequent methodological advances include the extension to confidence intervals by Lio and Liu (2018) [7] and diversified parameter estimation techniques encompassing least absolute deviations [8], Tukey’s biweight estimation [9], and maximum likelihood estimation [10]. Liu and Jia (2020) applied cross-validation procedures to improve model selection reliability [11].
Yang and Liu (2019) introduced the uncertain autoregressive (UAR) model for time-dependent data [12], in which observations are characterized as uncertain variables rather than random variables, representing a fundamental departure from classical time series analysis that relies on white noise residuals and strict distributional assumptions. The framework has since been extended along multiple dimensions: Zhao et al. (2020) derived closed-form analytical solutions under the least-squares criterion [13]; Yang et al. (2020) and Liu (2021) proposed robust alternatives via least absolute deviations and Huber estimation, respectively [14,15]; Liu and Yang (2022) established a systematic cross-validation procedure for order selection [16]; Wang et al. (2024) refined the least-squares inference procedure with a dedicated numerical algorithm [17]; Liu and Qin (2025) developed a moment-estimation method validated on Disney stock prices [18]; Shi and Sheng (2025) introduced an uncertain quantile autoregressive specification that characterizes the conditional distribution beyond the mean [19]; and Wang et al. (2025) proposed a ridge estimator for the uncertain autoregressive moving average model [20]. The practical utility of this expanding framework has been validated across a range of empirical contexts, including carbon dioxide emissions modeling [21] and urban water demand prediction [22], with demonstrated advantages in settings where conventional probabilistic assumptions are untenable.
Dutta et al. (2021) demonstrate that news-based Equity Market Volatility (EMV) trackers exert significant predictive power over crude oil market volatility, with the relationship exhibiting pronounced asymmetry across different volatility regimes [23]. Critically, the EMV tracker is itself constructed from news-based textual analysis, whose inherent imprecision and non-frequency-based nature render conventional probabilistic autoregressive models fundamentally ill-suited—motivating the adoption of uncertainty theory as the modeling framework. Given the well-documented interdependence between oil market uncertainty indices and realized volatility, modeling these variables independently through separate univariate autoregressive (UAR) specifications inevitably omits cross-variable dynamics and spillover effects. Such misspecification risks producing forecasts that fail to fully exploit the informational content embedded in their joint evolution. To accommodate these multivariate dependencies, Tang (2020) extended the UAR framework to the uncertain vector autoregressive (UVAR) model [24], which enables the simultaneous modeling of systems in which multiple variables interact dynamically through uncertain disturbances, thereby providing a natural vehicle for capturing cross-variable feedback mechanisms and volatility spillovers within the uncertainty-theoretic paradigm. Building on this foundation, Shi and Sheng (2024) further proposed an uncertain vector autoregressive smoothly moving average specification, providing greater flexibility for joint modeling of multivariate uncertain time series [25]. Despite its theoretical appeal, the UVAR framework has yet to be systematically applied to oil market forecasting—a gap that is particularly consequential given the well-documented interdependence between uncertainty indices and realized volatility.
This study addresses this gap and makes four main contributions. First, it provides the first systematic application of the bivariate UVAR model to jointly forecast crude oil realized volatility and the overall EMV tracker, extending the uncertain time series literature beyond the largely univariate empirical applications that have dominated the field to date. Second, it implements and compares three cross-validation schemes—fixed-origin, rolling-origin, and rolling-window—for UVAR order selection, providing empirical guidance for model specification in this class of models that has hitherto been absent from the literature. Third, it documents an empirically significant asymmetric bidirectional relationship between the two variables and demonstrates that ignoring cross-variable dynamics—as in univariate UAR specifications—leads to hypothesis-test failure for realized volatility, a misspecification that is fully remedied by the joint UVAR specification; this conclusion is shown to hold at both α = 0.05 and α = 0.10 through a dedicated sensitivity analysis. The asymmetric trade-off is also documented at the level of the average testing error: UVAR(1) records a marginally higher fixed-origin ATE than UAR(1) on the EMV tracker but a substantially lower one on realized volatility, indicating that the joint specification trades a small loss on the auxiliary variable for a sizable gain on the variable of primary economic interest. Fourth, an out-of-sample rolling one-step-ahead evaluation benchmarks UVAR(1) against a conventional probabilistic VAR(1) model and shows that the two attain comparable predictive accuracy in terms of out-of-sample sum of squared mean errors, while UVAR uniquely supports uncertain-statistical inference—formal hypothesis testing and uncertain confidence intervals—under the imprecise-observation paradigm in which the Gaussian-innovation assumptions underlying probabilistic VAR are inappropriate. Together, these contributions demonstrate that explicitly modeling cross-variable dynamics offers actionable inputs for value-at-risk assessment, portfolio hedging, and macroeconomic stress testing in oil-dependent economies exposed to sharp volatility regimes of the kind observed during the 2008–2009 financial crisis and the 2020 pandemic shock.
The remainder of this paper is organized as follows. Section 2 presents the theoretical foundations, including the definition of uncertain variables, the specification of the bivariate UVAR model, and the parameter estimation methodology. Section 3 describes the data on the EMV tracker and crude oil realized volatility, presents empirical estimation results, and evaluates forecasting performance through comparisons against both univariate UAR and probabilistic VAR benchmarks, complemented by a sensitivity analysis of the hypothesis test. Section 4 concludes with a summary of findings and directions for future research.

2. Preliminaries

2.1. Uncertain Vector Autoregressive Model

The UAR model was proposed by Yang and Liu (2019) [12], as shown in Equation (1). This model characterizes how an uncertain time series linearly regresses its future value on its own past observations:
X t = a 0 + i = 1 k a i X t i + ε t ,
where X t denotes the imprecisely observed values (uncertain variables) at time points t = 1 , 2 , , n ; a 0 , a 1 , , a k are the unknown autoregressive coefficients; k specifies the model order; and ε t represents the disturbance term.
However, in complex systems all factors are interrelated. If the value of a variable at period t is affected by the prior values of other variables, a multivariate extension becomes necessary. To this end, Tang (2020) proposed the uncertain vector autoregressive model, UVAR(k), by extending the UAR framework to the multivariate setting [24]. In the present study, we adopt the bivariate UVAR(k) model to jointly characterize the dynamic interactions between the overall EMV tracker and realized volatility of crude oil. Taking a bivariate system as an illustrative example, the UVAR(k) formulation is given by Equation (2):
X 1 , t X 2 , t = a 10 a 20 + a 11 , 1 a 12 , 1 a 21 , 1 a 22 , 1 X 1 , t 1 X 2 , t 1 + a 11 , 2 a 12 , 2 a 21 , 2 a 22 , 2 X 1 , t 2 X 2 , t 2 + + a 11 , k a 12 , k a 21 , k a 22 , k X 1 , t k X 2 , t k + ε 1 , t ε 2 , t ,
where X 1 , t and X 2 , t represent imprecisely observed values of variables X 1 and X 2 at period t, respectively; a 10 a 20 , a 11 , 1 a 12 , 1 a 21 , 1 a 22 , 1 , , a 11 , k a 12 , k a 21 , k a 22 , k are coefficient matrices to be estimated; k specifies the model order; and ε 1 , t and ε 2 , t are the disturbance terms at period t, independent of X 1 , t and X 2 , t , respectively.
Obviously, Equation (2) can be expressed equivalently as follows:
X i , t = a i , 0 + q = 1 k p = 1 2 a i p , q X p , t q + ε i , t , i = 1 , 2 .
As mentioned in the work of Yang and Liu (2019) [12], the parameters of model (3) are estimated using the least-squares approach, which are derived from the solution to the following optimization problem:
min a i 0 , a p i , q i , p = 1 , 2 q = 1 , 2 , , k i = 1 2 t = k + 1 n E X i , t a i 0 q = 1 k p = 1 2 a i p , q X p , t q 2 .
As proved by Tang (2020) [24], let Φ 1 , t and Φ 2 , t denote the regular uncertainty distributions of X 1 , t and X 2 , t , respectively. The solution of the optimization problem (4) is obtained by solving the following minimization problem:
min a i 0 , a p i , q i , p = 1 , 2 q = 1 , 2 , , k i = 1 2 t = k + 1 n 0 1 Φ i , t 1 ( α ) a i 0 q = 1 k p = 1 2 a i p , q γ p , t q 1 ( α , a i p , q ) 2 d α ,
where γ p , t q 1 ( α , a i p , q ) = Φ p , t q 1 ( 1 α ) , if a i p , q 0 Φ p , t q 1 ( α ) , if a i p , q < 0 , for i , p = 1 , 2 , q = 1 , 2 , , k .
Assuming the optimal solution is a i 0 , a p i , q , i , p = 1 , 2 , q = 1 , 2 , , k , the fitted UVAR model can be expressed as
X ^ i , t = a i 0 + q = 1 k p = 1 2 a i p , q X p , t q , i = 1 , 2 ,
where X ^ i , t is the forecast value of X i at period t.

2.2. Residual Analysis

We now proceed to analyze the expected value and variance of the disturbance term. As defined by Yang and Liu (2019) [12], the difference between the actual observed value and the value predicted by the UVAR model is referred to as the t-th residual:
ε ^ i , t = X i , t X ^ i , t , i = 1 , 2 .
Let the disturbance terms ε i , k + 1 , ε i , k + 2 , , ε i , n be assumed to be independent and identically distributed (i.i.d.) uncertain variables. Then, the expected value and variance of the disturbance terms can be estimated as follows:
e ^ i = 1 n k t = k + 1 n E ε ^ i , t = 1 n k t = k + 1 n 0 1 Φ i , t 1 ( α ) a i 0 q = 1 k p = 1 2 a i p , q γ p , t q 1 ( α , a i p , q ) d α ,
and
σ ^ i 2 = 1 n k t = k + 1 n E ε ^ i , t e ^ i 2 = 1 n k t = k + 1 n 0 1 Φ i , t 1 ( α ) a i 0 q = 1 k p = 1 2 a i p , q γ p , t q 1 ( α , a i p , q ) e ^ i 2 d α ,
for i = 1 , 2 .
The proof is adapted from Tang (2020) [24], who utilized the foundational theorems from Liu (2010) [5] to establish the transformation.

2.3. Hypothesis Test

We examine whether the fitted model accurately captures the essential features of the data. When the fitted model successfully captures the underlying characteristics of the data, a significant portion of the information is expected to be captured by X ^ i , t . As a result, the estimates of the mean ( e ^ i ) and variance ( σ ^ i 2 ) of the residuals should be deemed appropriate. Therefore, the hypothesis test for the fitted model is equivalent to assessing whether the distributional assumptions of the residuals are valid. Accordingly, we consider the hypothesis that the residuals ε ^ i , t follow a normal uncertainty distribution. Denote by e i and σ i the location and scale parameters of this distribution, respectively. Then the two-sided hypotheses are
H 0 : e 1 e 2 = e ^ 1 e ^ 2 and σ 1 2 σ 2 2 = σ ^ 1 2 σ ^ 2 2 H 1 : e 1 e 2 e ^ 1 e ^ 2 or σ 1 2 σ 2 2 σ ^ 1 2 σ ^ 2 2 ,
where e ^ i and σ ^ i 2 are the estimated expected value and variance of the disturbance terms.
Following Liu and Yang (2022) [16], for ε ^ i , t , k + 1 t n , we construct the rejection region W i (based on the confidence level α ):
W i = ( ε ^ i , k + 1 , , ε ^ i , n ) : at least α ( n k ) indices t satisfy ε ^ i , t < Φ i 1 α 2 or ε ^ i , t > Φ i 1 1 α 2 ,
where Φ i 1 ( α ) = e ^ i + σ ^ i 3 π ln α 1 α .
If ( ε ^ i , k + 1 , , ε ^ i , n ) W i for both i = 1 , 2 , then H 0 is accepted, indicating that the assumption of the residuals following a normal uncertainty distribution is reasonable. If the residual sequence for at least one i falls within its rejection region W i , then H 0 is rejected.

2.4. Forecast Value and Confidence Interval

Once the functional form of the UVAR(k) model is established, it can be exploited to obtain both point forecasts and their associated interval estimates for future observations.
For i = 1 , 2 , based on the preceding discussion, we know that the forecast uncertain value X ^ i , n + 1 of X i , n + 1 is
X ^ i , n + 1 = a i 0 + q = 1 k p = 1 2 a i p , q X p , n + 1 q + ε i , n + 1 ,
where the disturbance term ε i , n + 1 has the estimated expected value e ^ i and variance σ ^ i 2 .
The point estimation of X i , n + 1 is defined as the expected value of X ^ i , n + 1 , i.e.,
X i , n + 1 = E X ^ i , n + 1 = a i 0 + q = 1 k p = 1 2 a i p , q E X p , n + 1 q + e ^ i .
If the disturbance term ε i , n + 1 is assumed to follow a normal uncertainty distribution, then the inverse uncertainty distribution of X ^ i , n + 1 is
Φ ^ i , n + 1 1 ( α ) = a i 0 + q = 1 k p = 1 2 a i p , q γ p , n + 1 q 1 ( α , a i p , q ) + Φ i 1 ( α ) ,
where γ p , n + 1 q 1 ( α , a i p , q ) = Φ p , n + 1 q 1 ( 1 α ) , if a i p , q 0 Φ p , n + 1 q 1 ( α ) , if a i p , q < 0 , for i , p = 1 , 2 , q = 1 , 2 , , k and Φ i 1 ( α ) = e ^ i + σ ^ i 3 π ln α 1 α .
From Φ ^ i , n + 1 1 ( α ) , we can obtain the uncertainty distribution Φ ^ i , n + 1 ( x ) of X ^ i , n + 1 .
Following Yang and Liu (2019) [12], we adopt α (e.g., α = 95 % ) as the confidence level and determine the minimum value b i satisfying
Φ ^ i , n + 1 ( X i , n + 1 + b i ) Φ ^ i , n + 1 ( X i , n + 1 b i ) α .
The α confidence interval of X i , n + 1 is X i , n + 1 b i , X i , n + 1 + b i .

3. Results

Figure 1 summarizes the empirical procedure followed in this section.

3.1. Data Description

We apply the uncertain vector autoregressive (UVAR) model to analyze the relationship between the news-based Overall Equity Market Volatility (EMV) tracker and the realized volatility (RV) of crude oil. The EMV tracker data is collected from http://www.policyuncertainty.com/, and crude oil realized volatility is computed from daily WTI price sourced from https://cn.investing.com/. The dataset consists of monthly observations from January 2008 to December 2024, covering a period of significant economic events including the 2008 financial crisis, the European debt crisis, and the COVID-19 pandemic.
Variable 1 ( X 1 ): News-based overall EMV tracker. This is the Equity Market Volatility tracker constructed by computing the monthly share of newspaper articles in which a fixed list of equity-market-volatility keywords co-occurs with economic-condition keywords, scaled to match the level of the VIX over an anchor period. As a text-derived sentiment measure, the EMV tracker is inherently imprecisely observed rather than the realization of a frequentist sampling process.
Variable 2 ( X 2 ): Crude oil realized volatility (RV). This is the monthly realized volatility computed from daily WTI crude oil log-returns. While each daily price is recorded as a single number, the monthly RV value is itself a proxy for the latent monthly volatility process, rather than a directly measurable quantity, and is therefore subject to microstructure errors that motivate its treatment as an imprecisely observed value.
The dataset spans n = 204 monthly observations from January 2008 to December 2024. Table 1 reports summary statistics for both variables over the full sample period.
With p-values for both the Jarque–Bera and Ljung–Box tests below 0.01, both series significantly deviate from normality and exhibit serial correlation.
Figure 2 displays the time series plot of both variables. From the figure, we can observe that both variables exhibit similar patterns of fluctuation, with notable spikes during periods of economic turmoil, particularly in 2008–2009 (financial crisis), 2011–2012 (European debt crisis), and 2020 (COVID-19 pandemic). This suggests a potential interdependence between the EMV tracker and realized volatility.

3.2. Data Preprocessing

Given the different scales of the two variables, we apply standardization to ensure comparability. For each variable i = 1 , 2 , the standardized observations are computed as follows:
X ˜ i , t = X i , t μ i s i ,
where μ i and s i are the sample mean and standard deviation of variable i, respectively. The standardization parameters are:
  • Variable 1: μ 1 = 20.7000 , s 1 = 8.2885 .
  • Variable 2: μ 2 = 2.2443 , s 2 = 1.7939 .
After standardization, both variables have zero mean and unit variance, which facilitates the parameter estimation and interpretation of the UVAR model.
Given that neither variable is a precise frequentist realization—the EMV tracker is a textual-sentiment proxy and realized volatility is a microstructure-noise-corrupted proxy for an unobservable latent process, as discussed in Section 3.1—we represent each observation as a linear uncertain variable with regular uncertainty distribution, following Tang (2020) [24], so that parameters can be estimated within the regular uncertainty distribution framework.

3.3. Order Selection via Cross-Validation

To determine the optimal order k of the UVAR model, we employ three cross-validation methods: fixed-origin, rolling-origin, and rolling-window cross-validation. The training set size is set to T = 143 , leaving 61 observations for testing. For each candidate order k = 1 , 2 , 3 , 4 , we compute the average testing error (ATE) using the three methods. The results are presented in Table 2.
From Table 2, we observe that k = 1 yields the smallest ATE values across all three cross-validation methods. Therefore, we select the UVAR(1) model for our analysis. This result suggests that the current values of both variables are primarily influenced by their immediate past values, which is consistent with the short-term memory characteristics often observed in financial time series.

3.4. Parameter Estimation

Based on the least-squares principle described in Section 2, we estimate the parameters of the UVAR(1) model using the training dataset ( T = 143 ). The estimated UVAR(1) model is
X ˜ 1 , t = 0.5080 X ˜ 1 , t 1 + 0.2040 X ˜ 2 , t 1 0.0404 , X ˜ 2 , t = 0.0873 X ˜ 1 , t 1 + 0.6712 X ˜ 2 , t 1 0.0160 .
The parameter estimates reveal several important findings:
  • Autoregressive effects: Both variables exhibit positive autoregressive coefficients ( a 11 , 1 = 0.5080 and a 22 , 1 = 0.6712 ), indicating persistence in their respective dynamics. The realized volatility shows stronger persistence ( a 22 , 1 = 0.6712 ) compared to the EMV tracker ( a 11 , 1 = 0.5080 ).
  • Cross-variable effects: The positive coefficient a 12 , 1 = 0.2040 suggests that past realized volatility has a substantial positive impact on current EMV tracker values, while the coefficient a 21 , 1 = 0.0873 indicates that past EMV tracker values exert a modest positive influence on current realized volatility. This bidirectional relationship confirms the existence of cross-variable dependencies in the system, which justifies the use of the UVAR model framework. By incorporating both lagged variables, the UVAR model exploits these interdependencies to achieve more accurate and stable volatility forecasts.
  • Intercept terms: The small negative intercepts ( a 1 0 = 0.0404 and a 2 0 = 0.0160 ) are close to zero, which is expected given the standardization of variables.

3.5. Residual Analysis and Hypothesis Testing

According to Section 2, we conduct residual analysis to assess the goodness-of-fit of the estimated UVAR(1) model. The residuals are computed as in Equation (7). The estimated expected values and variances of the disturbance terms are presented in Table 3.
The expected values of both disturbance terms are essentially zero, which is consistent with the model assumptions. The variance of the first disturbance term ( σ ^ 1 2 = 0.7167 ) is larger than that of the second ( σ ^ 2 2 = 0.1625 ), suggesting that the EMV tracker exhibits greater unpredictability compared to realized volatility after accounting for autoregressive and cross-variable dynamics.
To formally test the appropriateness of the estimated parameters, we apply the uncertain hypothesis test described in Section 2. At a significance level of α = 0.05 , we test the null hypothesis H 0 . For the training set of size T k = 142 , the critical threshold is ( T k ) × α = 142 × 0.05 = 7.1 . The hypothesis test results are summarized in Table 4.
The spatial distribution of residuals over time, along with the identified outliers and acceptance bounds, is illustrated in Figure 3 and Figure 4 for ε 1 , t and ε 2 , t , respectively.
Although variable 1 has four outliers and variable 2 has seven outliers, both are within the acceptable range under the 5% significance level. Therefore, we accept the null hypothesis H 0 for both variables, indicating that the estimated UVAR(1) model provides a good fit to the data.
To assess the robustness of the hypothesis-test outcome to the choice of significance level, we recompute the test for α { 0.01 , 0.05 , 0.10 } . The corresponding critical thresholds and outlier counts are reported in Table 5.
At the conventional levels α = 0.05 and α = 0.10 , the UVAR(1) model is comfortably accepted for both residual sequences, indicating that the goodness-of-fit conclusion reported before is not an artifact of the specific significance level chosen. The rejection at α = 0.01 reflects the unusual tightness of the corresponding critical threshold: only 1.42 outliers are tolerated over 142 training observations, so even two extreme residuals—which are inevitable for a financial time series spanning multiple turbulent episodes such as the 2008 financial crisis and the 2011 European debt crisis—suffice to trigger rejection. This regime is uninformative for model discrimination, as essentially any low-order specification fitted to a series exhibiting heavy-tailed shocks of this kind would fail at this level. Overall, the sensitivity analysis confirms the adequacy of the UVAR(1) specification across the conventional significance levels typically reported in the empirical literature.

3.6. Forecasting and Confidence Intervals

Based on the estimated UVAR(1) model, we forecast the values of both variables for the next period ( n + 1 = 205 th observation in the standardized scale, corresponding to January 2025 in the original scale).
According to Section 2, the point forecast for the standardized variables is given by the general forecast formula, with X ˜ n representing the standardized observations at time n. The forecast results are X ˜ 1 , n + 1 = 0.1665 and X ˜ 2 , n + 1 = 0.0582 .
To construct the 95% confidence intervals, we assume that the disturbance terms follow normal uncertainty distributions N ( e ^ i , σ ^ i 2 ) . The 95% confidence intervals for the standardized forecasts are X ˜ 1 , n + 1 [ 1.8764 , 1.5434 ] and X ˜ 2 , n + 1 [ 0.8724 , 0.7559 ] .
To obtain forecasts in the original scale, we apply the inverse transformation:
X ^ i , n + 1 = X ˜ i , n + 1 · s i + μ i .
This yields the following point forecasts and 95% confidence intervals:
  • Overall EMV tracker: X ^ 1 , n + 1 19.32 , with 95% CI [ 5.15 , 33.49 ] .
  • Realized volatility: X ^ 2 , n + 1 2.14 , with 95% CI [ 0.68 , 3.60 ] .

3.7. Comparison with Univariate UAR and Probabilistic VAR Benchmarks

3.7.1. Comparison with the Univariate UAR(1) Model

To demonstrate the advantage of the UVAR model, we also fit separate uncertain autoregressive (UAR) models to each variable and compare the results. The UAR(1) models are
X ˜ 1 , t = 0.0511 + 0.5713 X ˜ 1 , t 1 + ξ 1 , t , X ˜ 2 , t = 0.0174 + 0.7494 X ˜ 2 , t 1 + ξ 2 , t .
The comparison is reported in Table 6.
The UVAR(1) model achieves lower estimated residual variance for both variables. Figure 5 visualizes the 95% confidence bands of the two models on the test set; while the two bands appear visually similar across most of the test period, computing the average band width over the 61 test points reveals that UVAR(1) bands are narrower than UAR(1) bands for both variables (28.346 vs. 28.575 for the EMV tracker; 2.921 vs. 2.976 for realized volatility).
Beyond residual variance and CI width, the most consequential difference between the two specifications concerns the uncertain hypothesis test of Section 3.5. Repeating the test for the two univariate UAR(1) models at α = 0.05 yields five outliers for the EMV tracker (accepted) and eight outliers for realized volatility (rejected, against a critical threshold of 7.1). The same pattern persists at α = 0.10 , where the UAR(1) model for realized volatility records 17 outliers against a threshold of 14.2 and is again rejected. In contrast, the joint UVAR(1) model is comfortably accepted for both series at both significance levels (Table 5). The univariate UAR(1) specification is therefore inferentially inadequate for realized volatility: it discards the cross-variable information that the joint UVAR(1) model exploits, and the resulting misspecification is detected by the formal residual diagnostic.
A natural question is whether the in-sample fixed-origin ATE for the UVAR(1) model dominates the corresponding UAR(1) values. Comparing the fixed-origin ATE of UAR(1) ( 0.6965 for the EMV tracker, 2.3096 for realized volatility) with that of UVAR(1) ( 1.438 ; see Table 2), one observes that UVAR(1) performs slightly worse than UAR(1) on the EMV tracker but substantially better on realized volatility. This asymmetry is a direct consequence of the joint specification: the UVAR(1) ATE is a single number that aggregates the testing error of both equations, so the model trades marginal predictive accuracy on the EMV tracker for substantial predictive gains on realized volatility through the cross-variable channel. Combined with the failure of UAR(1) to pass the uncertain hypothesis test for realized volatility documented above, this confirms that the joint UVAR(1) specification is preferred on both inferential and forecasting grounds for the variable that is the primary economic object of interest in oil volatility forecasting.

3.7.2. Comparison with the Probabilistic VAR(1) Model

We further benchmark UVAR(1) against the conventional probabilistic VAR(1) model estimated by maximum likelihood under Gaussian innovations. At each step of the rolling exercise on the test set t = 144 , , 204 , both models are re-estimated on an expanding window 1 , , t 1 and produce a one-step-ahead point forecast for period t. Forecast accuracy is evaluated on the original de-standardized scale via the sum of squared mean errors:
SSM i = t T X ^ i , t X i , t 2 ,
where T = { 144 , , 204 } . Results are reported in Table 7.
UVAR(1) marginally outperforms probabilistic VAR(1) on both series in terms of SSM. The probabilistic VAR(1) model relies on Gaussian innovations and finite-variance asymptotics to justify its hypothesis testing and interval estimation—assumptions that are clearly violated for both the news-based EMV tracker (a textual-sentiment proxy whose imprecision is non-frequentist by construction) and crude oil realized volatility (a microstructure-noise-corrupted proxy for an unobserved latent process; see the Jarque–Bera test results in Table 1). The UVAR(1) framework, in contrast, supports formal hypothesis testing and uncertain confidence intervals without invoking such distributional assumptions, making it more appropriate for the data at hand.

4. Conclusions

This study applies the bivariate uncertain vector autoregressive (UVAR) model to jointly forecast crude oil realized volatility and the Overall Equity Market Volatility (EMV) tracker, exploiting their well-documented bidirectional interdependence within the framework of uncertainty theory. The estimated UVAR(1) model reveals an asymmetric cross-variable relationship: lagged realized volatility exerts a substantially stronger positive influence on the contemporaneous EMV tracker (coefficient 0.2040) than the reverse channel (coefficient 0.0873), indicating that actual volatility shocks propagate into uncertainty sentiment with greater intensity than uncertainty expectations feed back into realized market fluctuations.
By capturing these cross-variable dynamics, the UVAR(1) model outperforms univariate UAR(1) benchmarks on multiple criteria: it yields lower residual variances and, on average, narrower 95% uncertainty confidence intervals for both variables, and—most critically—the UAR(1) model applied to realized volatility fails the uncertain hypothesis test at the 5% significance level, a misspecification that is fully remedied by the joint UVAR specification. Although UVAR(1) records a marginally higher in-sample fixed-origin ATE than UAR(1) on the EMV tracker, this is more than offset by substantial gains on realized volatility, which is the primary economic object of interest in oil market risk management; the joint specification therefore trades a small loss on the auxiliary variable for a sizable gain on the variable that matters most to practitioners. A sensitivity analysis confirms that the goodness-of-fit conclusion for UVAR(1) holds at both α = 0.05 and α = 0.10 . The rejection at the tighter level α = 0.01 is uninformative for model discrimination: the corresponding critical threshold of 1.42 outliers is so stringent that even two extreme residuals—inevitable for a financial series containing turbulent episodes such as the 2008 financial crisis and the 2011 European debt crisis—suffice to trigger rejection, an outcome that any low-order specification fitted to such a series would share.
The UVAR(1) framework also compares favorably with the conventional probabilistic VAR(1) benchmark in an out-of-sample rolling one-step-ahead exercise, attaining marginally lower sum of squared mean errors on both series while uniquely supporting principled uncertain-statistical inference. These findings demonstrate the suitability of the uncertainty-theoretic framework for energy-market forecasting, particularly for financial time series that markedly violate classical distributional assumptions. The calibrated point forecasts and uncertainty confidence intervals produced by the UVAR framework are directly actionable for practitioners, providing principled inputs for value-at-risk assessment, portfolio hedging against oil price swings, and macroeconomic stress testing in oil-dependent economies.

Author Contributions

Conceptualization, C.G.; Methodology, C.G.; Software, C.G.; Validation, C.G.; Formal analysis, C.G.; Investigation, C.G.; Resources, C.G.; Data curation, C.G.; Writing—original draft, C.G.; Writing—review & editing, C.G.; Visualization, C.G.; Supervision, P.C.; Project administration, C.G.; Funding acquisition, C.G. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The author would like to thank the anonymous reviewers for their valuable comments and suggestions that improved this manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
UARUncertain autoregressive (model)
UVARUncertain vector autoregressive (model)
VARVector autoregressive (model)
EMVEquity market volatility
RVRealized volatility
ATEAverage testing error
CIConfidence interval
SSMSum of squared mean errors
i.i.d.Independent and identically distributed

References

  1. Hamilton, J.D. Understanding Crude Oil Prices. Energy J. 2009, 30, 179–206. [Google Scholar] [CrossRef]
  2. Baumeister, C.; Kilian, L. Forty Years of Oil Price Fluctuations: Why the Price of Oil May Still Surprise Us. J. Econ. Perspect. 2016, 30, 139–160. [Google Scholar] [CrossRef]
  3. Liu, B. Uncertainty Theory, 2nd ed.; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
  4. Liu, B. Some research problems in uncertainty theory. J. Uncertain Syst. 2009, 3, 3–10. [Google Scholar]
  5. Liu, B. Uncertainty Theory: A Branch of Mathematics for Modeling Human Uncertainty; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
  6. Yao, K.; Liu, B. Uncertain regression analysis: An approach for imprecise observations. Soft Comput. 2018, 22, 5579–5582. [Google Scholar] [CrossRef]
  7. Lio, W.; Liu, B. Residual and confidence interval for uncertain regression model with imprecise observations. J. Intell. Fuzzy Syst. 2018, 35, 2573–2583. [Google Scholar] [CrossRef]
  8. Liu, Z.; Yang, Y. Least absolute deviations estimation for uncertain regression with imprecise observations. Fuzzy Optim. Decis. Mak. 2020, 19, 33–52. [Google Scholar] [CrossRef]
  9. Chen, D. Tukey’s biweight estimation for uncertain regression model with imprecise observations. Soft Comput. 2020, 24, 16803–16809. [Google Scholar] [CrossRef]
  10. Lio, W.; Liu, B. Uncertain maximum likelihood estimation with application to uncertain regression analysis. Soft Comput. 2020, 24, 9351–9360. [Google Scholar] [CrossRef]
  11. Liu, Z.; Jia, L. Cross-validation for the uncertain Chapman-Richards growth model with imprecise observations. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 2020, 28, 769–783. [Google Scholar] [CrossRef]
  12. Yang, X.; Liu, B. Uncertain time series analysis with imprecise observations. Fuzzy Optim. Decis. Mak. 2019, 18, 263–278. [Google Scholar] [CrossRef]
  13. Zhao, X.; Peng, J.; Liu, J.; Zhou, X. Analytic solution of uncertain autoregressive model based on principle of least squares. Soft Comput. 2020, 24, 2721–2726. [Google Scholar] [CrossRef]
  14. Yang, X.; Park, G.; Hu, Y. Least absolute deviations estimation for uncertain autoregressive model. Soft Comput. 2020, 24, 18211–18217. [Google Scholar] [CrossRef]
  15. Liu, Z. Huber estimation for uncertain autoregressive model. J. Uncertain Syst. 2021, 14, 2150010. [Google Scholar] [CrossRef]
  16. Liu, Z.; Yang, X. Cross validation for uncertain autoregressive model. Commun. Stat. Simul. Comput. 2022, 51, 4715–4726. [Google Scholar] [CrossRef]
  17. Wang, H.; Liu, Y.; Shi, H. Statistical inference of uncertain autoregressive model via the principle of least squares. Axioms 2024, 13, 789. [Google Scholar] [CrossRef]
  18. Liu, Y.; Qin, Z. Moment estimation of uncertain autoregressive model and its application in financial market. Commun. Stat. Simul. Comput. 2025, 54, 4324–4343. [Google Scholar] [CrossRef]
  19. Shi, Y.; Sheng, Y. Uncertain quantile autoregressive model. Commun. Stat. Simul. Comput. 2025, 54, 1869–1889. [Google Scholar] [CrossRef]
  20. Wang, X.; Cao, J.; Li, W. Ridge estimation for uncertain autoregressive moving average model with imprecise observations. Int. J. Gen. Syst. 2025, 54, 584–601. [Google Scholar] [CrossRef]
  21. Chen, D.; Yang, X. Maximum likelihood estimation for uncertain autoregressive model with application to carbon dioxide emissions. J. Intell. Fuzzy Syst. 2021, 40, 1391–1399. [Google Scholar] [CrossRef]
  22. Li, W.; Wang, X. Analysis and prediction of urban household water demand with uncertain time series. Soft Comput. 2024, 28, 6199–6206. [Google Scholar] [CrossRef]
  23. Dutta, A.; Bouri, E.; Saeed, T. News-based equity market uncertainty and crude oil volatility. Energy 2021, 222, 119930. [Google Scholar] [CrossRef]
  24. Tang, H. Uncertain vector autoregressive model with imprecise observations. Soft Comput. 2020, 24, 17001–17007. [Google Scholar] [CrossRef]
  25. Shi, Y.; Sheng, Y. Uncertain vector autoregressive smoothly moving average model. Commun. Stat. Simul. Comput. 2024, 53, 6038–6049. [Google Scholar] [CrossRef]
Figure 1. Workflow of the framework for joint forecasting of crude oil realized volatility and the EMV tracker via the bivariate UVAR model. The four stages structure the empirical analysis presented in this section.
Figure 1. Workflow of the framework for joint forecasting of crude oil realized volatility and the EMV tracker via the bivariate UVAR model. The four stages structure the empirical analysis presented in this section.
Mathematics 14 01601 g001
Figure 2. Time series plot of the overall EMV tracker and realized volatility, January 2008– December 2024.
Figure 2. Time series plot of the overall EMV tracker and realized volatility, January 2008– December 2024.
Mathematics 14 01601 g002
Figure 3. Residual mean of ε 1 , t (EMV tracker) over the training period. Red dashed lines indicate the acceptance bounds; red squares denote outliers exceeding the bounds.
Figure 3. Residual mean of ε 1 , t (EMV tracker) over the training period. Red dashed lines indicate the acceptance bounds; red squares denote outliers exceeding the bounds.
Mathematics 14 01601 g003
Figure 4. Residual mean of ε 2 , t (realized volatility) over the training period. Red dashed lines indicate the acceptance bounds; red squares denote outliers exceeding the bounds.
Figure 4. Residual mean of ε 2 , t (realized volatility) over the training period. Red dashed lines indicate the acceptance bounds; red squares denote outliers exceeding the bounds.
Mathematics 14 01601 g004
Figure 5. Out-of-sample 95% uncertainty confidence intervals of UVAR(1) (blue band, solid centerline) and UAR(1) (red band, dashed centerline) over the test period t = 144 , , 204 (January 2020–December 2024). Average band widths are reported in each panel.
Figure 5. Out-of-sample 95% uncertainty confidence intervals of UVAR(1) (blue band, solid centerline) and UAR(1) (red band, dashed centerline) over the test period t = 144 , , 204 (January 2020–December 2024). Average band widths are reported in each panel.
Mathematics 14 01601 g005
Table 1. Summary statistics for the overall EMV tracker and realized volatility.
Table 1. Summary statistics for the overall EMV tracker and realized volatility.
VariableMeanSDSkewnessKurtosisJB/LB p-Value
EMV Tracker20.708.292.8914.83<0.01
Realized Volatility2.241.796.7467.60<0.01
JB: Jarque–Bera normality test; LB: Ljung–Box serial correlation test.
Table 2. Average testing error (ATE) for candidate UVAR orders under three cross-validation methods.
Table 2. Average testing error (ATE) for candidate UVAR orders under three cross-validation methods.
Order kFixed-Origin ATERolling-Origin ATERolling-Window ATE
11.4380.2760.279
21.4690.2950.305
31.4870.3110.314
41.5000.3160.315
The minimum ATE across all three methods is achieved at k = 1 .
Table 3. Estimated expected value and variance of the disturbance terms.
Table 3. Estimated expected value and variance of the disturbance terms.
Disturbance TermExpected Value e ^ i Variance σ ^ i 2
ε 1 , t (EMV Tracker)≈00.7167
ε 2 , t (Realized Volatility)≈00.1625
Table 4. Uncertain hypothesis test results for the UVAR(1) residuals ( α = 0.05 ).
Table 4. Uncertain hypothesis test results for the UVAR(1) residuals ( α = 0.05 ).
Disturbance TermNumber of OutliersCritical Threshold ( T k ) α
ε 1 , t (EMV Tracker)47.1
ε 2 , t (Realized Volatility)77.1
Both variables are within the acceptable range, so H 0 is accepted at the 5% significance level.
Table 5. Sensitivity of the UVAR(1) hypothesis test to the significance level α .
Table 5. Sensitivity of the UVAR(1) hypothesis test to the significance level α .
α Critical Threshold
( T k ) α
Number of OutliersDecision for H 0
ε 1 , t ε 2 , t ε 1 , t ε 2 , t
0.011.4232RejectReject
0.057.1047AcceptAccept
0.1014.20612AcceptAccept
ε 1 , t corresponds to the EMV tracker and ε 2 , t corresponds to realized volatility.
Table 6. In-sample comparison of UVAR(1) and UAR(1) models.
Table 6. In-sample comparison of UVAR(1) and UAR(1) models.
ModelVariableResidual VarianceAvg. 95% CI Width
UVAR(1)EMV Tracker0.716728.346
UVAR(1)Realized Volatility0.16252.921
UAR(1)EMV Tracker0.728328.575
UAR(1)Realized Volatility0.16862.976
Table 7. Out-of-sample SSM comparison of UVAR(1) and probabilistic VAR(1).
Table 7. Out-of-sample SSM comparison of UVAR(1) and probabilistic VAR(1).
ModelVariableSSM
UVAR(1)EMV Tracker2848.22
UVAR(1)Realized Volatility614.22
VAR(1)EMV Tracker2850.65
VAR(1)Realized Volatility616.25
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Gao, C.; Chen, P. Oil Market Volatility Forecasting Under Uncertainty Theory: A Joint Modeling Framework via Uncertain Vector Autoregression. Mathematics 2026, 14, 1601. https://doi.org/10.3390/math14101601

AMA Style

Gao C, Chen P. Oil Market Volatility Forecasting Under Uncertainty Theory: A Joint Modeling Framework via Uncertain Vector Autoregression. Mathematics. 2026; 14(10):1601. https://doi.org/10.3390/math14101601

Chicago/Turabian Style

Gao, Chenyu, and Piwei Chen. 2026. "Oil Market Volatility Forecasting Under Uncertainty Theory: A Joint Modeling Framework via Uncertain Vector Autoregression" Mathematics 14, no. 10: 1601. https://doi.org/10.3390/math14101601

APA Style

Gao, C., & Chen, P. (2026). Oil Market Volatility Forecasting Under Uncertainty Theory: A Joint Modeling Framework via Uncertain Vector Autoregression. Mathematics, 14(10), 1601. https://doi.org/10.3390/math14101601

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop