1. Introduction
Oil price volatility forecasting is crucial for financial risk management and macroeconomic policy formulation, as price fluctuations significantly affect inflation expectations, investment decisions, and economic stability across nations [
1,
2]. Recent volatility episodes have intensified the need for robust predictive models capable of capturing oil market dynamics. While conventional probability-based forecasting methods have proven effective with abundant historical data, they face fundamental limitations when oil market uncertainty primarily reflects expert judgments, geopolitical assessments, or imprecisely quantified fundamentals rather than statistical frequencies. Uncertainty theory, established by Liu (2007) [
3] and extended through the product axiom [
4], provides an alternative mathematical framework for modeling indeterminacy when data are scarce or derived from expert elicitation. Unlike probability theory, which operates through multiplication axioms, uncertainty theory employs a minimum-operation system based on normality, duality, and subadditivity axioms. This makes it particularly suitable for scenarios where expert belief degrees rather than statistical frequencies characterize phenomena. Liu (2010) initiated uncertain statistics to handle imprecise observations [
5], which has since evolved into two primary branches: uncertain regression analysis and uncertain time series analysis.
In the domain of uncertain regression analysis, Yao and Liu (2018) developed the foundational framework for determining quantitative relationships between variables [
6]. Subsequent methodological advances include the extension to confidence intervals by Lio and Liu (2018) [
7] and diversified parameter estimation techniques encompassing least absolute deviations [
8], Tukey’s biweight estimation [
9], and maximum likelihood estimation [
10]. Liu and Jia (2020) applied cross-validation procedures to improve model selection reliability [
11].
Yang and Liu (2019) introduced the uncertain autoregressive (UAR) model for time-dependent data [
12], in which observations are characterized as uncertain variables rather than random variables, representing a fundamental departure from classical time series analysis that relies on white noise residuals and strict distributional assumptions. The framework has since been extended along multiple dimensions: Zhao et al. (2020) derived closed-form analytical solutions under the least-squares criterion [
13]; Yang et al. (2020) and Liu (2021) proposed robust alternatives via least absolute deviations and Huber estimation, respectively [
14,
15]; Liu and Yang (2022) established a systematic cross-validation procedure for order selection [
16]; Wang et al. (2024) refined the least-squares inference procedure with a dedicated numerical algorithm [
17]; Liu and Qin (2025) developed a moment-estimation method validated on Disney stock prices [
18]; Shi and Sheng (2025) introduced an uncertain quantile autoregressive specification that characterizes the conditional distribution beyond the mean [
19]; and Wang et al. (2025) proposed a ridge estimator for the uncertain autoregressive moving average model [
20]. The practical utility of this expanding framework has been validated across a range of empirical contexts, including carbon dioxide emissions modeling [
21] and urban water demand prediction [
22], with demonstrated advantages in settings where conventional probabilistic assumptions are untenable.
Dutta et al. (2021) demonstrate that news-based Equity Market Volatility (EMV) trackers exert significant predictive power over crude oil market volatility, with the relationship exhibiting pronounced asymmetry across different volatility regimes [
23]. Critically, the EMV tracker is itself constructed from news-based textual analysis, whose inherent imprecision and non-frequency-based nature render conventional probabilistic autoregressive models fundamentally ill-suited—motivating the adoption of uncertainty theory as the modeling framework. Given the well-documented interdependence between oil market uncertainty indices and realized volatility, modeling these variables independently through separate univariate autoregressive (UAR) specifications inevitably omits cross-variable dynamics and spillover effects. Such misspecification risks producing forecasts that fail to fully exploit the informational content embedded in their joint evolution. To accommodate these multivariate dependencies, Tang (2020) extended the UAR framework to the uncertain vector autoregressive (UVAR) model [
24], which enables the simultaneous modeling of systems in which multiple variables interact dynamically through uncertain disturbances, thereby providing a natural vehicle for capturing cross-variable feedback mechanisms and volatility spillovers within the uncertainty-theoretic paradigm. Building on this foundation, Shi and Sheng (2024) further proposed an uncertain vector autoregressive smoothly moving average specification, providing greater flexibility for joint modeling of multivariate uncertain time series [
25]. Despite its theoretical appeal, the UVAR framework has yet to be systematically applied to oil market forecasting—a gap that is particularly consequential given the well-documented interdependence between uncertainty indices and realized volatility.
This study addresses this gap and makes four main contributions. First, it provides the first systematic application of the bivariate UVAR model to jointly forecast crude oil realized volatility and the overall EMV tracker, extending the uncertain time series literature beyond the largely univariate empirical applications that have dominated the field to date. Second, it implements and compares three cross-validation schemes—fixed-origin, rolling-origin, and rolling-window—for UVAR order selection, providing empirical guidance for model specification in this class of models that has hitherto been absent from the literature. Third, it documents an empirically significant asymmetric bidirectional relationship between the two variables and demonstrates that ignoring cross-variable dynamics—as in univariate UAR specifications—leads to hypothesis-test failure for realized volatility, a misspecification that is fully remedied by the joint UVAR specification; this conclusion is shown to hold at both and through a dedicated sensitivity analysis. The asymmetric trade-off is also documented at the level of the average testing error: UVAR(1) records a marginally higher fixed-origin ATE than UAR(1) on the EMV tracker but a substantially lower one on realized volatility, indicating that the joint specification trades a small loss on the auxiliary variable for a sizable gain on the variable of primary economic interest. Fourth, an out-of-sample rolling one-step-ahead evaluation benchmarks UVAR(1) against a conventional probabilistic VAR(1) model and shows that the two attain comparable predictive accuracy in terms of out-of-sample sum of squared mean errors, while UVAR uniquely supports uncertain-statistical inference—formal hypothesis testing and uncertain confidence intervals—under the imprecise-observation paradigm in which the Gaussian-innovation assumptions underlying probabilistic VAR are inappropriate. Together, these contributions demonstrate that explicitly modeling cross-variable dynamics offers actionable inputs for value-at-risk assessment, portfolio hedging, and macroeconomic stress testing in oil-dependent economies exposed to sharp volatility regimes of the kind observed during the 2008–2009 financial crisis and the 2020 pandemic shock.
The remainder of this paper is organized as follows.
Section 2 presents the theoretical foundations, including the definition of uncertain variables, the specification of the bivariate UVAR model, and the parameter estimation methodology.
Section 3 describes the data on the EMV tracker and crude oil realized volatility, presents empirical estimation results, and evaluates forecasting performance through comparisons against both univariate UAR and probabilistic VAR benchmarks, complemented by a sensitivity analysis of the hypothesis test.
Section 4 concludes with a summary of findings and directions for future research.
3. Results
Figure 1 summarizes the empirical procedure followed in this section.
3.1. Data Description
We apply the uncertain vector autoregressive (UVAR) model to analyze the relationship between the news-based Overall Equity Market Volatility (EMV) tracker and the realized volatility (RV) of crude oil. The EMV tracker data is collected from
http://www.policyuncertainty.com/, and crude oil realized volatility is computed from daily WTI price sourced from
https://cn.investing.com/. The dataset consists of monthly observations from January 2008 to December 2024, covering a period of significant economic events including the 2008 financial crisis, the European debt crisis, and the COVID-19 pandemic.
Variable 1 (): News-based overall EMV tracker. This is the Equity Market Volatility tracker constructed by computing the monthly share of newspaper articles in which a fixed list of equity-market-volatility keywords co-occurs with economic-condition keywords, scaled to match the level of the VIX over an anchor period. As a text-derived sentiment measure, the EMV tracker is inherently imprecisely observed rather than the realization of a frequentist sampling process.
Variable 2 (): Crude oil realized volatility (RV). This is the monthly realized volatility computed from daily WTI crude oil log-returns. While each daily price is recorded as a single number, the monthly RV value is itself a proxy for the latent monthly volatility process, rather than a directly measurable quantity, and is therefore subject to microstructure errors that motivate its treatment as an imprecisely observed value.
The dataset spans
monthly observations from January 2008 to December 2024.
Table 1 reports summary statistics for both variables over the full sample period.
With p-values for both the Jarque–Bera and Ljung–Box tests below 0.01, both series significantly deviate from normality and exhibit serial correlation.
Figure 2 displays the time series plot of both variables. From the figure, we can observe that both variables exhibit similar patterns of fluctuation, with notable spikes during periods of economic turmoil, particularly in 2008–2009 (financial crisis), 2011–2012 (European debt crisis), and 2020 (COVID-19 pandemic). This suggests a potential interdependence between the EMV tracker and realized volatility.
3.2. Data Preprocessing
Given the different scales of the two variables, we apply standardization to ensure comparability. For each variable
, the standardized observations are computed as follows:
where
and
are the sample mean and standard deviation of variable
i, respectively. The standardization parameters are:
Variable 1: , .
Variable 2: , .
After standardization, both variables have zero mean and unit variance, which facilitates the parameter estimation and interpretation of the UVAR model.
Given that neither variable is a precise frequentist realization—the EMV tracker is a textual-sentiment proxy and realized volatility is a microstructure-noise-corrupted proxy for an unobservable latent process, as discussed in
Section 3.1—we represent each observation as a linear uncertain variable with regular uncertainty distribution, following Tang (2020) [
24], so that parameters can be estimated within the regular uncertainty distribution framework.
3.3. Order Selection via Cross-Validation
To determine the optimal order
k of the UVAR model, we employ three cross-validation methods: fixed-origin, rolling-origin, and rolling-window cross-validation. The training set size is set to
, leaving 61 observations for testing. For each candidate order
, we compute the average testing error (ATE) using the three methods. The results are presented in
Table 2.
From
Table 2, we observe that
yields the smallest ATE values across all three cross-validation methods. Therefore, we select the UVAR(1) model for our analysis. This result suggests that the current values of both variables are primarily influenced by their immediate past values, which is consistent with the short-term memory characteristics often observed in financial time series.
3.4. Parameter Estimation
Based on the least-squares principle described in
Section 2, we estimate the parameters of the UVAR(1) model using the training dataset (
). The estimated UVAR(1) model is
The parameter estimates reveal several important findings:
Autoregressive effects: Both variables exhibit positive autoregressive coefficients ( and ), indicating persistence in their respective dynamics. The realized volatility shows stronger persistence () compared to the EMV tracker ().
Cross-variable effects: The positive coefficient suggests that past realized volatility has a substantial positive impact on current EMV tracker values, while the coefficient indicates that past EMV tracker values exert a modest positive influence on current realized volatility. This bidirectional relationship confirms the existence of cross-variable dependencies in the system, which justifies the use of the UVAR model framework. By incorporating both lagged variables, the UVAR model exploits these interdependencies to achieve more accurate and stable volatility forecasts.
Intercept terms: The small negative intercepts ( and ) are close to zero, which is expected given the standardization of variables.
3.5. Residual Analysis and Hypothesis Testing
According to
Section 2, we conduct residual analysis to assess the goodness-of-fit of the estimated UVAR(1) model. The residuals are computed as in Equation (
7). The estimated expected values and variances of the disturbance terms are presented in
Table 3.
The expected values of both disturbance terms are essentially zero, which is consistent with the model assumptions. The variance of the first disturbance term () is larger than that of the second (), suggesting that the EMV tracker exhibits greater unpredictability compared to realized volatility after accounting for autoregressive and cross-variable dynamics.
To formally test the appropriateness of the estimated parameters, we apply the uncertain hypothesis test described in
Section 2. At a significance level of
, we test the null hypothesis
. For the training set of size
, the critical threshold is
. The hypothesis test results are summarized in
Table 4.
The spatial distribution of residuals over time, along with the identified outliers and acceptance bounds, is illustrated in
Figure 3 and
Figure 4 for
and
, respectively.
Although variable 1 has four outliers and variable 2 has seven outliers, both are within the acceptable range under the 5% significance level. Therefore, we accept the null hypothesis for both variables, indicating that the estimated UVAR(1) model provides a good fit to the data.
To assess the robustness of the hypothesis-test outcome to the choice of significance level, we recompute the test for
. The corresponding critical thresholds and outlier counts are reported in
Table 5.
At the conventional levels and , the UVAR(1) model is comfortably accepted for both residual sequences, indicating that the goodness-of-fit conclusion reported before is not an artifact of the specific significance level chosen. The rejection at reflects the unusual tightness of the corresponding critical threshold: only outliers are tolerated over 142 training observations, so even two extreme residuals—which are inevitable for a financial time series spanning multiple turbulent episodes such as the 2008 financial crisis and the 2011 European debt crisis—suffice to trigger rejection. This regime is uninformative for model discrimination, as essentially any low-order specification fitted to a series exhibiting heavy-tailed shocks of this kind would fail at this level. Overall, the sensitivity analysis confirms the adequacy of the UVAR(1) specification across the conventional significance levels typically reported in the empirical literature.
3.6. Forecasting and Confidence Intervals
Based on the estimated UVAR(1) model, we forecast the values of both variables for the next period (th observation in the standardized scale, corresponding to January 2025 in the original scale).
According to
Section 2, the point forecast for the standardized variables is given by the general forecast formula, with
representing the standardized observations at time
n. The forecast results are
and
.
To construct the 95% confidence intervals, we assume that the disturbance terms follow normal uncertainty distributions . The 95% confidence intervals for the standardized forecasts are and .
To obtain forecasts in the original scale, we apply the inverse transformation:
This yields the following point forecasts and 95% confidence intervals:
Overall EMV tracker: , with 95% CI .
Realized volatility: , with 95% CI .
3.7. Comparison with Univariate UAR and Probabilistic VAR Benchmarks
3.7.1. Comparison with the Univariate UAR(1) Model
To demonstrate the advantage of the UVAR model, we also fit separate uncertain autoregressive (UAR) models to each variable and compare the results. The UAR(1) models are
The comparison is reported in
Table 6.
The UVAR(1) model achieves lower estimated residual variance for both variables.
Figure 5 visualizes the 95% confidence bands of the two models on the test set; while the two bands appear visually similar across most of the test period, computing the average band width over the 61 test points reveals that UVAR(1) bands are narrower than UAR(1) bands for both variables (28.346 vs. 28.575 for the EMV tracker; 2.921 vs. 2.976 for realized volatility).
Beyond residual variance and CI width, the most consequential difference between the two specifications concerns the uncertain hypothesis test of
Section 3.5. Repeating the test for the two univariate UAR(1) models at
yields five outliers for the EMV tracker (accepted) and eight outliers for realized volatility (rejected, against a critical threshold of 7.1). The same pattern persists at
, where the UAR(1) model for realized volatility records 17 outliers against a threshold of 14.2 and is again rejected. In contrast, the joint UVAR(1) model is comfortably accepted for both series at both significance levels (
Table 5). The univariate UAR(1) specification is therefore inferentially inadequate for realized volatility: it discards the cross-variable information that the joint UVAR(1) model exploits, and the resulting misspecification is detected by the formal residual diagnostic.
A natural question is whether the in-sample fixed-origin ATE for the UVAR(1) model dominates the corresponding UAR(1) values. Comparing the fixed-origin ATE of UAR(1) (
for the EMV tracker,
for realized volatility) with that of UVAR(1) (
; see
Table 2), one observes that UVAR(1) performs slightly worse than UAR(1) on the EMV tracker but substantially better on realized volatility. This asymmetry is a direct consequence of the joint specification: the UVAR(1) ATE is a single number that aggregates the testing error of both equations, so the model trades marginal predictive accuracy on the EMV tracker for substantial predictive gains on realized volatility through the cross-variable channel. Combined with the failure of UAR(1) to pass the uncertain hypothesis test for realized volatility documented above, this confirms that the joint UVAR(1) specification is preferred on both inferential and forecasting grounds for the variable that is the primary economic object of interest in oil volatility forecasting.
3.7.2. Comparison with the Probabilistic VAR(1) Model
We further benchmark UVAR(1) against the conventional probabilistic VAR(1) model estimated by maximum likelihood under Gaussian innovations. At each step of the rolling exercise on the test set
, both models are re-estimated on an expanding window
and produce a one-step-ahead point forecast for period
t. Forecast accuracy is evaluated on the original de-standardized scale via the sum of squared mean errors:
where
. Results are reported in
Table 7.
UVAR(1) marginally outperforms probabilistic VAR(1) on both series in terms of SSM. The probabilistic VAR(1) model relies on Gaussian innovations and finite-variance asymptotics to justify its hypothesis testing and interval estimation—assumptions that are clearly violated for both the news-based EMV tracker (a textual-sentiment proxy whose imprecision is non-frequentist by construction) and crude oil realized volatility (a microstructure-noise-corrupted proxy for an unobserved latent process; see the Jarque–Bera test results in
Table 1). The UVAR(1) framework, in contrast, supports formal hypothesis testing and uncertain confidence intervals without invoking such distributional assumptions, making it more appropriate for the data at hand.
4. Conclusions
This study applies the bivariate uncertain vector autoregressive (UVAR) model to jointly forecast crude oil realized volatility and the Overall Equity Market Volatility (EMV) tracker, exploiting their well-documented bidirectional interdependence within the framework of uncertainty theory. The estimated UVAR(1) model reveals an asymmetric cross-variable relationship: lagged realized volatility exerts a substantially stronger positive influence on the contemporaneous EMV tracker (coefficient 0.2040) than the reverse channel (coefficient 0.0873), indicating that actual volatility shocks propagate into uncertainty sentiment with greater intensity than uncertainty expectations feed back into realized market fluctuations.
By capturing these cross-variable dynamics, the UVAR(1) model outperforms univariate UAR(1) benchmarks on multiple criteria: it yields lower residual variances and, on average, narrower 95% uncertainty confidence intervals for both variables, and—most critically—the UAR(1) model applied to realized volatility fails the uncertain hypothesis test at the 5% significance level, a misspecification that is fully remedied by the joint UVAR specification. Although UVAR(1) records a marginally higher in-sample fixed-origin ATE than UAR(1) on the EMV tracker, this is more than offset by substantial gains on realized volatility, which is the primary economic object of interest in oil market risk management; the joint specification therefore trades a small loss on the auxiliary variable for a sizable gain on the variable that matters most to practitioners. A sensitivity analysis confirms that the goodness-of-fit conclusion for UVAR(1) holds at both and . The rejection at the tighter level is uninformative for model discrimination: the corresponding critical threshold of outliers is so stringent that even two extreme residuals—inevitable for a financial series containing turbulent episodes such as the 2008 financial crisis and the 2011 European debt crisis—suffice to trigger rejection, an outcome that any low-order specification fitted to such a series would share.
The UVAR(1) framework also compares favorably with the conventional probabilistic VAR(1) benchmark in an out-of-sample rolling one-step-ahead exercise, attaining marginally lower sum of squared mean errors on both series while uniquely supporting principled uncertain-statistical inference. These findings demonstrate the suitability of the uncertainty-theoretic framework for energy-market forecasting, particularly for financial time series that markedly violate classical distributional assumptions. The calibrated point forecasts and uncertainty confidence intervals produced by the UVAR framework are directly actionable for practitioners, providing principled inputs for value-at-risk assessment, portfolio hedging against oil price swings, and macroeconomic stress testing in oil-dependent economies.