1. Introduction
Realized variances are ex-post measures of return variation that are typically constructed by summing the squared values of high-frequency log returns (see, e.g.,
Andersen and Bollerslev 1998;
Andersen et al. 2003;
Barndorff-Nielsen et al. 2008). The use of realized variances has revolutionized methods of modeling and forecasting volatility over the past two decades. Indeed, realized variances are now employed in a wide variety of intriguing applications. Some recent examples include forecasting volatility for stocks included the S&P 500 index via machine learning methods (
Zhu et al. 2023), forecasting volatility for international real estate investment trusts (
Bonato et al. 2022), modeling time-varying conditional skewness in equity markets (
Kirby 2024), pricing options on the Chicago Board Options Exchange volatility index (
Tong and Huang 2021), developing dynamic tail-risk models to aid in measuring and managing financial risk (
Chen et al. 2023), and studying volatility spillovers across cryptocurrency markets (
Ben Ameur et al. 2024).
Many of the econometric studies that employ realized variances are conducted using either daily log returns or daily simple returns (see, e.g.,
Gorgi et al. 2019;
Hansen et al. 2012,
2024;
Noureldin 2022;
Noureldin et al. 2012). Researchers seldom feel the need to differentiate between simple returns and log returns in such studies because doing so is unnecessary from an empirical perspective. If the holding period for a stock or stock index is a single day, then the difference between the variance of a simple return and the variance of the corresponding log return will typically be negligible. However, the differences between the statistical properties of simple returns and those of log returns become more pronounced as the holding period increases. Thus, they are unlikely to be negligible for research that addresses asset pricing, portfolio optimization, and related topics, which is usually conducted using simple returns for weekly, monthly, or quarterly holding periods (see, e.g.,
Avramov et al. 2006;
Kirby and Ostdiek 2012;
Yogo 2006).
For instance, the covariance matrix of simple returns plays a central role in
Markowitz (
1952) portfolio selection. Although it would be straightforward to use realized variances computed from log returns to construct an estimator of the covariance matrix of simple returns, the estimator would typically be biased given that simple returns are nonlinearly related to log returns. Consider the case in which log returns are normally distributed. Because simple returns have a lognormal distribution under these circumstances, the variance of simple returns is typically higher than the variance of log returns in this case. This clearly suggests that it would be useful to develop a procedure for constructing realized variances that are unbiased estimators of the variances of simple returns.
More broadly, it is important to note that the high-frequency data needed to construct daily realized variances may not be available for the full sample period of interest. The first year of the Trade and Quote data provided by the New York Stock Exchange is 1993. In contrast, the coverage of the daily stock file of the Center for Research in Security Prices begins in 1926. The widespread availability of daily historical data makes it well suited for estimating the unconditional and conditional variances of lower-frequency stock returns. I investigate this approach using a new technique for constructing realized measures. Unlike the conventional construction technique pioneered by
Andersen and Bollerslev (
1998), the new technique delivers realized measures that are unbiased estimators of the unconditional and conditional variances of simple returns in a discrete time setting under relatively mild assumptions that are frequently invoked in the volatility modeling literature.
I begin by conducting a Monte Carlo study of the relative estimation errors that result from using the new and conventional realized measures as estimators of the unconditional and conditional variances of simple returns and log returns for a range of different holding periods. The results of the study demonstrate that my technique for constructing realized measures of the variances of simple returns works as intended. I find no evidence of bias for any holding period and the proposed realized measures deliver improvements in estimation efficiency that are comparable to those produced by conventional realized measures of the variances of log returns.
To develop further insights, I use S&P 500 index data to investigate the performance of pseudo out-of-sample variance forecasts that are constructed using the new realized measures. The empirical analysis employs the generalized autoregressive conditional heteroskedasticity (GARCH) model of
Bollerslev (
1986) as a benchmark and is conducted in a manner that isolates the incremental gains from using the new realized measures for modeling purposes. First, I fit a GARCH(1,1) model to simple returns for both weekly and monthly holding periods. Next, I replace the squared demeaned simple returns in the recurrence relation for the conditional variance under the GARCH(1,1) model with the corresponding realized measures. Finally, If fit the resultant specification, which is a multiplicative error model (MEM) of the type introduced by
Engle (
2002), to the same sample of simple returns for weekly and monthly holding periods.
Because the only difference between the recurrence relations for the conditional variances under the GARCH(1,1) and MEM(1,1) specifications is that former employs squared demeaned simple returns and the latter employs realized measures, the performance advantage of the MEM(1,1) specification (if any) is due to the incremental gains from using realized measures. I use the
Giacomini and White (
2006) test of equal predictive ability to assess whether the differences in performance are statistically significant. As anticipated, I find that the MEM forecasts produce smaller mean errors, smaller mean absolute errors, and smaller root mean square errors than the GARCH forecasts for every forecast horizon under consideration at both the weekly frequency and the monthly frequency. However, the results for the monthly observations are stronger from the standpoint of statistical significance. I find that the smallest
t-statistic produced by the test of equal predictive ability is 2.61 in this case. Because I reject the hypothesis that the GARCH forecasts of monthly variances are just as accurate as the MEM forecasts of monthly variances at the 1% significance level, irrespective of the forecast horizon, I conclude that the proposed realized measures of the variances of simple returns deliver meaningful performance gains.
Although these results are illustrative, the proposed realized measures could obviously be exploited in other ways. For example, researchers have developed a variety of specifications that use conventional realized measures to model volatility dynamics, such as the heterogeneous autoregressive volatility (HAR) model of
Corsi (
2009), the high-frequency-based (HEAVY) model of
Shephard and Sheppard (
2010), and the realized GARCH model of
Hansen et al. (
2012). By replacing the realized measures constructed from high-frequency log returns with realized measures constructed from daily simple returns, any of these specifications could be used to model and forecast the conditional variances of simple returns for weekly, monthly, or quarterly holding periods.
It is also clear that the proposed realized measures can be employed to model and forecast the conditional variances of lower-frequency returns for any asset or commodity for which daily price data are readily available (e.g., the Eurodollar exchange rate, crude oil, or Bitcoin). Of course, using this approach for seasonal commodities would require a volatility model that is capable of capturing seasonality, such as a periodic MEM analog of the periodic GARCH model of
Bollerslev and Ghysels (
1996). But implementing this extension should be relatively straightforward from an econometric perspective.
The rest of the article is organized as follows.
Section 2 shows how to construct realized measures that are unbiased estimators of the unconditional and conditional variances of simple returns.
Section 3 discusses the results of the Monte Carlo study.
Section 4 describes the GARCH(1,1) and MEM(1,1) specifications used to forecast conditional variances for the S&P 500 index and presents the results of the pseudo out-of-sample performance comparisons. Finally,
Section 5 offers some concluding remarks.
3. Monte Carlo Analysis
I use Monte Carlo integration to document the properties of the unbiased variance estimators discussed in
Section 2. The DGP for the study is a well-known variant of the GARCH(1,1) model of
Bollerslev (
1986). In particular, I generate the single-period log returns from the model
where
,
,
,
, and
is an
random variable. This specification is well suited to Monte Carlo work because it allows
,
,
, and
to be computed analytically.
1Daily S&P 500 index data for the years 1946 through 2023 (19,835 observations) are used to calibrate the DGP. The data are from two sources: the daily stock file of the Center for Research in Security Prices for 3 July 1962 to 29 December 2023 and a dataset compiled by
Schwert (
1990) for 2 January 1946 to 2 July 1962.
2 First, I use the method of maximum likelihood to fit the model to daily log index returns subject to
and
.
3 Second, I set the values of
,
, and
in Equations (
14) and (15) equal to their maximum likelihood estimates, generate
with
and
, and construct
by setting
and computing
for all
i.
4 Third, I use the simulated data to calculate
and
for each
. Because there are roughly 252 trading days per year for the S&P 500 index, I consider
,
,
,
, and
to approximate weekly, monthly, quarterly, semiannual, and annual holding periods.
Table 1 summarizes the results for 10 million simulated observations (i.e.,
T = 1,000,000). Panel A examines the properties of the relative estimation errors for unconditional variances. The initial six columns report the mean, mean absolute, and root mean square values of
and
for the six values of
K under consideration (denoted by ME, MAE, and RMSE). For
, the results for log returns are nearly identical to those for returns. But differences emerge as
K increases.
As anticipated, the mean errors are quite small (zero to three decimal places) because and are unbiased estimators of and . The largest RMSEs correspond to for log returns and for returns. An increase in the RMSE is always indicative of an increase in kurtosis, which can be expressed as 1 plus the mean square error. The smallest MAEs and RMSEs correspond to .
Now consider the results in the final six columns of panel A, which contain the mean, mean absolute, and root mean square values of
and
for the six values of
K under consideration.
5 The realized measures show no indications of bias and are clearly much more efficient estimators of
and
for
than
and
. Notice, for example, that replacing
with
reduces the RMSE from
to
with
. This is a reduction of
. Furthermore, the improvements in efficiency become more pronounced as
K increases. The RMSE drops from
to
for the
case, which is a reduction of 86.2%.
Panel B examines the properties of the relative estimation errors for the conditional variances using the same layout as panel A. Once again, the mean errors are zero to three decimal places in all cases and there are large gains in efficiency from employing the realized measures. The reduction in the RMSEs relative to those reported in panel A is an indicator of the benefits exploiting conditioning information. The RMSEs drop by 0.203 (12.6%) in all cases for . As K increases, the drop always becomes smaller in raw numerical terms. But the percentage drop in the RMSE does not display a monotonic relation with K. For example, the RMSE for drops by for .
Overall, the Monte Carlo evidence indicates that the proposed technique for constructing realized measures that are unbiased estimators of the variances of holding-period returns works as intended. It achieves improvements in efficiency that are comparable to those achieved by the conventional technique for constructing realized measures of the variances of multiperiod log returns. I now turn to an empirical application that focuses on forecasting the conditional variances of weekly and monthly S&P 500 index returns.
4. Out-of-Sample Results for the S&P 500 Index
To lay the groundwork for the discussion, assume that the objective is to forecast the variance of a financial variable
using a realization of the sequence
for some
. Because the GARCH(1,1) model of
Bollerslev (
1986) is known to perform well in a variety of settings, it is often used to construct such forecasts. If the DGP is a GARCH(1,1) specification, then
can be expressed as
where
,
, and
. Thus,
is a conditionally-unbiased
s-step-ahead forecast of
.
Now consider an alternative
s-step-ahead forecast of
that is constructed from a realization of the sequence
, where
is a conditionally-unbiased realized measure of the variance of
with dynamics that are described by an MEM specification of the type introduced by
Engle (
2002). If the DGP is an MEM(1,1) specification, then
can be expressed as
where
is strictly non-negative and satisfies
. Because
is a conditionally unbiased
s-step-ahead forecast of
, it clearly has the potential to outperform
as a forecast of
.
I focus on the case in which
is a weekly or monthly return on the S&P 500 index and the realized measure of its variance is constructed from daily returns. Presumably, variance forecasts based on realized measures should generally be more accurate than those based on weekly or monthly returns. I therefore use the pseudo out-of-sample forecasts produced by the GARCH(1,1) specification to benchmark the performance of the pseudo out-of-sample forecasts produced by the MEM(1,1) specification. As in
Giacomini and White (
2006), I conduct the analysis using limited-memory estimators of the parameters.
Because the GARCH(1,1) model implies that
, it is essentially a MEM(1,1) specification for
. Furthermore, the recurrence relation for
in Equation (18) can be transformed into the recurrence relation for
in Equation (21) by replacing
with
and relabeling the parameters. It is apparent, therefore, that the research design ensures that performance advantage of the MEM(1,1) specification (if any) is due to the incremental gains from using realized measures as long as the approach used to fit the GARCH and MEM specifications puts them on an equal footing. This poses no issues because fitting the GARCH specification under the assumption that
produces the same results as treating
as an MEM specification and fitting it by assuming that
is a serially independent exponential random variable with a unit mean (see
Engle 2002, for further elaboration).
To illustrate, suppose
is the number of observations in a rolling window of weekly or monthly returns. For each choice of
s and value of
, I construct an estimate of
for
using the estimate of
obtained by minimizing
subject to
,
, and
, where
and
. Similarly, for each choice of
s and value of
N, I construct an estimate of
for
using the estimate of
obtained by minimizing
subject to
and
, where
. The resultant estimated values of
,
, and
are denoted by
,
, and
.
Several features of this procedure are worthy of further comment. First, apart from an additive constant,
and
are the log quasi-likelihood functions that result from treating
as
and
as an exponential random variable with a rate parameter of one. Thus, the estimators of
and
are consistent under the usual regularity conditions for quasi-maximum likelihood estimation. Second, I use the sample mean of
, sample variance of
, and sample mean of
that are computed from the initial
W observations of the rolling window as estimators of
,
, and
. This targeting approach simplifies optimization. Third, the procedure produces horizon-tuned forecasts because the estimates of
,
,
, and
are specific to the value of
s under consideration.
6To formally compare the accuracy of
and
as
s-step-ahead forecasts of
, I use the unconditional version of the
Giacomini and White (
2006) test of equal predictive ability. The test is based on the criterion
which is the difference between the absolute error losses produced by
and
.
7 The null hypothesis for the test is
. Hence, inference is conducted using the
t-statistic for
If
is positive and statistically significant, then the test indicates that the
s-step-ahead MEM forecasts outperform the
s-step-ahead GARCH(1,1) forecasts under MAE loss.
8The weekly and monthly index returns along with their realized variances are computed from daily index data for the years 1946 through 2023. As is typical in the finance literature, I use the actual number of trading days in a given week or given month rather than a fixed value of
K for the computations. Because the daily index returns display some evidence of negative first-order serial correlation, I account for the impact of this feature by computing the realized measures as
rather than as shown in
Section 2.
9 Here
D denotes the number of trading days for the week or month in question. I specify
for the weekly data and
for the monthly data (50% of the number of available observations in each case). To aid in interpreting the findings, I also conduct tests of equal predictive ability using weekly and monthly observations of log returns and their realized variances.
4.1. Properties of the Rolling-Window Parameter Estimates
Table 2 examines the properties of the sequence of parameter estimates produced by the rolling-window optimizations for each specification. Panels A and B present the results for weekly log returns and weekly returns. Not surprisingly, the average estimates of
and
for
point to strong persistence in the conditional variances for both log returns and returns. The results also indicate that the estimates of
and
are quite stable over time. In panel A, for example, the estimate of
for
ranges from 0.943 to 0.977 and the estimate of
for
ranges from 0.959 to 0.975.
The results for and in panel A display some interesting patterns. First, the average estimate of is somewhat smaller than the average estimate of for , , and . This finding suggests the conditional variance process of weekly log returns displays a weaker response to shocks under the GARCH specification than under the MEM specification. Second, the average estimate of declines monotonically with s, whereas the average estimate of does not. But there is a sharp drop in the average estimate of for . Although the underlying mechanism that leads to this finding is not immediately apparent, the findings for weekly returns mirror those for weekly log returns in all respects.
Panels C and D present the results for monthly log returns and monthly returns. As anticipated, the average estimates of and are somewhat lower than the corresponding values in panels A and B, which is consistent with returns following a stationary stochastic process. But the results still point to a substantial degree of persistence in the conditional variances. There is also more variation in the estimates of and over time for the monthly observations, which is an expected consequence of the sharp reduction in the number of observations in the rolling window used for estimation purposes.
Perhaps the most intriguing aspect of the results in panels C and D is that the average estimate of is considerably smaller than the average estimate of for , , and . This pattern suggests that the GARCH specification produces a smoother sequence of conditional variance forecasts than the MEM specification, which could indicate that the latter specification has an advantage in tracking the conditional variances. Notice that the average estimate of for is relatively low by comparison. Because for monthly observations is roughly equivalent to for weekly observations, the relation between the average estimate of and the forecast horizon is similar at both frequencies.
4.2. Conditional Volatility Forecasts
To develop further insights, I plot the conditional volatility forecasts for weekly returns and monthly returns.
Figure 1 shows side-by-side plots of the GARCH and MEM forecasts for weekly returns. The upper panels are for
and lower panels are for
. Although the side-by-side comparisons highlight the broad similarities in the forecasts for both forecast horizons, it is easy to spot a few differences. For instance, the spike in the one-step-ahead forecast of conditional volatility that follows the 1987 stock market crash is larger for the MEM specification than for the GARCH specification. But it is clear from the plots that the GARCH and MEM forecasts are highly correlated as a general rule.
Of course, this finding does not necessarily imply that the differences in the predictive ability of GARCH and MEM forecasts is negligible. If the MEM forecasts are more efficient than the GARCH forecasts, then they should have a performance advantage in formal statistical tests provided that the sample size is sufficiently large. Furthermore, the results of the Monte Carlo analysis indicate that gains from employing realized measures are inversely related to the investment horizon used for the analysis.
Consider the side-by-side plots of the GARCH and MEM forecasts for monthly returns, which are shown in
Figure 2. The visual differences in the plots are certainly more pronounced in this case. Not only are the one-step-ahead GARCH forecasts relatively smooth, they are also confined to a much narrower range than one-step-ahead MEM forecasts. These features are broadly consistent with a scenario in which the realized measures are more efficient estimators of the conditional variances than the squared demeaned returns.
4.3. Hypothesis Tests
The tests of equal predictive ability provide formal evidence in this regard. The results of the tests are presented in
Table 3. The initial eight columns of the table report the mean, mean absolute, root mean square, mean square values of
and
for the four choices of
s: 1, 3, 6, and 12. The final three columns report
, its
t-statistic, and the associated
p-value.
The results in panel A are for weekly log returns. Notably, the MEM forecasts produce smaller MEs, MAEs, and RMSEs than the GARCH forecasts at every forecast horizon. The largest difference in the RMSE corresponds to : 3.604 versus 2.440. But the test of equal predictive ability produces a p-value of 0.129 in this case. Broadly speaking, however, the test favors the MEM forecasts. Note that it produces a t-statistic of 1.75 () for and 2.30 () for .
The results in panel B are for weekly returns. Once again, the MEM forecasts produce smaller MEs, MAEs, and RMSEs than the GARCH forecasts at every forecast horizon. The other findings are also similar to those for weekly log returns. The test of equal predictive ability favors the MEM forecasts, yielding a t-statistic of 1.93 () for and 2.31 () for .
The results in panels C and D are for monthly log returns and monthly returns. The overall pattern of the MAEs and RMSEs mirrors that in panels A and B. However, the evidence regarding the superiority of the MEM forecasts is considerably stronger at the monthly frequency. The smallest t-statistics in panels C and D are 2.36 and 2.61, which have p-values of 0.018 and 0.009. Hence, the null hypothesis of equal predictive ability is rejected at the 1% level for every forecast horizon for monthly returns. This finding highlights the extent to which the new realized measures for lower-frequency returns deliver meaningful performance gains.
4.4. Broader Implications of the Analysis
The implications of the results for the MEM(1,1) specification extend beyond the specification itself because they suggest that replacing squared demeaned returns with the proposed realized measures can be adopted as a general strategy for improving the performance of volatility models for lower-frequency returns. Indeed, if the objective is to model the conditional variances of lower-frequency returns, then the proposed realized measures could be in place of conventional realized measures in any existing specification that uses conventional realized measures. Some prominent examples of such specifications include the heterogeneous HAR model of
Corsi (
2009), the HEAVY model of
Shephard and Sheppard (
2010), and the realized GARCH model of
Hansen et al. (
2012).
As for potential applications of the methodology, it can be implemented for any asset or commodity for which daily price data are readily available. This includes international equity indexes, exchange rates, commodities, and cryptocurrencies. It should therefore prove useful in many types of research, especially if the focus is on volatility modeling, asset pricing, or risk management over medium- to long-term horizons. Depending on the application, it might be necessary to address additional features of the DGP. One that immediately comes to mind is deterministic patterns in the volatility of seasonal commodity returns. In this case, the methodology could be implemented using a volatility model that is capable of capturing seasonality, such as a periodic MEM analog of the periodic GARCH model of
Bollerslev and Ghysels (
1996).
4.5. Caveats
The discussion in
Section 2.3 addresses two of the key assumptions that underpin the methodology and outlines techniques for relaxing these assumptions should they fail to hold in the setting of interest. But it is worthwhile to mention a few other caveats about implementing the methodology using a given specification of the DGP. Although the basic MEM(1,1) specification is useful for illustrating the incremental improvements in forecasting performance that result from using the proposed realized measures in place of squared demeaned returns, it should clearly be subject to specification tests before being adopted in a particular application. A potential concern is that, like the benchmark GARCH(1,1) specification, the basic MEM(1,1) specification is incapable of capturing leverage effects. This concern could easily be addressed by replacing Equation (21) with
where
is the indicator function. This would yield an asymmetric MEM(1,1) analog of the threshold GARCH(1,1) specification of
Glosten et al. (
1993).
It is also important to consider the potential impact of phenomena, such as structural breaks, that would invalidate the assumption that simple returns are weakly stationary. Note, however, that this caveat is applicable to any volatility model that assumes weak stationarity. If an MEM(1,1) model is misspecified due the presence of a structural break or breaks, then the same is true of a GARCH(1,1) model.
5. Conclusions
The availability of high-frequency data on stock prices has transformed the volatility modeling literature over the past two decades. But there are still good arguments for using daily returns to estimate the volatility of longer-horizon returns, especially for sample periods that begin prior to 1993. Because the statistical properties of log returns differ from those of returns and the differences increase with the investment horizon, I show how to construct realized measures that are unbiased estimators of the unconditional and conditional variances of returns in a discrete-time setting, provided that the DGP satisfies relatively mild assumptions that are often invoked in the volatility-modeling literature. The empirical evidence indicates that using the proposed realized measures to compute out-of-sample forecasts of the variances of weekly and monthly returns on the S&P 500 index leads to significant improvements in forecast accuracy. Hence, the measures should be useful in research that addresses asset pricing, portfolio optimization, and related topics, which is typically conducted using returns for weekly, monthly, or quarterly holding periods.
For example, suppose an investor wants to estimate the conditional value at risk for a long position in an equity index over a one-month holding period. This could be accomplished in two simple steps. First, use the realized measures constructed from daily index returns to fit an MEM specification to the monthly index returns. Second, use the resultant sequence of estimated conditional volatilities to standardize the sequence of monthly index returns, find the quantile of the standardized index returns that corresponds to the desired confidence level for the value-at-risk criterion, and compute the conditional value at risk using the estimated conditional volatility for the month that follows the final month in the sample period.
The methodology could also be exploited in macro-finance applications. Suppose, for instance, that a researcher is interested in assessing the influence of macroeconomic variables, such as industrial production, on the volatility of monthly stock market returns. One approach for doing so would be to fit an autoregressive model to the logarithm of the gross monthly growth rate of industrial production, use the resultant parameter estimates to compute the estimated multiplicative shocks to the growth rate, and augment the conditional variance recursion of an MEM specification for monthly stock market returns with lagged values of the estimated growth-rate shocks. This approach would allow a long sample period to be used for the analysis because a century’s worth of monthly data on U.S. industrial production and daily data on U.S. market returns are currently available.