Quantifying Risk in Traditional Energy and Sustainable Investments

: These days we are witnessing a deep change in the characteristics of the type of energy that our economies are supplied with. A clear trend is that sustainable and green energies are decisively replacing traditional fossil fuel-based sources of energy. For various reasons, this fundamental change implies an increasing risk in investments on portfolios heavily based on traditional energy industries. What is less known, is that these industries have returns that show a very low correlation with sustainable fossil fuel-free stock portfolios making them an appealing tool for portfolio managers to design properly diversiﬁed investments. In this study we examine this and related phenomena proposing statistical methods to implement the expected shortfall (ES), the challenging risk measure recently adopted by the ﬁnancial regulator. We obtain evidence that a newly proposed backtesting procedure for the ES based on multinomial tests is an adequate and simple method to validate these risk measures when applied to a highly volatile stock index. Backtesting results of the ES show that ﬂexible heavy-tailed distribution α –stable performs well for modelling the loss distribution. These results are even improved when the variances of fossil fuel price returns are included as external regressors in the GARCH model variance equation. In this case, the ES computed from the four considered loss distributions perform properly.


Introduction
The debate on the role of fossil fuels in climate change affects all facets of society. The financial industry is also being affected, both by a progressive increase in awareness of the potential impact of climate change on investments and by the risk of fossil fuels becoming "stranded," that is, unburned or in the ground, as regulation increases. The traditional energy industry is currently exposed to downside risks from write-offs or revaluations of these unsustainable assets. However, companies in the traditional energy industry have been used for diversification purposes and have demonstrated their potential to provide high realized returns along with high volatility as commodity prices rise or fall. While there is a move towards divestment in fossil fuels, replacing investment in the traditional energy sector with other sustainable investments, individual and institutional investors seek to balance risk and expected return (For instance, the Rockefeller Family Fund publicly announced its decision to divest from fossil fuels. In addition, a report by Moody's [1] notes that 175 oil and mining companies were on below investment grade watch in early 2016, mainly because of the shift from carbon-intensive fossil fuel to renewable energy investment, that is transition risk, which affects oil prices.). Beyond the growing awareness of climate change and regulatory risk and from a strictly financial point of view, portfolios based on traditional energy industries have underperformed sustainable global Sustainability 2019, 11, 720 2 of 22 portfolios in the last decade. Nevertheless, the situation is reversed in times of uncertainty and crisis since assets of traditional energy industry have a very small or even negative correlation with the rest. From a diversification perspective, these makes this type of assets particularly appeal to portfolio managers. A better diversification of portfolios of large institutional investors could increase investments in sustainable companies, based on globalized financial flows, thus indirectly providing more resources for project funding aimed at environmental protection or sustainable development. In this context, it is relevant to examine which measurements of the risk inherent in investing in the traditional energy industry are more adequate, in combination with a thorough analysis to discriminate suitable methods for validating risk measurement models. The existing literature proposes several risk measurement tools to provide financial institutions, risk managers and market participants with appropriate technical approaches to measure the risk of the financial markets. It is therefore important for these market players to adequately quantify the potential economic loss of their investments. In the case of the traditional energy industry, the literature focuses risk quantification based on Value-at-Risk (VaR) but there is scarce work regarding the new trend-setting topic of the Expected Shortfall (ES) backtest (Backtesting is the process of comparing daily actual and hypothetical profits and losses with model-generated measures to assess the conservatism of risk measurement systems). In this paper, we examine the use and validation of the ES or Conditional Value-at-Risk (CVaR), as the risk measure recently recommended by banking regulators, in two broadly diversified investments, one in the traditional energy industry and one excluding fossil fuel companies, during the last decade. In addition, we implement a new ES backtesting procedure based on multinomial tests.
The main objective of our research in this paper is to examine the ability of ES risk measures to correctly quantify the risk in investments in the oil, gas and coal industries. ES is defined as the expected loss conditional on the loss being greater than the VaR level. In January 2016, financial regulators propose the use of the ES instead of VaR to prudently capture tail risk and capital adequacy. Regulatory capital calculations are no longer based on the idea that a bank would survive in normal market conditions with a certain level of confidence (VaR) but on trying to ensure survival in extreme market conditions by capturing tail risks (ES) (The Basel Committee on Banking Supervision has been establishing a set of international global regulatory standards for banking regulation. The different Basel capital accords aim to strengthen the stability of the international banking system. In January 2016, the "Fundamental Review of the Trading Book" proposes to replace the well-established VaR with another risk measure, the expected deficit (ES), for the calculation of capital requirements for market risk (see in particular "Minimum capital requirements for market risk," or the current version of January 2019,)). This change is challenging for portfolio and risk managers because it is not clear which validation method the regulator and the industry should use to test the proposed risk measure, that is, it is not clear how to evaluate the goodness of the ES risk measure. Currently, there is a vivid debate in academia and the financial industry about how to validate internal models in regulatory capital under ES calculation. In this paper, we apply the new method proposed by [2] to validate ES. As this risk measure can be approximated as a weighted sum of different levels of VaRs; this method consists of utilizing a multinomial test instead of several independent binomial tests.
Our paper makes four contributions to the literature on risk measurement in the context of the traditional energy markets. The high volatility of the stock returns of the companies of energy industry provides a suitable and demanding dataset to examine the performance of the proposed ES backtesting technique. First, we employ several GARCH models to adequately model the risk of a broadly diversified portfolio of traditional energy industry stocks over a long period of time that includes periods of calm, turmoil and severe financial and economic crisis. Second, the behaviour of this portfolio is examined in relation to the behaviour of a sustainable equity portfolio to provide guidance to investors at a time when a divestment movement is observed in the fossil fuel industry. Third, we apply a new ES backtesting procedure based on multinomial tests for different VaR levels instead of performing a binomial test for each VaR level as in the previous literature on financial markets. To the best of our knowledge, this is the first attempt to apply multinomial tests on traditional Sustainability 2019, 11, 720 3 of 22 energy and sustainable stock indexes. Fourth, we analyse the inclusion of exogeneous variables to improve the performance of the forecast volatility model as corroborated by the backtesting analysis.
We proxy the traditional energy industry through the S&P 500 Oil, Gas and Consumable Fuels Index, called Traditional Index (TI) and the divestment movement in fossil fuels thorough the FTSE Developed ex Fossil Fuel Index, called Sustainable Index (SI). A simple descriptive analysis reveals that the global portfolio excluding fossil fuel industry assets performs financially better than the portfolio of assets related to the traditional oil and gas industry over the last decade. However, TI outperformances during the 2007-2008 global financial crisis and subsequent period of uncertainty, which is a good feature for portfolio and risk managers seeking to diversify overall risk of their portfolios.
We consider four statistical models, namely normal, Student's t, α-stable and generalized Pareto, to model the variability of negative log-returns of two broadly diversified stock indexes. The one-day-ahead VaR and ES are calculated by applying a rolling window of 250 observations. Thus, the length of the backtesting period for both indexes is 2709 days with an expected number of exceptions of 27.09 for a 99%-VaR. We compute the well-known binomial tests for VaR at 99% and the two new multinomial tests, that is, the Pearson and Nass statistics, proposed by [2], for 97.5%-ES backtesting.
Our empirical results provide useful guidelines for regulation purposes and for practitioners suggesting that flexible heavy-tailed distribution α-stable performs quite satisfactorily. On the other hand, concerning the design of ES backtesting methods, we find evidence in favour of including the variance of unsustainable asset returns as external regressors in the GARCH model as it helps to improve backtesting results. In this case, the ES computed from the four considered loss distributions perform properly.
The rest of the paper is organized as follows: we present a survey of the relevant literature in Section 2 of the paper. Section 3 presents the models and the backtesting methodology, Sections 4 and 5 analyse the data and the results on ES backtesting and Section 6 concludes the paper.

Literature Review
There is an abundance of academic literature on the modelling of the risk of highly volatile prices of both energy commodities and energy stocks and derivatives. Energy commodity markets are naturally vulnerable to significant price changes. It is therefore important to model these price fluctuations and implement an effective tool for managing energy price risk. VaR has become a popular risk measure in the financial industry among many other alternative risk measures (e.g., [3,4]). The internal model approach under the Basel II framework proposes VaR as a risk measure to gauge the amount of assets needed to cover possible losses, that is, the minimum regulatory capital requirements. A variety of works have been published on risk quantification applied to different financial assets (e.g., stocks, bonds, commodities and derivatives) and several backtesting methods have been proposed to validate VaR models (see for instance [5]; for different VaR forecasting tests) (The idea is to calculate the number of times the actual losses have exceeded the estimated VaRs. It is expected that the number of exceptions is approximately 1% of cases when a 99% VaR is calculated. If the percentage of exceptions is higher (lower) than 1%, then the VaR model underestimates (overestimates) risk).
VaR answers the question of how much we can lose with a given probability over a given time horizon (The idea is to calculate the number of times the actual losses have exceeded the estimated VaRs. It is expected that the number of exceptions is approximately 1% of cases when a 99% VaR is calculated. If the percentage of exceptions is higher (lower) than 1%, then the VaR model underestimates (overestimates) risk.). The popularity of this instrument is essentially due to its conceptual simplicity. VaR reduces the risk associated with any portfolio to a single number, the loss associated with a given probability. In addition, VaR helps portfolio managers to determine the most appropriate risk management policy for each situation. Thus, VaR is the primary tool used to forecast extreme declines in returns and is often used for designing optimal risk management strategies.
Nevertheless, financial regulatory entities have recently expressed concerns about the inability of VaR to capture tail risk. It is not a "coherent" measure of risk because it does not satisfy the property of "subadditivity" [31]. In addition, VaR does not provide a measure of the magnitude of losses suffered above the threshold. In January 2016, the Basel Committee on Banking Supervision changes from requiring banks to calculate market risk capital on the basis of VaR to using ES on the behaviour of market variables during a 250-day period of stressed market conditions (The "Minimum capital requirements for market risk" (both in its most recent version published on 14 January 2019 and in the previous version of 2016), changes the measure to use for determining market risk capital. Instead of VaR with a 99% confidence level, expected shortfall (ES) with a 97.5% confidence level is proposed). ES (also known as C-VaR or expected tail loss) is the expected loss, conditional on the loss being worse than the VaR loss. As with VaR, ES attempts to provide a single number that summarizes the total risk in a portfolio. The papers of the Special Issue "Advances in Modelling Value at Risk and Expected Shortfall" of Journal of Risk and Financial Management present a recent state of the art in these market risk measures and its implications for stability of financial system. [32] proposes a novel method to estimate VaR and ES implied by financial options, whereas [33] provide a closed-form expression for ES of portfolios when risk factors are elliptically distributed. On the other hand, [34] develop a new VaR model based on financial markets overnight information. The ES is also used as risk measure for portfolio diversification strategy purposes [35] and for hedging purposes [36]. In any case, the use of ES poses a challenge to portfolio and risk managers because it is not clear which validation method the regulator and the industry should employ to test the proposed risk measure, that is, it is not clear how to evaluate the goodness of the ES risk measure. Designing backtesting method for ES is not as straightforward as in the case of VaR, since ES does not satisfy the elicitability property (e.g., [37,38]). An appropriate scoring function that this risk measure potentially minimizes does not exist. In fact, the Basel Committee proposes to use ES to calculate capital requirements but proposes to carry out the backtest using a VaR measure (The backtesting requirements continue to be based on the 1-day static VaR measure considering 250 days of (rolling) window size).
Literature on the use of ES in the energy industry is limited. The ES measure is employed in the financial industry specially to quantify the economic or regulatory capital for banking and insurance companies. This paper focuses on risk quantification of traditional energy and sustainable investment indices; however, our methodology can be applied to estimate climate change risk, since lot of investors are facing huge losses due to effects of climate according to [39]. Some studies use ES constraints in the optimization programs to choose investment projects (e.g., [40][41][42][43][44]). Other papers include the ES as risk objective function in the estimation of hedging strategies to reduce price volatility risk into energy markets (e.g., [45][46][47]). These authors suggest that ES should be an appropriate metric accounting for some properties of the energy assets. Finally, [13] apply both VaR and ES to model the price risk of four energy commodities. To backtest ES, they use a circular bootstrap method from the one-sided test proposed by [48]. They conclude that the forecasted ES measure captures actual shortfalls in a satisfactory manner.

Model and Methodology
This study is an attempt to shed light on the correct measurement of risk in the traditional energy industry, which has a particularly interesting behaviour in the financial markets for portfolio managers. It maintains a very low or even negative correlation with sustainable portfolios, which is an excellent tool for diversifying risks. However, investing in the traditional energy industry requires precise validation measures of the risk measurement models used to make predictions. In this section, we focus on ES and study different strategies to model the asset returns on which we will calculate ES and test the validity of the different predictions obtained. Backtesting ES remains an open question and we implement a novel method that produce quite satisfactory results compared with existing procedures.

Modelling Asset Returns
VaR and ES approaches model the left tail of the return distribution or, similarly, the right tail of the loss distribution. The losses or negative log-returns over the next day are defined here as L t+1 = −100log (P t+1 /P t ), where P t represents the corresponding index prices. As it is commonly employed in the literature (see, e.g., [49]), we suppose that conditional on the location-scale parameters µ t+1 and σ t+1 , negative log-returns follow L t+1 = µ t+1 + ε t+1 and the innovations are ε t+1 = σ t+1 Z t+1 . The random variables Z t+1 are assumed to be independently distributed with a common cumulative distribution function (CDF) G that, for certain cases, depends on unknown parameters. We discuss several possibilities for G in the next section. The parameter µ t+1 is modelled by an ARMA (1,1) process and a GARCH (1,1) process is employed for σ t+1 , that is, where θ 1 and θ 2 are the parameters associated of AR (1) and MA (1) respectively. Apart from the variables of the standard GARCH (1,1) model, the variances of oil, gas and coal price returns are considered as external regressors. Thus, our empirical results consider two methods of backtesting. One method excludes the external regressors (i.e., γ 1 = γ 2 = γ 3 = 0) from the GARCH model and the other method takes into account these variables in the variance equation of the GARCH model. Given a probability level α, the VaR can be expressed as where q α is the α quantile of G. The ARMA-GARCH model is implemented by using rugarch package in R [50].
In the ARMA (1,1)-GARCH (1,1) setting above, the location and variability of negative log-returns are modelled through the parameters µ t+1 and σ t+1 . The distribution G should be free of any such parameters (to avoid identifiability issues) and must account for other important features, such as asymmetry and/or kurtosis. In particular, the statistical models we consider are: (i) normal (used for comparative purposes), (ii) Student's t, (iii) α-stable and (iv) generalized Pareto.

(i) Normal distribution
The CDF of a standard normal distribution is given by The CDF of a Student's t distribution is given by where Γ represents the gamma function and ν > 0 is the degrees of freedom parameter that controls the kurtosis (small values of ν correspond to heavier tails). The Cauchy distribution is a particular case when ν = 1.
The α-stable distribution is commonly described by its characteristic function, since the probability density function (PDF) is not available in closed-form.
where the sign(t) function is defined as 1 if t > 0; 0 if t = 0 and −1 otherwise. The parameters in this distribution are the index of stability (characteristic exponent) α ∈ (0, 2] and a skewness parameter β ∈ [−1, 1]. There are three cases with known closed-form expressions for their densities: the normal (when α = 2 and β = 0), Cauchy (α = 1 and β = 0) and Lévy distributions (α = 1/2 and β = 0). The smaller the value of α, the heavier the distribution tail. The stable package for R developed by Nolan is employed to fit Stable distribution (Robust Analysis Inc. (2013). STABLE. R package, version 5.3.).

(iv) The generalized Pareto distribution (GPD)
The CDF of the GPD is given by where ξ is the shape parameter and β is the scale parameter. When ξ > 0, the GPD is the Pareto distribution; when ξ = 0, it is the exponential distribution; and when ξ < 0, the distribution is the Pareto type II distribution. Heavy-tailed empirical distributions usually follow a GPD with a positive shape parameter ξ > 0. When G is either a normal or a Student's t distribution, the parameters for the ARMA (1,1)-GARCH (1,1) and for the innovations are estimated jointly by employing the Maximum Likelihood (ML) estimation. A two-step approach is used to estimate the parameters for the cases where G is either a α-stable or a generalized Pareto distribution. First, the Quasi-ML (QML) method is used to estimate the parameters in the ARMA (1,1)-GARCH (1,1), thus allowing estimations of the underlying innovations to be produced, say,ε t+1 . Specific methods are then performed in a second step to estimate the parameters in G:

•
For the α-stable distribution, the ML approach is employed by using the direct integration method in Reference [51]. • For the generalized Pareto distribution, the peaks over threshold (POT) method is employed to estimate the parameters. According to [49], the VaR or α-quantile is obtained from where u is the chosen threshold, β and ξ are the scale and shape parameters, respectively, T u is the threshold exceedances and T is the sample size. Therefore, T u /T is an empirical estimator for the excess distribution. In this paper, the threshold is chosen as the 10th percentile of the standardized residuals of the negative log-returns as is typical in the literature [19,48,[52][53][54]. The evir package in R is employed to implement the EVT-GPD model [55].

Backtesting ES
As mentioned above, the method to be used to validate the results of the application of the ES remains an open question [56] show that ES and VaR are jointly elicitable and the authors propose a scoring function that is more complicated than the well-known scoring function for VaR. Comparative tests can then be performed following the Diebold-Mariano test (e.g., [57]). Based on the Monte Carlo simulations, [58] propose other tests for ES; following the argument that VaR and ES are jointly elicitable, in this paper, we employ the simple approach proposed by [2] to validate ES calculations in an implicit manner. ES can be approximated by a weighted sum of VaR levels [59] and then, a multinomial test can be performed rather than the binomial test for each VaR level. This paper extends applications of [2] to traditional energy and sustainable indexes. Moreover, our work considers an ARMA-GARCH model with external regressors to filter the negative log-returns of the analysed assets, whereas [2] employ the ARCH and GARCH models.
Following the [2] notation, ES can be calculated as in Reference [60] ES α (L) A simple approximation can be obtained from different quantiles [59] ES α (L) where q α (L) = VaR α (L). [2] then propose backtesting for ES by simultaneously backtesting multiple VaR estimates. Backtesting is based on multinomial tests of VaR exceptions. It is worthwhile to mention that the approximation can be generalized as where N is the number of quantiles to be used in the approximation. Although a higher N results in a better estimation of ES, simulations performed by [2] show that four quantiles provide reasonable size and power for the backtest. It is also noteworthy that the previous notation implies that risk measures are calculated over the loss distribution, that is, the right tail of the distribution. The number of exceptions (violations) are estimated given a certain model (distribution) and for each confidence level. As is typical in the literature, the exception indicator at each time t is defined as a function that takes value 1 if a loss has exceeded the VaR level. That is, where α i is as follows: for i = 1, . . . , N, with α 0 = 0 and α N+1 = 1. Then, the number of exceptions X t at each time t is given by As the number of exceptions follows a multinomial distribution, the unconditional coverage property can be written as Our interest is a measure that counts the outcomes {0, 1, . . . , N} with probabilities α 1 − α 0 , . . . , α N+1 − α N that sum to one. The cell counts O j are then given by where n is the backtesting period and j = 0, 1, . . . , N. Then, the random vector should follow the multinomial distribution The null and alternative hypotheses are given by is an arbitrary sequence of parameters from a specific model and The null and alternative hypotheses are given by H0: There are several multinomial tests; the most common is the Pearson chi-squared test, for which the test statistic follows a distribution under the null hypothesis:  For this case, the number of exceptions X t for each t is calculated as The cell counts O j are given by There are several multinomial tests; the most common is the Pearson chi-squared test, for which the test statistic S N follows a χ 2 N distribution under the null hypothesis: The null hypothesis is rejected at a prespecified type I error κ when S N > χ 2 N (1 − κ). Another test is the Nass test, which is an improvement over the previous test when cell probabilities are small [2]. The test statistic is χ 2 υ distributed under the null hypothesis: . The null hypothesis is rejected at a prespecified type I error κ when

Data
In this section we compare the validation of the risk model of investments that either includes or excludes, shares of companies related to the fossil fuel industry. With this in mind, our data comprise two sets of daily prices detailed as follows. We prepare one of the datasets to consider the companies that are not exposed to unsustainable assets. We refer to these as the sustainable index (SI). It should capture the stock return behaviour of sustainable companies. This first set of data corresponds to FTSE Developed ex Fossil Fuel Total Return Index. This index is a part of the Sustainability and Environmental, Social and Governance (ESG) indexes of FTSE Russell (other indexes with similar characteristics can be found at its website). This index is designed to represent the performance of FTSE All-World Index constituents after the exclusion of companies that have some exposure of revenues and/or reserves to fossil fuels. The second set of data is obtained from the S&P 500 Oil, Gas and Consumable Fuels Index ("Standard and Poor's 500 Oil, Gas and Consumable Fuels Index is a capitalization-weighted index. The index was developed with a base level of 10 for the 1941-43 base period. The parent index is SPXL3. This is a GICS Level 3 Industries. Standard and Poor's 500 (Industry) Index is a capitalization-weighted index. The index is designed to measure performance of the broad domestic economy through changes in the aggregate market value of 500 stocks representing all major industries. The index was developed with a base level of 10 for the 1941-43 base periods." Source: Bloomberg LP). This index includes companies in the energy sector engaged in the exploration, production, refining, marketing, storage and transportation of oil, gas, coal and consumable fuels. It is used as proxy of the whole oil and gas industry and is referred to as the traditional oil and gas index (TI).
Both indexes are capitalization-weighted and enable us to study the risk of investing in broadly diversified portfolios that include or exclude the traditional energy industry. The price data comprise information from 31 July 2006 to 16 November 2018 for a total of 3210 price observations (The selection of the indexes and period is restricted to availability of data from Bloomberg terminal). Moreover, we are interested in the effect of variability of main stranded asset price returns in the variance behaviour of the indexes. To this end, prices of oil, gas and coal have been collected for the same period of SI and TI indexes (The Generic 1 st 'CL' Futures (CL1), Generic 1 st 'NG' Futures (NG1) and Richards Bay Coal Futures (XO1) are obtained for oil, gas and coal prices, respectively). The abovementioned data are obtained from Bloomberg terminal (Bloomberg Professional Service is an information service that, through subscription, provides economic and financial data at the level of individual securities and the entire market. The workstations with the installed service are traditionally called Bloomberg terminals).
The descriptive statistics (see Table 1) show that analysed stock index returns exhibit very well-known stylized facts for financial asset daily returns. The mean and median returns of the TI are close to zero but the SI shows a mean of 0.024% and a median of 0.071%. In terms of daily volatility, the TI shows a standard deviation which is approximately two-thirds larger than the SI. The index returns distributions display fat tails, since excess kurtosis is positive. Moreover, the distributions are negative skewed, which implies more negative extreme values. Figure 2; Figure 3 depict the returns for both indexes. The index returns exhibit similar characteristics and are remarkably affected by the global financial crisis, as exhibited by the high volatility in approximately 2007 and 2008 and the financial problems faced by most companies in oil and gas industry. For the stranded asset returns, gas presents the highest volatility, whereas coal exhibit fatter tails among the three assets. All the analysed asset returns display positive skewed distributions. financial problems faced by most companies in oil and gas industry. For the stranded asset returns, gas presents the highest volatility, whereas coal exhibit fatter tails among the three assets. All the analysed asset returns display positive skewed distributions.    The relationship between risk, as measured by standard deviation and the rate of return of the traditional oil and gas industry index shows a worse performance than that observed for the  To show the temporal evolution of accumulated returns during the sample period, Figure 4 depicts the value of an initial investment of 100 on each of the indexes on 31    The relationship between risk, as measured by standard deviation and the rate of return of the traditional oil and gas industry index shows a worse performance than that observed for the  The relationship between risk, as measured by standard deviation and the rate of return of the traditional oil and gas industry index shows a worse performance than that observed for the sustainable asset index. Alternatively, we also analyse the risk-return combination considering ES as the measure of risk. The relation value of SI investment to potential loss outperforms the relation of TI investment to loss during the analysed period, when loss is estimated based on 97.5%-ES (Values of 97.5%-ES estimated for Stable model are employed when external regressors are considered in the GARCH model. Thus, potential loss is calculated as Loss t = ES 97.5 t (Index Value t )). Figure 5 shows the evidence abovementioned. ( )). Figure 5 shows the evidence abovementioned. Beyond other considerations relating to climate change awareness or the risk of further regulation of the fossil fuel industry, that is, from a strictly financial point of view, the global sustainable portfolio excluding fossil fuel industry assets performs better than the portfolio of assets related to the traditional oil and gas industry over the last decade. However, this relatively poor performance on the traditional oil and gas industry assets is not necessarily a bad result. An interesting feature for portfolio managers is the outperformance of this index during the global financial crisis and subsequent period of uncertainty. During these periods, the correlation of these assets with the rest of the market tends to decrease, being very low or even negative. Therefore, these are assets to be included in a portfolio to diversify the overall risk.

Statistical Results
This section presents the results of computing VaR and ES from both datasets, SI and TI stock indexes and provides the ES backtesting analysis in comparison with traditional backtesting methods for VaR. The analysis here is primarily quantitative and of a strong statistical nature. The discussion about the implications of these results is delayed until next section.
The backtesting is classified in two cases. One method considers the ARMA-GARCH model with different innovations presented in Section 3.1. to filter the returns of SI and TI indexes, whereas the second method includes the variance of stranded asset (oil, gas and coal) returns as independent variables in the variance equation of GARCH model to filter the returns of the indexes. In-sample estimation results for the whole period are presented in Table 2. The results show that the parameters related to oil, gas and coal variances are statistically not significant. In other words, variance of stranded assets does not seem to have explanatory power in the variance of SI and TI returns. In fact, the estimation of ARMA-GARCH parameters when the variances of unsustainable assets are not included in the variance equation does not vary significantly comparing the estimation when the external regressors are taken into account. In what follows, we analyse whether the inclusion of these regressors help improve the backtesting results. Beyond other considerations relating to climate change awareness or the risk of further regulation of the fossil fuel industry, that is, from a strictly financial point of view, the global sustainable portfolio excluding fossil fuel industry assets performs better than the portfolio of assets related to the traditional oil and gas industry over the last decade. However, this relatively poor performance on the traditional oil and gas industry assets is not necessarily a bad result. An interesting feature for portfolio managers is the outperformance of this index during the global financial crisis and subsequent period of uncertainty. During these periods, the correlation of these assets with the rest of the market tends to decrease, being very low or even negative. Therefore, these are assets to be included in a portfolio to diversify the overall risk.

Statistical Results
This section presents the results of computing VaR and ES from both datasets, SI and TI stock indexes and provides the ES backtesting analysis in comparison with traditional backtesting methods for VaR. The analysis here is primarily quantitative and of a strong statistical nature. The discussion about the implications of these results is delayed until next section.
The backtesting is classified in two cases. One method considers the ARMA-GARCH model with different innovations presented in Section 3.1. to filter the returns of SI and TI indexes, whereas the second method includes the variance of stranded asset (oil, gas and coal) returns as independent variables in the variance equation of GARCH model to filter the returns of the indexes. In-sample estimation results for the whole period are presented in Table 2. The results show that the parameters related to oil, gas and coal variances are statistically not significant. In other words, variance of stranded assets does not seem to have explanatory power in the variance of SI and TI returns. In fact, the estimation of ARMA-GARCH parameters when the variances of unsustainable assets are not included in the variance equation does not vary significantly comparing the estimation when the external regressors are taken into account. In what follows, we analyse whether the inclusion of these regressors help improve the backtesting results. The first 250 returns of oil, gas and coal assets are employed to obtain the first set of values of their respective variances and a rolling window of 250 observations is implemented to estimate the rest of the variances. That is, the initial range of data from 1 August 2006 to 15 July 2007 is employed in order to calculate the variances that act as external regressors. Figure 6 shows the estimated variance of the stranded asset returns. Variability of oil and coal returns were mainly affected by the global financial crisis and high volatility in the gas returns is observed posterior that date. The correlation between log-returns (estimated variance of log-returns) between oil and gas is 0.18 (0.40), for oil and coal is 0. 16 Table 3; Table 4) for both indexes when calculating 99%-VaR. Table 3 presents the results of testing VaR and ES for both the SI and the TI when the variance of unsustainable assets is not considered in the GARCH model. We compute the well-known binomial tests for VaR at 99% and the two new multinomial tests, that is, the Pearson and Nass statistics, proposed by [2], for 97.5%-ES backtesting. In the case of the sustainable index, the binomial test for VaR rejects the Student's t and GPD models, since both models overpredict risk for the index returns. In most applications of market risk quantification, results of EVT techniques based on GPD model are favourable. However, in this case, the binomial test for 99%-VaR rejects the good performance of this model. A plausible reason is that the amount of observations (in the tail of the empirical distribution) employed to fit the GPD, which is 25 in each step of the rolling window. It is very well-known that parameter estimation depends on the threshold selection, which is still an open question in EVT and this drawback is discussed for instance in Reference [54]. Backtesting ES of the SI, the results of Pearson and Nass tests do not reject the good performance of normal, Stable and GPD models but Student's t model does not perform satisfactorily, which is consistent with backtesting of VaR results for the same index (Table 3, Panel a).
In the case of the traditional oil and gas industry index returns, only the Student's t model does not perform well according to the binomial test for 99%-VaR (Table 3, Panel b), whereas all the models perform well for 97.5%-ES backtesting, according to Pearson and Nass statistics.   Table 3; Table 4) for both indexes when calculating 99%-VaR.  Table 3 presents the results of testing VaR and ES for both the SI and the TI when the variance of unsustainable assets is not considered in the GARCH model. We compute the well-known binomial tests for VaR at 99% and the two new multinomial tests, that is, the Pearson and Nass statistics, proposed by [2], for 97.5%-ES backtesting. In the case of the sustainable index, the binomial test for VaR rejects the Student's t and GPD models, since both models overpredict risk for the index returns. In most applications of market risk quantification, results of EVT techniques based on GPD model are favourable. However, in this case, the binomial test for 99%-VaR rejects the good performance of this model. A plausible reason is that the amount of observations (in the tail of the empirical distribution) employed to fit the GPD, which is 25 in each step of the rolling window. It is very well-known that parameter estimation depends on the threshold selection, which is still an open question in EVT and this drawback is discussed for instance in Reference [54]. Backtesting ES of the SI, the results of Pearson and Nass tests do not reject the good performance of normal, Stable and GPD models but Student's t model does not perform satisfactorily, which is consistent with backtesting of VaR results for the same index (Table 3, Panel a). In the case of the traditional oil and gas industry index returns, only the Student's t model does not perform well according to the binomial test for 99%-VaR (Table 3, Panel b), whereas all the models perform well for 97.5%-ES backtesting, according to Pearson and Nass statistics. Table 4 replicates analysis of Table 3 but the new analysis considers the external regressors in the equation of variance (Equation (1)). Although the binomial test for 99%-VaR still rejects the good performance of the Student's t model, all other models now exhibit a reasonable performance for VaR and ES tests. This is an important result, since there is evidence that employing external regressors (variance of stranded asset returns) help improve risk model validations for the analysed data in our paper.
We also conduct the simple backtest of ES, commonly used in the literature, as a robustness check of the new multinomial test previously applied to validate ES. Table 5 shows the results of independent individual binomial backtests of VaR for four confidence levels equal to and higher than that used in the 97.5% ES estimate. This analysis is performed for both indexes without considering external regressors in variance equation. This methodology based on independent testing for different confidence levels provides results similar to those obtained in the multinomial test for three loss distributions. However, it indicates that the GPD model overpredicts risk when VaR is calculated at 98.75% and 99.375% (97.5% and 98.125%) confidence levels for SI (TI) index return as can be seen in Table 5, Panel a (Panel b). Anyway, the results for the Pearson and Nass tests for 97.5%-ES displayed in Table 3, which rejects the Student's t model and shows that the normal and Stable models perform well for both indexes, are also confirmed by the individual binomial tests.  Table 6 presents same results as Table 5 considering variance of unsustainable asset returns as independent variables in the variance equation of the GARCH model. Only Student's t model is rejected when 98.75%-VaR is calculated for SI index returns. Once again, the risk model performance is enhanced when the variances of stranded asset returns are included as regressors in the variance equation to assess VaR at different confidence levels. Finally, Figure 7 shows the comparison of 99%-VaR and 97.5%-ES (with external regressors in the variance equation) for each analysed model applied to SI returns (Similar results are obtained for Portfolio TI returns and are available upon request and for both indexes when external regressors are not included in the variance equation in GARCH model). As expected, 99%-VaR is similar to 97.5%-ES for the Gaussian case; however, it is noted that 97.5%-ES is higher than 99%-VaR for stable and GPD cases. This result corroborates one of the arguments used by the Basel Committee to defend the use of ES to calculate the market risk of a financial institution.

Discussion, Conclusions and Future Work
Energy assets have higher volatilities than other types of stocks and are affected by a trend towards divestment caused by greater global awareness of the environmental and financial impact of climate change. Nevertheless, portfolio managers and institutional investors in general have big interest on energy assets due to their importance for diversification purposes and, in these days, the predominant role of fuel-based industries in energy portfolios is clearly being replaced by sustainable alternatives. This change in paradigm implies new challenges to policy makers and managers and they have to count on trustable risk measures and accompanying backtesting procedures, that must be in agreement with the recently proposed guidelines about financial investments proposed by the regulators. To address this issue, as described in Section 4, we have constructed and analysed two very different indexes: one formed by companies in the traditional energy industry (TI) and a second one of a sustainable nature (SI). We have collected data from these portfolios during an extensive period of time. For these datasets, in Section 5 we have performed a large experiment where we have utilized several proposed probabilistic models from which we have computed a number of risk measures that we have tested using two different backtesting methods.
Our results show that, in general, sustainable investments statistically behave similarly to traditional assets but in terms of returns these are outperforming traditional ones in the last decade. Nevertheless and quite interestingly, the situation is the reverse in periods of crisis when additionally, returns of both portfolios show low or even negative correlation. These findings suggest that investments on sustainable industries may assume the role of traditional fuel-based energy industries in designing well diversified portfolios. Also, our results point out that a combination of both could be interesting trying to anticipate structural changes on investments during turmoil periods. All this would imply that, by incorporating traditional energy industry stocks into portfolios of sustainable companies, the financing of these latter companies can be improved and, therefore, they can enjoy better financing of their sustainable development projects. In summary, our findings on proper risk measurement are potentially beneficial for sustainable development. Investors and portfolio managers may perceive investment in sustainable companies as more desirable if they can diversify or reduce the inherent market risk.
From a modeler perspective and in terms of methods to implement VaR but especially ES, our experiment provides several interesting conclusions and recommendations. In particular results of VaR and (implicit) ES backtesting seem to indicate that external regressors should be considered in the GARCH equation. In particular, we have found evidence that the variances of unsustainable assets (i.e., the exogenous regressors) help to improve the variance estimation of the TI and SI returns and therefore the backtesting performance. These results are in line with [61][62][63][64]. Gaussian and Stable distributions perform comparatively better in all situations and hence these are the statistical models

Discussion, Conclusions and Future Work
Energy assets have higher volatilities than other types of stocks and are affected by a trend towards divestment caused by greater global awareness of the environmental and financial impact of climate change. Nevertheless, portfolio managers and institutional investors in general have big interest on energy assets due to their importance for diversification purposes and, in these days, the predominant role of fuel-based industries in energy portfolios is clearly being replaced by sustainable alternatives. This change in paradigm implies new challenges to policy makers and managers and they have to count on trustable risk measures and accompanying backtesting procedures, that must be in agreement with the recently proposed guidelines about financial investments proposed by the regulators. To address this issue, as described in Section 4, we have constructed and analysed two very different indexes: one formed by companies in the traditional energy industry (TI) and a second one of a sustainable nature (SI). We have collected data from these portfolios during an extensive period of time. For these datasets, in Section 5 we have performed a large experiment where we have utilized several proposed probabilistic models from which we have computed a number of risk measures that we have tested using two different backtesting methods.
Our results show that, in general, sustainable investments statistically behave similarly to traditional assets but in terms of returns these are outperforming traditional ones in the last decade. Nevertheless and quite interestingly, the situation is the reverse in periods of crisis when additionally, returns of both portfolios show low or even negative correlation. These findings suggest that investments on sustainable industries may assume the role of traditional fuel-based energy industries in designing well diversified portfolios. Also, our results point out that a combination of both could be interesting trying to anticipate structural changes on investments during turmoil periods. All this would imply that, by incorporating traditional energy industry stocks into portfolios of sustainable companies, the financing of these latter companies can be improved and, therefore, they can enjoy better financing of their sustainable development projects. In summary, our findings on proper risk measurement are potentially beneficial for sustainable development. Investors and portfolio managers may perceive investment in sustainable companies as more desirable if they can diversify or reduce the inherent market risk.
From a modeler perspective and in terms of methods to implement VaR but especially ES, our experiment provides several interesting conclusions and recommendations. In particular results of VaR and (implicit) ES backtesting seem to indicate that external regressors should be considered in the GARCH equation. In particular, we have found evidence that the variances of unsustainable assets (i.e., the exogenous regressors) help to improve the variance estimation of the TI and SI returns and therefore the backtesting performance. These results are in line with [61][62][63][64]. Gaussian and Stable distributions perform comparatively better in all situations and hence these are the statistical models we recommend for constructing risk measure methodologies in applications with substantial energy-based investments.
Obviously and not less important particularly to policy-makers is the question of ES backtesting. To the best of our knowledge, our paper is the first time that multinomial tests have been applied to SI and TI assets and that exogenous variables are included to improve the performance of the forecast volatility model. The backtesting procedure we implement is based on multinomial tests for different VaR levels rather than performing a binomial test for each VaR level, since ES can be approximated in terms of multiple VaRs [59]. We obtain evidence that the multinomial test is an adequate and simple method to validate ES models as presented in this paper. This simple approach leads to an implicit manner for ES backtesting and it is suggested for regulatory purposes.
Future research can be conducted to compare other ES tests such as those proposed by [58]. A possible limitation of our work is that ES is approximated by just four terms of VaR levels. Though more VaR figures employed in the ES approximation can provide a more accurate ES measure, [2] show that employing four VaR levels produce good results regarding size and power of the multinomial test and it is more powerful than a single binomial test. Then future research can be focused on employing eight VaR levels, which also performs well according to [2], for the multinomial test. In addition, other ES tests proposed in the future can be compared with the multinomial test employed in this paper. Another limitation is the window size (250 days), suggested by Basel Committee, may be not enough when calculating VaR at high probability levels and this is also noted by [2]. Parameter estimation of GARCH models can be biased in small samples and a window of 500 days is recommended when this type of models is employed [65]. Moreover, parameter estimates of generalized Pareto distribution exhibits high variance in small samples when peaks-over-threshold method is utilized and risk measures may be wrongly assessed. This was evidenced in our empirical application and several window sizes can be tested in future research. This paper analysed the risk quantification for individual assets. Further research may be focused on ES testing of traditional energy and sustainable investment portfolios, where multivariate models (such as DCC, BEKK, copula, among others) can be employed to analyse the correlation and dependence structure of different assets. Finally, a main concern is to quantify the economic loss caused by climate change. A first attempt of climate VaR estimation is developed by the [39]. Future work could be the estimation of losses once those losses have exceeded climate VaR by employing some of the techniques in our paper.

Acknowledgments:
We gratefully acknowledge guest editors of the special issue, comments and suggestion of the reviewers and collaboration of assistant editor, Nina Tian.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses or interpretation of data; in the writing of the manuscript or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript: