1. Introduction
Even before the initial Basel Committee consultative document (
Bank for International Settlements 2012) there had been a push by both risk managers and academics to replace VaR (Value at Risk)  with another risk measure that addresses VaR’s deficiencies. In particular, coherent risk measures (
Acerbi and Taschea 2002a, 
2002b; 
Hall 2007) satisfy the basic desired properties required by a risk measure as outlined in 
Artzner et al. (
1999). Expected Shortfall (ES) is the natural choice among all coherent risk measures, and therefore there is no surprise that it has been chosen by the Basel Committee as the risk measure to replace VaR. However, unlike the case of VaR, there is no well-established backtesting framework for Expected Shortfall. Indeed, the current Basel proposal to backtest ES at the 97.5 quantile is to backtest the related VaR estimate at the 97.5 and 99 quantiles, which is a grossly insufficient test. Nevertheless, some recent backtesting methods have been proposed including, but not limited to, 
Acerbi and Szekeley (
2014); 
Costanzino and Curran (
2015); 
Du and Escanciano (
2017); 
Fissler et al. (
2015); 
Gordy et al. (
2017); 
Kratz et al. (
2016).
The main result of this note is the development of a Traffic Light backtest for Expected Shortfall which extends the Traffic Light backtest for VaR. The test relies on the computation of critical values derived from the finite-sample distribution of the ES test statistic (
9) first introduced in 
Costanzino and Curran (
2015).
The note is organized as follows. In 
Section 2 we briefly review the VaR Traffic Light to provide context for our corresponding test for ES. In 
Section 3 we define the Traffic Light test for ES and compute the distribution of the finite-sample statistic from which we calculate the critical values using a numerical root-finding algorithm. Finally, in 
Section 4 we discuss the test and some implications.
  2. Review of the VaR Traffic Light Test
Let 
 be a sequence of historical trading days and 
 the corresponding realized trading losses. The most basic approach to assessing the accuracy of a VaR forecast calculation for those trading days is to backtest using the VaR Coverage Test which essentially counts the number of VaR breaches. This leads to the Traffic Light approach to backtesting VaR originally proposed by the Basel Committee for Banking Supervision in 
Basle Committee on Banking Supervision (
1996), which we describe below.
For each 
, let 
 denote the forecast VaR at level 
 defined by
      
      where 
 is the cumulative distribution of the random loss variable 
L. For each trading day 
i we define the VaR breach indicator 
 as
      
     That is, 
 keeps track of whether a breach occurred for trading day 
i. Then, the total number of breaches over all 
N trading days, denoted by 
, is
      
Under the null hypothesis that the VaR model is correct, 
. Thus, for the Basel parameters 
 and 
, we expect 
 breaches. Of course, in any backtest it is very rare that one observes exactly 
 breaches (in fact impossible since 
 must be an integer), and thus we appeal to statistical analysis to understand the probability of obtaining significantly fewer or more breaches than would be expected if we had a correct model. For fixed 
N and level 
 we define the cumulative probability 
 of obtaining 
x or fewer breaches as
      
The Basel Committee on Banking Supervision proposed a Traffic Light approach to statistical significance of VaR breaches in their 1996 document 
Basle Committee on Banking Supervision (
1996). Therein the Basel Committee defines three color zones through cumulative probabilities of the number of realized VaR breaches. The Green Zone is defined as the number of breaches under the null hypothesis whereby the cumulative probability of obtaining that many breaches or fewer is less than 
. The Yellow Zone is defined as the number of breaches whereby the cumulative probability of obtaining that many breaches or fewer is greater than 
 but less than 
. Finally, the Red Zone is defined by a cumulative probability of 
 or more. Thus, the boundary between the Green and Yellow zones is defined as the largest integer 
x such that 
 and the boundary between the Yellow and Red zones is similarly defined as the largest integer 
x such that 
.
Table 1 gives the resulting color zones for different breaches values under the VaR Basel parameters 
 and 
 observations. The true Binomial Null Distribution is used to compute the Cumulative Probabilities rather than the asymptotic Normal distribution.
   3. Derivation of the Expected Shortfall Traffic Light Test
We now define a Traffic Light approach to backtesting Expected Shortfall based on the Coverage Test in 
Costanzino and Curran (
2015). The test relies on an appropriate extension of the VaR breach indicator (
2) to the case of ES. The resulting new breach indicator (
6) takes into account the severity of the breach (i.e., losses beyond the VaR level) and is a continuous variable rather than discrete.
We begin the derivation by defining Expected Shortfall as
      
      In analogy to 
 (
2), we define the ES generalized breach indicator 
 by
      
      where have used (
2) and have set 
, where 
 is the cumulative distribution implicitly defined in (
1). We note that compared to 
 (
2), 
 (
6) has an extra term 
 which determines the severity of the breach. That is, suppose 
. Then 
 so 
 whereas 
. On the other hand suppose 
 is very negative. Then, 
 so that 
 and similarly 
. Thus, 
 keeps track of whether a breach happened on trading day 
i as well as the 
severity. Then, the total severity of breaches over all 
N trading days, denoted by 
, is
      
For fixed 
N and level 
 we define the cumulative probability 
 of obtaining 
x or fewer breaches as
      
Therefore, for any quantile 
q, we can compute the corresponding Generalized Breach Value 
x by inverting the equation
      
Note that in the case of the VaR Traffic Light Test (see 
Table 1), it makes sense to compute the quantiles for different breach values (i.e., 
). For Expected Shortfall, the breach indicator is a continuous variable and it no longer makes sense to choose the breach value and compute an associated quantile. Rather, we choose the quantile and then invert to obtain the corresponding breach value. In particular, we borrow the color zone boundaries from the VaR Traffic Light Test, which yield a Green Zone if 
, Yellow Zone if 
, and Red Zone if 
; i.e.,
      
      and the boundary between the Yellow and Red zones is given by
      
To compute these boundaries, and other values of 
x one needs to compute the distribution of the test statistic 
 under the null-hypothesis 
 given by
      
Hence, as a crude approximation, we can compute (
12) and (
13) using the asymptotic test distribution (
18) to obtain 
 and 
. These values are approximate since they use the asymptotic distribution 
 of the test statistic rather than the finite-sample one 
. We now derive the finite-sample distribution 
 and use a numerical root-finding procedure to accurately estimate the critical values.
The derivation of the ES Traffic Light test relies on the computation of the finite-sample cumulative distribution of the test statistic 
 (
9). A key observation in the derivation is that under the null-hypothesis, the distribution of 
 conditional on a breach having occurred is uniform in the 
-tail, and thus using the law of total probability we have
      
      where 
 is the Irwin–Hall distribution (c.f. 
Hall 1927; 
Irwin 1927; 
Marengo et al. 2017) defined by
      
      and 
 binomial probability mass function
      
      We then use this probability calculation (
19) and a root-finding algorithm to solve the equation
      
      for 
 where 
q is the appropriate quantile level. In particular assuming the Basel parameters 
 and 
, then for 
 and 
 we obtain
      
Table 2 gives the resulting quantiles and color zones for different breach values under the ES Basel parameters 
 and 
 observations where the cumulative probabilities were computed using (
19). Of particular note is the breach values and cumulative probabilities for Expected Shortfall at the 97.5 quantile (i.e., 
) are very similar to the VaR values at the 99 quantile (i.e., 
). In addition, the finite-sample Breach Value at the 50th quantile (3.0276) is very similar to the asymptotic Breach Value at the 50th quantile (
. Furthermore, note that in the case of Expected Shortfall, the breach values are continuous, and therefore infinitesimally small changes in breach value may result in a change in the color zone.
   4. Discussion
First, the values and quantiles for VaR at  are similar to the values and quantiles for ES at . This happens because there are more VaR breaches at  than at , but the severity of the breach in ES is smaller than unity so these two mechanisms average each other out.
We also note that along with color zones, the Basel document (
Basle Committee on Banking Supervision 1996) defines market risk capital multipliers based on the cumulative probability 
 of the number of realized exceptions, 
. In particular, a multiplier 
 ranging from 0 to 1 is given depending on the number of breaches; i.e., 
 for some function 
. The same can obviously be done for Expected Shortfall; i.e., 
 for some function 
. However, the continuous nature of the breach values from (
9) leads to the need for 
 to be a continuous function so as to avoid the case where small changes in breach value give rise to large changes in multiplier.