Probabilistic Forecasts: Scoring Rules and Their Decomposition and Diagrammatic Representation via Bregman Divergences

A scoring rule is a device for evaluation of forecasts that are given in terms of the probability of an event. In this article we will restrict our attention to binary forecasts. We may think of a scoring rule as a penalty attached to a forecast after the event has been observed. Thus a relatively small penalty will accrue if a high probability forecast that an event will occur is followed by occurrence of the event. On the other hand, a relatively large penalty will accrue if this forecast is followed by non-occurrence of the event. Meteorologists have been foremost in developing scoring rules for the evaluation of probabilistic forecasts. Here we use a published meteorological data set to illustrate diagrammatically the Brier score and the divergence score, and their statistical decompositions, as examples of Bregman divergences. In writing this article, we have in mind environmental scientists and modellers for whom meteorological factors are important drivers of biological, physical and chemical processes of interest. In this context, we briefly draw attention to the potential for probabilistic forecasting of the within-season component of nitrous oxide emissions from agricultural soils.


Introduction
A probabilistic forecast provides a forecast probability p that an event will subsequently occur.Probabilistic forecasts are used extensively in meteorology, so it is there that we will look for example scenarios and data.Now, qualitatively, a forecast of "rain tomorrow" with probability p = 0.7 means that on the basis of the forecast scheme, rain is rather more likely than not.Of course, we require definitions of "rain" and "tomorrow" in order to be able to properly interpret the forecast, but let us assume these are available.Then, given these definitions, we are able, subsequent to the forecast, to make an observation of whether or not there was rainfall in sufficient quantity to be designated "rain" during the hours designated "tomorrow".If we view the event as binary, the outcome is either true (it rained) or false (it did not rain).Suppose it rained.From the point of view of forecast evaluation, it would be natural to give a better rating to a preceding forecast-as above-that rain was rather more likely than not (p = 0.7), than one that rain was less likely (i.e., a smaller p).Quantitative methods for the calculation of such ratings in the context of forecast evaluation are called scoring rules [1].This article discusses scoring rules for probabilistic forecasts.We will restrict our attention to the evaluation of forecasts for events with binary outcomes.Note that meteorologists often refer to forecast evaluation as forecast verification (e.g., [2]).
It is convenient to think of a scoring rule as a means of attaching a penalty score to a forecast; the better the forecast, the smaller the penalty (e.g., [3]).Returning to the example of a forecast of rain tomorrow with probability p = 0.7, the Brier score [4] is (1 − p) 2 = 0.09 if rain is subsequently observed and (0 − p) 2 = 0.49 if not.The logarithmic score (an early discussion is given in [5]) is ( ) .36 if rain is subsequently observed, and 20 if not (we will use natural logarithms throughout).In practice, meteorologists are usually interested in the evaluation of a forecast scheme based on the average score for a data set comprising a sequence of forecasts and the corresponding observations.The Brier score and the logarithmic score apply different penalties; most notably, the logarithmic score attaches larger penalties than does the Brier score to forecasts for which p is close to 0 or 1 when the outcome viewed as unlikely on the basis of the forecast turns out subsequently to be the case.However, both scoring rules are "strictly proper" [6,7].
In the case of binary events, strictly proper scoring rules allow a statistical decomposition of the overall score into terms that further characterize a forecast [8].Murphy [9] provided a statistical decomposition of the Brier score into three components, which he termed uncertainty, reliability and resolution (see also [10]).Weijs et al. [11,12] provided a further analysis of the logarithmic score, resulting in the divergence score and its statistical decomposition into the equivalent three components.The cited articles discuss uncertainty, reliability and resolution in detail.
Gneiting and Katzfuss [13] provide an analytical overview of probabilistic forecasting.One way of looking at the present article is as a complement to recent analytical innovations in forecast evaluation [11,12].Using Bregman divergences, we provide a new calculation template for analysis of the Brier score and the divergence score, and new explanatory diagrams.Our objective in so doing is to provide an analysis with a straightforward diagrammatic interpretation as a basis for the evaluation of probabilistic forecasts in environmental applications where meteorological factors are important drivers of biological, physical and chemical processes of interest.The present article is set out as follows.We introduce an example meteorological data set that is available in the public domain, and review the original analysis based on the Brier score.Following a brief discussion of the use of zero and one as probability forecasts, there is further analysis of both the Brier score and the divergence score for this data set.We then introduce our approach to the Brier score and the divergence score based on Bregman divergences, and provide examples of the calculations of the scores and their statistical decompositions.In a final discussion, we briefly mention the potential application of probabilistic forecasting to modelling of N2O emissions from agricultural soils at the within-season time-scale.

Data, Terminology, Notation
In the interests of producing an analysis that allows a straightforward diagrammatic representation, we will restrict our attention here to binary outcomes.We discuss the evaluation of probability forecasts using a data set that is in the public domain.The full data set comprises 24-h and 48-h forecasts for probability of daily precipitation in the city of Tampere in south-central Finland, as made by the Finnish Meteorological Institute during 2003; together with the corresponding daily rainfall records [14].Our analysis here is based on the 24-h rainfall forecasts.The forecasts given in [14] were made for three rainfall categories, but here, as in the original analysis, the two higher-rainfall categories were combined in order to produce a binary forecast: probability of no-rain (≤0.2 mm rainfall) and probability of rain (otherwise).The observations were recorded as mm precipitation but for the purpose of forecast evaluation (again as in the original analysis) the observed rainfall data were combined into the same two categories as the forecasts: observation of no-rain (≤0.2 mm rainfall) and observation of rain (otherwise).After excluding days for which data were missing, the full record comprised N = 346 probability forecasts (denoted pt) and the corresponding observations (ot), t = 1, …, N, with ot = 0 for observation of no-rain and ot = 1 for observation of rain.
The Brier score for an individual forecast is ( ) ( ) . This is the definition given in the original data analysis, retained for consistency.For the original data analysis the probability forecasts utilized eleven "allowed probability" forecast categories: for k = 1,…,11; pk = 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9 and 1 (pk denotes the forecast probability of rain in category k, thus the forecast probability of no-rain is its complement 1 − pk).The number of observations in each category is denoted nk and the number of observations of rain in each category is denoted ok.The average frequency of rain observations in category ; resolution, RESBS = ( ) ; uncertainty, ).The components of the decomposition of the Brier score are: reliability, RELBS = 0.0254; resolution, RESBS = 0.0602; uncertainty UNCBS = 0.1793.As required, RELBS -RESBS + UNCBS = BS and the summary of results provided along with the original data set [14] is thus reproduced.

Probability Forecasts of Zero and One
In the original data set, the probability forecasts include pk = 0 (for category k = 1) and pk = 1 (for category k = 11); in words, respectively, "it is certain there will be no rain tomorrow" and "it is certain there will be rain tomorrow".Such forecasts can present problems from the point of view of evaluation.Whereas probability forecasts 0 < pk < 1 explicitly leave open the chance that an erroneous forecast may be made, probability forecasts pk = 0 and pk = 1 do not.The question that then arises is how to evaluate a forecast that was made with certainty but then proves to have been erroneous.This is not a hypothetical issue, as can be seen in the original data set.For category k = 1 (pk = 0), we note that 1 out of the 46 forecasts made with certainty was erroneous, while for category k = 11 (pk = 1), we note that 2 out of 13 forecasts made with certainty were erroneous [14].If such an outcome were to occur when the logarithmic (or divergence) score was in use, an indefinitely large penalty score would apply.In routine practice our preference is to avoid the use of probability forecasts pk = 0 and pk = 1 (as a rule of thumb: only use a probability forecast of zero or one when there is absolute certainty of the outcome).There is a price to be paid for taking this point of view, which we discuss later.Notwithstanding, for further analysis in the present article, we will replace the probability forecast for category k = 1 by pk = 0.05 (instead of zero) and the probability forecast for category k = 11 by pk = 0.95 (instead of one) (the observations remain unchanged).A summary of the data set incorporating this adjustment (to be used exclusively from this point on) is given in Table 1.

The Divergence Score and its Decomposition
Weijs et al. [11,12] provide informative background on the provenance of the divergence score, and a detailed analysis of its derivation.We refer interested readers this work, and present here only enough details to illustrate a template calculation of the score and its reliability-resolution-uncertainty decomposition.The divergence score is based on the Kullback-Leibler divergence, a kind of measure of distance between two probability distributions [15,16].For binary forecasts and the corresponding observations, all the distributions required for calculating the divergence score and its decomposition are Bernoulli, so we can write: where variable x is a place-holder and, in our analysis, represents particular comparison and reference values (here, xc and xr, respectively) that will be replaced by a probability or a frequency, ranged between zero and one.The distribution (xc, 1 − xc) is referred to as the comparison distribution, and the distribution (xr, 1 − xr) is referred to as the reference distribution.Note that and that the divergence is not necessarily symmetric with respect to the arguments.For the purpose of numerical calculation, recall that ( ) ( ) = 0.4471.
The components of the decomposition of the divergence score are calculated as follows: reliability, uncertainty (which in this case is characterized by the binary Shannon entropy [17]), UNCDS ( ) = 0.5442.Then we have (for full details see Appendix, Table 2):

Forecast Evaluation via Bregman Divergences
Here we discuss forecast evaluation for the example data set via the Brier score and the divergence score, but using a different route through the calculations.Using Bregman divergences [18,19], our calculations lead to identical numerical results to those outlined above, in terms of the scores and their decompositions.What we gain by the analysis presented here is a set of diagrams which usefully complement those used by Weijs et al. [11,12] to illustrate the statistical decomposition both of the Brier score and the divergence score.This is possible because of the availability of a simple diagrammatic format for the illustration of Bregman divergences (e.g., [19,20]).So, by expressing reliability, resolution and score as Bregman divergences, we are able to illustrate these quantities directly as distances on graphical plots.In addition, this approach enables us to write down the Brier score and the divergence score and their corresponding decompositions in a common format, thus clearly demonstrating their analytical equivalence.
Bregman divergences are properties of convex functions.In particular, the squared Euclidean distance (on which the Brier score is based) is the Bregman divergence associated with f(x) = x 2 and the Kullback-Leibler divergence (on which the divergence score is based) is the Bregman divergence associated with f(x) = x•ln(x) + (1 − x)•ln(1 − x) (the negative of the binary Shannon entropy function).
Generically, a tangent to the curve ( ) is drawn at xr (the reference value).The Bregman divergence between the tangent and the curve at xc (the comparison value) is then, for scalar arguments: in which ( ) is the slope of the tangent at xr. Recall that 0 ≤ xc ≤ 1, 0 ≤ xr ≤ 1; and note that and that the divergence is not necessarily symmetric with respect to the arguments.
Where necessary for calculation purposes, we take as previously.

Scoring Rules as Bregman Divergences
3.1.1.Brier Score and Divergence Score Diagrams for Individual Forecast Categories  3 and 4 3A, Appendix) is smaller than the score for comparison value o = 1 (DB = 0.36, Table 3B, Appendix).From Figure 1B (divergence score) we can see that for reference value pk = 0.4 the score for comparison value o = 0 (DB = 0.5108, Table 4A, Appendix) is smaller than the score for comparison value o = 1 (DB = 0.9163, Table 4B, Appendix).In each case this is as we require, because the forecast probability pk = 0.4 is closer to o = 0 than to o = 1.That is, a forecast of pk = 0.4 gets a better evaluation score if o = 0 is subsequently observed than if o = 1 is subsequently observed.
To calculate directly as Kullback-Leibler divergences the divergence scores for individual forecast categories as illustrated in Figure 1B, we have: = 0.9163.

Overall Scores
For the Brier score, the Bregman divergence for each individual forecast category (as calculated via Equation ( 3)) is the squared Euclidean distance between o (the comparison value, where the divergence is calculated) and pk (the reference value, where the tangent is drawn) (Appendix, Table 3).For the divergence score, the Bregman divergence for each individual forecast category (as calculated via Equation ( 3)) is the Kullback-Leibler divergence between o (the comparison value, where the divergence is calculated) and pk (the reference value, where the tangent is drawn) (Appendix, Table 4).In each case, the overall score for a forecast-observation data set is calculated as a weighted average of the individual Bregman divergences.For the Brier score, we have 49.9375/346 = 0.1440; for the divergence score we have .6859/346 = 0.4471 (for full details see Appendix, Tables 3 and 4).( )  where the divergence is calculated) and pk (the reference value, where the tangent is drawn) (see Appendix, Table 5A).For the divergence score reliability, the Bregman divergence for each individual forecast category (as calculated via Equation ( 3)) is the Kullback-Leibler divergence between k o and pk (see Appendix, Table 5B).In each case, the overall reliability score for a forecastobservation data set is calculated as a weighted average of the individual Bregman divergences.For the Brier score, we have ( ) .6204/346 = 0.0249; for the divergence score, we have ( ) .6440/346 = 0.0712 (for full details see Appendix, Table 5).

Interpreting Reliability
First, recall that reliability is defined so that smaller is better: perfect reliability corresponds to an overall reliability score equal to zero.From the formulation of the Bregman divergence ( ) , we can see that this occurs when for all k categories (see Appendix, Table 5).In fact, since for all k categories for an overall reliability score equal to zero.
What this tells us is that for perfect reliablity of our probability forecast, the average frequency of rain observations in each category must be equal to the probability forecast for that category.In practice, we typically accept (small) deviations of k o from pk that contribute a small ( ) to the overall calculation of RELBS or RELDS.

Overall Resolution
For the Brier score resolution, each individual Bregman divergence (as calculated via Equation ( 3)) is the squared Euclidean distance between k o (the comparison value, where the divergence is calculated) and o (the reference value, where the tangent is drawn) (see Appendix, Table 6A).For the divergence score resolution, each individual Bregman divergence (as calculated via Equation ( 3)) is the Kullback-Leibler divergence between o and k o (see Appendix, Table 6B).In each case, the overall resolution score for a forecast-observation data set is calculated as a weighted average of the individual Bregman divergences.For the Brier score, we have ( ) 20.8205/346 = 0.0602; for the divergence score we have, ( ) 58.2471/346 = 0.1683 (for full details see Appendix, Table 6).

Interpreting Resolution
Recall that resolution is defined so that larger is better.If forecasts and observations were independent (which is least desirable), resolution would be equal to zero; if forecasts were perfect (which is most desirable), resolution would be equal to uncertainty.Note that the conditions under which resolution is equal to uncertainty also fulfil the conditions for perfect reliability, equal to zero (as above, in the context of interpreting reliability).Resolution depends on our ability to define forecast categories for which the observed frequencies k o are different from the overall average frequency o , such that the average for a forecast category provides a better prediction of the eventual outcome than the average over all forecast categories.For both the Brier score and the divergence score, if any k o is equal to o , then the corresponding resolution component is equal to zero.If o o k = for all k, then overall resolution is equal to zero.Consider first the scenario in which-as in the initial analysis of the original data set-probability forecasts of pk = 0 and pk = 1 are allowed.Further, let us suppose that all 265 observations of no-rain followed forecasts of pk = 0 (in which case 0 ) and all 81 observations of rain followed forecasts of pk = 1 (so = 188.2875/346= 0.5442 = UNCDS.That is to say, if we were to allow probability forecast categories pk = 0 and pk = 1, then use them exclusively in making forecasts and do so without error, resolution would be equal to uncertainty (i.e., RESBS = UNCBS and RESDS = UNCDS).Now consider instead the scenario in which-as in our analysis of the adjusted data set-the most extreme allowed probabilities are pk = 0.05 and pk = 0.95.Now, the best resolution we can achieve is if all 265 observations of no-rain followed forecasts of pk = 0.05 (in which case = 130.5177/346 = 0.3772.Thus, the price we pay for restricting the extreme allowed probabilities to pk = 0.05 and pk = 0.95 is to reduce the achievable upper limit of resolution. In the present example the notional upper limit is reduced to about 80% of uncertainty for calculations based on squared Euclidean distance, and about 70% of uncertainty for calculations based on Kullback-Leibler divergence.The difference arises because of the larger penalty score that accrues with extreme discrepancies between forecast and observation for the divergence score compared with the Brier score (as mentioned in the Introduction).
We note in passing that overall resolution, as formulated, may be characterized as a Jensen gap [21] for a convex function.Banerjee et al. [22] refer to this as the Bregman information.Thus generically we have ( ) ( ) , and in particular here, Then, with f(x) = x 2 (for the Brier score) we have

Uncertainty
We select an uncertainty function appropriate for the analysis, depending on the chosen convex function and its associated Bregman divergence.For the Brier score, uncertainty is calculated as the value of the uncertainty function (the Bernoulli variance) at o : UNCBS = ( ) 4A).For the divergence score, uncertainty is calculated as the value of the uncertainty function (the binary Shannon entropy) at o : UNCDS = ( ) = 0.5442 (Figure 4B).We interpret uncertainty as a quantification of our state of knowledge in the absence of a forecast, so based only on the data set from which overall average frequency of rain observations o is calculated.

Overview
Theil [23] used a logarithmic scoring rule to describe the inaccuracy of predictions, but also found it convenient to write prediction errors directly in terms of the difference between the observed and forecast probabilities.This was achieved by use of a Taylor series expansion to write a logarithmic scoring rule in terms of a quadratic approximation.More recently, Benedetti [24] has attributed the lasting application of the Brier score in forecast evaluation to its being an approximation of the logarithmic score; however, an analysis leading to the Brier score as an approximation of the logarithmic score does not reveal a hierarchy in which the latter is in some way more fundamental than the former (cf.[25]).
For an individual probability forecast, with pk an allowed probability and corresponding observation, we can calculate the scoring rule: (see Figure 1).Equation ( 4) calculates either the Brier score or the divergence score, depending on our choice of convex function on which to base the Bregman divergence.For a data set comprising a number of forecasts and corresponding observations, we calculate the overall score as for either the Brier score or the divergence score.On this basis, neither scoring rule is inherently superior to the other.However, it is possible to establish further criteria against which the properties of such scoring rules may be judged [24].The statistical decomposition of the scoring rule in Equation ( 4) also has a common format: (5) (see Figures 2 and 3, respectively, for example illustrations of components of REL and RES; and Figure 4 for an illustration of UNC, which does not vary with k).Again, it is only the choice of convex function (and corresponding choice of an appropriate uncertainty function) that distinguishes the calculation of the components of the Brier score from those of the divergence score.For a data set comprising a number of forecasts and the corresponding observations, we calculate the overall reliability and overall resolution scores, respectively, as We can compare the information-theoretic analysis of a boundary-line model by Topp et al. [26] with the present analysis.When, as in [26], forecast probabilities are based on retrospectivelycalculated relative frequencies, reliability is equal to zero (i.e., perfect reliability), uncertainty is equal to the Shannon entropy, and resolution is equal to the expected mutual information.In such a retrospective analysis, a normalized version of expected mutual information may be calculated as a measure of the proportion of uncertainty in the observations that is explained by the forecasts.

Discussion
Figure 5 shows a diagrammatic summary of the overall divergence score and its components (see also Equation ( 2)), based on calculations using the example data set.Here, uncertainty (UNC) is characterized by the binary Shannon entropy at the overall average frequency of rain observations, 2341 .0 = o .In this context, we can think of entropy as a measure of the extent of our uncertainty before use of the forecaster.A useful intuitive interpretation of reliability (REL) can be gained from the data summary set out in Table 1.There, the probabilities pk represent the allowed probability forecasts for rain.For a perfectly reliable forecaster, the observed frequencies of rain events, , will be equal to pk in each category k; then REL = 0. Resolution (RES) is a measure of the extent to which the forecaster accounts for uncertainty (but not reliability), i.e., RES ≤ UNC.As mentioned above, in the case of the divergence score, resolution is characterized by expected mutual information.Then, the divergence score (DS) characterizes the uncertainty not accounted for by the forecaster (UNC -RES) together with the reliability (REL), so that DS = UNC -RES + REL.The evaluation of probabilistic weather forecasts is primarily of interest to meteorologists, of course; but the methodology for evaluation of probabilistic forecasts is also applicable more widely in those situations where weather factors are identified as drivers of processes contributing to risk.Weather factors are important drivers of N2O emissions from agricultural soils, but studies of management interventions aimed at greenhouse gas mitigation have mainly been concerned with emissions inventory, and mitigation options tend to be assessed on an integrated seasonal time-scale [27,28].An interesting example of the potential for a probabilistic approach to describing short-term N2O flux dynamics was offered in discussion of a modelling study by Hawkins et al. [29], as follows: "The model depicts a realistic positive emissions response to soil moisture at the mean values of the other factors.This reflects the general understanding that N efficiency, in terms of lower N2O emission, may be promoted by drier conditions.The WETTEST and DRIEST scenarios were simulated to investigate the magnitude of this efficiency difference.Although these scenarios are hypothetical because in practice the wettest or driest day in a week in terms of soil moisture is not known until the end of the week, they are analogous to spreading fertiliser before or after a rainfall event."We note here that although the wettest and driest day in a week in terms of soil moisture may only be known retrospectively, weather forecasts provide (probabilistic) advance warning of rainfall events.Rees et al. [28] highlight the importance of reducing the supply of nitrogen in the context of greenhouse gas mitigation, so that management interventions with potential to increase nitrogen-use efficiency are of interest.Increasing nitrogen-use efficiency ought to represent a contribution to measures that, in relation to mitigation, reduce both greenhouse gas emissions and farm costs, constituting a "win-win" scenario [30].The goal therefore is practical implementation of meteorological information, in the form of forecasts that could be incorporated into decision making for within-season environmental management interventions.This depends first on our ability to show that such forecasts have the required levels of reliability and resolution, using appropriate evaluation methodology as outlined here.

Appendix
The Appendix contains the tables of results referred to in the text.
and the overall average frequency of rain observations is N O o = .The components of the decomposition of the Brier score are as follows: reliability, RELBS = ( ) which is the Bernoulli variance); and then BS = RELBS -RESBS + UNCBS.For the original data set, we calculate the Brier score: BS = 0.1445 (all calculations are shown correct to 4 d.p.
for an individual forecast is the Kullback-Leibler divergence between the observation

Figure 1
Figure 1 shows examples of scoring rules as Bregman divergences in diagrammatic form, for pk = 0.4 and an observation { } 1 , 0 ∈ o (see Appendix, Tables3 and 4, category k = 5, for details of , category k = 5, for details of calculations based on Equation (3)).For individual forecasts, smaller divergences (scores) are better, and from Figure 1A (Brier score) we can see that for reference value pk = 0.4 the score for comparison value o = 0 (DB = 0.16, Table

Figure 1 .
Figure 1.Scoring rules as Bregman divergences.The long-dashed curve is a convex function of p, the solid line is a tangent to the convex function at the reference value of p (pk) indicated by a short-dashed line between the curve and the horizontal axis.The short-dashed lines between the curve and the tangent indicate the Bregman divergence at the comparison values of o (these lines coincide with sections of the vertical axes of the graphs, at comparison values o = 0 and o = 1).(A) Brier score (for calculations see Appendix, Table3, k = 5).For this example, a tangent to the convex function f(p) = p 2 is drawn at probability forecast of rain pk = 0.4.The score for this forecast depends on the subsequent observation.If no-rain is observed, the score is the Bregman divergence at o = 0, which is 0.16.If rain is observed, the score is the Bregman divergence at o = 1, which is 0.36.Bregman divergences for other forecastobservation combinations are given in the Appendix, Table3.The overall score for a forecastobservation data set is calculated as a weighted average of the individual Bregman divergences; (B) Divergence score (for calculations see Appendix, Table4, k = 5).For this example, a tangent to the convex function f(p) = p•ln(p) + (1 − p)•ln(1 − p) is drawn at probability forecast of rain pk = 0.4.The score for this forecast depends on the subsequent observation.If no-rain is observed, the score is the Bregman divergence at o = 0, which is 0.5108.If rain is observed, the score is the Bregman divergence at o = 1, which is 0.9163.Bregman divergences for other forecast-observation combinations are given in the Appendix, Table4.The overall score for a forecast-observation data set is calculated as a weighted average of the individual Bregman divergences.

1 .
Figure 2 shows examples of reliability components as Bregman divergences in diagrammatic form, for reference value pk = 0.6 and comparison value 2727 .0 = k o (see also Appendix, Table 5, category k = 7, for details of calculations based on Equation (3)).From Figure 2A (for the Brier score reliability component) ( ) k k B p o D = 0.1071.From Figure 2B (for the divergence score reliability component)

Figure 2 .
Figure 2. Reliability as a Bregman divergence.The long-dashed curve is a convex function of p, the solid line is a tangent to the convex function at the reference value of p (pk) indicated by a short-dashed line between the curve and the horizontal axis.A second shortdashed line, between the curve and the tangent, indicates the Bregman divergence at the comparison value of o (for calculations see Appendix, Table 5).Overall reliability for a forecast-observation data set is calculated as a weighted average of individual Bregman divergences.(A) Brier score reliability.For this example, a tangent to the convex function f(p) = p 2 is drawn at probability forecast of rain pk = 0.6.The reliability component depends on the corresponding k o , the average frequency of rain observations following such forecasts, which is 0.2727 for the example data set.The reliability component is the Bregman divergence at k o = 0.2727, which is 0.1071; (B) Divergence score reliability.For this example, a tangent to the convex function f(p) = p•ln(p) + (1 − p)•ln(1 − p) is drawn at probability forecast of rain pk = 0.6.The reliability component depends on the corresponding k o which is 0.2727 for the example data set.The reliability component is the Bregman divergence at k o = 0.2727, which is 0.2198.

Figure 3 .
Figure 3. Resolution as a Bregman divergence.The long-dashed curve is a convex function of o, the solid line is a tangent to the convex function at the reference value of o ( ) o indicated by a short-dashed line between the curve and the horizontal axis.A second short-dashed line, between the curve and the tangent, indicates the Bregman divergence at the comparison value of o (for calculations see Appendix, Table 6).Overall resolution based on a forecast-observation data set is calculated as a weighted average of the individual Bregman divergences.(A) Brier score resolution.For this example, a tangent to the convex function f(o) = o 2 is drawn at the overall average frequency of rain observations, o = 0.2341.The components of resolution are calculated for each particular k o , the average frequency of rain observations in each category.For k = 9, k o = 0.6667 for the example data set.The corresponding resolution component is the Bregman divergence at k o = 0.6667, which is 0.1871; (B) Divergence score resolution.For this example, a tangent to the convex function f(o) = o•ln(o) + (1 − o)•ln(1 − o) is drawn at the overall average frequency of rain observations, o = 0.2341.The components of resolution are calculated for each particular k o , the average frequency of rain observations in each category.For k = 9, k o = 0.6667 for the example data set.The corresponding resolution component is the Bregman divergence at k o = 0.6667, which is 0.4204.
we calculate resolution based on squared Euclidean distance, we have RESBS = .Alternatively, if we calculate resolution based on the Kullback-Leibler divergence, we have

Figure 4 .
Figure 4. Uncertainty functions.The long-dashed curves are uncertainty functions, u(o); the short dashed lines indicate o (= 0.2341 for the example data set) and the corresponding value of ( ) o u .(A) The Bernoulli variance u(o) = o•(1 − o).For the example

Figure 5 .
Figure 5.The overall divergence score and its components.The overall divergence score is denoted DS, with components uncertainty (UNC), reliability (REL) and resolution (RES), such that DS = UNC -RES + REL, with RES ≤ UNC as indicated by the vertical dashed line.
the tangent to f(p) at p k ; ( ) k o f − f(p k ) − (

Table 1 .
Summary of the data set.a

Table 3
The score for this forecast depends on the subsequent observation.If no-rain is observed, the score is the Bregman divergence at o = 0, which is 0.16.If rain is observed, the score is the Bregman divergence at o = 1, which is 0.36.Bregman divergences for other forecastobservation combinations are given in the Appendix, Table3.The overall score for a forecastobservation data set is calculated as a weighted average of the individual Bregman divergences; (B) Divergence score (for calculations see Appendix, , k = 5).For this example, a tangent to the convex function f(p) = p 2 is drawn at probability forecast of rain pk = 0.4.

Table 4
, k = 5).For this example, a tangent to the convex function f(p) = p•ln(p) + (1 − p)•ln(1 − p) is drawn at probability forecast of rain pk = 0.4.The score for this forecast depends on the subsequent observation.If no-rain is observed, the score is the Bregman divergence at o = 0, which is 0.5108.If rain is observed, the score is the Bregman divergence at o = 1, which is 0.9163.Bregman divergences for other forecast-observation combinations are given in the Appendix, Table4.The overall score for a forecast-observation data set is calculated as a weighted average of the individual Bregman divergences.

Table 2 .
Decomposition of the Brier score and the divergence score.a /n k ; n k /N, normalized frequency of observations; REL BS , k (components of REL BS ) = b Column sums: 346

Table 4 .
Divergence score calculation via Bregman divergence.a

Table 5 .
Reliability calculation via Bregman divergence.a Notation: k, forecast category index; p k , probability forecast for rain (reference value, at which the tangent is calculated), probability forecast for no-rain is the complement; k o , average frequency of rain observations a (comparison value, at which the divergence is calculated); n k , number of observations;( )

Table 6 .
Resolution calculation via Bregman divergence.a Notation: k, forecast category index; o , overall average frequency of rain observations (see Table 2) (reference value, at which the tangent is calculated); k o , average frequency of rain observations (comparison value, at which the divergence is calculated); n k , number of observations; ( ) a o f ′ , slope of the tangent to f(o) at o ; ( )