Gutenberg–Richter B-Value Time Series Forecasting: A Weighted Likelihood Approach

: We introduce a novel approach to estimate the temporal variation of the b-value parameter of the Gutenberg–Richter law, based on the weighted likelihood approach. This methodology allows estimating the b-value based on the full history of the available data, within a data-driven setting. We test this methodology against the classical “rolling window” approach using a high-deﬁnition Italian seismic catalogue as well as a global catalogue of high magnitudes. The weighted likelihood approach outperforms competing methods, and measures the optimal amount of past information relevant to the estimation.


Introduction
The Gutenberg-Richter law [1] represents the building block of modern seismology, widely adopted by seismologists to describe the relationship between the magnitude and frequency of seismic events [2]. The formulation of this law reads: (1) where N is the number of events of magnitude greater than the threshold M, and a and b (usually called "b-value") are the two parameters that characterize the magnitudefrequency relation. The b-value regulates the frequency of smaller earthquakes relative to larger ones: the larger the b-value, the higher the frequency of smaller events. Numerous studies demonstrate that the b-value relates to the differential stress of Earth's crust: highly stressed zones, or faults, usually exhibit low b-values, whereas weakly stressed areas usually exhibit higher b-values [3][4][5].
In a recent study, Gulia et al. [6] show that during aftershock sequences the b-value measured nearby the mainshock's fault tends to increase with respect to its "background" value. This increase occurring during aftershock sequences is a direct consequence of the decrease of the differential stress in the fault due to the mainshock; a similar increase in b-value is observed near patches that experienced the largest slip after the Tohoku 2011 earthquake [4].
Nevertheless, a decrease in b-value on the mainshock's fault can indicate that the strongest event of the sequence has not yet occurred, and thus this information could be useful to forecast future stronger earthquakes [7].
For these reasons, analyzing the time series of the b-value can be a powerful tool to enhance seismologists' forecasting capability.
Statistical methods that estimate this value rely on the probabilistic representation of the Gutenberg-Richter law, which interpret the b-value as the rate parameter of an Forecasting 2021, 3 562 exponential distribution [8]. Currently, the widespread approach to estimate the b-value considers a fixed number of events, commonly in the range of 50 to 400 events [7,9]. However, since the earthquake rate can change very rapidly during a seismic sequence (and in general in any seismic catalogue), using a fixed number of events generally leads to time windows of unequal length. This unequal length leads to a difficult interpretation of the b-value time series. During a period of high seismic activity, e.g., the beginning of an aftershock sequence, more than 100 events can happen in few hours, while in a period of low seismic activity, e.g., the end of an aftershock sequence, a similar number of events can take up to a year to realize. This paper introduces a robust statistical framework to compute the temporal variation of the b-value, avoiding the subjective choice of the number of events to use in the estimation. In addition, this framework allows proper consideration of the natural variability of the seismic rate. We adopt the weighted likelihood approach [10,11], previously used to compute the spatial variation of b-value [12], to estimate the time variation of the parameter, along with the associated uncertainty. We test our model both on an Italian high-resolution catalogue of natural seismicity [13], and on a global catalogue [14,15], and we evaluate the estimation accuracy against the classical fixed-number estimation approach.

Methods
Aki [8] and Utsu [16] show that the Gutenberg-Richter law corresponds to an exponential distribution, and the b-value can be estimated by the maximum likelihood (ML) approach as follows:b where M, M min and ∆M are the mean magnitude of events in the catalogue, the completeness magnitude, and the binning of the magnitude, respectively. To capture the possible time-varying dynamic of the b-value, we propose to employ the weighted likelihood (WL) method of Hu and Zidek [10]: where W LL is the weighted log-likelihood, f (x) = βe −βx is the probability density function (pdf) of an exponential distribution with rate parameter β = ln (10) b, X i = M i − M min , and W i is the weight associated with the i-th event with a magnitude M i in the catalogue. Taroni et al. [17] show that Equation (3) can be generalized to accommodate weights as follows: The standard deviation ofb also depends on the weights: However, while Taroni et al. [17] investigate the spatial variation of the b-value (e.g., across locations), here we place our focus on its variation over time. We devise a weighting scheme that depends on the time lag between the i-th event, and the first event reported in the catalogue (∆t i ), ensuring W i > 0, ∀i. Specifically, we consider the following exponential kernel: Forecasting 2021, 3 563 with α being the forgetting parameter and κ being a normalizing constant to ensure ∑ n i=1 W i = 1. The parameter α gauges the amount of past information that is relevant to estimate the weights. This forgetting factor regulates the steepness of the rate at which past information is discounted. To provide an intuition for this, in Figure 1 we report the cumulative weights associated with different forgetting factors. As α increases, the relative weights associated with information far in the past decrease, suggesting that only recent events bear the relevant signal. For a forgetting factor of 0.01, almost 85% of past observations receive a cumulative weight greater than 0.1, while this reduces to only about 7% for α = 0.5.
reported in the catalogue (Δt ), ensuring > 0, ∀ . Specifically, we consider the following exponential kernel: with α being the forgetting parameter and κ being a normalizing constant to ensure ∑ = 1. The parameter α gauges the amount of past information that is relevant to estimate the weights. This forgetting factor regulates the steepness of the rate at which past information is discounted. To provide an intuition for this, in Figure 1 we report the cumulative weights associated with different forgetting factors. As α increases, the relative weights associated with information far in the past decrease, suggesting that only recent events bear the relevant signal. For a forgetting factor of 0.01, almost 85% of past observations receive a cumulative weight greater than 0.1, while this reduces to only about 7% for α = 0.5. At this point, we can rewrite Equation (4) as: The forgetting factor α is estimated using a grid-search approach. That is, we generate a grid of N α plausible values for α, i.e., α = [α 1 , α 2 … , α i , …,α N α ], with α ∈ ℝ and α 1 < α 2 < ⋯ < α i < ⋯ < α N α , and we evaluate the log-likelihood of the model for each value. The optimal α i is found in the correspondence of the maximum of the log-likelihood function. We outlined that the value of α also depends on the unit of measurement chosen for the time Δt i and on the total number of events in the catalogue.
Current methods rely on using a fixed number of the most recent observations to capture the evolution over time of the b-value, the so-called rolling window estimation. This approach, despite proving useful in several other disciplines, e.g., finance and economics [18], imposes an equal weight on every observation in the estimation sample, which directly depends on the choice of the number of observations used. Thus, the validity of these results heavily hinges upon researchers' inputs. At this point, we can rewrite Equation (4) as: The forgetting factor α is estimated using a grid-search approach. That is, we generate a grid of N α plausible values for α, i.e., α = [α 1 , α 2 . . . , α i , . . . ,α N α ], with α i ∈ R and α 1 < α 2 < · · · < α i < · · · < α N α , and we evaluate the log-likelihood of the model for each value. The optimal α i is found in the correspondence of the maximum of the log-likelihood function. We outlined that the value of α also depends on the unit of measurement chosen for the time ∆t i and on the total number of events in the catalogue.
Current methods rely on using a fixed number of the most recent observations to capture the evolution over time of the b-value, the so-called rolling window estimation. This approach, despite proving useful in several other disciplines, e.g., finance and economics [18], imposes an equal weight on every observation in the estimation sample, which directly depends on the choice of the number of observations used. Thus, the validity of these results heavily hinges upon researchers' inputs.
In contrast, the WL methodology posits a weighting scheme that assigns a weight to each available observation, and optimally sets the rate at which past information is discounted. Furthermore, the WL estimator accounts for unequally spaced observation, allowing events that occurred after long periods of quiet to receive higher weights.
We compare the performance between the WL approach and the rolling window estimation, considering several window sizes, through a pseudoprospective test. That is, we divide the catalogue into two subsamples of equal size, and we use the first to estimate the forgetting parameter α. In the second subsample, we use the weighting scheme obtained from the training sample to estimate the b-value. We compare this approach to several rolling window specifications by means of the Bayes Factor (BF), computed as the ratio of the predictive densities [19]. These are calculated as the likelihood function (L) of each observation in the second subsample, where the b-value is estimated by optimally weighting all the preceding; similarly, for the rolling window approach, the log-likelihood function is computed using the b-value estimated on a fixed number of previous observations. Hence: where X = {x 1 , . . . , x N } represents the set of magnitudes in the catalogue as in Equation (3), b W L and b RW are the b-values estimated with the weighted likelihood and rolling window methods, respectively. A BF > 1 provides statistical evidence in favor of the WL approach, while BF < 1 would suggest otherwise; Kass and Raftery [20] propose a threshold approach to determine the strength of the evidence in favor of either statistical model.

Data
We applied our methodology to two different catalogues. The first one is a highdefinition seismic catalogue concerning the central area of Italy (TABOO catalogue, Valoroso et al. 2017) [13]; the catalogue contains natural seismicity observations for about 4.5 years, from 2010 to 2014. This dataset is particularly suitable to study the time variation of the b-value as the available events were small in magnitude, being the maximum magnitude of ML 3.81. In fact, strong events (e.g., earthquake with magnitude higher than 5.5) might lead to incompleteness problems in the first days of the associated aftershock sequences [21,22], and thus induce bias in the estimation of the b-value [12,23]. The completeness magnitude of the catalogue we used is ML 0.5 [13], with 6453 events above the completeness threshold, as illustrated in Figure 2a. Following Marzocchi et al. [12], we verified the reliability of the completeness by testing the exponentiality of the magnitudes through a Lilliefors test [24]. A completeness of 0.5 is strongly supported by the data, with a p-value of 0.26.
The second catalogue we used is the global Centroid Moment Tensor catalogue (CMT, [14,15]). This catalogue contains worldwide events of stronger magnitude, from 1980 to 2019. We selected earthquakes with a minimum magnitude of Mw 5.5 [3], and a maximum depth of 50 km, in the Tonga trench zone, one of the most active seismic areas in the world. We refrained from using the whole global catalogue to focus on temporal variations of the b-value in a particular region. As for the previous case, we verified the reliability of the completeness by testing the exponentiality of the magnitudes. The completeness of 5.5 is strongly supported by the data, with a p-value of 0.17. This catalogue contains 1007 events above the completeness threshold, as illustrated in Figure 2b. The high level of completeness, Mw 5.5, allowed us to evaluate our method with data robust to the incompleteness problems in the first days of the aftershock sequences (usually related to smaller events, i.e., <Mw 5.0).

Results
We initially applied our method to the TABOO catalogue. We used the first subset of the catalogue, containing the first half of the data, as a training sample, estimating a forgetting factor α = 0.014 (see Figure 3a). We then used the testing sample, the second subsample, to forecast the evolution over time of the b-value, and to compare the forecasting performances against the rolling window method. As the number of events used for rolling window estimations ranged between 50 and 400, we used six different values to cover this range (50, 75, 100, 150, 200, 400). To assess the performances of the models, we computed the Bayes Factor as described in Equation (8), using the Bayes Factor thresholds of Kass and Raftery [20], for which "strong evidence" is associated to |ln(BF)| > 3. Results are summarized in Table 1. Four out of the six rolling windows we considered were significantly outperformed by the WL method, and in only one case the Bayes Factor provided weak evidence in favor of the competing benchmark.
In a similar fashion, we applied the WL method to the CMT catalogue, obtaining a forgetting factor α = 1.5 • 10 (see Figure 3b). Again, the WL method delivered good performances, achieving a "strong evidence" for two out of the six rolling windows (see Table 1).

Results
We initially applied our method to the TABOO catalogue. We used the first subset of the catalogue, containing the first half of the data, as a training sample, estimating a forgetting factor α = 0.014 (see Figure 3a). We then used the testing sample, the second subsample, to forecast the evolution over time of the b-value, and to compare the forecasting performances against the rolling window method. As the number of events used for rolling window estimations ranged between 50 and 400, we used six different values to cover this range (50, 75, 100, 150, 200, 400). To assess the performances of the models, we computed the Bayes Factor as described in Equation (8), using the Bayes Factor thresholds of Kass and Raftery [20], for which "strong evidence" is associated to |ln(BF)| > 3. Results are summarized in Table 1. Four out of the six rolling windows we considered were significantly outperformed by the WL method, and in only one case the Bayes Factor provided weak evidence in favor of the competing benchmark.
In a similar fashion, we applied the WL method to the CMT catalogue, obtaining a forgetting factor α = 1.5·10 −4 (see Figure 3b). Again, the WL method delivered good performances, achieving a "strong evidence" for two out of the six rolling windows (see Table 1). Forecasting 2021, 3 FOR PEER REVIEW 6

Discussion
The results reported in Table 1 highlight that the forecasting ability of the WL approach significantly and consistently outperforms classical, widespread methods by optimally choosing the amount of relevant information for estimating the b-value. In Figures 4 and 5 we illustrate how different estimation methods give rise to different time series of b-values. For both the 100 events (upper panels) and the 200 events (lower panels) case, the relative time series present higher volatility with respect to the one generated by the weighted likelihood estimation. Based on these results, it is possible to suspect that a large share of the variation visible in the rolling window time series is attributable to the natural statistical fluctuations of the b-value. Therefore, the weighted likelihood estimation represents a robust estimation method, able to filter out the non-physical fluctuations of the signal.
To further validate our methodology, we also compared the weighted likelihood approach with the novel MUST-B estimator introduced by García-Hernández et al. [25]. This estimator avoids subjective biases in the choice of the window size by taking the median values of different b-values estimated using rolling windows of different lengths. This method was successfully applied to the 2016-2017 central Italy seismic sequence; in our exercise, we retained the parameters' values described in García-Hernández et al. [25]. Bayes Factor values suggest that our methodology performed better than MUST-B, for both the TABOO (ln(BF) = 3.1), and the CMT (ln(BF) = 6.1) catalogues, respectively. It is, however, worth noting that tuning of the MUST-B approach refers to a single, different seismic sequence compared to the TABOO and CMT catalogues. A deeper investigation of the comparison between the WL and MUST-B is beyond the scope of this paper, and is left for future research avenues.
The TABOO catalogue that we used in our forecast exercise features a stable and time-independent magnitude of completeness of 0.5, with a maximum magnitude of 3.8.  Table 1. Bayes Factor (BF) for the comparison of fixed number b-value estimation and the weighted likelihood approach for TABOO (2nd column) and CMT (3rd column) catalogues; bold is used for "strong evidence" of the BF (ln(BF) > 3).

Discussion
The results reported in Table 1 highlight that the forecasting ability of the WL approach significantly and consistently outperforms classical, widespread methods by optimally choosing the amount of relevant information for estimating the b-value. In Figures 4 and 5 we illustrate how different estimation methods give rise to different time series of b-values. For both the 100 events (upper panels) and the 200 events (lower panels) case, the relative time series present higher volatility with respect to the one generated by the weighted likelihood estimation. Based on these results, it is possible to suspect that a large share of the variation visible in the rolling window time series is attributable to the natural statistical fluctuations of the b-value. Therefore, the weighted likelihood estimation represents a robust estimation method, able to filter out the non-physical fluctuations of the signal.
To further validate our methodology, we also compared the weighted likelihood approach with the novel MUST-B estimator introduced by García-Hernández et al. [25]. This estimator avoids subjective biases in the choice of the window size by taking the median values of different b-values estimated using rolling windows of different lengths. This method was successfully applied to the 2016-2017 central Italy seismic sequence; in our exercise, we retained the parameters' values described in García-Hernández et al. [25]. Bayes Factor values suggest that our methodology performed better than MUST-B, for both the TABOO (ln(BF) = 3.1), and the CMT (ln(BF) = 6.1) catalogues, respectively. It is, however, worth noting that tuning of the MUST-B approach refers to a single, different seismic sequence compared to the TABOO and CMT catalogues. A deeper investigation of the comparison between the WL and MUST-B is beyond the scope of this paper, and is left for future research avenues.
The TABOO catalogue that we used in our forecast exercise features a stable and time-independent magnitude of completeness of 0.5, with a maximum magnitude of 3.8.
The CMT catalogue, given its high threshold of completeness, Mw 5.5, is probably not affected by the incompleteness problems. For instrumental catalogues that contain strong events (i.e., events with a magnitude larger than 5.5) and with a relatively low magnitude of completeness (e.g., lower than 3.5), a proper estimation of such completeness is of paramount importance, in particular during aftershock sequences, and it strongly influences the b-value estimation [23].
A future improvement of the WL method could be the inclusion of temporal variations of the magnitude of completeness, using the b-value estimation approach of Taroni [26].
The CMT catalogue, given its high threshold of completeness, Mw 5.5, is probably not affected by the incompleteness problems. For instrumental catalogues that contain strong events (i.e., events with a magnitude larger than 5.5) and with a relatively low magnitude of completeness (e.g., lower than 3.5), a proper estimation of such completeness is of paramount importance, in particular during aftershock sequences, and it strongly influences the b-value estimation [23].
A future improvement of the WL method could be the inclusion of temporal variations of the magnitude of completeness, using the b-value estimation approach of Taroni [26].

Conclusions
In this paper, we introduce a novel method to estimate the temporal variations of the b-value of the Gutenberg-Richter law based on the weighted likelihood (WL) method. This framework avoids subjective judgment on the number of events to be used in the classical rolling window estimation, while exploiting the information content of the whole sample. We applied our new method both to a high-definition Italian seismic catalogue and to a global catalogue of strong events. Comparing the forecasting performances of the WL approach against the classical rolling window method using the Bayes Factor, our results show a clear advantage associated with the former: the WL method delivered significantly good performances with respect to the classical method. In line with the previous work of Taroni et al. [26], where the WL method is used to estimate the spatial variation of the b-value, we find evidence in favor of using this methodology also to estimate its temporal variation, in order to build b-value time series more coherent with the non-uniform temporal distribution of the seismicity.

Conclusions
In this paper, we introduce a novel method to estimate the temporal variations of the b-value of the Gutenberg-Richter law based on the weighted likelihood (WL) method. This framework avoids subjective judgment on the number of events to be used in the classical rolling window estimation, while exploiting the information content of the whole sample. We applied our new method both to a high-definition Italian seismic catalogue and to a global catalogue of strong events. Comparing the forecasting performances of the WL approach against the classical rolling window method using the Bayes Factor, our results show a clear advantage associated with the former: the WL method delivered significantly good performances with respect to the classical method. In line with the previous work of Taroni et al. [26], where the WL method is used to estimate the spatial variation of the b-value, we find evidence in favor of using this methodology also to estimate its temporal variation, in order to build b-value time series more coherent with the non-uniform temporal distribution of the seismicity.
Author Contributions: M.T., G.V. and A.D.P. conceived the analysis method, performed the data analysis, created the figures, and wrote the paper. All authors have read and agreed to the published version of the manuscript.