Global Precedent-Based Extrapolation Estimate of the M8+ Earthquake Hazard (According to USGS Data as of 1 June 2021)

The paper describes the algorithm and the results of the seismic hazard estimate based on the data of the seismological catalog of the US Geological Survey (USGS). The prediction algorithm is based on the search for clusters of seismic activity in which current activity trends correspond to foreshock sequences recorded before strong earthquakes (precedents) that have already occurred. The time of potential hazard of a similar earthquake is calculated by extrapolating the detected trends to the level of activity that took place at the time of the precedent earthquake. It is shown that the lead time of such a forecast reaches 10–15 years, and its implementation is due to the preservation and stability of the identified trends. The adjustment of the hazard assessment algorithm was carried out in retrospect for seven earthquakes (M8+) that had predictability in foreshock preparation. The evolution of the potential seismic hazard from 1 January 2020 to 1 June 2021 has been traced. It is concluded that precedent-based extrapolation assessments have prospects as a tool designed for the early detection and monitoring of potentially hazardous seismic activity.


Introduction
Our research is focused on assessing the predictive capabilities of the equation of Dynamics of Self-Developing Natural Processes (DSDNP equations) and developing algorithms for its practical use. The origin of the first versions of this equation occurred when A. Malyshev (one of the authors of this work) studied plastic deformations that preceded and accompanied the eruptions of Bezymyannyi Volcano in 1981-1984 [1]. These placative deformations were not accompanied by volcanic earthquakes. However, the analysis of changes in the volume of erupted material showed the presence of a direct (the more, the faster) avalanche-like development before the culmination of eruptions and a reverse (the less, the slower) avalanche-like development in the post-climactic eruptive process. Moreover, the absence of signs of an avalanche-like development indicated an upcoming calm (without climactic) eruption. These observations allowed A. Malyshev to successfully predict the directed blast of Bezymyannyi Volcano on 30 June 1985 [1-3], as well as a number of eruptions (with or without paroxysm) in 1986 to 1987 [1,3].
The expressions 'the more, the faster' and 'the less, the slower' are first-order differential equations expressed in words. In these equations, the rate of change in the state of the system depends on its current state. At the same time, the fact of self-development of the system is essential. As a result of the analysis of the patterns of self-development of a wide range of natural processes, it was concluded that, in the case of Self-Developing Systems (SDS), the forces that change their state arise due to their own energy of movement of these systems [1]: F x = C |E x − E 0 | γ = C |m x (x ) 2 /2 − m x (x 0 ) 2 /2| γ . Here, m x , F x and E x are, respectively, the "measure of inertness," the "force" and the "energy of motion" of the system with respect to parameter x; C is a constant of proportionality; γ is an exponent of nonlinearity and x and x 0 are the rates of change of the parameters in the current and stationary states, respectively. This conclusion was transformed into the DSDNP equation [4,5]: where x is the quantitative parameter of the process, x and x are the rate and acceleration of changes in this parameter at time t (the first and second derivatives), respectively, k is the proportionality coefficient and exponents λ and α describe the nonlinearity of the process near the stationary state (x ≈ x 0 ) and at a significant distance from it (x >> x 0 ). The above conclusion about the self-development of natural systems corresponds to DSDNP Equation (1) at λ = 2, α = 2γ and k = Cm x γ−1 . Equation (1) is difficult to use in practice. Therefore, in our research, we used a simplified version of the equation as an approximation model: Equation (2) corresponds to potentially catastrophic processes with a large range of changes in parameter x.
Equation (2) formally corresponds to the equation proposed by B. Voight [6] for describing the dynamics of brittle deformations (disjunctive dislocations) on the eve of the culmination of volcanic eruptions. The Voight equation is used in the Forecasting Failure Method (FFM). However, in earthquake forecasting, Equation (2) was used for the first time in the pioneering work by G. Papadopoulos [7]. The same equation is used in the Accelerated Moment Release (ARM) method using accumulated moment or Benioff strain [8][9][10]. Attempts to use these methods for predictive purposes have not had success. Moreover, both methods have been criticized. In particular, a number of papers [11][12][13] have claimed that the FFM method is biased and inaccurate even for a retrospective analysis. In turn, the statistical insignificance of the ARM method was justified in Reference [14], where it was shown that the ARM practice carries the hazard of identifying patterns that are not real but are created by choosing the free parameters for demonstration of the hypothesized pattern. This hazard is particularly high when the results are unstable.
Serious criticisms have led to a reduction in the number of attempts to use Equation (2) for predictive purposes. Currently, only a few researchers using the ARM method (primarily References [15,16]) are trying to cope with the criticism ('gravestone' on this issue) of J. Hardebeck and coauthors [14]. In addition, the Self-Developing Process (SDP) method [17,18] is actively used to study the seismicity of Sakhalin Island and adjacent territories (Russia). The SDP method was developed by I. Tikhonov based on the DSDNP equation for analyzing the flux of seismic events. Nevertheless, all the criticisms expressed in Reference [14] apply to this method as well.
In our opinion, both points of view (supporters of both the FFM and ARM methods and their opponents) have a right to exist. Each researcher can use intuition in his scientific research, but he must (1) consolidate the results obtained in objective and reproducible criteria and (2) confirm the receipt of similar results using these criteria on the maximum possible number of examples (wide test control). An analysis of critical comments on the FFM and ARM methods showed that one of their main problems is the instability of the results obtained. We can confirm the seriousness of this problem due to the experience of our own research. The instability of the results, in our opinion, is largely due to the choice of the optimization criterion: the standard least squares method, commonly used by researchers (in particular, in References [11][12][13]), does not provide the stability of approximation modeling and is therefore ineffective. The Revised-ARM method [15,16] does not solve this problem either. Therefore, the passage of the ARM method through a wide test control seems unlikely. As for the Tikhonov method [17,18], in our opinion, it needs objective criteria for the selection of representative earthquakes and the subsequent receipt of the results in automatic mode.
Our approximation algorithms were configured and successfully tested on synthetic catalogs during the second half of the 1990s. However, subsequent extensive testing on real seismic catalogs showed the instability of the results until the generally accepted optimization criterion for the smallest standard deviations was replaced by optimization for the minimum area deviations in the mid-2000s [5]. Further, the problem with the missing assessment system for the predictability of seismic trends got in the way of our research. This evaluation system was developed by us by the mid-2010s [19]. Optimization by area deviations proved inconvenient for predictive estimates. Therefore, it was replaced by optimization based on bicoordinate (mean geometric) deviations, which also provided the necessary stability to the results obtained. As a result, an algorithm was developed to identify seismic trends and assess their predictability (extrapolability). This algorithm has been extensively and successfully tested on real seismic catalogs when they are fully scanned in automatic mode. The initial version of the algorithm was applied to localized volcanic seismicity [19,20]. Then, this variant was adapted to a 3D space and used in the study of the predictability of seismic trends according to the Kamchatka Regional Catalog (KAGSR) [21], seismic catalogs of the US Geological Survey (USGS) [22,23] and the Japan Meteorological Agency (JMA).
The results of the above works show a good predictability of the seismicity trends, including both trends of increasing activity before strong earthquakes and trends of the attenuation of activity after these earthquakes. However, it should be emphasized here that 'the predictability of the seismic trend before a strong earthquake' and 'the forecast of a strong earthquake' are not equivalent concepts, even if a strong earthquake fully corresponds to the extrapolation (forecast) part of the trend. Any earthquake that is part of the seismicity trend forms a deviation from the main trend pattern. The prediction of the time and magnitude of these random fluctuations (located in the band of permissible deviations) is difficult even with good predictability of the trend itself.
Thus, the use of Equation (2) in the prediction of strong earthquakes has natural limitations. Nevertheless, the approximation-extrapolation technique developed on the basis of the DSDNP equation, in our opinion, is a good tool for retrospectively studying the patterns of preparation of strong earthquakes and identifying their hazards in real time. When generalizing the previously obtained results on the flux of seismic energy, it was found [24] that the range of values of the parameters α and k in Equation (2), which determined the predictability of strong earthquakes, in the diagram α-lg k ( Figure A1) is shifted to the left and below the entire area (including conditionally safe) predictability, i.e., to where the figurative points of the most slowly developing and long-term activation processes are located. This makes it possible to differentiate the trends in the activation of the seismic energy flux by the α and k parameters, on the one hand, into conditionally safe (without precedents of termination by strong earthquakes), and, on the other hand, into potentially hazardous ones that deserve close attention due to the precedents (often repeated) of these trend completions with strong earthquakes. This paper describes an algorithm for estimating the hazard of strong earthquakes and illustrates its use for the flow of seismic energy based on the analysis of data from the USGS catalog as of 1 June 2021.
About terminology. Earthquake sequences are divided by the sign of the coefficient k in the approximation-extrapolation trend: k > 0-activation sequences, k = 0-stationary sequences and k < 0-attenuation sequences. Foreshock sequences are activation sequences that end with strong earthquakes. Foreshocks-all earthquakes of the foreshock sequence preceding a strong earthquake. Aftershock sequences are attenuation sequences that begin after a strong earthquake. Aftershocks-all earthquakes of the aftershock sequence after the main shock. In this paper, only activation sequences and (among them) foreshock sequences are considered. In relation to the time of the mainshock, there are [7,25] close (from several hours to several days), short-term (up to 5 to 6 months) and long-term (several years) foreshocks among the foreshocks. We examine all the listed types of foreshocks without exception.

The Model Equation
The solutions of Equation (2) can be expressed in the explicit form: Thus, the solutions of Equation (2) are either directly a linear dependence (k = 0) or are reduced to linear dependences by taking the logarithm of the differences between the parameter values and/or time and the respective asymptotes:

The Optimization and Its Criteria
The linearity of solutions of Equation (2) in ordinary or logarithmic coordinates simplifies optimization (finding the best match to the factual data). Direct optimization by five values (α, k, x 1 , x 1 and t 1 ) is complex and requires large computing resources. However, when using solutions from Equation (2) in linear form (conventional or logarithmic), optimization is reduced to the analysis and comparison of several variants of linear regression. In some cases, optimization may also be required with respect to one or two additional parameters: T a and/or X a . The optimal values (α, k, x 1 , x 1 and t 1 ) for each variant are easily determined analytically from linearity constants (c 0 , c 1 ) and asymptotes (T a and/or X a ). This greatly simplifies the optimization procedure and reduces the requirements for computing resources.
For the stability of the results, it is of great importance to choose an optimization criterion-a quantitative characteristic of the correspondence between the factual data and their approximation model. Therefore, to solve the problems that arise (see their detailed description in Reference [22]), all variants of linear regression are compared in ordinary (nonlogarithmic) coordinates. The bicoordinate rms deviation ∆ xt = {Σ(∆x i ∆t i )/[n(x c − x s )(t c − t s )]} 0.5 is used as an optimization criterion to ensure the stability of the results. Here, (x c − x s ) and (t c − t s ) are the ranges of variations of the factual data for optimization (these ranges normalize the coordinates to a range from 0 to 1), and ∆x i and ∆t i are the deviations of each point of the factual data from the calculated curve along the abscissa and ordinate axes, respectively. In a geometric sense, the bicoordinate deviation corresponds to the side of a square equal in area to a rectangle with sides ∆x i and ∆t i , i.e., the bicoordinate deviation is the geometric mean of these deviations. The bicoordinate root mean square deviation is not the only possible criterion for the stability of the optimization results (optimization by area deviations has been successfully applied before), and moreover, it cannot be argued that this criterion is the best. However, it allows you to get stable results and is adequate for the available computing resources. For greater sensitivity, optimization is performed according to the maximum of the regularity coefficient (the inverse value for bicoordinate deviation): K reg = 1/∆ xt .

The Processing of Seismic Catalogs
A spatial analysis of seismic data was carried out by spherical hypocentral samples with radii of 7.5, 15, 30, 60, 150 and 300 km. For each radius, the sample centers formed a fixed grid over the entire surface of the Earth and in subsurface spaces up to depths of 1000 km. Within this grid, the sample centers are distributed by latitude, longitude and depth, with an offset step that is 1.5 times smaller than the sample radii (i.e., 5, 10, 20, 40, 100 and 200 km, respectively), which provides spatial overlap of the samples. Catalog processing is executed automatically for each radius. Each event in the analyzed catalog is consistently treated as a 'current' event (earthquake). The moment of time of this event is taken as the 'present'. The time preceding this event is considered the 'past', and the subsequent time is considered the 'future'. To analyze the seismicity preceding and following the 'current' event, a spherical sample with the center closest to the hypocenter of the 'current' earthquake is used. Within this sample, an array of data of the studied flow parameter is formed (in our case, the flux of seismic energy, i.e., the total energy of earthquakes).
The main trends of the 'current' seismicity are revealed by test approximations in order to search the factual data for those intervals that demonstrate the best characteristics during optimization. The first test approximation is performed based on the factual data presented by the 'current' earthquake and the 6 preceding ones. In subsequent test approximations, the nearest event from the 'past' is added to the approximated factual data until the first event in the sample is included in their number. All approximations with K reg < 10 are ignored. From all the test approximations, the three best variants are selected: the first one is based on the maximum K reg , and the rest are based on the nearest and main maxima of the K reg /K lin ratio. The first variant is always determined, the rest depending on the presence and combination of current nonlinear trends. Nonlinearity variants allow you to track new development trends that begin (and, therefore, are still poorly expressed) against the background of the main trends.

The Estimation of Extrapolation Predictability
The term 'trend predictability' is defined here as finding the factual data of the 'future' in the band of acceptable errors relative to the calculated curve in its extrapolation part. To estimate the trend predictability, the rms deviation σ of the factual points (t f ,x f ) from the calculated curve along the one normal to it is used. It is calculated on the approximation section of the trend in coordinates normalized to a range from 0 to 1 from the first (t s ,x s ) to the last point (t c ,x c ): Equation (3) is obtained on the basis of elementary geometric constructions (Figure 1), in which the shortest distance from the factual point (t f ,x f ) to the calculated curve is estimated in the first approximation as the height h of a right triangle (t r ,x f )-(t f ,x f )-(t f ,x r ) lowered by the hypotenuse (t r ,x f )-(t f ,x r ) from the opposite vertex, which is the factual point (t f ,x f ): h = ab/c. As a result, Equation (3) is an expression for the root mean square value of these distances. Further, the approximation is extrapolated into the 'future' as long as the distance of each subsequent (predicted) factual point (tp,xp) to the calculated point is in the band of permissible errors ± 3σ, i.e., the ratio is fulfilled: Further, the approximation is extrapolated into the 'future' as long as the distance of each subsequent (predicted) factual point (t p ,x p ) to the calculated point is in the band of permissible errors ± 3σ, i.e., the ratio is fulfilled: The width of the error band here is determined from the statistical rule '3 sigma', according to which, 99.73% of the results fall into such an error band in the case of a normal distribution. Since the ratio (4) uses normalization for the range from 0 to 1 from the first (t s ,x s ) point to the point being tested (t p ,x p ), then the average deviation is pre-recalculated to the same normalization range; that is, σ is pre-recalculated on the approximation section of the trend according to Equation (3) with the replacement of t c and x c by t p and x p .

Quantitative Estimates in Prediction Precedents for Strong Earthquakes
The relative accuracy of the precedent predictions is estimated by the formula: Further, the following classification of the accuracy of the retrospective predictions is used in the work: quantitative estimations at ∆ ≥ 5 (relative error <20%), semi-quantitative estimations at 2 ≤ ∆ < 5 (relative error 20-50%) and qualitative estimations at ∆ < 2 (relative error > 50%). Here, the relative error is the inverse of the relative accuracy.
The predicted nonlinearity of L pn is calculated by the formula: Factual values (x sh_f , t sh_f ) are used in retrospective estimates. When predicting by precedent, instead of them, the calculated values (x sh_i , t sh_i ) for a strong earthquake are used in the Equation (6).
In essence, the predictive nonlinearity of L pn corresponds to the sign of the coefficient k in the DSDNP equation, but not in the integer, except in real terms. For extremely nonlinear activation sequences, parameter predictability dominates over time predictability, so the L pn value is close to 1. As the ratio in the predictability of the trend in terms of parameter and time is leveled, the L pn value decreases, reaching 0 for sequences close to stationary development. A further shift of the ratios in trend predictability leads to an increasing increase in time predictability compared to parameter predictability, which corresponds to attenuation sequences. L pn values tend to be −1 for extremely nonlinear attenuation sequences, in which the time predictability significantly exceeds the parameter predictability. Thus, for the activation sequences considered by us (foreshock sequences in case of the completion of activation by a strong earthquake), the L pn value varies from 0 to 1. As will be shown below on the example of retro-forecasts, the predictive nonlinearity of the L pn determines the asymmetry of the band of permissible deviations and thereby reflects the stochasticity/determinism of the position (fluctuations) of the main thrust of the predicted trend.
The approximation-extrapolation coefficient A shows how many times the general trend of activation exceeds the approximation part included in it. This coefficient is calculated in the coordinates of the full trend, normalized to a range from 0 to 1 and is used to estimate the limit of possible extrapolations. In general, the value of A can be determined by the ratio of the lengths of the corresponding sections of the calculated curve; however, in the case of step cumulative characteristics of the seismic flow (energy, Benioff strain or the number of events), the formula is simpler and quite effective: When predicting by a precedent, instead of the factual values (x sh_f ,t sh_f ) in Equation (7), calculated values (x sh_i ,t sh_i ) for a strong earthquake are also used.

The Real-Time Predictive Estimation Algorithm
The use of the precedents from retro-forecast data is based on the possibility of linking the time t sh_f of the main earthquake to the rate x sh of change of the parameter at the point of the extrapolation curve closest to the main earthquake. In retrospective studies, this point is determined in parameter-time coordinates, normalized to a range from 0 to 1 according to the factual values from point (t s ,x s ) to point (t sh_f ,x sh_f ). If the factual point of a strong earthquake is located within the region of existence of the extrapolation curve (t sh_f < T a at α > 1 and x sh_f < X a at α > 2), then the distances to the extrapolation curve from the factual point of a strong earthquake are determined by the abscissa (a) and ordinate (b): Geometrically, these distances correspond to the cathetus of a right triangle with a vertex at the point (t sh_f ,x sh_f ) (see Figure 1, assuming that point . Then, the position of point (t sh_i ,x sh_i ) is approximately defined as the intersection of the hypotenuse perpendicular to it from the vertex of the right angle. Based on the proportions existing in a right triangle, we determined the coordinate values for this point: Using these coordinates, the rate x sh is calculated: If one of the coordinates of the factual point of a strong earthquake goes beyond the area of existence of the extrapolation curve, then the nearest point of the calculated curve is determined by the second coordinate (abscissa or ordinate, respectively): x sh_i = x(t sh_f ) and t sh_i = t sh_f or t sh_i = t(x sh_f ) and x sh_i = x sh_f . After that, the rate x sh is calculated according to the above formulas. If both coordinates go beyond the limits of the existence of the extrapolation curve (t sh_f ≥ T a and x sh_f ≥ X a at α > 2), the asymptotic point (Ta,Xa) turns out to be the closest to the earthquake; therefore, an extremely large value is conditionally assumed as the rate x sh .
The regularities of the precedent foreshock preparation of strong (M7+) earthquakes allow identifying similar trends in the seismic activity increase. The prediction estimates of these trend hazards assume the use of the data of precedent retrospective predictions and are based on the possibility of binding the time of the main earthquake to the rate of change in the x sh parameter at the point of the extrapolation curve closest to the main earthquake. For this purpose, a database of precedent retrospective predictions is created. This database includes information about the hypocentric radius of the sample, α, k and x sh , as well as information about this strong precedent earthquake (magnitude, time and place).
The essence of the precedent-extrapolation assessment of a seismic hazard is to identify potentially hazardous spatial zones where there is such an increase in seismic activity that has historical precedents of ending with a strong earthquake. The quantitative aspect of the forecast corresponds to the calculation of the possible time of a similar earthquake based on the database of its previous forecasts. For the activity in each spatial zone, analogs are possible in the preparation of several strong earthquake precedents. In this case, calculations of the possible time of a similar earthquake in the considered spatial region are performed for each of the precedents.
The prediction extrapolations algorithm provides for the following operations: 1. Search in the catalog for unfinished (not come out of the band of admissible errors at the time of the catalog end) prediction definitions, in which a tendency towards an increase in seismic activity is found.

2.
Comparison of the type of an activity increase with the database of precedent retrospective predictions alongside the sample radius, exponent α (with an accuracy of 0.01) and coefficient k (when comparing lg k with an accuracy of 0.1). All cases of an activity increase that have no analogs in the database of precedent retrospective predictions are ignored.

3.
For each precedent retrospective prediction based on the rate of change in the x sh parameter, the time t sh and the value of the x sh parameter are calculated, at which, for a given type of an activity increase, its level will correspond to the level of the precedent shock: Then, the values of L pn , A and σ t are estimated. The definitions, for which the approximation and extrapolation ratio A exceed the maximum value of A max for retro-forecast precedents, are considered as having no precedents. 4.
The revealed precedent retrospective predictions are grouped by the main shock. For each group, the calculated average time of the strong earthquake and its standard deviation σ sh , as well as the average values of L pn , A and σ t , are determined.

Initial Data
As the initial data for a precedent-based extrapolation estimate of the M8+ earthquake hazard, this work uses the worldwide United States Geological Survey (USGS) earthquake catalog [26], which includes data from 1900 to the end of May 2021. By this time, the catalog contains data on 3,889,120 earthquakes with a magnitude M = −1.0 . . . +9.5 at its modal value 1.2. The results of processing the Japan Meteorological Agency (JMA, 1919-present) earthquake catalog [27] and the Earthquakes Catalogue for Kamchatka and the Commander Islands (ECKCI, 1962-present) [28] were also used to analyze the foreshock predictability and form a database of precedent retro-forecasts. By the end of August 2018, the JMA catalog contained data on 3,498,071 earthquakes with a magnitude M = −1.6 . . . + 9.0 at its modal value 0.6. By the end of March 2021, the ECKCI catalog contained data on 428,225 earthquakes with a magnitude M = -2.7 . . . + 8.1 at its modal value 0.5. The seismic energy flux E is considered as the x parameter, i.e., a cumulative amount of earthquakes energy. In this case, the energy of a single earthquake is estimated according to the existing relationship between its magnitude M and the energy class K [29]: K = lg E = 1.5 M + 4.8. The relationship between M and energy E is valid for E in Joules. When processing catalogs, all earthquakes are used, for which there are energy characteristics (M or K), regardless of their magnitude. Our research does not require the completeness of seismic catalogs and does not depend on it. We study the flow of seismic energy as it is (where it is recorded and how it is recorded). This is of great importance for this work, since the USGS world catalog is made up of many regional catalogs and, therefore, is extremely heterogeneous in completeness both in space and in time.

Analysis of Retro-Precedents of Foreshock Forecasting
As a result of processing the ECKCI, JMA and USGS data, it was found that the predicted activation of the seismic energy flow precedes 721 out of 2082 earthquakes with M ≥ 7 (8 out of 17 ECKCI, 123 out of 676 JMA and 590 out of 1389 USGS). It was found that 116,700 (1553 ECKCI, 87,474 JMA and 27,673 USGS) precedents of these earthquakes fall into the band of acceptable errors when retrospectively extrapolating foreshock trends into the 'future'. In particular, one of the most powerful earthquakes of this numberthe Tohoku earthquake on 11 March 2011 (M = 9.0 JMA and M = 9.1 USGS)-had 189 (150 JMA and 39 USGS) precedents of falling into the band of retro-forecast extrapolations. A significant number of strong earthquakes that do not have a predictable preparation, in our opinion, is due to the absence or insufficient level of registration of seismicity in the period preceding these earthquakes.

The Relationship between the Predicted Nonlinearity of the Seismic Energy Flow and the Determinism of Strong Earthquakes
Examples of retrospective prediction extrapolations with a sufficiently high relative accuracy ∆ and significant differences in predicted nonlinearity L pn are shown in Figure A2. Table A1 contains data on the main shock, sample and some characteristics of the foreshock trend corresponding to these examples. It can be seen in Figure A2 that, for a low predicted nonlinearity L pn (graph in Figure A2a), the position of the strong earthquake step in the energy flux is weakly determined by the band of admissible deviations. A strong earthquake in this band could have occurred both much earlier and much later than its factual time, i.e., the stochasticity of a strong shock time increases with decreasing the predicted nonlinearity. The latter is typical for the sequences of activation close to stationary development. On the contrary, with an increase in the predicted nonlinearity (graphs in Figure A2b-f), the asymmetry of the band of admissible deviations by the parameter and time increases. As a result, the step of a strong earthquake, requiring a large admissible deviation by the parameter, is more and more rigidly determined by a reduction of the time interval, in which a strong shock can occur. This variability of stochasticity/determinacy of the process, depending on the level of predicted nonlinearity L pn , has to be taken into account in predictive extrapolation calculations of the time of strong earthquakes.

Statistics of Lead Time, Relative Accuracy and Approximation-Extrapolation Ratio in Retro-Forecasts of Strong Earthquakes
In the distribution of retro-forecast precedents according to their lead time (Table A2), first of all, the high proportion of retro-forecasts separated from the main shock by large time intervals attracts attention. Almost 20% of predictions have a lead time of 3 to 10-15 years or more (f.e., graphs in Figure A2b,d). Almost 43% of retro-forecasts have a lead time of 3 months to 3 years (f.e., graphs in Figure A2c,e).
Retro-forecasts of strong earthquakes have, on average, a semi-quantitative level of relative accuracy (∆ average = 3.26, Table A3). However, a high lead time in the range of 3 years or more leads to an increase in the average relative accuracy of forecasting to a quantitative level (∆ average = 5.69 for (t sh_f − t c ) > 1000 days; see Table A3). There is also a general tendency to further increase the average relative accuracy for the most deterministic earthquakes (with the highest level of predictive nonlinearity, L pn ).
With a decrease in the lead time, the average relative accuracy of retro-forecasts decreases until the transition to a qualitative level at (t sh_f − t c ) < 10 days. Nevertheless, against the general background of this decline, retro-forecasts with a quantitative level of accuracy are noted at all time ranges. Thus, the distribution of retro-forecast definitions by their lead time and accuracy indicates the possibility of, at least, medium long-term quantitative predictions of strong earthquakes with the prospect of quantitative forecasting at all ranges of lead time.
When switching to predictive extrapolations in real time, it is necessary to take into account that almost all trends in the activation of the energy flow with an unlimited extrapolation can reach a level sufficient for a strong earthquake. However, the real possibilities of predictive extrapolation are limited. As can be seen in Table A4, the approximationextrapolation ratio in foreshock retro-forecasts does not exceed the value of A max = 2.50. It follows from this that, for sequences with weakly expressed predictive nonlinearity (and, accordingly, approximately proportional increments in the predictive part of the parameter and time), the time extrapolation cannot exceed the duration of the approximation component of the trend by more than 1.5 times. For extremely nonlinear sequences, in which almost all the increments of the parameter fall on the forecast part of the trend, the extrapolation possibilities are reduced in time and are limited to 20% of the duration of the approximation component of the foreshock trend. In addition, the data in Table A4 indicate that both the maximum and average values of the approximation-extrapolation ratio A tend to increase with the increasing lead time and predictive nonlinearity of retro-forecasts.

Precedent-Based Extrapolation Estimation of Seismic Hazard in Retrospect
When assessing the seismic hazard, data from M8+ earthquake retro-forecasts are used (Table 1), having a sufficiently high level of predictive nonlinearity L pn ≥ 0.9. Seven strong earthquakes used as precedents according to the USGS catalog have 597 precedent forecasts. This corresponds to 20% of the number of precedent earthquakes and 79.5% of the number of precedent forecasts based on the results of processing the USGS catalog.  As can be seen in Figures A3, A5, A7, A9, A11, A13 and A15, in columns (a), the predictability of each of these earthquakes is noted simultaneously in several zones, forming a spatial cluster. However, similar activity is observed not only in the earthquake preparation zone, where it is maximal, but also outside it (see column (b) in the figures listed above). Therefore, the cluster of activity is identified according to the zone with the largest number of unfinished potentially hazardous extrapolations with similar precedent activity. Together with the zone of maximum activity, all the closest zones with similar activity that intersect with the maximum zone are combined into a cluster. The intersection of two cluster zones is understood here as finding the hypocenters of these zones from each other at a distance less than the radius of the biggest zone.
The cluster center (and hypocenter of a possible earthquake) is calculated as the weighted average (by the number of potentially hazardous extrapolations) center of all cluster zones. Similarly, the boundaries of the cluster ellipsoid in latitude, longitude and depth are calculated based on the hypocentral radii of the zones included in the cluster. The results of the allocation of cluster zones and the determination of the cluster ellipsoid and the estimated position of the hypocenter of a possible earthquake are shown in column (c) in the figures listed above. The errors in determining the hypocenter D err (the distance between their factual and calculated positions) are given in Tables A5-A11. In all cases, the actual hypocenter falls into the calculated ellipsoid of cluster activity. This is the expected result corresponding to the successful testing of cluster algorithms for earthquakes on the example of their own precedent forecasts. However, in a number of cases, when testing the predictability according to the data of one earthquake, similar foreshock preparation and cluster predictability for other earthquakes were found. In particular, when testing predictability on the example of the earthquake of 16 November 2000 (M8.0), a cluster of predictability with a similar preparation for the earthquake of 1 April 2007 (M8.1) was additionally detected in all hazard estimates (see Figure A3); when testing on the example of the earthquake of 6 February 2013 (M8.0), the predictability clusters of three more earthquakes (9 January 2001, M7.1; April 1, 2007, M8.1 and 10 August 2010, M7.3) were also additionally detected (see Figure A11). These facts can be considered as an additional justification for the possibility of applying precedent-based extrapolation estimates in real-time.
In a number of cases, cluster definitions of the time of a possible shock demonstrate the quantitative accuracy of the forecast: the relative error of the estimated time of the earthquake of 11 March 2011 (M9.1) is 2.58% in the estimate of 1 January 2009 (see Table A8); for the earthquake of 6 February 2013 (M8.0)-2-4% in the estimate of 2007-2010 and 2012 (see Table A9), etc. However, often, these accurate forecasts are side by side or interspersed with semi-quantitative and qualitative estimates. Moreover, all forecast retro-forecast estimates for the time of the earthquake of 13 January 2007 (M8.1) are in the 'past', although the earthquake itself occurs in the 'future' (see Table A6). This is the correct location of the estimated time in the band of permissible deviations. Since the relative accuracy of the precedent calculations is unstable, and its real value can be estimated only in the future based on the actual result so far, we are forced to focus on the worst level of accuracy, i.e., on the qualitative relative accuracy of precedent estimates of the time of a possible earthquake. Therefore, we can consider the fact of the formation of a cluster of potentially hazardous activity only as a precursor of a possible strong earthquake. Accordingly, all retro-forecasts of the time of a possible strong earthquake with a quantitative level of accuracy at this stage of research can be considered only as random coincidences.

Precedent-Based Extrapolation Estimation of Seismic Hazard in Real-Time
We made four global hazard estimates based on the USGS catalog ( Figure A17): as of 1 January 2020, as of 1 July 2020, as of 1 January 2021 and as of 1 June 2021. The first three were intended for control testing and tracking changes of global hazards, and the last one was a real-time estimate at the end of the processed catalog. The test period covered 1 year and 5 months. During this period, 17 M7+ earthquakes occurred in the world, including one M8+ earthquake. Of this number, seven (41%) M7+ earthquakes (including a single M8+ earthquake) occurred in the identified clusters of hazardous activity and within the error range of the predicted trends (full predictability, Table A12; see also the black circles with a crosshair in Figure A17). Another seven (41%) M7+ earthquakes have partial predictability. They occurred in the identified clusters of activity (see the green circles in Figure A17 We believe that the ratios of 41%, 41% and 8% between full and partial predictability and its absence is a sufficient basis for starting to test precedent-based extrapolation estimations of seismic hazards in real time. The main clusters of hazardous activity as of 1 June 2021 are listed in Table A13. The table includes precedent clusters for which more than 20 potentially hazardous extrapolations have been found. That is, Table A13 contains information about the brightest clusters of precedent activity from those shown in Figure A17d.We were limited by the scope of the article and not able to comment in detail on the data in this table. Therefore, it is given as an example. Some features of the results contained in this table will be discussed in the next section.

Discussion
The results obtained confirm the conclusions previously made [21][22][23] about the good predictability of the seismic energy flow when using the DSDNP equation as a model and the prospects of using this equation for the prediction of strong earthquakes. However, here, in order to avoid misunderstandings, it is necessary to specify the terminology used, as well as to return to the topic of natural limitations of the DSDNP equation (and its analogs) in the prediction of strong earthquakes touched upon in the introduction. A forecast is usually understood as determining the location, strength (magnitude), time and probability of occurrence of future earthquakes. An attempt to detail this definition is contained in the overview report by the International Commission of Earthquake Forecasting for Civil Protection [30] (p. 319): "A prediction is defined as a deterministic statement that a future earthquake will or will not occur in a particular geographic region, time window, and magnitude range, whereas a forecast gives a probability (greater than zero but less than one) that such an event will occur." The DSDNP equation, like its analogs, has natural limitations that take it beyond the above definition of forecasts, both unambiguous and probabilistic. Firstly, using this equation, it is possible to successfully identify and predict trends in seismic activity but not fluctuation deviations from these trends. The trend forecast only allows you to determine the band of permissible deviations for it. Secondly, there is a 'magnitude of uncertainty' in the forecasts of seismic trends in the energy flux: the same increment of seismic energy can be realized through a singular M8 earthquake, 32 M7 earthquakes or one thousand M6 earthquakes. All these alternatives are equal from the point of view of trend predictability, provided they are in the band of permissible deviations. It is impossible to determine in advance exactly how the predicted potential of the seismic energy flow is realized (in the form of a single strong earthquake or a swarm of moderate ones). Thirdly, the use of unambiguous and probabilistic methods is not applicable to the predictability of the trend itself. From the point of view of the dynamics of self-developing natural processes [4], the formation of activation trends in accordance with Equations (1) and (2) is due to the system's exit from the state of equilibrium (of stationary development), which has become unstable, and the subsequent transition of the system to a new state of equilibrium (of stationary development). The change of mode from activation to attenuation in each specific case is determined by the internal state of the system, which cannot be estimated either by unambiguous or probabilistic methods. It is only possible to track potentially hazardous trends, assessing the increase in possible threats or fixing the termination, in fact.
The calculation of the precedent time by Equation (8) is not a real forecast. It is only the formal determination of a reference point in the increasing flow of seismic energy, in the vicinity (tolerance band) of which, in the case of a similar development of the process in accordance with Equation (2), a strong earthquake was once already registered. This allowed us to conclude that this process of increasing activity is potentially hazardous and to assess the possible time frame of this hazard. However, the probability of repeating the characteristics of a precedent earthquake in this process is negligible. If a strong earthquake occurs in a potentially hazardous cycle of increasing activity, it will create its own precedent with its magnitude, with its deviations (although permissible) from the calculated trends in parameter and time and, therefore, with its calculated level of activity at the time of the earthquake.
The described method is limited by the 'non-repeatability' of precedent earthquakes, the inability to determine the magnitude range for the future mainshock and the unpredictable completion of hazardous trends (often long before hazardous levels of activity). All this takes the described method beyond both unambiguous and probabilistic earthquake forecasts. Therefore, the only purpose of this method is to identify areas of potentially hazardous increases in the flux of seismic energy and their subsequent monitoring. In fact, precedent-based extrapolation estimates can be considered as a preparatory stage for future real earthquake forecasts. Nevertheless, it may seem that precedent-based extrapolation estimates fall in the category of seismic pattern methods. However, as we have already indicated in the introduction, the forecast of seismic trends cannot be correctly used to predict the time and magnitude of random deviations from these trends (actually the mainshocks). Equating our method with earthquake prediction methods from the category of seismic patterns in order to compare their effectiveness, in our opinion, is doubly wrong.
That is why we do not consider it possible to compare our precedent-based extrapolation estimates of the danger of seismic trends with existing earthquake forecasting methods [30].
Maps of the distribution of precedent clusters are convenient for monitoring spatial hazards (see Figure A17). The 'reddening' of the cluster ellipsoid shows an increase in the number of potentially hazardous trends in its composition. In particular, on the maps of Figure A17, it can be seen that, after two M7+ earthquakes, the hazard of stronger earthquakes (M8+) in the area of the Aleutian Islands has sharply increased. A decrease in 'redness' indicates a decrease in hazards, i.e., a decrease in the number of potentially hazardous trends due to exceeding the limits of permissible deviations or the limit of possible extrapolations (A > A max ).
Monitoring the hazard over time is complicated by the qualitative level of accuracy of forecast extrapolations. Updating the database with additional information on corrections for differences between the actual and calculated values of the parameter and time at the time of the earthquake would significantly increase the accuracy of determining the earthquake time when testing on its own predictive precedents (see Tables A5-A11). However, in general, the situation with the accuracy of time estimates would most likely not have changed, since the low accuracy of extrapolating the flow of seismic energy is ultimately controlled by a large spread of actual data for this parameter. Benioff strains show a more ordered dependence on time. Therefore, their studying looks more promising in terms of the accuracy of the forecast in time.
In the estimates of the precedent time, there is another problem that remains within the framework of this article not only unresolved but not even disclosed. This is the problem of the multivariance of precedent trends in the computational cluster. As an example, it is enough to pay attention to large σ sh values, especially against the background of low σ t values. In particular, for cluster 17, the standard deviation of the estimated earthquake time (σ sh ) is 4.5 years, with only a daily expected average deviation in time of the actual data from the calculated curve (σ t ). This indicates the presence of several alternatives (or their 'fans') for the further development of the hazard with significant discrepancies between them. The number of clusters with similar large σ sh values is a significant part of their total number, which requires additional research to solve this problem.
Concluding the discussion, it should be noted that, in the current state, the methodology of the precedent-extrapolation estimates is a primary (research) variant that needs further adaptation, debugging, identification and the elimination of minor shortcomings and (possibly) errors. Above, we showed the possibility of its practical use in predictive research both in retrospect and in real time. However, a long process of testing, improvements and debugging awaits this technique ahead.   [24]. M is the magnitude of predicted strong earthquakes. Figure A1 is constructed in Reference [24] based on the analysis of over 30 million identified best variants of 'current' seismicity trends (see Section 2.3). The tendency of increasing the activity in each sequence is checked for extrapolability (predictability). A weight W is assigned to each extrapolation. W = (W1 × W2 × W3) 1/3 is the geometric mean of three independent weight characteristics.  [24]. M is the magnitude of predicted strong earthquakes. Figure A1 is constructed in Reference [24] based on the analysis of over 30 million identified best variants of 'current' seismicity trends (see Section 2.3). The tendency of increasing the activity in each sequence is checked for extrapolability (predictability). A weight W is assigned to each extrapolation. W = (W 1 × W 2 × W 3 ) 1/3 is the geometric mean of three independent weight characteristics. The first weight component characterizes the forecast range: W 1 = [(x p − x s )/(x c − x s ) + (t p − t s )/(t c − t s )]/2 − 1. The second weighting component characterizes the nonlinearity of the forecast: W 2 = exp|ln[(x p − x s )/(x c − x s ))/((t p − t s )/(t c − t s )]| − 1. The third weighting component characterizes the quality of the forecast, i.e., the correspondence of the factual data to the calculated approximation-extrapolation curve: W 3 = 0.1/σ − 1; all predictive extrapolations with σ > 0.1 are considered low-quality and rejected. The analysis of the weight distribution of predictive extrapolations by the parameters of the DSDNP equation α and k is carried out with rounding by α with an accuracy of 0.01 and by lg k − up to 0.1. The weights of forecast extrapolations having the same rounded values α and lg k are summed and divided by the total weight of all the forecast extrapolations. Further, these specific weights for all available combinations of α and lg k are sorted in descending order and then summed from smaller values to larger ones. Thus, we obtained a cumulative characteristic of the distribution of a specific weight of w p predictive extrapolations, depending on the combinations of α and lg k: w p (α, lg k) = Σ α, lg k (ΣW(α, lg k)/ΣW). w p has values close to 1 at the points of the maximum specific gravity of the forecast extrapolations and close to 0 for combinations of α and lg k that are insignificant from the point of view of the forecast.
The presence of a strong M5+ earthquake was detected in the extrapolation (forecast) part of 315,000 activization sequences. Combinations of the parameters corresponding to these foreshock sequences are shown in Figure A1b. Combinations of the parameters corresponding to 45,636 forecasts for M7+ earthquakes are shown in Figure A1c. of wp predictive extrapolations, depending on the combinations of α and lg k: wp(α, lg k) = Σα, lg k(ΣW(α, lg k)/ΣW). wp has values close to 1 at the points of the maximum specific gravity of the forecast extrapolations and close to 0 for combinations of α and lg k that are insignificant from the point of view of the forecast.
The presence of a strong M5+ earthquake was detected in the extrapolation (forecast) part of 315,000 activization sequences. Combinations of the parameters corresponding to these foreshock sequences are shown in Figure A1b. Combinations of the parameters corresponding to 45,636 forecasts for M7+ earthquakes are shown in Figure A1c.  Table A1. The intersection of the vertical and horizontal dotted lines on the graphs corresponds to the ''current' values of time and parameter, there is the 'past' to the left and below this intersection and the 'future' to the right and above.
The graphs in Figure A2 only illustrate the variability of the stochasticity/determinacy of the process, depending on the level of predicted nonlinearity Lpn, but do not prove this statement. Nevertheless, the graphs of all other (27,673-USGS data, 116,420-all catalogs) precedents of the predictability of strong earthquakes demonstrate a similar relationship between Lpn and the band of permissible deviations. All these results (data for graphs) were obtained automatically during catalog processing (see Section 2.3). That is, we did not choose the forecast moments for the graphs in Figure A2. When processing the catalog for each radius of formation of the hypocentral samples, we totally checked all the catalog events. In each case, the event was considered as the 'current time' for test approx-  Table A1. The intersection of the vertical and horizontal dotted lines on the graphs corresponds to the 'current' values of time and parameter, there is the 'past' to the left and below this intersection and the 'future' to the right and above.
The graphs in Figure A2 only illustrate the variability of the stochasticity/determinacy of the process, depending on the level of predicted nonlinearity L pn , but do not prove this statement. Nevertheless, the graphs of all other (27,673-USGS data, 116,420-all catalogs) precedents of the predictability of strong earthquakes demonstrate a similar relationship between L pn and the band of permissible deviations. All these results (data for graphs) were obtained automatically during catalog processing (see Section 2.3). That is, we did not choose the forecast moments for the graphs in Figure A2. When processing the catalog for each radius of formation of the hypocentral samples, we totally checked all the catalog events. In each case, the event was considered as the 'current time' for test approximations and subsequent predictive extrapolations. Among these extrapolations, trends corresponding to successful forecasts of strong earthquakes were automatically detected, including those shown in Figure A2. This made possible and objective the subsequent analysis of the predictability of strong earthquakes. In total, the analysis of the USGS catalog revealed~18.8 million 'current' activation trends. For this purpose, two to three decimal orders of magnitude more test approximations were performed, the results of which were not logged to save computing resources. At the same time, only in 27,673 cases, among the main "current" activation trends, the fact that M7+ earthquakes hit the band of permissible deviations of the extrapolation (prediction) part of the trend was established. We do not consider it necessary to evaluate this result as 'good' or 'bad'. We consider it as an objective reality that can and should be used to identify hazardous activation trends (as it is shown, for example, in Figure A1). The examples for Figure A2 were selected (also automatically) from all other cases of the predictability of strong earthquakes only because each of them has the maximum predictability (by the magnitude of the ratio A) among the forecasts of a certain level of nonlinearity L pn : Figure A2a-0 < L pn ≤ 0.5, Figure A2b-0.5 < L pn ≤ 0.8, Figure A2c-0.9 < L pn ≤ 0.95, Figure A2d Figure A2 can be considered as additional illustrations to Tables A2-A4. Additionally, when carefully examining the graphs in Figure A2. it can be noticed that the line of actual data in the forecast part of the graphs before a strong earthquake, as a rule, goes beyond the band of permissible deviations and returns to it at the time of the earthquake. This effect is due to the extrapolation estimation algorithm used in the work (see Section 2.4), according to which each of the actual points in the forecast part is consistently monitored for falling into the band of permissible deviations according to the ratio (4). In this case, a variable (increasing) range of rationing is used. Therefore, points that have previously passed the control by the ratio (4) may be outside the tolerance band of subsequent points. The tolerance band for the upper (terminal) point of a strong earthquake is shown in the graphs of Figure A2. Table A1. Data on the main shock, sample and some characteristics of the foreshock trend for the examples of retrospective prediction extrapolations given in Figure A2.   (the more, the redder) of potentially hazardous unfinished extrapolations found in the cluster. A cluster of predictability of another earthquake (M8.1, 1 April 2007) with a similar foreshock preparation was also found. The size of the maps is 2222 × 2222 km (±10° latitude from the epicenter of the earthquake). The smaller map to the right of each geographical map is a north-south crosssection with a depth of 750 km.       Figure A5, column 'a') number of successful predictive extrapolations (see Figure A4 for explanations).   Figure A5, column 'a') number of successful predictive extrapolations (see Figure A4 for explanations).     Figure A9, column 'a') number of successful predictive extrapolations (see Figure A4 for explanations).   Figure A9, column 'a') number of successful predictive extrapolations (see Figure A4 for explanations).      Figure A13. Zones of factual predictability for the earthquake of 1 April 2014 (a), identified zones with similar development according to the parameters of the DSDNP equation (b) and a cluster of zones of the greatest activity (c) used to calculate the time and place of a precedent earthquake (see Figure A3 for explanations). zones of the greatest activity (c) used to calculate the time and place of a precedent earthquake (see Figure A3 for explanations). Figure A14. Retro predictability of the earthquake on 1 April 2014 in the zone with the highest (at the estimate date; see Figure A13, column 'a') number of successful predictive extrapolations (see Figure A4 for explanations).    Figure A3 for explanations).