Transfer Entropy of West Pacific Earthquakes to Inner Van Allen Belt Electron Bursts

Lithosphere-ionosphere non-linear interactions create a complex system where links between different phenomena can remain hidden. The statistical correlation between West Pacific strong earthquakes and high-energy electron bursts escaping trapped conditions was demonstrated in past works. Here, it is investigated from the point of view of information. Starting from the conditional probability statistical model, which was deduced from the correlation, the Shannon entropy, the joint entropy, and the conditional entropy are calculated. Time-delayed mutual information and transfer entropy have also been calculated analytically here for binary events: by including correlations between consecutive earthquake events, and between consecutive earthquakes and electron bursts. These quantities have been evaluated for the complex dynamical system of lithosphere-ionosphere; although the expressions calculated by probabilities resulted in being valid for each pair of binary events. Peaks occurred for the same time delay as in the correlations, Δt = 1.5–3.5 h, and as well as for a new time delay, Δt = −58.5–−56.5 h, for the transfer entropy; this last is linked to EQ self-correlations from the analysis. Even if the low number of self-correlated EQs makes this second peak insignificant in this case, it is of interest to separate the non-linear contribution of the transfer entropy of binary events in the study of a complex system.


Introduction
Numerous experiments have followed the discovery of the Van Allen belts [1] to determine safe conditions for near-Earth space exploration [2]. Given that sudden fluxes of high-energy particles in the South Atlantic Anomaly region constitute a significant hazard for satellite outages [3], they have been thoroughly studied. Afterwards, sudden variations in high-energy charged particle fluxes near the South Atlantic Anomaly were for the first time associated with seismic activity by Voronov et al. [4]. Subsequently, flux variations of charged particles associated with strong earthquakes (EQs) were detected using the Intercosmos-24 satellite [5,6]. These variations were described as precipitating particles that escaped the trapped conditions of the geomagnetic field. Such sudden fluxes have been statistically analyzed in relation to the seismic activity in various near-Earth space experiments, such as the MIR orbital station, the METEOR-3, GAMMA, and SAMPEX satellites [7]. These fluxes have shown particle burst correlations with 2-5 h ahead on strong EQs. Moreso, the SAMPEX database has been studied, revealing a 3-4 h correlation with precipitating high-energy electrons, anticipating strong shallow mainshocks [8]. All of these studies considered up to only a few months of recordings. The NOAA-15 satellite high-energy particle database, collecting more than 16 years of counting rates, has been studied: electron burst (EB) events resulted at 2-3 h in advance of strong EQ events [9]. This correlation resulted as statistically significant for mainshocks of M > 6 striking the Indonesian and Philippine areas, occurring at a depth of less than 200 km. The possible disturbances above the EQ epicenters due to particle drift have also been estimated to be in the range of 4-6.5 h before strong seismic events, suggesting a possible causal connection [10].

Materials and Methods
This study was based upon the large data set of particle counting rates, EQs, and geomagnetic indexes that are deposited in publicly available databases. A well-established method [9][10][11][12] permits to define EBs from the 30-100 keV electron counting rate database during geomagnetically quiet days defined by geomagnetic indexes. Then, EBs are correlated to EQs running on 16.5 years of data by filling histograms for intervals of positive and negative delays between each other. For electrons, the medium energy proton and electron detector onboard the NOAA-15 polar satellite [30] collected counting rates from 1998: http://www.ngdc.noaa.gov/stp/satellite/poes/dataaccess.html (accessed on 28 February 2022). Being that the electron counting rates had been contaminated by protons [31] from the lower energy range, corrections were carried out using software downloaded from the Virtual Radiation Belt Observatory: http://virbo.org/POES#Processing (accessed on 28 February 2022). Regarding the geomagnetic field, it was re-evaluated together with L-shells on the NOAA-15 orbit using the International Geomagnetic Reference Field (IGRF-12) model [32], downloaded at: http://www.ngdc.noaa.gov/IAGA/vmod/igrf. html (accessed on 28 February 2022). Geomagnetic Ap indexes and Dst variations were downloaded at the links https://www.ngdc.noaa.gov/geomag/data.shtml (accessed on 28 February 2022) and http://wdc.kugi.kyoto-u.ac.jp/dst_final/index.html (accessed on 28 February 2022), respectively. The escaping conditions of trapped electrons were also determined with respect to the geomagnetic field by calculating their minimum mirror point altitudes through the UNILIB libraries [33], which were downloaded at https://www.mag-unilib.eu (accessed on 28 February 2022). Finally, EQ events were downloaded at https://earthquake.usgs.gov/earthquakes/search/ (accessed on 28 February 2022), and the database was declustered using CLUSTER 2000 software [34], downloadable at https://www.usgs.gov/media/images/cluster2000 (accessed on 28 February 2022).
The only electron flux detected by the NOAA-15, which was the difference between the channel 30 keV-2.5 MeV and the channel 100 keV-2.5 MeV coming from the zenith, gave positive results for a correlation with strong EQs [9]. Electron fluxes of greater energy and coming from other directions, as well as any proton fluxes detected by NOAA, produced no detectable correlations with strong EQs. After having been corrected for proton contamination, electrons were verified to have mirror altitudes less than 100 km along the drift period, which meant being disturbed by the trapped conditions in the inner Van Allen belts. The dynamics of electrons were described using adiabatic invariants retrieved using geomagnetic quantities and represented in a 4-dimensional matrix of time, L-shell, pitch angle, and geomagnetic field. As electron loss is principally induced by solar activity, time intervals when Ap and Dst indexes overcame appropriate thresholds were excluded. Thus, an electron counting rate (ecr) fluctuation was of a statistical origin with less than 1% probability, when the condition Poisson(ecr) < 0.01 was satisfied. This was considered as an EB influenced by the Earth's surface. Furthermore, EBs detected along the same semi-orbit were considered as one EB. A correlation between EQs and EBs has been recently calculated so as to unambiguously define EBs. It was obtained by including only an EB L-shell in a restricted interval [12]. A peak of correlation appeared when considering EQs at a depth of less than 200 km, located in a large portion of the Earth's crust in the West Pacific, with a minimum magnitude of 6. The peak occurred at the ∆t = 1.5−3.5 h interval. Being the conditional probability distribution of an EQ occurrence given the observation of an EB obtained by: P(EQ|EB) = G ∆t P(EQ) = P(EQ) + corr(EQ,EB){P(EQ)[1 − P(EQ)][1 − P(EB)]/P(EB)} 1 2 (1) The corresponding peak in conditional probability (1) of the correlation peak is shown in Figure 1. It deviates significantly from the average probability of an EQ, the unconditional P(EQ). Information-theoretic measures are based on probability distributions; hence, they can be calculated independently from the model and are expected to be capable of capturing non-linear interactions [35]. The information in the information theory is quantified by Shannon entropy [14], and for a binary system, is defined as: The logarithm of base 2 was chosen because binary events E were considered, and the unit of information was in bits. This measures the uncertainty associated with E, and H(E) is maximum if 0 and 1 are both equally probable for E. Therefore, H(E) is zero when only 0 or 1 is more likely to occur.
is the joint entropy [14], and P(x,y) is the joint probability distribution. This describes the amount of uncertainty that characterizes both sets of events, E1 and E2. On the other hand the conditional entropy H(E1|E2) [14] H(E1|E2) = −ΣΣ{0,1} P(x,y) log2 P(x|y) measures the uncertainty about the event E1, which remains after the other event E2 is manifested. In analogy, P(x|y) is the conditional probability. H(E1|E2) = H(E1) when the knowledge of the event E2 does not reduce the uncertainty on the event E1. Moreover, in the case of dependency between events, each one contains information on the other. To quantify the amount of information that the event E2 has on E1, mutual information [14] is defined as: which is a symmetric distribution with I(E1;E2) = I(E2;E1), which is not able to detect the direction of the information flux between the events. An asymmetric measure, named time-delayed mutual information [36], can be obtained if an investigator is interested in causality, by introducing a time-lag parameter Δt between events E1 and E2 as: If Δt = tx − ty, the sign of Δt can be used to infer the direction of influence [14] when relation (6) exhibits a peak in Δt. A negative Δt for a peak in mutual information indicates that E1 shares a maximum amount of information with the future state of event E2. Con- Information-theoretic measures are based on probability distributions; hence, they can be calculated independently from the model and are expected to be capable of capturing non-linear interactions [35]. The information in the information theory is quantified by Shannon entropy [14], and for a binary system, is defined as: The logarithm of base 2 was chosen because binary events E were considered, and the unit of information was in bits. This measures the uncertainty associated with E, and H(E) is maximum if 0 and 1 are both equally probable for E. Therefore, H(E) is zero when only 0 or 1 is more likely to occur.
H(E 1 ,E 2 ) = −ΣΣ {0,1} P(x,y) log 2 P(x,y) is the joint entropy [14], and P(x,y) is the joint probability distribution. This describes the amount of uncertainty that characterizes both sets of events, E 1 and E 2 . On the other hand, the conditional entropy H(E 1 |E 2 ) [14] H(E 1 |E 2 ) = −ΣΣ {0,1} P(x,y) log 2 P(x|y) measures the uncertainty about the event E 1, which remains after the other event E 2 is manifested. In analogy, P(x|y) is the conditional probability. H(E 1 |E 2 ) = H(E 1 ) when the knowledge of the event E 2 does not reduce the uncertainty on the event E 1 . Moreover, in the case of dependency between events, each one contains information on the other. To quantify the amount of information that the event E 2 has on E 1 , mutual information [14] is defined as: which is a symmetric distribution with I(E 1 ;E 2 ) = I(E 2 ;E 1 ), which is not able to detect the direction of the information flux between the events. An asymmetric measure, named time-delayed mutual information [36], can be obtained if an investigator is interested in causality, by introducing a time-lag parameter ∆t between events E 1 and E 2 as: If ∆t = t x − t y , the sign of ∆t can be used to infer the direction of influence [14] when relation (6) exhibits a peak in ∆t. A negative ∆t for a peak in mutual information indicates that E 1 shares a maximum amount of information with the future state of event E 2 . Consequently, it is possible to affirm that E 1 drives E 2 . Instead, when ∆t is positive, this indicates that E 1 shares a maximum amount of information with the past of E 2 , which is equivalent to affirming that E 1 is driven by E 2 [37]. Time-delayed mutual information works also when non-linear links characterize interactions between two event time series [38]. While shared history and common external driving effects between two processes seem to escape from identification by means of mutual information [39], Schreiber [40] pointed out that a condition on the past state of E 2 must be fixed in Equation (6) using conditional mutual information. Then, transfer entropy is defined as a conditional time-delayed mutual information [41] from the event E 2 to E 1 with respect to a time lag ∆t.
Transfer entropy from E 2 to E 1 is based on the past of both E 1 and E 2 , and quantifies how reliably the event E 2 can predict the occurrence of the event E 1 .

Results
Considering the binary events EQ and EB, the sum (2) has two terms that represent the presence or absence of the event. Being so, the Shannon entropies are: and These can be evaluated immediately because the unconditioned probabilities are known for the singularly considered systems of West Pacific EQs and precipitating EBs from the internal Van Allen belts [12]. The total entropy of the two systems is the sum of their entropies. While, the entropy of the combined system is the joint entropy: where Σ EB = {0,1} P(EQ,EB) = P(EQ) and Σ EQ = {0,1} P(EQ|EB) = 1, being P(EQ = 1) = P(EQ) and P(EQ = 0) = 1 − P(EQ), were used. The origin of the rows reported in (10) can be clarified in a Venn diagram (see Figure 2 left).
Entropy 2022, 24, x FOR PEER REVIEW 5 of 13 indicates that E1 shares a maximum amount of information with the past of E2, which is equivalent to affirming that E1 is driven by E2 [37]. Time-delayed mutual information works also when non-linear links characterize interactions between two event time series [38]. While shared history and common external driving effects between two processes seem to escape from identification by means of mutual information [39], Schreiber [40] pointed out that a condition on the past state of E2 must be fixed in Equation (6) using conditional mutual information. Then, transfer entropy is defined as a conditional timedelayed mutual information [41] from the event E2 to E1 with respect to a time lag Δt.
Transfer entropy from E2 to E1 is based on the past of both E1 and E2, and quantifies how reliably the event E2 can predict the occurrence of the event E1.

Results
Considering the binary events EQ and EB, the sum (2) has two terms that represent the presence or absence of the event. Being so, the Shannon entropies are: and These can be evaluated immediately because the unconditioned probabilities are known for the singularly considered systems of West Pacific EQs and precipitating EBs from the internal Van Allen belts [12]. The total entropy of the two systems is the sum of their entropies. While, the entropy of the combined system is the joint entropy: where ΣEB = {0,1} P(EQ,EB) = P(EQ) and ΣEQ = {0,1} P(EQ|EB) = 1, being P(EQ = 1) = P(EQ) and P(EQ = 0) = 1 − P(EQ), were used. The origin of the rows reported in (10) can be clarified in a Venn diagram (see Figure 2 left).

Figure 2.
Venn diagrams: for EQ and EB to understand the origin of the rows of the basic Equation (10), on the left; for the entropies and the mutual information of EQ and EB events, on the right. The set of EBs is drawn larger than the set of EQs because EB events are here more numerous than EQ events.

Figure 2.
Venn diagrams: for EQ and EB to understand the origin of the rows of the basic Equation (10), on the left; for the entropies and the mutual information of EQ and EB events, on the right. The set of EBs is drawn larger than the set of EQs because EB events are here more numerous than EQ events.
Relation (10) can be rewritten as: where the relation (9), the logarithm properties and the definition of conditional entropy are taken into account. Thus, introducing the probability gain G ∆t = P(EQ|EB)/P(EQ) by (1) with a time delay ∆t = T EQ -T EB of correlations [12] between EQs at the time T EQ and EBs at the time T EB , the conditional entropy can be rewritten in a more synthetic form as: The joint entropy is a symmetric function where H(EB,EQ) = H(EQ,EB), while the conditional entropy follows the Bayes rule [14]: Relations among the entropies are represented in Figure 2 right, by a Venn diagram. The conditional entropy can be also calculated for a time delay ∆t, as well as for the joint entropy. Starting from (1) for EQs in West Pacific and EBs detected by NOAA-15, the entropies are reported in Figure 3 for the same range of delays as in Figure 1. In this regard, it must be remembered that N EQ = 600, on page 11 of the reference [12] is the number of EQs with M ≥ 6 that struck the West Pacific during the entire 16.5-year period. However, the number of EQs that occurred during the 6953 pairs of hours would only be considered, which is N EQ = 57, and P(EQ|EB) = 0.0081 EQs for each pair of hours. Values of the alarm rate and the failure rate would be consequently corrected in [12] to 0.77 and 0.23, respectively. A N EB = 1892 EBs was considered in the same time interval, so that P(EB) = 0.27. Joint and conditional entropies were retrieved by (10) and (12) according to the time delay ∆t, both of which showed a minimum for ∆t = 1.5-3.5 h corresponding to the correlation peak.
where the relation (9), the logarithm properties and the definition of conditional entropy are taken into account. Thus, introducing the probability gain GΔt = P(EQ|EB)/P(EQ) by (1) with a time delay Δt = TEQ -TEB of correlations [12] between EQs at the time TEQ and EBs at the time TEB, the conditional entropy can be rewritten in a more synthetic form as: The joint entropy is a symmetric function where H(EB,EQ) = H(EQ,EB), while the conditional entropy follows the Bayes rule [14]: Relations among the entropies are represented in Figure 2 right, by a Venn diagram. The conditional entropy can be also calculated for a time delay Δt, as well as for the joint entropy. Starting from (1) for EQs in West Pacific and EBs detected by NOAA-15, the entropies are reported in Figure 3 for the same range of delays as in Figure 1. In this regard, it must be remembered that NEQ = 600, on page 11 of the reference [12] is the number of EQs with M ≥ 6 that struck the West Pacific during the entire 16.5-year period. However, the number of EQs that occurred during the 6953 pairs of hours would only be considered, which is NEQ = 57, and P(EQ|EB) = 0.0081 EQs for each pair of hours. Values of the alarm rate and the failure rate would be consequently corrected in [12] to 0.77 and 0.23, respectively. A NEB = 1892 EBs was considered in the same time interval, so that P(EB) = 0.27. Joint and conditional entropies were retrieved by (10) and (12) according to the time delay Δt, both of which showed a minimum for Δt = 1.5-3.5 h corresponding to the correlation peak. The time delay Δt can be also be introduced for the mutual information, obtaining the time-delayed mutual information (6), it can be rewritten using GΔt as: The time delay ∆t can be also be introduced for the mutual information, obtaining the time-delayed mutual information (6), it can be rewritten using G ∆t as: using the chain rule for the entropies. Both (15) and (16) suggest that I(EQ;EB) = 0 when EQ and EB are independent events. Relations among the entropies and the mutual information are represented in Figure 2 right, by a Venn diagram. The plot of the relation (14) is reported in Figure 4 together with H(EQ|EB) and H(EQ), with respect to the time delay ∆t. The time-delayed mutual information reveals a maximum for ∆t = 1.5-3.5 h, in analogy with the entropies.

Discussion
The uncertainties associated with the variables EQ and EB are very different, where a lower value for EQs, 0.068 bits, is measured with respect to those for EBs, 0.844 bits. The low H(EQ) depends on the EQ = 0 outcome, which is more likely to happen, whereas the higher frequency of EBs is the cause of the near to 1 bit for H(EB). The entropy of the joint Finally, the TE between the lithosphere and the ionosphere is described by the EQs and EBs, respectively. Furthermore, TE EB->EQ based on the new P(EQ ∩ EQ +1 |EB +∆t ) is shown in Figure 8.  Finally, the TE between the lithosphere and the ionosphere is described by the EQs and EBs, respectively. Furthermore, TEEB->EQ based on the new P(EQ ∩ EQ+1|EB+Δt) is shown in Figure 8.

Discussion
The uncertainties associated with the variables EQ and EB are very different, where a lower value for EQs, 0.068 bits, is measured with respect to those for EBs, 0.844 bits. The low H(EQ) depends on the EQ = 0 outcome, which is more likely to happen, whereas the higher frequency of EBs is the cause of the near to 1 bit for H(EB). The entropy of the joint

Discussion
The uncertainties associated with the variables EQ and EB are very different, where a lower value for EQs, 0.068 bits, is measured with respect to those for EBs, 0.844 bits. The low H(EQ) depends on the EQ = 0 outcome, which is more likely to happen, whereas the higher frequency of EBs is the cause of the near to 1 bit for H(EB). The entropy of the joint system (10) is greater than the singular entropies, as the total number of events increases, and the probability of detecting indifferently an EQ or an EB is closer to 0.5. The joint entropy is lower than the simple sum of each system entropy, as a low non-significant correlation always exists between two sets of uncorrelated measurements, which is by chance. Thus, when the correlation increases between EQs and EBs, H(EQ,EB) decreases. This is evident in Figure 3 when EQs and EBs with a time delay are considered, even if the physical meaning of the joint entropy of a physical system composed of two parts considered at different times is debatable. The same considerations for the physical meaning of the conditional entropy of a physical system composed of two parts are considered at different times. The conditional entropy is the remaining uncertainty regarding EQ when the outcomes of EB are known, and it is at a lower level for ∆t of the maximum correlation.
The predominant direction of interaction cannot be directly searched for by mutual information because it is symmetric with respect to EQ and EB [40]; while delayed mutual information is more adapted for analyzing driver-response relationships. Figure 7 shows a significant mutual information relative variation occurring for ∆t = 1.5-3.5 h. Although, the maximum value of mutual information is equal to 0.0068 bits, which is 1/10 of the uncertainty H(EQ). Mutual information has the physical meaning of how much information EB has about EQ, and the time delay breaks vice versa. In particular, it means that EB shares a maximum amount of information with the future state of EQ. It was shown that TE, which is based on transition probabilities, allows for better detection of information transfer [40]. Figure 8 reports the TE considering the same time delay intervals ∆t of all the previous figures and is in agreement with an excess of information transfer from EB to EQ for ∆t = 1.5-3.5 h. Note that TE EB->EQ is not always positive; this is due to the approximation of the conditional probability of self-correlated EQs P(EQ ∩ EQ +1 |EB +∆t ) and the contribution of fluctuations. In this regard, it needs to be emphasized that whenever considering 600 EQs during 16.5 years of data, the Cluster 2000 software excluded many EQs in consecutive intervals. This significant reduction in the EQ self-correlation is a direct consequence of the Cluster's algorithm based on correlations among EQs [34]. However, the presence of a second peak which does not appear in the correlation histogram, nor in either the entropies or the mutual information, seems strictly dependent on the EQ self-correlation. This can be deduced by looking at Figures 6 and 7, where EQ self-correlation shows a peak for the same time delay.
The considered delay ∆t = −58.5-−56.5 h represents a real novelty that arises from this analysis. It means that EB shares a maximum amount of information with past EQs occurring around 2 days and 9.5 h before EBs. To better understand this result it is possible to observe that the major contributions to this peak came from relations (17), (21), (22), and (23), where positive terms including P(EQ ∩ EQ +1 |EB +∆t ) are summed. In a similar way, the major contributions to the peak with ∆t = 1.5-3.5 h came from relations (20) and (25), which sum positive terms containing the correlation (1). Being so, linear and non-linear contributions could be separated, making a distinction between correlation and self-correlation in the particular case of binary events, respectively. For what concerns the possible physical interpretation of the unsuspected TE peak shown in Figure 8, foreshock seismic activity should be investigated. If foreshocks occur, and an average of 2.5 h separate EBs from mainshocks, then foreshocks would anticipate mainshocks of 2 days and 7 h. Thus, the apparent paradox of TE indicating an information flux towards past events can be interpreted as foreshocks that possess information about mainshocks. However, it is necessary to remember that, in this case, self-correlation is obtained with a few EQs, and its significance is very low, even if TE is very sensible for self-correlations.

Conclusions
In our study here, entropies, delayed mutual information, and TE have for the first time been analytically calculated for their combined system of EQs into the lithosphere and EBs occurring into the ionosphere. This was possible, thanks to previously obtained descriptions of this system by conditional probability [12]. The reported expressions are valid for the general case of each couple of binary event time series belonging to the respective parts of a combined system.
The statistical correlation between EQs in the West Pacific and EBs detected by NOAA resulted in significant variations on entropies of the coupled system, calculated with the same time delays. The delayed mutual information supported, even more, the existence of this correlation. Finally, the TE revealed a consistent transfer of information from EBs to EQs corresponding to the correlation peak, together with a new and less pronounced exchange of information, about two and a half days later.
This last result can be traced back to the occurrence of foreshocks, although this finding appears to be of little significance in this case, where the EQ self-correlation concerns very few cases. Through the analysis of the contributions to the TE, this minor information exchange allows separating the non-linear contribution from the linear one.