Prospective Neural Network Model for Seismic Precursory Signal Detection in Geomagnetic Field Records

: We designed a convolutional neural network application to detect seismic precursors in geomagnetic field records. Earthquakes are among the most destructive natural hazards on Earth, yet their short ‐ term forecasting has not been achieved. Stress loading in dry rocks can generate electric currents that cause short ‐ term changes to the geomagnetic field, yielding theoretically detectable pre ‐ earthquake electromagnetic emissions. We propose a CNN model that scans windows of geomagnetic data streams and self ‐ updates using nearby earthquakes as labels, under strict detectability criteria. We show how this model can be applied in three key seismotectonic settings, where geomagnetic observatories are optimally located in high ‐ seismicity ‐ rate epicentral areas. CNNs require large datasets to be able to accurately label seismic precursors, so we expect the model to improve as more data become available with time. At present, there is no synthetic data generator for this kind of application, so artificial data augmentation is not yet possible. However, this deep learning model serves to illustrate its potential usage in earthquake forecasting in a systematic and unbiased way. Our method can be prospectively applied to any kind of three ‐ component dataset that may be physically connected to seismogenic processes at a given depth.


Introduction
Earthquakes are among the most destructive natural hazards on Earth and occur as a result of plate motion and the accumulation of tectonic stress, predominantly in collisional or subduction settings. Predicting the time and location of an earthquake can fundamentally improve seismic resilience and preparedness. Seismic hazard estimates are helpful for civil engineering designs and risk planning, but they only provide a long-term probability for earthquake occurrence [1]. Earthquake early-warning systems based on the velocity difference of seismic waves are suitable only for locations sufficiently far from epicentral areas to receive a useful warning time (Allen et al., 2019). Earthquake forecasting is thus the only way to realistically reduce the risk of earthquakes. However, reported seismic precursors have so far been inconsistent in the literature [2].
Some of the most promising and frequent seismic precursors have been observed in the Earth's magnetic field [3]. The geomagnetic field is generated by the circular movement of liquid metals in the outer core in response to the Earth's rotation. The most prominent short-term variations are caused by ionospheric currents and solar storms [4], but some anomalous signals have been recognized as precursors to earthquakes. These include magnetic polarization variations [5,6], changes in the diurnal oscillation rates of the vertical components [7], and ultralow-frequency intensity anomalies [8].
In this study, we put forward a prospective machine learning method that reads continuous records of geomagnetic field components and converts the raw data into a binary earthquake prediction system. Specifically, we adapt a convolutional neural network (CNN)-a popular form of supervised machine learning algorithm primarily used for classification problems [9]. The CNN scans moving windows of three-component unfiltered geomagnetic field measurements and learns from local earthquake catalogs as training labels. We present a proof of concept for real geomagnetic datasets and local earthquake catalogs in three key tectonic zones in Europe (Figure 1), despite limited data availability. Our suggested application has not been tested before and offers the possibility to search for precursors in an unbiased and systematic way in future large datasets recorded in optimal source-receiver configurations.

Detectability of Electromagnetic Emissions from Pre-Earthquake Phenomena
Whether pre-earthquake electromagnetic emissions exist and are measurable is a matter of debate [10]. Laboratory experiments have shown that stress loading in dry rocks can generate electric currents [11], although the authors of [12] argued that the Earth's crust is saturated with fluid so electric currents are not expected to appear. However, several models of electromagnetic emissions have been proposed, such as the induction of secondary fields due to electrically conducting fluid flow (the magnetohydrodynamic effect [13]) or due to changes in ferromagnetic rocks' magnetization in response to an applied stress (the piezomagnetic effect [14]).
The evolution of an earthquake has been often described in three stages: nuclear, stick-slip, and impulsive sliding motion [15]. Numerous microfractures occur in the initial nuclear stage due to tectonic stress. The authors of [16] suggested that this process can generate high-frequency electromagnetic radiation, which decreases as the fracture size increases in the stick-slip stage [17]. Uncertainty exists regarding the measurability of these emissions, the ways in which they affect the geomagnetic field, and how they evolve with time as the earthquake nucleation transitions into dynamic sliding [18,19].
Several studies show that it may be possible to measure electromagnetic seismogenic precursory signals in the geomagnetic field in specific configurations. The authors of [20] theoretically showed that seismogenic ultralow-frequency emissions can be observed at a distance of 100-200 km from the epicenter for earthquakes with magnitudes (Ms) between 4 and 6, with hypocentral depths of less than 30 km. The authors assumed a cylindrical current source in the lithosphere, taking into account the heterogeneity of the crust and atmospheric conductivity, as well as the anisotropic ionosphere. Seismic precursors in the geomagnetic field in such configurations have been reported before several earthquakes [21]. The authors of [22] estimated the maximum horizontal distance (R) for which an earthquake of magnitude M generates a detectable effect on the geomagnetic field, assuming that the earthquake magnitude is proportional to the geoelectric current intensity. According to their proposed formulae, the electromagnetic effect of an Mw 6.4 earthquake has a detectability radius of ~800 km.

Data and Methods
The proof of concept is presented on real triaxial geomagnetic datasets recorded at observatories situated inside three key tectonic zones in Europe: Muntele Rosu Observatory (MLR) in Romania, Duronia station (DUR) in Italy, and Iznik (IZN) in Turkey, along with local earthquake catalogs from 2007 until 2021 ( Figure 1 and Table 1).  We propose an unbiased and systematic workflow for assessing whether preearthquake geomagnetic data contain consistent seismogenic precursory signals without constraining the anomaly pattern or the observation bandwidth. So far, attempts have been focused on visual correlations of geomagnetic anomalies and earthquake occurrence, without a robust validation procedure or opportunities to reproduce the observations statistically. Here, we generated an adapted convolutional neural network model to investigate whether magnetic data could be used as a basis for earthquake forecasting, while taking into account previously suggested theories of seismogenic electromagnetic emissions.
Neural networks are supervised machine learning algorithms inspired by the anatomy of the nervous system. The algorithm assumes that a set of observed data are associated with known labels, which the algorithm learns to predict through training based on a wide set of known flagged examples. The network contains several layers of "neurons", usually represented by sigmoidal functions that adapt their coefficients at each iteration. Complex convolutional neural networks comprise a series of convolutional filters, pooling layers, in which the dimensions of the matrices resulting from the previous layers are reduced, as well as dense layers, in which all neurons are interconnected. These complex network configurations are able to classify images based on a labeled dataset [23].
Neural networks have been previously applied in Earth sciences, including prediction of earthquake locations [24] and detection of seismic precursors in subionospheric radio data [25] or gravitational data [26]. The algorithm does not take into account the physics behind the production of signals, but only picks up repeated features associated with the imposed labels. Similar deep learning approaches of aftershock forecasting without a priori information on fault orientations were able to predict the location of aftershocks better than classic Coulomb failure stress change models, which capture incomplete knowledge of earthquake physics and Earth's structure [24]. This proves that we do not need full knowledge of the type of seismic precursor anomaly to be able to classify it, but only a large enough dataset on which to train the machine learning algorithm. CNNs in this case are a form of big data mining, providing the ability to detect and assimilate relationships between target variables without a priori knowledge. However, implementing the most efficient configuration of neural networks to identify seismic precursors is a challenge in terms of the possible combinations of layers, number and connectivity of neurons, filter types, etc. Using existing model configurations that are successful in image classification can be advantageous for labelling portions of the magnetic field as precursors of earthquakes or not. Here, we were inspired by the general architectural principles of VGG models (visual geometry groups [23]), which involve stacking convolutional layers interspersed with pooling layers. The original VGGNet consists of 16 convolutional layers, each with a 3 × 3 filter with stride 1 and the same padding, and maximum pooling layers with a 2 × 2 filter with stride 2. We drew our inspiration from the uniform architecture concept of VGG, but chose to reduce the number of consecutive convolutional layers so as to reduce the number of trainable parameters. Each layer uses a ReLU activation function-a rectified linear neuron activation function that overcomes the vanishing gradient problem [28]. The last layer uses a softmax activation function, which yields a normalized probability for each class label. Our model is presented in Table 2. To estimate the model parameters, we used a cross-entropy loss function and the Adam optimization algorithm-a modified stochastic gradient descent algorithm recommended for solving deep learning problems [29].

Proof of Concept Design
We present examples of application of our proposed method on real geomagnetic data and local earthquake catalogs in three key tectonic regions in Europe (Figure 1). However, taking into account the strict theoretical detectability criteria and the relatively short recording periods of both seismicity and the geomagnetic field, our examples only serve to illustrate the future applicability of our proposed method. CNNs require large datasets in order to be able to recognize and assimilate relationships between variables. Thus, we expect our model to self-update and become increasingly better trained with time, as more data become available and more magnetic observatories are placed in the right regions with respect to earthquake epicenters.
Our data consisted of three-component geomagnetic data streams from long-term observatories that are located in three key seismotectonic locations in Europe. The target regions were as follows: the Apennines thrust belt-a seismic zone comprising mostly upper crustal normal faulting events responsible for repeated damage to historical Italian towns [30]; the North Anatolian Fault System in Turkey, dominated by low-depth strikeslip earthquakes [31]; and the Vrancea seismic zone in Romania (Figure 1)-an intraplate seismic nest that releases the largest strain in continental Europe [32]. Each of these seismic zones contains a geomagnetic field observatory conveniently located close to the epicenters, providing excellent opportunities to investigate the presence of possible electromagnetic precursory signals. Figure 1 shows the locations of these observatories (IZN in Turkey, DUR in Italy, and MLR in Romania), along with their recording periods and local seismicity. The maximum recorded magnitude during the studied period never exceeded Mw = 6.0. We downloaded continuous magnetic data from IZN and DUR sampled at 1 sample/minute from INTERMAGNET (International Real-Time Magnetic Observatory Network) for 2016-2021 and 2007-2020, respectively. The MLR station was installed in 2013 in Vrancea by the National Institute for Earth Physics. The DUR and IZN observatories are equipped with fluxgate magnetometers. We did not consider records with missing recording periods of more than a few minutes. For data gaps of less than 3 min, we padded the recordings with the mean value of each component for the considered period.
We constructed an annotated dataset by labelling periods of three-component geomagnetic field amplitudes with a binary set of labels (1 for earthquake and 0 for no earthquake). We used local seismic catalogs (ROMPLUS in Vrancea, [33], continuously updated by the NIEP) and the ISC Seismic Bulletin [34] to extract events that occurred within a 200 km radius from each observatory, with a maximum hypocentral depth of 30 km and Mw > 4, generally following the [20] detectability radius. In Vrancea, we allowed a maximum depth of 100 km and a maximum epicentral distance of 100 km. These configurations could be tightened in the future as more data are added to train the model. The labelled time series were then split into training and validation subsets comprising 80% and 20% of the total dataset, respectively.
We constructed a few training sets with real geomagnetic data prior to nearby earthquakes for use as inputs. We carried out trials with different-sized inputs to capture possible precursors with a large range of frequencies. Our trials included one-week-, oneday-, and one-hour-long geomagnetic input data (Figure 2), with and without bandpass filtering of the diurnal variation before scaling (Figure 3). The observatories had seismicity counts of 40 (MLR), 119 (DUR), and 132 (IZN) for their recording period. We built a nonearthquake dataset by selecting periods of geomagnetic field measurements that were not associated with these earthquakes and were not contaminated by solar storm signals [35]. To avoid introducing bias due to training data imbalance and model overfitting, we built a set of geomagnetic field records during seismic quiescence-similar in size to the sets hypothetically affected by nearby earthquake electromagnetic emissions. The three components of geomagnetic field measurements (Bx, By, and Bz) were normalized by the maximum amplitude of Bz and were essentially input as greyscale images in the CNN (Figure 4).

Preliminary Results and Future Work
We carried out model training for both individual observatory datasets-filtered and unfiltered-as well as using all datasets combined. We compared the accuracy for a range of time windows: one hour, one day, one week, and one month. While the cross-entropy loss function decreased with each iteration (from 0.7 to 0.5), the accuracy of both the training and validation data remained equal to the amount of earthquake-labelled versus non-earthquake-labelled data counts (0.5874 and 0.6641, respectively). Figure 5 shows the evolution of training and validation test accuracy and loss, respectively, with epoch number. While prediction accuracy increased in the first 10 epochs for both the training and validation datasets, the average value became constant, although training loss continued to decrease slowly. Validation loss was still higher than training loss, suggesting data overfitting-likely due to insufficient training data size. To test the sensitivity of our results to varying neural network hyper parameters, we carried out a range of tests. Choosing the optimal combination of hidden layers, number of neurons, connectivity, number of dense layers, pooling size, convolutional filter size and stride, dropout regularization, number of epochs, optimization algorithm, and learning rate is an extremely time-consuming and computationally expensive operation. Various methods have been proposed to simplify the process, including crossconvolutional grid search and Bayesian optimization techniques [36]. Here, we limited the sensitivity testing to a range of popular optimizers, learning rates, and numbers of convolutional layers. We found that the Adam optimizer yielded the best loss function evolution compared to other popular choices, such as Adagrad, AdaMax, SGD, and RMSProp ( Figure 6). Learning rate changes yielded oscillatory behavior in the accuracy evolution with each iteration. Increasing the number of consecutive convolutional layers did not produce a visible effect on either the loss function or the accuracy functions. Figure 6. Evolution of the cross-entropy loss function for a range of popular optimizers, using the one-week-long training dataset as a testing ground.
Trials with varying geomagnetic time-series lengths-ranging from one hour, to one day, to one week-showed slight differences in the estimated test set accuracy and loss ( Figure 7). The highest accuracy was obtained for the one-week-long input data (0.664), followed by the one-hour-long (0.586) and one-day-long (0.414) time series, while loss decreased as a function of the input data length. These observations might suggest that one-week-long geomagnetic datasets could contain more information correlated with the occurrence of earthquakes, although the data size used in this example was still insufficient to provide a confident comparison. Predicting the sample size required for a good classification performance is a complex challenge in most machine learning algorithms and is strongly dependent on the model complexity, number of trainable parameters, feature size, quality and repeatability of the training data, and other factors [37]. There are no rules of thumb that directly relate the number of trainable parameters in a CNN with a required training dataset size.
Previous CNN examples included the original AlexNet model [9], which had 60 million parameters and achieved the best top-5 error rate for the ImageNet set [38], comprising 1.3 million images, amounting to a factor of 60 fewer training images than trainable parameters. The original VGG16 model comprising 138 million parameters obtained 60% accuracy on 16,185 images of 196 classes of cars, indicating a factor of 8625 fewer training images than weights that must be learned. These examples serve to show that most published criteria for the best outcome consider either target accuracy, classifier confidence, uncertainty estimation, or the minimum expected error, without an objectively defined prediction of sample size.
A popular method employed in biomedical data classification models with few labelled training sets employs an inverse power law fitted to points on a learning curve, empirically predicting sample sizes by extrapolation [39]. A learning curve describes how the performance of a classifier (in our case, a CNN) varies with the training sample size. Learning curves are typically divided into three sections: a steep curve showing a rapid increase in the classification performance, a turning point where the slope in performance is attenuated, and a final section showing no further performance improvement, implying that the model has reached its efficiency threshold. Figure 8 shows the evolution of accuracy and loss with increasing labelled sample counts. We randomly removed portions of data and restarted training the CNN using each data subset to construct learning curves. Generally, loss functions for the training and validation datasets should converge towards a small value when the model has been sufficiently trained. Their loss separation and constant accuracy suggests, to a first order model, that the data size needs to be increased. However, predicting the required increase by fitting an inverse power law to the estimated learning curve is difficult in this case, due to the high variance of the validation loss estimates. Future generation datasets should provide a better starting point for this kind of empirical prediction. With the limited dataset available, these observations can indicate one of the following: 1. Geomagnetic field variations are not correlated with the occurrence of smallmagnitude earthquakes at the predicted distances; 2. An observational bias exists, implying that other geomagnetic signals obscure precursory signals and that stricter conditions need to be imposed on the detectability threshold values; 3. The data available at present are insufficient to train the proposed convolutional neural network model.
The sparse coverage and relatively short-term measurements of the geomagnetic field in epicentral areas hinder consistent seismic precursor observations at present. Data augmentation is necessary for better model training and will need to include significantly more geomagnetic field records as more data become available and more earthquakes occur within the detectability radius. Since the recording periods are limited, capturing few medium-sized earthquakes that fit the desired criteria, we recommend caution in adding more non-earthquake-labelled datasets in order to avoid imbalanced classifications. This implies having a similar amount of earthquake versus non-earthquake class labels and ensuring that geomagnetic time series labeled as "non-earthquake" do not overlap with any seismically active periods within the detectability radius. Additionally, focus should be directed to developing algorithms that can simulate the magnetic field response to electrical currents caused by stress loading in different rock types, with the option to tune the distances and physical parameters related to earthquake sources. Synthetic training data will help train the proposed neural network, similar to image recognition training that uses combinations of artificially created images [40].

Conclusions
We proposed a systematic and unbiased procedure to detect seismic precursors in geomagnetic field data, using deep learning techniques, and illustrated its application on next-generation datasets from optimal source-receiver settings. We designed a complex multilayered convolutional neural network similar to architectures that have been successful in image recognition as well as a range of Earth science applications. We treated windows of geomagnetic data streams as greyscale images that were labeled as precursory to earthquakes or not, using local seismic catalogs. We considered strict criteria for the detectability of electromagnetic emissions, where observatories were within a 100 km distance from epicentral areas with high seismicity rates and earthquakes with a minimum magnitude of 4. Since the accuracy of neural networks is strongly dependent on large datasets, we expect our model to self-update and become increasingly better at detection and assimilating relationships in time, as more data become available. Our model is the first deep learning implementation of seismic precursor recognition on the geomagnetic field and provides a ground basis for further theoretical developments in this domain.