Identification of Electromagnetic Pre-Earthquake Perturbations from the DEMETER Data by Machine Learning

The low-altitude satellite DEMETER recorded many cases of ionospheric perturbations observed on occasion of large seismic events. In this paper, we explore 16 spot-checking classification algorithms, among which, the top classifier with low-frequency power spectra of electric and magnetic fields was used for ionospheric perturbation analysis. This study included the analysis of satellite data spanning over six years, during which about 8760 earthquakes with magnitude greater than or equal to 5.0 occurred in the world. We discover that among these methods, a gradient boosting-based method called LightGBM outperforms others and achieves superior performance in a five-fold cross-validation test on the benchmarking datasets, which shows a strong capability in discriminating electromagnetic pre-earthquake perturbations. The results show that the electromagnetic pre-earthquake data within a circular region with its center at the epicenter and its radius given by the Dobrovolsky’s formula and the time window of about a few hours before shocks are much better at discriminating electromagnetic pre-earthquake perturbations. Moreover, by investigating different earthquake databases, we confirm that some low-frequency electric and magnetic fields’ frequency bands are the dominant features for electromagnetic pre-earthquake perturbations identification. We have also found that the choice of the geographical region used to simulate the training set of non-seismic data influences, to a certain extent, the performance of the LightGBM model, by reducing its capability in discriminating electromagnetic pre-earthquake perturbations.


Introduction
Earth observation by satellites has the advantages of global observation, high dynamic range, full-weather observation, and being unaffected by ground conditions, and these advantages increase greatly the chance of detecting earthquake events and enrich the information content of earthquake precursors [1]. Among these satellites, Detection of Electro-Magnetic Emissions Transmitted from Earthquake Regions (DEMETER) was the first satellite to specifically serve seismic research and volcanic monitoring. It is very important to detect possible anomalies that exist in the ionosphere before earthquake occurrence or volcano eruption [2]. The DEMETER satellite has accumulated a large amount of scientific data and abundant research achievements since its continuous operation for many years. Earthquake case studies and statistical studies are the main fields of the studies of ionospheric anomalies related to earthquakes using the DEMETER satellite data.
With the DEMETER data, many perturbations are found before strong earthquakes occur. The Kii island earthquake with a magnitude of 7.3 was the first case where many ionospheric perturbations were found before the strong earthquake [3]. Looking at the devastating Wenchuan earthquake which happened on 12 May 2008, Zhang et al. [4] found that the ion density reached its lowest values three days prior to the earthquake and decreased suddenly on the day of the earthquake. Not only did Błeçki et al. [5] find plasma turbulence over the area of earthquakes, but also Sarkar and Gwal [6] observed similar plasma anomalies together with electric field perturbations four to seven days before the earthquake. Anomalies began to appear in the equatorial ionosphere about a month before the earthquake and peaked eight days before the main earthquake in the study of Ryu et al. [7]. The analytical result of the boxplot method was applied by Liu et al. [8] to study the epicenter with the DEMETER data, and the consequences show that one-six days before the Wenchuan earthquake, the electron density (Ne), ion density (Ni) at nighttime, and ion temperature (Ti) at daytime dramatically declined and increased, respectively. Walker et al. [9] took a statistical approach using ultralow-frequency (ULF) data and found that there is a probable increase in the ULF wave power near the epicenter. Píša et al. [10] presented a statistical analysis of plasma density variations and showed that a large increase in the plasma density happened before the Chile earthquake. Zhang et al. [11] noticed that low-frequency electromagnetic disturbances started to arise on a large scale of latitudes and attained the highest after one week before the earthquake. Ho et al. [12,13] showed that anomalous enhancement of Ti, Ni, and Ne exists particularly around the epicenter area. Louerguioui et al. [14] reported that the anomalous changes in the computed ULF electric component along the direction of the Z-axis were obviously detected on the 1st, 11th, and 17th days before the earthquake. In addition to the above earthquakes, many studies' results confirm that the DEMETER ionospheric perturbations are useful and sensitive for detecting anomalies related to earthquakes, such as the 2007 Pu'er earthquake [15][16][17], the L'Aquila earthquake [18,19], the Haiti earthquake [20], and so on.
Since ionospheric perturbations are not only caused by earthquakes, and for some earthquakes, ionospheric perturbations cannot be found, some scholars began to use statistical analysis to exclude the electromagnetic disturbances caused by non-earthquake sources. Němec et al. [21,22] carried out a statistical study using the data of the electric field up to 10 kHz, and the results show a clear decline (several sigma) of the wave intensity in a few hours preceding earthquakes, which was repeated with similar but less obvious (only two sigma) fall results by Píša et al. [23,24]. Statistical study to investigate the relationship between seismicity and equatorial anomalies observed by DEMETER has been performed using electric field measurement [25] and equatorial plasma density [26]. He et al. [27,28] statistically indicated that the electron density near the epicenter increased in the nighttime, and the intensity of anomalies enhancement was found when the magnitude increased. Yan et al. [29] studied the ionospheric ion density and showed that at 200 km from the epicenter, five days before the occurrence of an earthquake with a magnitude greater than 5, ionospheric anomalies can be found. Statistical analysis using the ion density peaks in the complete DEMETER dataset [30][31][32][33][34] demonstrated that the number of disturbances increases and then decreases gradually on the day of the earthquake. Similarly, this phenomenon was also observed in the region of their magnetic conjugated points [35].
As mentioned above, research on ionospheric disturbances before earthquakes strike often depends on one or several specific parameters, and contradicting and confusing outcomes can be made. Therefore, the electromagnetic characteristics of ionospheric disturbances before earthquakes strike must be carefully examined. Moreover, many of these studies only apply to specific earthquakes, and lack universality and may draw very different conclusions. We achieve our objectives with an efficient discrimination analysis of electromagnetic pre-earthquake perturbations on a large amount of continuous satellite data by leveraging machine learning technologies that are widely used in recent studies of predicting earthquakes [36][37][38][39][40]. Moreover, machine learning enables finding out the most important features in discriminating electromagnetic pre-earthquake perturbations. Specifically, we investigate machine learning technologies to discriminate pre-earthquake ionosphere perturbations using electric and magnetic field data from the instrument champ electrique (ICE) [41] and instrument magnetic search coil (IMSC) [42] instruments. Sixteen spot-checking classification algorithms including the deep neural network (DNN), which is the most commonly used deep learning model nowadays, are applied to determine what measurement or parameter assists the forecasting of earthquakes using the electric and magnetic field data of DEMETER.
A comprehensive machine learning study of pre-earthquake electromagnetic perturbations from the DEMETER data is presented in this paper. Section 2 is related to the used dataset and the data preprocessing. The machine learning technologies are described in Section 3. Section 4 shows the observed results and discussion of the results. Finally, Section 5 presents the conclusions.

Dataset
The DEMETER satellite was launched on a near Sun-synchronous orbit  in June 2004, with an inclination of 98 • and an altitude of about 700 km [2]. This scientific mission ended in December 2010 and provided continuous data of about six and a half years. Global electromagnetic waves and plasma parameters except those in the auroral zone (geomagnetic latitude >65 • ) were measured by satellite instruments, which provide good coverage of the Earth's seismic zones. Among the parameters from these instruments, the power spectra data of electric and magnetic fields during DEMETER Survey modes were used in this study and calculated onboard with a frequency resolution of 19.5 Hz and a time resolution of 2 s.

Data Preprocessing
We found that the power spectra of electric and magnetic fields are basically consistent temporally and spatially although they come from two different instruments. After the data integration, the initial data involve 27,347,148 records in total.
First of all, we carefully process the data. (1) We removed the 2004 data and conducted our analysis on the data starting from 2005. This is because magneto-torquers worked during 3 time intervals around a half-orbit at the beginning of the DEMETER mission (the interference can be seen with the IMSC), and at the same time, calibration signals must be prepared each time the mode is changing, i.e., from survey to burst or from burst to survey at the beginning of each half-orbit. Unfortunately, DEMETER encountered a magnetic storm at the beginning of November 2004, and the big Sumatra earthquake in December 2004 came with many aftershocks. (2) We removed the frequencies from 1 up to 8 kHz with the IMSC, as the IMSC has interference that spectrograms lines at constant frequencies, which is due to the onboard electronics and it is impossible to remove the interference. We removed the man-made waves from the ground-based Very Low Frequency (VLF) transmitters, i.e., waves with frequencies above 11.5 kHz (the three Russian ALPHA transmitters start around 12 kHz) with both the IMSC and ICE. (3) We identified and removed all the half-orbits where electronics were in trouble, mostly due to heavy particles or wrong telecommands. (4) We excluded the time intervals when troubles occurred in the electric data and the very spectacular events called structured emissions with turbulence and interference (SETI) [43]. (5) To avoid density perturbations due to magnetic activity [44], we restricted our analysis to the data with the Kp index of smaller than or equal to 3. After the data preprocessing, a total of 25,546,213 records were used for training.

Frequency Bands Logarithmically Spaced
Different from the approach presented by Němec et al. [21], which focuses on a wide frequency bandwidth, we focus on low frequencies in our study and select frequency bands logarithmically spaced in ULF and in ELF. In order to focus on the low frequencies, the frequency range in our analysis is limited to below 3 kHz with ICE and below 1 kHz with IMSC, respectively. We have separately selected 11 and 6 frequency bands for the electric and magnetic field, respectively (Figure 1b). (a) Original power spectra of electric field (right pane) and magnetic field (left pane), the frequency range of our analysis is limited to below 10 kHz. (b) Processing to 11 and 6 frequency bands for the electric and magnetic field, respectively, the frequency range is limited to below 3 kHz with instrument champ electrique (ICE) and below 1kHz with instrument magnetic search coil (IMSC). (c) Processing to 11 and 3 frequency bands for the electric field (3-10 kHz) and magnetic field (8-10 kHz), respectively.
For power spectrum data of the electric field (ICE), we select 11 frequency bands logarithmically spaced and limited to below 3 kHz: ELF ( Then, we calculate the value for the frequency band according to the following formula: spectra_data = log 10 (10 a m + 10 a m+1 . . . + 10 a n ) where a m and a n correspond to power spectrum data of the upper and lower bounds of the frequency band. Finally, a total of 17 features (11 electric and 6 magnetic field frequency band measurements) are used as inputs to train the machine learning tools. The purpose of this is to exclude the interference of the spacecraft and cover the entire studied low-frequency range in ULF and ELF as uniformly as possible.
In this study, a total of 8760 earthquakes with magnitudes larger than or equal to 5.0 and depths less than or equal to 40 km occurred all over the world from January 2005 to December 2010 according to the USGS catalogue (http://neic.usgs.gov/). In order to avoid the mixed-effects pre-seismic and post-seismic activities, we excluded the values used more than once in the analysis since otherwise it is impossible to determine which earthquake the observed data can be attributed to. In order to verify the reliability and improve the robustness of the machine learning tools, we generate 8760 artificial non-seismic events and stagger the time and place to match when and where the real earthquakes occur. The artificial non-seismic events generation method we use here is to select a period of time and space so that the spatio-temporal range does not involve real earthquakes, and then generate an artificial non-seismic event by randomly sampling the time, latitude and longitude within the spatio-temporal range with the following constraints: (1) the time is not 10 days before and after a real earthquake occurrence time, (2) the longitude is not within 10 • of the longitude of a real earthquake and (3) the latitude is not within 10 • of the latitude of a real earthquake.
As reported by Němec et al. [21], a significant decrease in the wave intensity is observed several hours before earthquakes started. Given that there is no universal standard for the temporal window, we set the temporal window to be 48 h or 7 days before an earthquake hits for comparisons. The spatial feature is the distance away from the epicenter where the earthquake occurs. Again, since there is no agreed standard, the radius of the earthquake area that is expected to change is given by Dobrovolsky's formula: where R is the radius of the earthquake preparation zone in kilometers and M is the earthquake magnitude [45]. So, the circular region with its center at the epicenter and a radius given by the Dobrovolsky's formula and a deviation of 10 • were selected as the spatial features in our study ( Figure 2).
Moreover, the day-and nighttime data were treated separately. After the data preprocessing was completed, datasets (Table 1) were generated and subsequently used as inputs to train the machine learning tools.

Overview of Our Methodology
The methodology carried out in this study is shown in a schematic way ( Figure 3). After data preprocessing, each dataset was carefully split into two contiguous pieces: 90% for training models, which is from January 2005 to June 2010, and 10% for testing purposes and final evaluation, which is from June 2010 to December 2010. The top classifier is chosen by a quick assessment and model comparison of 15 different spot-checking algorithms along with the deep neural network (DNN) method. Bayesian optimization [46] and five-fold cross-validation [47] are used in this process for hyperparameter tuning and performance evaluation, respectively. In this way, we can perform model selection with high confidence, assuring that a robust model is selected and used. Moreover, the most important features of electromagnetic pre-earthquake perturbations can be obtained from the best classifier.

Machine Learning Methods
In this study, we formulate the perturbation discrimination task of pre-earthquake electromagnetism as a binary class classification problem, where the seismic-related data are labeled with 1 and non-seismic-related data are labeled with 0. Spot-checking algorithms are a quick assessment of a bunch of different algorithms on the problem so that you know which algorithms to focus on and which to discard [48]. As a classification problem in this study, we will try 15 spot-checking algorithms along with the DNN method on DataSet 01 with five-fold cross-validation, specifically: (1) linear algorithms: logistic regression [49], ridge regression [50], stochastic gradient descent classifier [51], passive aggressive classifier [52], linear discriminant analysis [53] and quadratic discriminant analysis [54]. (2) Nonlinear algorithms: classification and regression trees [55], extra tree [56], support vector machine [57] and naive Bayes [58]. (3) Ensemble algorithms: AdaBoost [59], random forest [60], gradient boosting machine [61], extreme gradient boosting [62] and light gradient boosting machine [63]. (4) Deep learning algorithms: deep neural network [64,65]. Benchmarking was performed on a server equipped with two Intel ® Xeon ® Processor E5-2650 v4 CPU, 128 GB of memory and NVIDIA Tesla V100 GPU.
Among these methods, LightGBM is a fast, distributed and high-performance gradient boosting framework based on ensembling tree-structured algorithms and has been widely used for ranking, classification and many other machine learning tasks [63]. After partitioning the training data onto a number of datasets, LightGBM performs both local and global voting in each iteration, and detailed explanations to this are documented in Meng et al. [66]. To investigate whether the frequency bands are seismic-related or not, the selected frequency bands are represented by feature vectors and encoded into the LightGBM-based classifier. LightGBM utilizes the histogram-based algorithm [67,68] to accelerate the training process, reduce memory consumption and optimize the parallel voting decision tree algorithm with advanced network communication. The training by LightGBM is usually much faster than many other algorithms such as support vector machines, therefore its name includes the word "Light".

Bayesian Hyperparameter Tuning
Since the methods are sensitive to parameter selection, Bayesian hyperparameter tuning is used to choose the parameters that obtain the best performance. We use the negative of the area under the curve (AUC) as the return value (loss) of the objective function, and Bayesian optimization [46] selects the most promising hyperparameters that minimize an objective function by building the probability model based on past evaluation results of the objective. In this way, Bayesian optimization can achieve better performance on the test set while requiring fewer iterations than random and gird search. The search space of major parameters for LightGBM and DNN is provided in Table S1 and  Table S3, respectively, and the maximum number of iterations for each model is set to 100. We use the Hyperopt python package [69] for the implementation of Bayesian optimization.

Five-Fold Cross-Validation
Cross-validation is a resampling procedure used to test how well the machine learning model performs. The classification results obtained in this study used five-fold cross-validation [70]. The overall dataset was divided into five subsets randomly. In each verification step, one subset was selected as a test set, while the remaining ones formed a training set. The model was trained on the training set and then verified by the test one. This process was repeated five times, and a different subset of samples was selected each time so that every subset was selected once in turn. The final was calculated according to averages of the five subsets.

Performance Evaluation
After we have determined the parameters for each method, the performance of each method based on these parameters was compared with the others. We use five performance measures, namely accuracy, sensitivity, specificity, the AUC and the area under the recall-precision curve (AURPC), to evaluate the performance of each method.
AUC is the area under the receiver operating characteristic (ROC) curve, which is a plot of true positive rate (TPR) against false positive rate (FPR). AURPC is also used in our work to evaluate the performance, which is regarded as a good alternative to AUC [71].
The accuracy (ACC) is defined as The sensitivity (TPR) is defined as The specificity (TNR) is defined as where TP is the number of true positives (a true positive test result is one that detects the condition when the condition is present), TN is the number of true negatives (a true negative test result is one that does not detect the condition when the condition is absent), FP is the number of false positives (a false positive test result is one that detects the condition when the condition is absent) and FN is the number of false negatives (a false negative test result is one that does not detect the condition when the condition is present).

Feature Importance
The importance of a feature is computed as the (normalized) reduction in the errors brought by that feature. It is also known as the Gini importance. The single node importance NI is defined as where Node split is the split non-leaf node in the decision tree, Node right and Node le f t are the right and left children nodes of Node split and w is the sample weight.
The importance of each feature on a decision tree is then calculated as So, M is the number of the trees in LightGBM, F is the number of non-leaf nodes which employ the target feature to split data and K is the total number of non-leaf in the m-th tree.

Evaluation of the Model Performance
As shown in Figure 3, all the benchmarking methods were used to construct models for classification, based on DataSet 01 with the generated features. Figure 4 shows histograms and summary statistics of the seismic and non-seismic input sample data, which show the values of the frequency band (ICE: from 19.53 to 39.06 Hz), where the percentage corresponds to the count of each bin divided by the sum of counts of all bins. From Figures S1 and S2, no significant imbalance was found in the positive (there was an earthquake) and negative (there was no earthquake) cases in all input data, suggesting the credibility and stability of the machine learning models. The data were preprocessed with normalization or not and with standardization or not, i.e., we have four cases, before we feed the data into the machine learning tool [72], which are important parameters in the Bayesian optimization process. The performance of each of the compared methods was summarized using the specificity, sensitivity, accuracy, precision and AUC measures on the training dataset of DataSet 01 (Table 2). In particular, for the sake of the comparison, we plotted the ROC curves [73] ( Figure 5a) and the boxplot curves of accuracy (Figure 5b) to show the performance of each method. We adopt the ROC (receiver operating characteristic) curve as a performance metric, which depicts relative trade-offs between true positive (benefits) and false positive (costs). Besides, we show AUC (area under the ROC curve) as an indicator of the distinguishing power of different models. Higher AUC conduces to a preferable method for the prediction task. By analogy, higher AUC leads to a better method to distinguish between seismic and non-seismic electromagnetic data.  From Figure 5 and Table 2, the average AUC of each spot-checking algorithm ranges from 0.576 to 0.98 and the average accuracy of each method ranges from 0.657 to 0.939. We found that the light gradient boosting machine (LightGBM) was the top classifier both from the ROC curves ( Figure 5a) and boxplot curves of accuracy (Figure 5b), and was also the best method from the evaluation metrics (Table 2). We have also noticed random forest (RF) generated good performance almost as well as LightGBM for DataSet 01, with the exception of specificity which is slightly lower. To investigate further, we also compared LightGBM with RF using the other four benchmarking datasets (Table 1). In general, as shown in Table 3, we discover that almost all performance metrics of LightGBM are slightly higher than that of RF in all the datasets presented in Table 1, which means that the LightGBM method is more accurate than RF in discriminating electromagnetic pre-earthquake perturbations. We also compared LightGBM with RF using AUC. From Figure 6, we can see that the AUC curve of LightGBM is also a little higher than that of RF in all the datasets, implying that LightGBM outperformed RF, and this could be explained by the fact that the hyperparameters in LightGBM, e.g., the number of trees, depth and the learning rate, are tunable, but this is not possible for RF. DNN was applied to DataSet 01 and generated the accuracy of 0.830 and AUC of 0.894. Compared to DNN, the accuracy of LightGBM increases to 0.947, and DNN finds its AUC worse than that of LightGBM on DataSet 01 (Table 2). We have also explored different DNN networks (e.g., with different layers such as 7, 8, 9 and 10; increased number of the iterations to 180 (which was 100)) ( Table S3, Supplemental Dataset S2) and have not found one that works better than LightGBM (the sheet with the name of DNN in DataSet S2 shows that the best AUC is 0.8957, which is only slightly better than the result of the relatively small search space, which is with the AUC of 0.894, whilst the AUC of LightGBM is 0.986). Although DNN is constructed with three fully connected layers, its performance is worse than that of LightGBM. This might be because deep learning architectures require significantly large training sets (a large number of earthquakes) for system optimization, which are not available in the current research because of the very limited resources.   Moreover, to investigate if there is a bias in the LightGBM model, a random labels test was performed. Practically, based on DataSet 01, we generated a set of elements with random labels, i.e., the labels are "wrong", and named the dataset Dataset 08. The figures in Figure 7 indicate that the LightGBM model is not able to learn on the "wrong" dataset, which means the LightGBM model passed the level 0 test. The AUCs of LightGBM on DataSet 01 and DataSet 08 are 0.985 and 0.507 (Table 4), respectively, and the accuracies on DataSet 01 and DataSet 08 are 0.950 and 0.370 (Table 4), respectively.  Furthermore, as seen in Table 4 and Figure 8, all four benchmarking datasets were used to compare the results of using different spatial and temporal features by LightGBM. In this comparison, we employed both AUC and AURPC to evaluate the performance by using our five-fold cross-validation (benchmarking) dataset. In general, we discover that the datasets with the nighttime (DataSets 01 and 02 in Table 4) lead to better classification performance than the datasets with the daytime (DataSets 03 and 04 in Table 4) for the same different spatial and temporal features, respectively (Tables 1 and 4). The ROC and recall-precision curves of LightGBM for all the datasets are shown in Figure 8 for performance comparison. The daytime data show no significant difference, probably because the ionospheric conditions are generally more disturbed and then the detection of seismo-electromagnetic effects becomes difficult. This result is in good alignment with the statistical results of electromagnetic perturbations connected to seismic activities [21][22][23][24].

Considering Different Spatial Windows
We also observed that LightGBM performed better in the circular region with its center at the epicenter and a radius given by the Dobrovolsky's formula (DataSets 01 and 03 in Table 4) than in the circular region with its center at the epicenter and a radius given by a deviation of 10 • (DataSets 02 and 04 in Table 4), with AUC improved from 0.985/0.919 to 0.960/0.873. Geometrically, the disturbance propagating upwards from the ground surface can change the properties of the ionosphere, and the radius of the affected area is in good alignment with the results calculated by the earthquake magnitude and Dobrovolsky's formula.
In order to further prove that "data within a circular region with its center at the epicenter and its radius given by the Dobrovolsky's formula and the time window of about a few hours before shocks" are more useful in discriminating electromagnetic pre-earthquake perturbations, satellite datasets of the spatial window with its center at the epicenter and a deviation of 3 • (DataSet 09), 5 • (DataSet 10), 7 • (DataSet 11) and 12 • (DataSet 12) have been generated (Table 5).  Figure 9 provides the recall-precision and ROC curves of the five datasets (including DataSet 01) with different spatial windows. Table 6 presents the classification performance with the five datasets of different spatial windows using LightGBM. From Table 6, the AUC on each dataset ranges from 0.881 to 0.986 and the average accuracy ranges from 0.797 to 0.959. It remains true that by increasing the distance of the spatial window, the performance of AUC and accuracy decreases by about 0.084 and 0.027, respectively. Based on these results, we conclude that the choice of the spatial window size is influencing, to a certain extent, the performance of the LightGBM model. Although the LightGBM model is capable of discriminating electromagnetic pre-earthquake perturbations with different spatial window sizes, it gives the best performance on the dataset with its center at the epicenter and its radius given by the Dobrovolsky's formula and the time window of about 48 h before shocks.

Considering Different Frequency Bands
To study the influences of different frequency bands on discriminating electromagnetic pre-earthquake perturbations, the frequency range of the analysis is limited to 3-10 kHz with ICE and 8-10 kHz with IMSC (high frequencies), and the approach to select frequency bands is the same as Němec et al. [21]. We have separately selected 11 and 3 frequency bands for the electric and magnetic fields, respectively (Figure 1c), which form the new dataset (DataSet 05). For power spectrum data of the electric field (ICE), we select 11 frequency bands logarithmically spaced and limited to 3 As illustrated in Table 4, DataSet 01 (the data with the frequency feature below 3 kHz with ICE and below 1 kHz with IMSC) and DataSet 05 (the data with the frequency feature at 3-10 kHz with ICE and 8-10 kHz with IMSC) were used for discriminating electromagnetic pre-earthquake perturbations.
We discover that DataSet 01 shows better classification performance than DataSet 05 for the LightGBM classifier: The AUCs of LightGBM on DataSets 01 and 05 are 0.985 and 0.768, respectively, and the accuracies on DataSets 01 and 05 are 0.950 and 0.732, respectively. From this observation, we conclude that the datasets with low frequencies (below 3 kHz with ICE and below 1 kHz with IMSC) enable the electromagnetic pre-earthquake perturbations discrimination to achieve better accuracy than those with high frequencies. The ROC and recall-precision curves of the LightGBM model on DataSets 01 and 05 are shown in Figure 10a for performance comparison.

Considering the Earthquake Geographical Region
As the above method for generating artificial non-seismic events would step too far away from the earthquake geographical region, in order to further verify the reliability of the machine learning models, we refine the method by selecting "the same regions" but focusing "outside time windows" of real earthquakes. Practically, for each of the real earthquakes, we select earthquake-related data using the two constraints: "time inside a 20-day time window (ten days before/after earthquakes)" and "distance inside the Dobrovolsky's radius", then we select all data named A which are seismic-related data based on the real earthquakes. Then, we select data named B using only one constraint, "distance is inside the Dobrovolsky's radius" from real earthquakes. By excluding A from B, we get C, which is apparently a non-seismic-related dataset, but inside the same earthquakes geographical region. Then, we can select the seismic-related data to form A and non-seismic-related data from C. After the data preprocessing is completed, DataSet 06 (the data with the frequency feature below 3 kHz with ICE and below 1 kHz with IMSC) and DataSet 07 (the data with the frequency feature at 3-10 kHz with ICE and 8-10 kHz with IMSC) are generated (Table 1) and are used as inputs to train the machine learning models. Performance comparison between the two datasets on the LightGBM method is illustrated in Table 4. The ROC and recall-precision curves of the LightGBM method on DataSet 06 and DataSet 07 are shown in Figure 10b. From Table 4 and Figure 10b, it could be noticed that the AUCs of LightGBM on DataSet 06 and DataSet 07 are 0.863 and 0.684, respectively, and the accuracies on DataSet 06 and DataSet 07 are 0.782 and 0.657, respectively. The LightGBM method provides better classification performance on DataSet 06 than on DataSet 07. In summary, although we use a new method to select the seismic-related data and non-seismic-related data, our work shows that the datasets with low frequency make it easier to discriminate the electromagnetic pre-earthquake perturbations, i.e., the data selection method has little (if not no) effect on the results.
To further explore the possible reason for the results mentioned above, we consider a new method for non-seismic-related data selection using a certain time window with a fixed width (ten days). Practically, for each of the real earthquakes, we select earthquake-related data using the three constraints: "time inside a 20-day time window (ten days before/after earthquakes)", "distance within the Dobrovolsky's radius" and "frequency feature below 3 kHz with ICE and below 1 kHz with IMSC", and we name the selected data D. Besides, we select another version of data named E similarly as we select D except that the first constraint is changed to "time inside a 40-day time window (twenty days before/after earthquakes)". We select another version of data named F by excluding D from E, i.e., the data in F is inside a time window with a fixed width (10-20 days before/after earthquakes). We then augment the seismic-related data all these non-seismic data and obtain a dataset called DataSet I. Similarly, we obtain DataSet II (20-30 days before/after earthquakes), DataSet III (30-40 days before/after earthquakes), DataSet IV (40-50 days before/after earthquakes), DataSet V (50-60 days before/after earthquakes) and DataSet VI (60-70 days before/after earthquakes) ( Table 1). The last six rows of Table 4 illustrate LightGBM's performance on the six datasets and Figure 10c shows the ROC and recall-precision curves of the LightGBM method on the six datasets. As is shown by Table 4 and Figure 10c, the method has similar performance on the six datasets, e.g., the AUCs of LightGBM on all six datasets are around 0.89 and the accuracies are around 0.81, i.e., LightGBM performs worse than on DataSet 01 since on DataSet 01 the AUC is 0.985 and its accuracy is 0.950. It remains true that by remaining on the same geographical location in simulating the non-seismic data, the performances decrease by about 0.09 of AUC and 0.14 of accuracy. That is, by simulating the non-seismic data in the vicinity of the faults (earthquake geographical region), the results of the LightGBM performance are worse than if you simulate the non-seismic data far from the faults. Based on these results, we conclude that the choice of the geographical region used to simulate the training set of non-seismic data is influencing, to a certain extent, the performance of the LightGBM model, by reducing its capability in discriminating electromagnetic pre-earthquake perturbations.

Considering the Earthquake Mangitude
Earthquake magnitude may play an active role on pre-earthquake perturbations identification. To demonstrate the influence of magnitudes, we have divided the earthquakes into groups of three, which are 6818 earthquakes with magnitudes between 5.0 and 5.5, 1813 earthquakes with magnitudes between 5.5 and 6.0 and 589 earthquakes with magnitudes of 6 and over, and the corresponding datasets are DataSet 13, DataSet 14 and DataSet 15 (Table 5). Table 6 illustrates the LightGBM method's performance on the four datasets (including DataSet 01). As is shown in Table 5, the method has similar performance over the four datasets, e.g., the AUC of the LightGBM method on all four datasets is around 0.96 (ranging from 0.919 to 0.986), and the accuracy is around 0.93 (ranging from 0.839 to 0.8975). Figure 11 shows the recall-precision and ROC curves displaying the performance of LightGBM on the datasets; we also observe a similar tendency in the performance of our method on the three datasets, which suggests that the LightGBM model provides satisfactory performance on discriminating electromagnetic pre-earthquake perturbations on the datasets with different magnitudes. Moreover, it can be concluded from Table 6 that the larger the magnitude, the better the classification performance. This is also in line with the reality. Although the datasets are quite different, these results indicate that the LightGBM model can be used to produce reliable predictions using the datasets with different magnitudes and provide good performance.

Considering an Unbalanced Dataset
As the actual earthquake problem is always highly unbalanced, where non-earthquakes instances are always higher as compared to earthquakes. In order to provide a realistic performance overview, we try our proposed method on an unbalanced dataset in this section. To investigate whether or not our proposed method is capable of predicting an earthquake with unbalanced datasets, datasets with the positive to negative ratio of 1:2 (DataSet 16), 1:3 (DataSet 17), 1:4 (DataSet 18) and 1:5 (DataSet 19) have been generated (shown in Table 5). Table 6 illustrates the proposed method's performance on the five datasets (including DataSet 01). As shown in Figure 12, the method has similar performance on the five datasets, e.g., the PR AUCs of the proposed method on all five datasets are around 0.96 (ranges from 0.904 to 0.998) and the ROC AUCs are around 0.96 (ranges from 0.923 to 0.986). Figure 12 shows the ROC and recall-precision curves, and we also observe a similar tendency in the performance of our method on the five datasets, suggesting that our method provides a good performance for pre-earthquake perturbations discrimination on the unbalanced datasets. Despite that the five unbalanced datasets are different, these results indicate that our method is less sensitive to the positive to negative ratio, and that our method can be used to identify electromagnetic pre-earthquake perturbations using unbalanced datasets, that is, it can provide a good realistic performance.

Considering Different Temporal Windows
We have observed that the dataset with a temporal window of 48 h (DataSet 01) has good classification performance. In order to investigate whether or not the LightGBM method is capable of identifying electromagnetic pre-earthquake perturbations with different temporal windows, datasets with temporal windows of 7 days (DataSet 20), 10 days (DataSet 21), 20 days (DataSet 22) and 30 days (DataSet 23) have been generated (shown in Table 1). Figure 13 provides the ROC and recall-precision curves of the five datasets with different temporal windows. Table 6 presents the classification performance with different temporal windows using LightGBM. From Table 6, the AUC ranges from 0.899 to 0.986 and the AURPC ranges from 0.964 to 0.995 on different datasets. It remains true that by increasing the days of the temporal window, the performances decrease by about 0.087 for AUC and 0.031 for AURPC. That is, the LightGBM model's performance is worse than if we increase the days of the temporal window. Based on these results, we conclude that the choice of the temporal window size is influencing, to a certain extent, the performance of the LightGBM model, by reducing its capability in identifying pre-earthquake perturbations. Regarding the machine learning methods, the LightGBM method performed better on the smaller dataset. This is due to the nature of trees, i.e., inherently fewer samples could train a better tree structure and produce better performance. Although the LightGBM model is capable of identifying perturbations with different temporal window sizes, it gives the best performance on the dataset with our initial selection of the temporal window of 48 h.

Dominant Features from LightGBM
In comparison with the blind features-based DNN model, LightGBM allows the importance of features to be determined during classification, i.e., the importance of each element in distinguishing one class from another. Gini importance is utilized to measure the importance of a feature and calculated as the sum of the split scores (across all trees) containing the feature, proportional to the number of samples it splits [63]. As shown in Figure 14, the frequency interval from 371.07 to 468.72 Hz of ICE, the frequency interval from 468.72 to 605.43 Hz of ICE, the frequency interval from 292.95 to 371.07 Hz of ICE, the frequency interval from 488.25 to 624.96 Hz of IMSC and the frequency interval from 292.95 to 390.60 Hz of IMSC are the most important features for the LightGBM model. A machine learning method trained using earthquake and non-earthquake electromagnetic data can demonstrate how the electromagnetic data have been influenced by the earthquake process.
The above results are well illustrated in Figures 15 and 16, which show perturbations observed in the electric and magnetic field near the epicenter of some earthquakes in low frequency. Figure 15 presents perturbations three days before the Indonesian earthquake occurred on 17 July 2006 with a magnitude of 7.7, and the data related to the orbits were very close to the epicenter (9.28 • S, 107.38 • E).
Perturbations are observed in the electric and magnetic fields near the epicenter in the form of intensification with the cutoff frequency in the range of about 350-400 Hz in ELF, as indicated by the arrow, reported by Bhattacharya et al. [74]. From Figure 16, perturbations are observed five days before the earthquake occurred on 30 August 2005 18:10:45 UT with a magnitude of 6.2, where the epicenter of the quake is 38.55 • north latitude and 143.06 • east longitude, and similar perturbations are observable with intensification at around 500 Hz in ELF in both the electric field and magnetic field. In addition, some examples of earthquake precursors for VLF temporary variations are demonstrated by Eppelbaum and Finkelstein [75]. Figure 17a shows the histograms and summary statistics obtained for distances from the epicenters of the seismic-related data. It can be seen that the distances interval from 100 to 200 km is observed as a key feature. The time interval from 10 to 11 h before the time of the main shock has the largest weight (Figure 17b).

Discussion
We summarize the previous studies using machine learning for pre-earthquake perturbation analysis for the DEMETER data, shown in Table 7. Through the performance comparison among these studies, the result shows that among those popular machine learning methods, the LightGBM method outperforms the others, i.e., it gives the best performance on all the benchmarking datasets.
Compared with the existing studies on pre-earthquake perturbation analysis, this study has three main advantages. First, hyperparametric optimization and cross-validation are used in all the engaged methods, which allows us to find the best parameters for these methods. In this way, we can perform technology selection with high confidence, assuring that a robust method is selected and used. Second, differing from the traditional statistics and data mining methods, a more advanced machine learning architecture is used for pre-earthquake perturbation analysis. Last, most of the existing methods are validated by a small number of samples, such as Wang et al. [76], which may have a strong bias toward the values in a larger area or a longer time interval. However, LightGBM is applied to different datasets, whereas global features are learned during the training process. Hence, LightGBM has a strong generalization ability. The hiss of the ionosphere at low altitude with the frequency between 100 Hz and 1 kHz can propagate along the magnetic field and penetrate into the topside ionosphere [80][81][82][83][84]. In our study, the frequency intervals from 371.07 to 468. 72 60 Hz of IMSC are the most important features rendered by LightGBM when we discriminate the earthquake and non-earthquake electromagnetic data. Our results suggest that sudden changes in the cutoff frequency of the ELF hiss or observation of whistlers near to the epicenter of earthquakes can help to discriminate earthquake-related electromagnetic data. Due to the existence of such phenomena before earthquakes, they could be regarded as short-term precursors to earthquakes.

Conclusions
We have explored 15 spot-checking classification algorithms along with the deep neural network model on the classification problem, and the top classifier based on a variety of combined features generated by low-frequency power spectra of the electric and magnetic fields was used to improve the performance of electromagnetic pre-earthquake perturbations discrimination from the DEMETER data. Our work shows that the LightGBM model outperforms the other models including DNN.
In our paper, the selection of temporal and spatial windows was used both during training and the testing process. In addition, for clearer presentation, we have provided Table S4 which summarizes the main variables and results for all datasets. From Table S4, we could observe that electromagnetic pre-earthquake data in the circular region with its center at the epicenter and a radius given by the Dobrovolsky's formula and a few hours before the times of the shocks are more reasonable in discriminating electromagnetic pre-earthquake perturbations, and the classification model also performs better on those with the nighttime than the datasets with the daytime; the results also show that the earthquake geographical region reduces the machine learning model's capability in discriminating electromagnetic pre-earthquake perturbations. Moreover, the distance interval from 100 to 200 km from the epicenter and the time interval from 10 to 11 h before the main shock are the key features from the results of the histograms and summary statistics.
Observations of dominant features of electromagnetic pre-earthquake perturbations are derived from LightGBM, and the observations along with the study results of the influences of different frequency band ranges on discriminating electromagnetic pre-earthquake perturbations support that we can discriminate earthquake-related electromagnetic data using low-frequency signals. The physical explanations about this could consider the lithosphere-atmosphere-ionosphere coupling [85][86][87][88][89], which showed that the effect of seismic crustal stress enables mechanical energy to be transformed to heat energy, and then transmitted to the earth surface through pores in rocks. At the same time, the local stress enhancement and the occurrence of micro-earthquakes can cause numerous micro-cracks and micro-fractures inside the lithosphere, so that the geo gases and fluids within the lithosphere, such as H2 and Rn, would spill out along these channels. The increase in geo gases in the atmosphere stimulates physical processes and chemical reactions from the ground surface up to the ionosphere, which can also produce an additional electric field, penetrate into the ionosphere and cause a large-scale ionosphere perturbation, and finally cause electromagnetic pre-earthquake perturbations.
Overall, our results show that machine learning is a promising tool for processing high-dimensional electromagnetic pre-earthquake perturbations data and that it supports investigations of seismic development.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2072-4292/12/21/3643/s1, Table S1: Thirteen datasets are used as input to the models, Table S2: The search space of parameters for the LightGBM model, Table S3: The search space of parameters for the deep neural networks model, Figure S1: Histograms and summary statistics of the seismic input-related data, Figure S2: Histograms and summary statistics of the non-seismic-related input data. Dataset S1: Hyperparametric optimization trials on thirteen datasets by LightGBM when training the model.