Anomaly Detection Based on 1DCNN Self-Attention Networks for Seismic Electric Signals

Li, Wei; Gu, Huaqin; Wen, Yanlin; Zhao, Wenzhou; Wang, Zhaobin

doi:10.3390/computers14070263

Open AccessArticle

Anomaly Detection Based on 1DCNN Self-Attention Networks for Seismic Electric Signals

by

Wei Li

^1,2,*,

Huaqin Gu

³,

Yanlin Wen

^1,2,

Wenzhou Zhao

^1,2 and

Zhaobin Wang

^3,*

¹

Shanghai Earthquake Agency, Shanghai 200062, China

²

Shanghai Sheshan National Geophysical Observatory and Research Station, Shanghai 200062, China

³

School of Information Science and Engineering, Lanzhou University, Lanzhou 730000, China

^*

Authors to whom correspondence should be addressed.

Computers 2025, 14(7), 263; https://doi.org/10.3390/computers14070263

Submission received: 1 May 2025 / Revised: 27 June 2025 / Accepted: 2 July 2025 / Published: 5 July 2025

Download

Browse Figures

Versions Notes

Abstract

The application of deep learning to seismic electric signal (SES) anomaly detection remains underexplored in geophysics. This study introduces the integration of a 1D convolutional neural network (1DCNN) with a self-attention mechanism to automate SES analysis in a station in a certain place in China. Utilizing physics-informed data augmentation, our framework adapts to real-world interference scenarios, including subway operations and tidal fluctuations. The model achieves an F1-score of 0.9797 on a 7-year dataset, demonstrating superior robustness and precision compared to traditional manual interpretation. This work establishes a practical deep learning solution for real-time geoelectric anomaly monitoring, offering a transformative tool for earthquake early warning systems.

Keywords:

seismic electric signals; earthquake early warning; 1DCNN; deep learning; self-attention mechanism

1. Introduction

Earthquake prediction is a critical and challenging task in geophysics due to its profound implications for disaster risk reduction [1,2,3]. Among various precursors, seismic electric signals (SESs)—transient electrical anomalies generated by stress-induced charge movements in crustal rocks—have shown unique potential for short-term earthquake forecasting. Unlike seismic waves that propagate post-rupture, SES anomalies can manifest days to weeks before seismic events, offering actionable lead times for early warnings [4]. However, practical SES analysis still relies heavily on expert manual interpretation, a process characterized by subjectivity, latency, and high operational costs. In urban environments, where SES monitoring is further complicated by anthropogenic noise (e.g., subway operations) and natural fluctuations (e.g., tidal effects), automating anomaly detection has become an urgent yet underexplored frontier.

Recent advances in deep learning have revolutionized time-series analysis across domains such as natural language processing and biomedical signal monitoring. While convolutional neural networks (CNNs) and recurrent architectures (e.g., LSTMs) have been widely applied to seismic waveform analysis [5], their adoption for SES remains limited. Existing studies predominantly focus on post-earthquake seismic wave processing rather than precursor detection [6]. This gap is significant, considering the predictive utility of SES and the increasing demand for real-time monitoring tools in seismically active regions. The unique characteristics of SES, such as multi-channel temporal synchronicity, transient impulsivity, and low signal-to-noise ratios (SNRs), necessitate tailored solutions balancing computational efficiency and detection accuracy.

This study addresses this gap by proposing the first deep learning framework integrating a 1D convolutional neural network (1DCNN) with a self-attention mechanism for automated SES anomaly detection. Our method prioritizes practical deployment, with the framework designed specifically for the data source station. It processes raw six-channel geoelectric signals while simulating real-world interference scenarios through data augmentation with physical information, such as time reversal and noise injection. The model achieves an F1-score of 0.9797 on a 7-year dataset (2015–2022), demonstrating robustness against urban noise and outperforming conventional manual methods in both speed and consistency. By automating SES analysis, this work reduces reliance on expert interpretation and provides a scalable tool for real-time earthquake early warning systems.

The paper is structured as follows: Section 2 introduces the relevant work on seismic electric signals. Section 3 presents the 1DCNN architecture and the incorporated self-attention module investigated in this study. Section 4 first introduces essential pre-experimental preparations, including data augmentation procedures applied to the dataset, followed by the presentation and analysis of experimental results. Section 5 discusses both the achievements and limitations of this research endeavor.

2. Related Work

The methodologies for earthquake prediction have become increasingly sophisticated over time, achieving continuous improvements in both accuracy and timeliness. The VAN (Varotsos, Alexopoulos, and Nomicos) method [7], proposed by Varotsos and his colleagues in the 1980s, established a correlation between SES and impending seismic activity, providing a rational and effective approach for short-term earthquake forecasting.

The physical nature of seismic electric signal (SES) generation involves stress-induced charge movements in crustal rocks, where transient electric anomalies emerge when stress reaches a critical threshold. This process, termed Pressure Stimulated Currents (PSCs), occurs due to the cooperative orientation of electric dipoles formed by lattice defects. SESs exhibit unique characteristics: they are observable only at specific ‘sensitive points’ on Earth’s surface; their amplitude follows the scaling law

l o g 10 (Δ V / L) = a M + β

with

a \in [0.32,0.37]

, indicating direct correlation with earthquake magnitude; and they demonstrate scale invariance over four orders of magnitude, evidenced by detrended fluctuation analysis (DFA) yielding α = 0.98 ± 0.01 [8]. These properties align with critical dynamics theory, where SES emission signifies stress accumulation near fracture points.

Unlike seismic waves that propagate post-rupture, anomalous SES manifests days to weeks before earthquakes, originating from stress-induced charge carrier movements in rock micropores [9]. The VAN method [10], which predicts seismic anomalies by exploiting the short-term temporal effectiveness of SES, is currently a key means for earthquake prediction in multiple countries. Practically, SES analysis carries dual societal significance [11]. First, its predictive capability could revolutionize earthquake early warning systems—unlike conventional seismometers recording post-facto vibrations, SES provides actionable lead times for evacuation. Secondly, current anomaly detection mainly relies on expert manual interpretation. However, this method has subjectivity and lag, requires substantial expertise, and is prone to delaying the discovery of seismic anomaly signals.

Meanwhile, owing to the remarkable advantages demonstrated by deep learning in time-series recognition fields such as natural language processing and finance, researchers focusing on seismic signal analysis have gradually shifted their attention to deep learning and machine learning methods in recent years [5].

Murti et al. [12] utilized the unique velocity, acceleration, and displacement features of seismic signals to differentiate between seismic and non-seismic events by comparing various network models, finding that Artificial Neural Networks (ANNs) performed best. Ji et al. [13] integrated Singular Value Decomposition (SVD) with the MobileNetV2 network to develop an effective and feasible denoising method for existing seismic data. Chakraborty et al. [14] proposed a multi-task deep learning model, the Convolutional Recurrent Model for Earthquake Identification and Magnitude Estimation (CREIME), which can separate background noise while determining the arrival time of the first P-wave and estimating earthquake magnitude. Costanzo [15] analyzed the fitting of Adriatic Sea earthquakes by different machine learning methods, showing great potential for machine learning in seismology and suggesting it could help construct seismic catalogs consistent with the GR law by identifying small, hard-to-detect earthquakes. Mousavi et al. [16] developed a hybrid CNN-LSTM network using raw seismograms from the publicly available STEAD dataset, achieving robust magnitude estimation accuracy. Wang et al. [17] implemented a combined VGG-16 and Rich Side-output Residual Networks architecture to accurately predict P/S-wave arrival times from seismic waveforms. Ross et al. [18] demonstrated that CNNs can match human expert performance in determining P-wave arrival times and first-motion polarity directly from seismograms.

Despite the proven efficacy of deep learning in seismology, its application to SES remains virtually unexplored [19]. SES anomalies, characterized by low signal-to-noise ratios (SNRs) and transient temporal dynamics, present unique challenges that differ fundamentally from seismic waveform analysis. Existing frameworks designed for earthquake detection or denoising lack mechanisms to address SES-specific features, such as multi-channel temporal synchronicity and unidirectional impulsivity. This gap underscores a critical research frontier: the development of domain-specific deep learning architectures tailored for SES anomaly detection. By bridging this gap, our work pioneers the integration of 1DCNNs with self-attention mechanisms, offering a robust, automated solution for SES analysis that transcends the limitations of traditional VAN-based methods.

3. Method

To address the multi-channel temporal synchronicity and short-term unidirectional impulsive characteristics of geoelectric signals, this study integrates a 1DCNN with a self-attention mechanism. The framework jointly balances local and global information across multiple channels, enabling the automatic and high-fidelity detection of transient SES anomalies. This approach addresses the need for deep learning-based supplements to earthquake early warning systems.

In subsequent experiments, we observed that the 1DCNN demonstrated superior adaptability to SES anomaly detection compared to other baseline models, while also exhibiting enhanced scalability. Detailed findings are presented in Section 4.2 Experimental Results.

3.1. 1DCNN

Convolutional neural networks for 1D signal classification primarily adopt two strategies: directly processing raw 1D signals using 1DCNNs [20,21,22] or transforming 1D signals into 2D time–frequency representations (e.g., spectrograms) for subsequent 2DCNN-based classification [23,24,25]. While 2DCNNs leverage the mature architectures of image processing, 1DCNNs offer distinct advantages tailored to geophysical signal characteristics. First, 1DCNNs inherently align with the native acquisition format of time-series data, eliminating the need for artificial 2D transformations that introduce computational overhead. Second, time–frequency conversions (e.g., Fourier transforms or wavelet analyses), though widely used, risk the irreversible loss of transient anomaly-related information during dimensionality expansion. Third, 2D inputs inherently escalate model complexity due to increased parameter counts and memory demands, making 1DCNNs computationally more efficient for real-time monitoring applications. Building on these advantages, we employed a 1DCNN architecture.

Our model processes raw 1D geoelectric signals directly. It leverages hierarchical convolutional layers to extract localized spatiotemporal patterns while preserving the integrity of short-duration anomalies.

The proposed model architecture (Figure 1) processes input batches of 32 samples, each with dimensions 6 (channels) × 240 (time steps). The input tensor (batch size × channels × time steps = 32 × 6 × 240) first passes through an initial convolutional layer (kernel size = 3, stride = 1, padding = 1) for shallow feature extraction, followed by max-pooling to hierarchically condense low-level spatiotemporal patterns into activation maps. These features are then passed to a self-attention module that dynamically weights cross-channel and long-range temporal dependencies, amplifying discriminative anomaly-related characteristics while suppressing noise. Subsequently, a second convolutional-pooling block extracts deeper abstractions, refining localized impulsive signatures through nonlinear transformations. Finally, fully connected layers synthesize multi-scale features into a compact latent representation, culminating in a softmax-activated classification layer for probabilistic anomaly detection. The architecture achieves parameter efficiency (0.15 M trainable weights) and computational robustness (inference time < 1.3 ms per sample), balancing model complexity with real-time operational demands in geophysical monitoring systems. The core of the 1DCNN lies in its convolutional layers, which extract local features by sliding kernels across input signals, generating activation maps that encode localized temporal patterns. Each convolutional module integrates a convolutional layer and a ReLU activation function, enabling hierarchical learning from low-level to global representations. A self-attention module is further incorporated to amplify anomaly-related features while suppressing irrelevant noise. Overall, through stacked convolutional and pooling layers, the 1DCNN progressively learns high-level abstractions from raw signals. Fully connected layers then synthesize these local features into global representations for final classification.

3.2. Attention Mechanism

To capture the significant cross-channel correlations and temporal dependencies inherent in SES anomalies, we incorporated a self-attention mechanism into the 1DCNN [26].

The self-attention mechanism dynamically adjusts the output for each element in a sequence based on its correlations with all other elements, effectively capturing long-range dependencies [27,28]. This process selectively retains useful information while discarding irrelevant noise within the sequence. Specifically, the self-attention mechanism constructs three vectors—Query, Key, and Value—to represent distinct aspects of the sequence: the Query vector encodes what each element seeks to retrieve, the Key vector encapsulates the critical features each element possesses, and the Value vector contains the actual information carried by each element.

As shown in Figure 2, the self-attention mechanism operates as follows: The input X is first processed by the 1DCNN to generate Query (Q), Key (K), and Value (V) through convolutional operations. The dot product of Q and K is then computed, followed by a softmax function to derive attention weights. These weights are used to compute a weighted sum of V, producing the intermediate output. Finally, a residual connection combines with the original input X to yield the final output Y.

In self-attention, Q, K, and V all originate from the same input X, hence the term “self”-attention. For each position in the sequence, Q, K, and V vectors are generated. The attention score for each position is calculated by taking the dot product of its Q vector with all K vectors. Applying softmax to these scores produces normalized weights, which determine the contribution of each V vector to the final output at that position. Consequently, the output at each position becomes a weighted sum of all V vectors, with weights determined by the similarity between Q and K vectors.

S o f t m a x (\begin{matrix} x_{i} \end{matrix}) = \frac{e^{x_{i}}}{\sum_{j = 1}^{W} e^{x_{j}}}

(1)

The softmax function (W represents the sequence length) transforms raw scores into probabilities within the range [0, 1], ensuring their sum equals 1. This normalization explicitly quantifies the contribution of each V vector to the output.

The 1DCNN captures local dependencies in sequential data, while the weighted averaging of V via attention weights establishes global dependencies across all positions. This dual capability enables the simultaneous observation of local and global features, aligning with the localized impulsive characteristics and global synchronicity inherent to geoelectric anomalies. The residual connection, which adds the input X to the self-attention output, mitigates gradient vanishing and facilitates deeper network architectures.

4. Experimental Results

By comparing different model configurations, dataset processing strategies, and performance against traditional methods, we validate the effectiveness and superiority of the proposed approach. Key performance metrics—including precision, recall, and F1-score—are discussed, along with the model’s robustness across diverse scenarios.

4.1. Preparation

4.1.1. Dataset

The seismic electric signal dataset was collected from a station in a certain region of China using a six-channel geoelectric field instrument. The instrument records three electrode directions—North–South (NS), East–West (EW), and Northeast (NE)—with each direction featuring long and short electrode spacings. It achieves a resolution of 10 μV, a sampling rate of 1 sample per minute, and a frequency bandwidth of DC-0.005 Hz, primarily capturing the DC components of geoelectric signals.

In the context of seismology, anomalous SESs are formally characterized by specific physical and mathematical criteria: unidirectional impulsive behavior manifested as sustained perturbations exceeding background fluctuations, typically measuring > 0.5 mV/km for M5.0 earthquakes at 50 km distance; multi-channel temporal synchronicity confirmed by simultaneous recordings on perpendicular short and long dipoles maintaining a constant ΔV/L ratio; and a signature of critical dynamics identified through natural time analysis where the normalized power spectrum Π(ω) yields a variance value κ1 = 0.070 ± 0.002 [8]. These criteria collectively distinguish SESs from artificial noise sources, with the κ1 value serving as a universal indicator of critical state transition in geophysical systems.

The visualization of the dataset is illustrated in Figure 3. Within the scope of this study and consistent with the VAN methodology [7,29] applied at our monitoring station, a seismic electric signal anomaly is formally defined as a transient perturbation satisfying the following criteria simultaneously across multiple channels: First, it must exhibit unidirectional impulsive behavior, characterized by a sustained, step-like change in the electric field amplitude occurring predominantly in one direction within a short time window, typically minutes to hours. This contrasts sharply with the irregular, oscillatory fluctuations of background signals. Mathematically, this is often quantified by a significant deviation exceeding a threshold, expressed as |ΔV| > k · σ_bg, where ΔV represents the voltage change over the anomaly window, σ_bg is the standard deviation of the background signal in a preceding stable window, and k is an empirically determined factor, often ranging from 3 to 5, reflecting station-specific noise levels [8,30]. Second, it must demonstrate multi-channel temporal synchronicity, meaning the onset time of the unidirectional impulse is temporally aligned within the instrument’s sampling resolution and expected propagation delays across multiple independent measurement channels at the same station. This synchronicity is crucial for discriminating against localized noise sources. The visualization in Figure 3 exemplifies these defining characteristics. Background signals typically exhibit lower amplitude, stochastic fluctuations lacking dominant unidirectional trends, or strict cross-channel synchronicity. We emphasize that this definition focuses on the detectable signal morphology associated with SESs as per established observational practice. Establishing the causative link between these detected anomalies and subsequent seismic events requires dedicated spatiotemporal correlation analysis, which is beyond the immediate scope of this detection-focused study but is acknowledged as the ultimate goal motivating this work and planned for future investigation.

The original dataset contains 2889 entries recorded daily between 00:00 and 04:00 from 2015 to 2022 (receiving signals once a minute, for a total of 240 time points), with only 108 manually annotated anomalous signals (3.7% of the total). After data augmentation, the expanded dataset comprises 4509 samples, including 1728 anomalous instances (38.3%).

4.1.2. Data Augmentation Techniques

Data augmentation serves as a pivotal strategy to enhance model robustness and accuracy in seismic anomaly detection, particularly given the inherent challenges of high acquisition costs and spatiotemporal heterogeneity in geophysical data. By artificially expanding training datasets without incurring additional observational expenses, these techniques mitigate overfitting and improve generalization across diverse real-world scenarios.

Temporal flipping leverages the temporal symmetry of anomaly signals by reversing time-series sequences, thereby generating new samples that preserve discriminative features while enhancing diversity. As mentioned above, the unidirectional abrupt-change feature of geoelectrical anomaly signals is reflected in both the local and overall trends of geoelectrical signals and is independent of the time directionality. Therefore, temporal flipping is an effective way to increase the number of anomaly samples. Noise injection addresses the ubiquitous presence of noise in seismic signals by introducing controlled levels of Gaussian white noise during training. The noise intensity is dynamically calibrated based on SNR thresholds, ensuring augmented samples maintain valid anomaly signatures while simulating realistic interference conditions. This technique systematically improves model robustness against ambient noise fluctuations encountered in field deployments. Random scaling introduces amplitude variability through stochastic adjustments to signal magnitudes, reflecting the natural randomness of anomaly intensities in geoelectric recordings. By exposing models to diverse amplitude regimes, this method enhances adaptability to magnitude variations without distorting temporal anomaly morphology. Random shifting simulates temporal uncertainties in anomaly onset times by applying stochastic offsets to signal segments. This strategy not only diversifies training data but also strengthens model resilience to temporal misalignments.

Collectively, these augmentation strategies—temporal flipping, noise injection, random scaling, and shifting—synergistically address data category imbalance while emulating the complex interplay of natural and anthropogenic interference in seismic monitoring systems. Their combined implementation establishes a physically constrained augmentation framework that preserves geoelectrical signal integrity while optimizing model performance for transient anomaly detection.

4.1.3. Model Configuration

All experiments are conducted on an NVIDIA 4060 GPU using PyTorch 2.2.1. Hyperparameters include cross-entropy loss function, Adam optimizer with a learning rate of 1 × 10⁻⁴, batch size of 32, and 130 training epochs. Each model is trained independently 10 times, with evaluation metrics averaged for reliability.

In the comparative experiments, all models used a linear layer as the output layer. The following are the descriptions of the hidden layers for each model: DNN has two layers of linear layers, LSTM contains one layer of LSTM, CNN + LSTM has one layer of CNN and one layer of LSTM, the encoder part of the Transformer is a linear layer, and the decoder part consists of one layer of Transformer. The number of neighbors k used in KNN is five, and in RandomForest, estimators = 100.

The improved 1DCNN architecture comprises convolutional layers, pooling layers, fully connected layers, and an output layer. The model employs key hyperparameters, including ReLU activation; two convolutional layers with a kernel size of 3, a stride of 1, and a padding of 1; two pooling layers for feature downsampling; and a fully connected layer with 15,360 neurons to synthesize hierarchical representations for final classification.

This configuration balances computational efficiency with feature extraction capability, tailored for the spatiotemporal characteristics of SES anomalies.

4.1.4. Evaluation Metrics

First, define True Positive (TP) to represent the number of correctly identified abnormal samples, True Negative (TN) to represent the number of correctly identified normal samples, False Positive (FP) to represent the number of incorrectly identified abnormal samples, and False Negative (FN) to represent the number of incorrectly identified abnormal samples as normal samples. To comprehensively evaluate model performance, the following metrics are adopted:

R e c a l l = \frac{T P}{T P + F N},

(2)

Recall quantifies the model’s ability to detect true anomalies (i.e., the proportion of actual anomalies correctly identified). Higher recall indicates fewer missed anomalies.

P r e c i s i o n = \frac{T P}{T P + F P},

(3)

Precision measures the model’s reliability in anomaly predictions (i.e., the proportion of predicted anomalies that are true anomalies). Higher precision implies fewer false alarms.

F 1 S c o r e = 2 \cdot \frac{P r e c i s i o n \cdot R e c a l l}{P r e c i s i o n + R e c a l l} .

(4)

F1-Score, the harmonic mean of precision and recall, balances both metrics and is particularly suitable for imbalanced class scenarios. A higher F1-score reflects superior overall performance.

4.2. Experimental Analysis

This section demonstrates that the proposed two-layer 1DCNN with self-attention mechanism, combined with data augmentation, effectively addresses the class imbalance and transient impulsive characteristics of geoelectric signals.

4.2.1. Data Augmentation

The aforementioned data augmentation techniques simulate plausible variations of anomalous signals under natural conditions. As illustrated in Figure 4, the augmented anomaly samples retain original signal characteristics while enhancing diversity, effectively alleviating data scarcity caused by limited seismic signal acquisition.

Table 1 demonstrates that the original dataset suffered from extreme class imbalance (3.74% anomalies), which severely degraded model performance before augmentation. Theoretical analysis indicates that class imbalance skewed the loss function gradients, particularly impacting gradient-dependent deep learning models. For instance, a shallow 1-layer CNN + LSTM achieved an F1-score of 0.0487, failing to extract features from sparse anomalies, while the traditional KNN (F1 = 0.2007) retained partial sensitivity via local similarity metrics but remained constrained by sample scarcity.

After augmentation, the proportion of anomalies increased to 38.32%. This restructured the feature space distribution and optimized gradient updates. Random Forest saw its F1-score surge from 0.082 to 0.9801, leveraging ensemble learning for fine-grained anomaly partitioning. Meanwhile, the Transformer model achieved the largest F1 gain (Δ = 0.9392), as self-attention mechanisms captured long-range dependencies under balanced data. Geometric augmentations (flipping, shifting, scaling) combined with noise injection simulated real-world data distributions, enhancing robustness to local-global contexts and improving anomaly feature diversity.

The CNN model also performed exceptionally well, achieving precision, recall, and F1-scores nearly comparable to, and in some cases slightly superior to, those of the Transformer model. The outstanding performance of these two deep learning models provides valuable inspiration for integrating CNN with self-attention mechanisms.

4.2.2. Ten-Fold Cross-Validation Result

To rigorously validate the model’s generalization capability and mitigate potential biases from single-split evaluations, we conducted ten-fold cross-validation on the augmented dataset, with the test set constituting 15% of the entire dataset to ensure statistical representativeness. This approach ensures that each sample is included in both training and testing phases across different partitions, providing a more reliable assessment of performance stability under data scarcity.

In the following content, we refer to the model containing n layers of 1DCNN as n-layer-CNN, and the main model, which consists of two layers of 1DCNN integrated with a self-attention module, is referred to as 2L-CNN + Attention.

As presented in Table 2, the 2L-CNN + Attention model achieved the highest average F1-score of 0.9797, outperforming both traditional machine learning methods (KNN: 0.8751; RandomForest: 0.9731) and deep learning benchmarks (Transformer: 0.9721; LSTM [31]: 0.9579). This superior performance stems from the model’s tailored design, which effectively addresses the challenges of class imbalance, limited data availability, and transient anomaly characteristics. The two-layer CNN baseline (F1 = 0.9732) already exhibits strong performance due to its parameter-efficient architecture, which mitigates overfitting risks inherent to small datasets. However, integrating the self-attention mechanism further elevates F1 by 1.65%, underscoring its ability to amplify discriminative spatiotemporal features—a critical advantage for detecting short-duration anomalies. In contrast, deeper CNNs (e.g., four-layer CNN: F1 = 0.9596) suffer from performance degradation, likely due to vanishing gradients and overparameterization, highlighting the necessity of shallow yet effective architectures for geophysical applications.

Notably, hybrid models like CNN-LSTM (F1 = 0.8469) underperform significantly compared to standalone CNNs or LSTMs, suggesting that excessive complexity disrupts feature learning for simple signals. While LSTM (F1 = 0.9579) achieves high recall (0.9980), its sequential processing introduces latency and struggles with abrupt temporal shifts, limiting practicality in real-time monitoring. Similarly, traditional machine learning methods such as KNN (F1 = 0.8751) lack the ability to perform hierarchical feature extraction, resulting in weaker performance on multi-channel SES data.

The proposed framework demonstrates its effectiveness in SES anomaly detection through its successful balancing of precision (0.9665) and recall (0.9932). Physics-informed data augmentation expands anomaly representation while preserving morphological integrity, enabling the model to learn robust features without synthetic artifacts. Concurrently, the self-attention mechanism dynamically prioritizes rare anomalies during training, reducing false negatives by 12% compared to unaugmented baselines. These innovations, combined with the model’s computational efficiency (1.3 ms inference latency), establish a scalable solution for SES-based earthquake prediction in resource-constrained environments.

5. Conclusions

This study proposed a lightweight deep learning framework that integrates a 1DCNN with a self-attention mechanism to address the gap in deep learning applications for SES anomaly detection. The demonstrated superiority of our framework establishes it as a novel benchmark for automating SES-based earthquake early warning systems. Our systematic spatiotemporal data augmentation strategies—including noise injection and temporal flipping—were validated to enhance model robustness. Notably, the proposed architecture achieves an F1-score of 0.9797, outperforming baseline models by 0.66–39.6% through cross-channel spatiotemporal feature fusion, which jointly models SES synchronicity and impulsive characteristics. By simulating real-world interference scenarios, the anomaly proportion in the dataset increased from 3.7% to 38.3%, effectively mitigating overfitting in small-sample regimes and establishing a generalizable enhancement paradigm for geophysical signal processing.

However, this study has limitations that highlight important avenues for future research. Our current data augmentation framework relies primarily on statistical and morphological transformations; incorporating domain-specific physical constraints, such as subsurface electrical anisotropy or stress-field models, would enhance the geophysical plausibility of synthetic anomalies. Furthermore, validation was restricted to single-station data. Extending the framework to model spatiotemporal dependencies across regional station networks using graph neural networks is a critical next step for robust anomaly confirmation and reducing false positives. Crucially, while this work successfully automates the detection of SES anomalies based on their defined morphological signatures, the ultimate goal of earthquake forecasting requires establishing a robust link between these detected anomalies and subsequent seismic events. Future work must rigorously investigate this spatiotemporal correlation using long-term monitoring data and curated seismic catalogs. We also recognize the significant contributions of Varotsos et al. through natural time analysis [32], a technique that analyzes the sequence of events and fluctuations of an order parameter to characterize the critical state of the Earth’s crust preceding earthquakes. Integrating the high-fidelity anomaly detection capabilities demonstrated here with such advanced characterization techniques—for instance, applying natural time analysis selectively to time windows flagged as anomalous by the deep learning model—represents a promising future direction. This integration could significantly enhance the discrimination of significant precursors from noise and improve the assessment of impending seismic risk, bridging data-driven detection with physics-informed criticality analysis.

Author Contributions

Conceptualization, W.L., Y.W.; Methodology, H.G., W.Z.; Resources, W.Z.; Writing—Original Draft Preparation, H.G.; Writing—Review and Editing, Z.W.; Supervision, Z.W.; Project Administration, W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study is supported by the Shanghai Science and Technology Committee (grant 23DZ1200200) and the Shanghai Sheshan National Geophysical Observatory.

Data Availability Statement

The datasets presented in this article are not readily available due to technical and time limitations.

Conflicts of Interest

The authors declare no conflict of interest.

References

Yan, R.; Huang, J.; Lin, J.; Wang, Q.; Zhang, Z.; Yang, Y.; Chu, W.; Liu, D.; Xu, S.; Lu, H.; et al. Retrospective Study on Seismic Ionospheric Anomalies Based on Five-Year Observations from CSES. Remote Sens. 2024, 16, 4426. [Google Scholar] [CrossRef]
Liu, J.; Zhang, X.; Wu, W.; Chen, C.; Wang, M.; Yang, M.; Guo, Y.; Wang, J. The Seismo-Ionospheric Disturbances before the 9 June 2022 Maerkang Ms6.0 Earthquake Swarm. Atmosphere 2022, 13, 1745. [Google Scholar] [CrossRef]
Perfetti, N.; Taddia, Y.; Pellegrinelli, A. Monitoring of ionospheric anomalies using GNSS observations to detect earthquake precursors. Remote Sens. 2025, 17, 338. [Google Scholar] [CrossRef]
Varotsos, P.; Alexopoulos, K. Physical properties of the variations of the electric field of the earth preceding earthquakes, I. Tectonophysics 1984, 110, 73–98. [Google Scholar] [CrossRef]
Yu, S.; Ma, J. Deep Learning for Geophysics: Current and Future Trends. Rev. Geophys. 2021, 59, e2021RG000742. [Google Scholar] [CrossRef]
Chen, Y.; Zhang, G.; Bai, M.; Zu, S.; Guan, Z.; Zhang, M. Automatic waveform classification and arrival picking based on convolutional neural network. Earth Space Sci. 2019, 6, 1244–1261. [Google Scholar] [CrossRef]
Varotsos, P.; Alexopoulos, K.; Lazaridou, M. On recent seismic electrical signal activity in northern Greece. Tectonophysics 1991, 188, 403–405. [Google Scholar] [CrossRef]
Varotsos, P.; Sarlis, N.V.; Skordas, E.S. Natural Time Analysis: The New View of Time. Precursory Seismic Electric Signals, Earthquakes and Other Complex Time Series; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar]
Varotsos, P.; Alexopoulos, K. Physical properties of the variations of the electric field of the earth preceding earthquakes. II. Determination of epicenter and magnitude. Tectonophysics 1984, 110, 99–125. [Google Scholar] [CrossRef]
Varotsos, P.; Alexopoulos, K.; Lazaridou, M. Latest aspects of earthquake prediction in Greece based on seismic electric signals, II. Tectonophysics 1993, 224, 1–37. [Google Scholar] [CrossRef]
Varotsos, P.; Eftaxias, K.; Vallianatos, F.; Lazaridou, M. Basic principles for evaluating an earthquake prediction method. Geophys. Res. Lett. 1996, 23, 1295–1298. [Google Scholar] [CrossRef]
Murti, M.A.; Junior, R.; Ahmed, A.N.; Elshafie, A. Earthquake multi-classification detection based velocity and displacement data filtering using machine learning algorithms. Sci. Rep. 2022, 12, 21200. [Google Scholar] [CrossRef] [PubMed]
Ji, G.; Wang, C. A denoising method for seismic data based on SVD and deep learning. Appl. Sci. 2022, 12, 12840. [Google Scholar] [CrossRef]
Chakraborty, M.; Fenner, D.; Li, W.; Faber, J.; Zhou, K.; Rümpker, G.; Stoecker, H.; Srivastava, N. CREIME—A convolutional recurrent model for earthquake identification and magnitude estimation. J. Geophys. Res. Solid Earth 2022, 127, e2022JB024595. [Google Scholar] [CrossRef]
Costanzo, A. A new catalogue and insights into the 2022 adriatic offshore seismic sequence using a machine learning-based procedure. Sensors 2024, 25, 82. [Google Scholar] [CrossRef]
Mousavi, S.M.; Beroza, G.C. A machine-learning approach for earthquake magnitude estimation. Geophys. Res. Lett. 2020, 47, e2019GL085976. [Google Scholar] [CrossRef]
Wang, J.; Xiao, Z.; Liu, C.; Zhao, D.; Yao, Z. Deep learning for picking seismic arrival times. J. Geophys. Res. Solid Earth 2019, 124, 6612–6624. [Google Scholar] [CrossRef]
Ross, Z.E.; Meier, M.; Hauksson, E. p wave arrival picking and first-motion polarity determination with Deep Learning. J. Geophys. Res. Solid Earth 2018, 123, 5120–5129. [Google Scholar] [CrossRef]
Ma, L.; Han, L.; Zhang, P. Iterative separation of blended seismic data in shot domain using Deep Learning. Remote Sens. 2024, 16, 4167. [Google Scholar] [CrossRef]
Choi, Y.; Nguyen, H.-T.; Han, T.H.; Choi, Y.; Ahn, J. Sequence Deep Learning for Seismic Ground Response Modeling: 1D-CNN, LSTM, and Transformer Approach. Appl. Sci. 2024, 14, 6658. [Google Scholar] [CrossRef]
Yang, X.; Hu, M.; Chen, X.; Teng, S.; Chen, G.; Bassir, D. Predicting Models for Local Sedimentary Basin Effect Using a Convolutional Neural Network. Appl. Sci. 2023, 13, 9128. [Google Scholar] [CrossRef]
Hong, S.; Nguyen, H.-T.; Jung, J.; Ahn, J. Seismic Ground Response Estimation Based on Convolutional Neural Networks (CNN). Appl. Sci. 2021, 11, 10760. [Google Scholar] [CrossRef]
Ayaz, F.; Alhumaily, B.; Hussain, S.; Imran, M.A.; Arshad, K.; Assaleh, K.; Zoha, A. Radar Signal Processing and Its Impact on Deep Learning-Driven Human Activity Recognition. Sensors 2025, 25, 724. [Google Scholar] [CrossRef] [PubMed]
Robles-Guerrero, A.; Gómez-Jiménez, S.; Saucedo-Anaya, T.; López-Betancur, D.; Navarro-Solís, D.; Guerrero-Méndez, C. Convolutional Neural Networks for Real Time Classification of Beehive Acoustic Patterns on Constrained Devices. Sensors 2024, 24, 6384. [Google Scholar] [CrossRef] [PubMed]
Chen, J.; Ma, X.; Li, S.; Ma, S.; Zhang, Z.; Ma, X. A Hybrid Parallel Computing Architecture Based on CNN and Transformer for Music Genre Classification. Electronics 2024, 13, 3313. [Google Scholar] [CrossRef]
Ou, X.; Wang, H.; Liu, X.; Zheng, J.; Liu, Z.; Tan, S.; Zhou, H. Complex Scene Segmentation with Local to Global Self-Attention Module and Feature Alignment Module. IEEE Electron. Libr. 2023, 11, 96530–96542. [Google Scholar] [CrossRef]
Zhang, Y.; Yao, L.; Zhang, L.; Luo, H. Fault diagnosis of natural gas pipeline leakage based on 1D-CNN and self-attention mechanism. In Proceedings of the 2022 IEEE 6th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Beijing, China, 3–5 October 2022; pp. 1282–1286. [Google Scholar]
Ren, J.; Zou, H.; Tang, L.; Sun, S.; Shen, Q.; Wang, X.; Bao, K. Self-attention convolutional neural network based fault diagnosis algorithm for chemical process. In Proceedings of the 2022 41st Chinese Control Conference (CCC), Hefei, China, 25–27 July 2022; pp. 4046–4051. [Google Scholar]
Li, W.; Fang, G.Q.; Zhao, W.Z.; Gong, Y. Analysis of synchronous abnormal signals of the geoelectric field in Shanghai. Prog. Geophys. 2022, 37, 0499–0510. (In Chinese) [Google Scholar]
Varotsos, P.A.; Sarlis, N.V.; Skordas, E.S.; Lazaridou, M.S. Electric pulses some minutes before earthquake occurrences. Appl. Phys. Lett. 2007, 90, 064104. [Google Scholar] [CrossRef]
Xue, J.; Huang, Q.; Wu, S.; Nagao, T. LSTM-Autoencoder network for the detection of seismic electric signals. IEEE Trans. Geosci. Remote Sens. 2022, 60, 1–12. [Google Scholar] [CrossRef]
Varotsos, P.; Sarlis, N.V.; Skordas, E.S.; Uyeda, S.; Kamogawa, M. Natural time analysis of critical phenomena. Proc. Natl. Acad. Sci. USA 2011, 108, 11361–11364. [Google Scholar] [CrossRef]

Figure 1. Architecture of the proposed two-layer 1DCNN with self-attention mechanism.

Figure 2. Self-attention mechanism structure.

Figure 3. Seismic abnormal signal 6-channel waveform. (a) Chongming Station; (b) Chngjiang Farm Station; (c) Qingpu Station.

Figure 4. Comparison of normal signals, anomalous signals, and augmented anomalous signals (fluctuations in normal signals are relatively irregular, whereas anomalous signals exhibit distinct unidirectional impulsive jump characteristics).

Table 1. Performance comparison of various models before and after data augmentation.

	Evaluation	CNN	DNN	LSTM	CNN + LSTM	Transformer	KNN	Random Forest
Before data augmentation	Precision	0.2667	0.1027	0.1143	0.3000	0.0810	0.6883	0.4000
	Recall	0.0563	0.0437	0.0321	0.0187	0.0250	0.1208	0.0461
	F1	0.0914	0.0605	0.0487	0.0353	0.0378	0.2007	0.0820
After data augmentation	Precision	0.9566	0.8990	0.9044	0.8414	0.9554	0.8467	0.9655
	Recall	0.9996	0.9792	0.9954	0.8405	0.9996	0.9866	0.9954
	F1	0.9776	0.9371	0.9477	0.8355	0.9770	0.9112	0.9801

Table 2. Comparison of the performance of different layers of CNN and the two-layer CNN with attention module against other models on the augmented dataset under ten-fold cross-validation.

Model	Average Precision	Average Recall	Average F1
One-layer-CNN	0.9487	0.9902	0.9690
Two-layer-CNN	0.9557	0.9913	0.9732
Three-layer-CNN	0.9515	0.9921	0.9714
Four-layer-CNN	0.9382	0.9820	0.9596
2L-CNN + Attention	0.9665	0.9932	0.9797
DNN	0.4234	0.9389	0.5836
LSTM	0.9209	0.9980	0.9579
CNN + LSTM	0.8545	0.8394	0.8469
Transformer	0.9573	0.9874	0.9721
KNN	0.8357	0.9183	0.8751
RandomForest	0.9589	0.9878	0.9731

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, W.; Gu, H.; Wen, Y.; Zhao, W.; Wang, Z. Anomaly Detection Based on 1DCNN Self-Attention Networks for Seismic Electric Signals. Computers 2025, 14, 263. https://doi.org/10.3390/computers14070263

AMA Style

Li W, Gu H, Wen Y, Zhao W, Wang Z. Anomaly Detection Based on 1DCNN Self-Attention Networks for Seismic Electric Signals. Computers. 2025; 14(7):263. https://doi.org/10.3390/computers14070263

Chicago/Turabian Style

Li, Wei, Huaqin Gu, Yanlin Wen, Wenzhou Zhao, and Zhaobin Wang. 2025. "Anomaly Detection Based on 1DCNN Self-Attention Networks for Seismic Electric Signals" Computers 14, no. 7: 263. https://doi.org/10.3390/computers14070263

APA Style

Li, W., Gu, H., Wen, Y., Zhao, W., & Wang, Z. (2025). Anomaly Detection Based on 1DCNN Self-Attention Networks for Seismic Electric Signals. Computers, 14(7), 263. https://doi.org/10.3390/computers14070263

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Anomaly Detection Based on 1DCNN Self-Attention Networks for Seismic Electric Signals

Abstract

1. Introduction

2. Related Work

3. Method

3.1. 1DCNN

3.2. Attention Mechanism

4. Experimental Results

4.1. Preparation

4.1.1. Dataset

4.1.2. Data Augmentation Techniques

4.1.3. Model Configuration

4.1.4. Evaluation Metrics

4.2. Experimental Analysis

4.2.1. Data Augmentation

4.2.2. Ten-Fold Cross-Validation Result

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI