Revolutionizing the Detection of Lightning-Generated Whistlers: A Rapid Recognition Model with Parallel Bidirectional SRU Network

Wang, Bolin; Yuan, Jing; Yang, Dehe; Zhang, Zhihong; Yin, Hanke; Wang, Qiao; Wang, Jie; Zhima, Zeren; Shen, Xuhui

doi:10.3390/rs17121963

Open AccessArticle

Revolutionizing the Detection of Lightning-Generated Whistlers: A Rapid Recognition Model with Parallel Bidirectional SRU Network

by

Bolin Wang

¹,

Jing Yuan

^1,2,*,

Dehe Yang

³,

Zhihong Zhang

⁴,

Hanke Yin

¹,

Qiao Wang

⁵

,

Jie Wang

³,

Zeren Zhima

³ and

Xuhui Shen

⁶

¹

School of Computer Science and Engineering, Institute of Disaster Prevention, Langfang 065201, China

²

Hebei Province University Smart Emergency Application Technology Research and Development Center, Langfang 065421, China

³

National Institute of Natural Hazards, Ministry of Emergency Management of China, Beijing 100085, China

⁴

LiaoningEarthquakeAgency, Shenyang 110034, China

⁵

Institute of Geophysics, China Earthquake Administration, Beijing 100081, China

⁶

National Space Science Center, CAS, Beijing 100085, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2025, 17(12), 1963; https://doi.org/10.3390/rs17121963

Submission received: 30 April 2025 / Revised: 29 May 2025 / Accepted: 4 June 2025 / Published: 6 June 2025

(This article belongs to the Special Issue Satellite Observations and Numerical Studies for Ionosphere, Plasma Dynamics, and Space Weather Prediction)

Download

Browse Figures

Versions Notes

Abstract

Lightning-generated whistlers (LW) play a crucial role in understanding magnetosphere–ionosphere coupling mechanisms and, perhaps, identifying precursor signals of natural disasters, such as volcanic eruptions and earthquakes. Traditional frequency–time image recognition techniques require approximately 40 years to analyze seven years of observational data from the China Seismo-Electromagnetic Satellite (CSES), which fails to meet the requirements for practical implementation. To address this issue, a novel and highly efficient model for LW recognition is proposed, integrating speech processing technology with a parallel bidirectional Simple Recurrent Unit (SRU) neural network. The proposed model significantly outperforms traditional methods in computational efficiency, reducing the parameter count by 99% to 0.1 M and enhancing processing speed by 99%, achieving 20 ms per sample. Despite these improvements, the model maintains excellent performance metrics, including 93% precision, 88.7% recall, and 90.7% F1-score, which is a measure of predictive performance. As a result, the model can process seven years of data in just 33 days, marking a 442-fold increase in processing speed compared to conventional approaches.

Keywords:

China seismo-electromagnetic satellite; lightning-generated whistlers; low latency detection; SRU

1. Introduction

Lightning is a prevalent electrical phenomenon in Earth’s atmosphere, occurring at a global average rate of 44 occurrences per second [1]. The broadband electromagnetic waves propagate between the hemispheres along geomagnetic field lines in a right-hand polarized mode. Because high-frequency components travel faster than low-frequency ones, the resulting electromagnetic field data exhibits dispersion, which is manifested as a descending tone in the frequency–time analysis (see Figure 1) [2,3,4,5]. Extensive empirical and theoretical investigations have consistently confirmed the remarkable sensitivity of the lightning whistler (LW) to variations in the space environment [6,7,8,9]. This distinctive attribute positions LW as a powerful diagnostic tool for characterizing the physical properties of the Earth’s ionosphere and magnetosphere [10,11,12,13,14,15,16]. Moreover, by serving as a natural coupling interface between lithospheric dynamics and the space electromagnetic environment, LW are emerging as a promising candidate for natural hazard early warning systems. Their applications span from earthquake precursor identification [17,18,19,20] to volcanic activity tracking [21,22], offering novel perspectives for advancing disaster prediction methodologies.

However, with the increasing prevalence of high-precision electromagnetic environmental monitoring networks, the volume of data has exhibited exponential growth (exceeding 20 GB per day), and a substantial number of LW signals are submerged within it. Traditional manual methods for detecting LW have emerged as the primary bottleneck impeding progress in this field due to their inefficiency and limited scalability [23,24]. In this scenario, the continuous and iterative improvement in detection technologies has become an inevitable necessity and leads to the development of three generations of technological systems thus far.

The first generation of the detection method used a template-matching recognition approach of falling-frequency tones for automated detection [23]. However, this method still required human intervention to eliminate interfering signals, resulting in high false alarm rates (FARs) and missed detection rates (MDRs).

The second-generation detection methods employ artificial feature engineering to improve detection speed by extracting features such as band–energy ratios and frequency–time slopes [11,25,26]. However, in complex electromagnetic environments, the accuracy is limited by the inadequate robustness of these features.

The third-generation detection method leverages deep learning for recognition, utilizing a convolutional neural network (CNN) framework [27,28,29,30], with the YOLO network [31] being particularly notable. The YOLO series of single-stage detectors excels in extracting features from LW frequency–time images through end-to-end learning, enhancing the model’s robustness in complex electromagnetic environments. However, its inference speed is insufficient for processing large-scale observational data, which hinders its practical engineering applications. To address this limitation, Yuan et al. proposed the method that utilizes speech recognition technology for LW detection [32,33,34,35], resulting in a 99% improvement in inference speed. Despite this, the approach’s primary drawback is its inability to differentiate multiple LW events within the same fragment, leading to a high rate of missed detections. In summary, while the LW detection algorithm based on frequency–time images remains the dominant technology, as it advances toward engineering applications, it faces two key challenges:

(1) Computational Efficiency Issue: Transforming a single waveform sample (0.16 s of data) into a frequency–time image requires approximately 1.8 s, consuming 90% of the total processing time. This inefficiency drastically limits the inference speed, making it infeasible to process CSES’s seven-year in-orbit dataset within a practical timeframe—an estimated 40 years—thereby failing to meet the timeliness requirements of space science missions.

(2) Hardware Adaptation Challenge: Generating frequency–time images dramatically increases storage requirements. While the original waveform for a single sample occupies only 0.32 KB, its corresponding frequency–time representation demands 165 KB—an increase of 516 times. This substantial discrepancy conflicts with the constrained memory bandwidth and real-time processing demands of spaceborne platforms, posing significant challenges to deploying current frameworks on in-orbit edge computing units.

To overcome the above problems, this study draws inspiration from the human cochlea’s biological optimization mechanisms for joint frequency–time resolution [36]. By simulating the auditory system’s ability to distinguish LW events, we replicated the human ear’s nonlinear frequency perception characteristics in detecting LW signals [35,36,37,38]. Through critical band division and perceptual weighting, the spectrum analysis technology based on the Mel scale has achieved efficient frequency–time feature representation [39]. Experiments show that this method retains the core frequency–time features of the LW (see Discussion Section (a)Bio-inspired Auditory Feature Encoder). Based on this, we propose A Parallel Bidirectional Whistler Detection Network Integrating Speech Technology (PBWDNIST) Whistler Network. This is the first LW detection network that integrates speech-processing technology with a lightweight architecture. The technical breakthroughs are as follows:

(1) Cross-modal signal transformation: We design an electromagnetic–acoustic conversion protocol to resample electromagnetic waveforms in the ELF/VLF band (0.3–30 kHz) into 16 kHz acoustic signals, followed by the extraction of Mel Frequency Cepstral Coefficients (MFCCs) [39]—this coefficient extracts more compact and irrelevant features through the discrete cosine and compression of the Mel spectrogram, which can be used for a further audio analysis. This preprocessing workflow reduces the time consumption by 99%, from 1.8 s to just 3 milliseconds, compared to the traditional generation of frequency–time images.

(2) Parallel extraction of audio sequence feature: We propose the lightweight Parallel Bidirectional Simple Recurrent Unit (BiSRU) architecture for fast extracting sequential features from MFCCs, which decouples gated computations and cyclic dependencies to enhance parallel computing. This design overcomes the limitation of traditional Recurrent Neural Networks (RNNs) (e.g., Long Short-Term Memory, LSTM and Gated Recurrent Unit, GRU), which cannot fully parallelize due to time-step dependencies, thereby improving both training and inference speed.

2. Materials and Methods

CSES is China’s first electromagnetic monitoring satellite. It operates in a near-polar sun-synchronous orbit (inclination 97.4°, altitude 507 km) with an orbital period of approximately 94 min and revisit period of 5 days, enabling quasi-global coverage of electromagnetic field data between 65°N and 65°S latitude [40]. The satellite was equipped with a variety of scientific payloads, including the Search Coil Magnetometer (SCM), Electric Field Detector (EFD), and High-Precision Magnetometer (HPM). Data acquisition occurs through segmented observation modes during both ascending (nighttime) and descending (daytime) orbits, lasting approximately 34 min per half-orbit. Observations are conducted in patrol mode (global scale, spatial resolution of 2000 km) and detailed survey mode (focused on China and surrounding regions, as well as the Pacific/Eurasian seismic belt, with a spatial resolution of 500 km). These observations encompass electromagnetic field waveform and power spectrum data in the Extremely Low Frequency (ELF, 3–3000 Hz) and Very Low Frequency (VLF, 3–30 kHz) bands [41]. SCM is grounded in Faraday’s law of electromagnetic induction and has been specifically developed to capture transient electromagnetic phenomena, such as LW. The sensitivity of the SCM in the 1–20 kHz frequency band achieves

0.1 p T / \sqrt{H z} @ 1 k H z

[42], thereby providing a high-resolution observational foundation for investigating the coupling mechanism between the ionosphere and magnetosphere.

By 2025, CSES had been in orbit for seven years, collecting over 40 TB of ELF band data from the SCM payload, or over 20 GB daily. Traditional detection methods based on frequency–time images are inefficient: preprocessing a single 0.16 s waveform takes 1.8 s, and processing the full dataset with single-thread computation would take over 40 years. For real-time analysis, spaceborne platforms urgently need a lightweight, high-precision, low-latency detection algorithm for on-orbit intelligent detection and disaster warning of electromagnetic anomalies.

This study uses ELF band data obtained from the SCM payload of CSES. To address the stable transient characteristics of LW signals and the computational constraints of spaceborne platforms, the PBWDNIST Whistler Network framework is proposed (Figure 2).

2.1. Overall Framework Structure

PBWDNIST uses a cascading architecture for sequential data processing. The raw data is first processed using a Bio-inspired Auditory Feature Encoder (Figure 3). This involves splitting ELF waveforms with a 4 s sliding window, then converting them into audio data, applying a first-order high-pass filter to compensate for high-frequency attenuation, reducing spectral leakage via Hamming window-based frame splitting, and computing the power spectrum using a 512-point FFT to generate a 40-channel Mel filter bank matrix.

Subsequently, a Sandglass-Shaped DSConv Network (Figure 4) maps the single-channel Mel spectrum to a 128-dimensional feature space (Equation (8)), with three composite modules (Equations (9)–(11)) extracting local features across frequency bands and fusing them via residual connections.

Next, the Parallel Bidirectional Gated Temporal Unit (Figure 5) transforms gated state updates into matrix-based parallel operations by separating gated signal calculations and eliminating cyclic dependencies (Equations (11)–(15)). This reduces time-series modeling complexity from O(n) to O(1). The forget gate and reset gate (Equations (16) and (17)) generate gate signals based solely on current inputs, while element-level operations across time steps (Equation (18)) establish global context associations.

Finally, frequency–time characteristics are decoded via a double-layer fully connected network in the Multivariate Spatiotemporal Detection Head (Figure 6, Equations (20) and (21)). A posterior logical constraint (Equations (22)–(24)) ensures that both the start and continue state of LW comply with physical propagation laws, maintaining sequential coherence of the LW frequency scattering mode through continuous constraints.

The model reduces parameters to 0.08 M (1% of conventional models) and accelerates single-sample inference to under 20 ms (a 99% improvement over traditional methods). Experimental results show an F1-score of 90.7% in LW detection, which greatly improves the detection efficiency while maintaining high accuracy. This provides a novel approach for on-orbit intelligent processing of large-scale space electromagnetic data. The implementation details of each module are detailed in Section 2.2.

2.2. Detailed Implementation Process

2.2.1. Bio-Inspired Auditory Feature Encoder

This module implements the dispersion mode modeling of LW by employing bionic acoustic feature mapping. Its processing workflow, as illustrated in Figure 3, consists of four distinct stages:

(1) Cut the data and convert it into audio: A 4 s sliding window technique is employed to extract segments

q_{a}

from the original ELF waveform x(n), thereby generating audio fragments

q_{n}

that conform to the input requirements of the LW speech detection model. The conversion process from waveform data

q_{a}

to audio data

q_{n}

can be represented by the following Equation (1).

q_{n} = q_{a} (n t_{s}), n = 0,1, 2, . . . . . .

(1)

s_{i} (n) = r o u n d (\frac{q_{n} - q_{m i n}}{q_{m a x} - q_{m i n}} \times (2^{16} - 1))

(2)

In Equations (1) and (2),

q_{n}

is the value of the n-th sampling point,

t_{s} = \frac{1}{f_{s}}

is the LW data sampling period, with

f_{s}

= 10,240 Hz in this paper, and

s_{i} (n)

indicating that the quantized LW audio signal is used for subsequent further feature extraction, and

q_{m i n}

and

q_{m a x}

are, respectively, the minimum and maximum values of the input LW audio signal amplitude.

(2) Pre-emphasis filtering: The high-frequency part of LW is weighted to increase the resolution of the high-frequency part (Equation (3)).

{s_{i}}^{'} (n) = s_{i} (n) - 0.97 s_{i - 1} (n)

(3)

Here,

s_{i} (n)

is LW audio signal, and

{s_{i}}^{'} (n)

is the pre-emphasized signal.

(3) Short-time frames and windowing: The 4 s audio segment is divided into 160 frames of 15 ms each. Hamming windows are then applied to the signal to avoid the effect on the edge of the LW audio signal segment (Gibbs effect) [43,44]; see Equations (4) and (5) for definition of windowing.

{s_{i}}^{″} (n) = {s_{i}}^{'} (n) \times ω (n)

(4)

ω (n) = \{\begin{matrix} (1 - α) - α (\frac{2 π n}{N - 1}), 0 \leq n \leq N - 1 \\ 0, others \end{matrix}

(5)

The window function is denoted by the

ω (n)

and

{s_{i}}^{″} (n)

represents the signal after applying the Hamming window. Additionally, α is, by default, 0.46.

(4) Mel spectrum coding: For each frame, compute the power spectrum

P (f)

using FFT (Equation (6)). Generate the frequency–time matrix

F = W_{m e l} \cdot P (f) \in R^{40 \times 160}

by compressing the frequency domain with a 40-channel Mel filter bank; the logarithmic energy coefficient

X = l o g | F |

effectively describes the LW dispersion mode [37], so it was used for subsequent calculation:

P (f) = \sum_{n = 0}^{N - 1} s_{i} ’ ’ (n) e^{\frac{- j 2 π k}{N}}, 0 \leq k \leq N

(6)

W_{m e l} (k) = \{\begin{matrix} 0, k < f (m - 1) \\ \frac{2 (f (m + 1) - k)}{f (m) - f (m - 1)}, f (m - 1) \leq k \leq f (m) \\ \frac{2 (f (m + 1) - k)}{f (m + 1) - f (m)}, f (m) \leq k \leq f (m + 1) \\ 0, k \geq f (m + 1) \end{matrix}

(7)

In Equations (6) and (7),

P (f)

is the FFT-transformed signal; N is the number of FFT points (N = 512 in this study); m is the m-th filter (m = 40 in this study); k is the independent variable; f(m) is the center frequency of the m-th filter; and

W_{m e l}

is the energy spectrum weight.

This method uses Mel scale nonlinear mapping to mimic the human ear’s auditory response [37]. Compared with traditional frequency–time image detection methods, it reduces computational burden and enhances the physical interpretation of LW dispersion modes.

2.2.2. Sandglass-Shaped DSConv Network

In this study, a Sandglass-shaped DSConv Network was designed with the help of Sandglass architecture [45], which is specially used to extract the frequency distribution information of LW audio features. In order to achieve good feature extraction and reduce the number of model parameters, a lightweight convolution strategy was adopted to extract LW audio signals. This strategy can not only capture the frequency distribution and local characteristics of LW effectively, but also minimize the resource consumption of inference process. The overall architecture of the module is shown in Figure 4a, which consists of 1 × 1 convolutional and composite modules. The composite module consists of Sandglass and MaxPooling.

(1) 1 × 1 Convolution Module: Use 1 × 1 convolution to map the single-channel Mel spectrum

X \in R^{40 \times 160 \times 1}

to a 128-dimensional feature space. The computation process is detailed below in Equation (8).

F_{c}^{(i, j, k)} = W^{k} \cdot X_{i j} + b^{k}, (i \in [1, 40], j \in [1, 160], k \in [1, 128])

(8)

In Equation (8),

X_{i j}

represents the sum of the product of the audio feature

X = l o g | F |

at position (i, j)

W^{k} \in R^{128 \times 1}

is the 1 × 1 convolution kernel of the k-th channel,

b^{k}

is the bias of the k-th channel, and

F_{c} \in R^{40 \times 160 \times 128}

is the audio feature of 128 channels generated by linear expansion. In this process, the audio feature

X

is re-processed by linear combination to complete feature expansion and re-expression, thus obtaining more abundant audio features. In addition, 1 × 1 convolution can perform linear combination of channel dimensions without changing the spatial resolution of the input feature map, so as to realize the increase and decrease dimension of features, and effectively achieve the lightweight operation of the model.

(2) Composite Module: As shown by Composite Module in Figure 4a, the three Composite Module achieves feature optimization through cascading operations to provide LW high-resolution feature expression. The input is the Mel spectrum feature

F_{c} \in R^{40 \times 160 \times 128}

, while the output is the spatio-temporal enhancement feature

G \in R^{H \times W \times C}

.

G

is an output feature processed by a composite module; below is the computation process.

G_{1} = Φ_{1, p} (Φ_{1, d} (F_{c}))

(9)

G_{2} = Φ_{2, d} ({Φ_{2, p} (G}_{1})) + F_{c}

(10)

G = Φ_{m} (G_{2})

(11)

In Equations (9)–(11),

Φ_{1, p}, Φ_{2, p}

represents 1 × 1 point-by-point convolution 1 × 1, and they are used to reduce the dimension of audio features, aggregate important features together, and filter out minor features.

Φ_{1, d}

,

Φ_{2, d}

represents 3 × 3 depth convolution, and they are used to raise the dimension of audio features, and improve the expression ability of audio features by increasing the dimension of features.

F_{c}

is the audio feature of the input, and

Φ_{m}

is 1 × 5 Maxpooling. It is used to reduce the spatial dimension of the feature graph and reduce the computational complexity, so as to extract more abundant frequency distribution information of LW audio features.

In Equation (9), the audio feature

F_{c}

is first dimensionally expanded using

Φ_{1, d}

operation to enhance the expressive capability of audio features, followed by dimensional reduction via

Φ_{1, p}

operation to aggregate important information, thereby generating the new audio feature

G_{1}

.

In Equation (10), the audio feature

G_{1}

is further optimized by operating

Φ_{2, p}

and

Φ_{2, d}

, and fused with the original audio feature

F_{c}

by residual fusion. In this process, high-frequency features and local features are fused. At this point, one Sandglass operation is completed and feature

G_{2}

is output.

In Equation (11), feature

G_{2}

is processed by

Φ_{m}

for downsampling, while retaining important feature information as the final output. At this point, the processing within the composite module is complete, and the output feature is

G

.

Feature

F_{c}

is fed into a network structure composed of three composite modules (as shown in Figure 4a, the composite module). After generating output

G_{0} \in R^{2 \times 160 \times 128}

, feature fusion of time dimension is conducted (as shown in Figure 4a, the Concat), resulting in the final output of feature

G_{c o n c a t} \in R^{256 \times 160}

, which was used for subsequent calculation.

The key innovations of the Sandglass-shaped DSConv Network are as follows:

(1) High-dimensional feature retention: Residual connections fuse input and output features in a 128-channel high-dimensional space (Figure 4b), preventing information loss often seen in traditional DSConv’s [46] low-dimensional bottleneck layer.

(2) Temporal feature focusing: Only features are superimposed in time dimension, and one-dimensional fusion is implemented based on audio sequence features, thus preserving the most critical time-series features. Compared with convolution operations in frequency dimension and time dimension, this single-dimensional fusion strategy can effectively reduce the convolution operation and achieve model lightweight.

(3) Sandglass-Shaped DSconv: Sandglass-Shaped Dsconv structure (as shown in Figure 4b use 3 × 3, 1 × 1, 1 × 1, and 3 × 3 Conv in sequence) achieves efficient calculation through the hierarchical feature interaction mechanism of “dimension increase—dimension decrease—dimension increase again”. Specifically, in the first stage, 3 × 3 Conv is adopted to expand the number of channels (increase dimension), enabling the model to capture richer feature representations in the high-dimensional space; subsequently, the number of channels is gradually compressed (decrease dimension) through a two-level linked 1 × 1 Conv, the key information is retained, and the secondary information is filtered out, thereby saving storage and operation costs; at the end, 3 × 3 Conv is used again to reconstruct the feature dimension. To prevent the loss of key information, residual joins are introduced to fuse the original features with the processed features. This can significantly reduce the amount of computation and prevent network degradation at the same time.

2.2.3. Parallel Bidirectional Gated Temporal Unit

We proposed a novel time-series feature-modeling method based on parallelized bidirectional SRU to address the low efficiency of traditional RNNs in time-series feature modeling. Compared with conventional RNN architectures, this approach uses matrix decomposition and parallel calculations (Equation (19)) to decouple hidden state dependencies, thus achieving a time step independent of previous ones. This resolves the computational bottleneck of serial RNNs (Equation (12)), reducing complexity from O(n) to O(1), while retaining the gating capabilities of LSTM and GRU for capturing LW audio sequence characteristics [47]. Figure 5 presents the complete architecture of the parallelized bidirectional gated temporal unit, with Figure 5b providing a detailed explanation of the core SRU module’s structural design. By decoupling gated signal calculations and cyclic dependencies, this design achieves a remarkable improvement in timing modeling efficiency. Notably, as depicted in Figure 5c, the network employs a BiSRU recurrent neuron structure with SRU as the fundamental unit, further enhancing the sequence feature modeling capability of LW events through a bidirectional feature extraction mechanism. This design not only overcomes the serial computation bottleneck of traditional RNNs but also preserves the model’s sensitivity to the dynamic characteristics of time series. The implementation is described in detail below.

The traditional RNN architecture faces a key challenge: serialization computation dependency. As we can observe from Equation (12), each time step (

x_{t}

) must be computed in a strict sequence (

x_{1} \to x_{2} \to \dots \to x_{t}

). This is because the operation at the current step relies entirely on the hidden state

h_{t - 1}

from the previous step. Here, U represents the cyclic weight matrix.

h_{t} = f (W x_{t} + U h_{t - 1} + b)

(12)

The method used in this study decouples the calculation process and cyclic dependence. Each time step’s input

x_{t}

(

x_{t}

is the

G_{c o n c a t}

feature of frame t) is processed in parallel, generating intermediate feature

e_{t}

(Equation (13)) and eliminating dependence between time steps.

e_{t} = g (W x_{t})

(13)

The

c_{t}

in Figure 5b is calculated through a lightweight cyclic mechanism that incorporates historical information solely through element-level operations, as shown in Equation (14).

⊙

is the sign of element-by-element multiplication.

c_{t} = f_{t} ⊙ c_{t - 1} + (1 - f_{t}) ⊙ x_{t}

(14)

The gated status update mechanism of the SRU is shown in Equations (15)–(19):

{\tilde{x}}_{t} = W x_{t}

(15)

f_{t} = σ (W_{f} x_{t} + b_{f})

(16)

r_{t} = σ (W_{r} x_{t} + b_{r})

(17)

c_{t} = f_{t} ⊙ c_{t - 1} + (1 - f_{t}) ⊙ {\tilde{x}}_{t}

(18)

\vec{h_{t}}, \overset{\leftarrow}{h_{t}} = r_{t} ⊙ {R e L U (c}_{t}) + (1 - r_{t}) ⊙ x_{t}

(19)

In this context, f stands for the forgetting gate, r represents the reset gate, c denotes the intermediate state, and

x

refers to the input. As we can observe from the formula above, all gated signals (

f_{t} {, r}_{t}

) depend solely on the current input

x_{t}

, which removes the reliance on

h_{t - 1}

. Only the memory unit

c_{t}

requires sequential calculation; however, since it only involves element-level operations (

⊙

), the computational cost remains quite minimal.

The processing flow of this method involves four key steps: First, the forgotten gate

f_{t}

and the reset gate

r_{t}

for each time step t (Equation (16) and 17) are calculated independently. This allows us to achieve parallel generation of the gate signals while eliminating the serial dependence on the history state. Second, we use element-level operations (Equation (18)) to establish global context associations across time steps. Next, the memory unit

c_{t}

and input projection feature

{\tilde{x}}_{t} = W x_{t}

are fused via residual connections to generate hidden states

\vec{h_{t}}

,

\overset{\leftarrow}{h_{t}}

(Equation (19)). Then, the forward and backward hidden layer state

\vec{h_{t}}

and

\overset{\leftarrow}{h_{t}}

are concatenated together to form the hidden layer state

h_{t}

of BiSRU. Finally, the hidden states from all time steps come together to create an enhanced set of LW audio feature FG = {

h_{1}, h_{2} . . . h_{t}

}

\in R^{32 \times 160}

.

2.2.4. Multivariate Spatiotemporal Detection Head

The Multivariate Spatiotemporal Detection Head consists of a two-layer fully connected network and a posterior logic constraint module (as shown in Figure 6). This setup is designed to decode the start and continue state of LW from the audio feature FG. The module accepts the audio feature FG = {

h_{1}, h_{2} . . . h_{t}

}

\in R^{32 \times 160}

as input, and its output includes two lines of the state matrix

P \in R^{2 \times 160}

. Specifically, the first row is the start state of LW and the second row is the continue state of LW.

In the concrete implementation, the feature

h_{t}

for each time frame is sequentially processed by the intermediate layer and the output layer. The intermediate layer extracts the nonlinear feature

h_{t}^{(1)}

and the output layer generates the prediction result

p_{t} = [p_{t}^{s t a r t}, p_{t}^{o c c u r}]

. Here,

p_{t}^{s t a r t}

denotes the start state of LW at time t, and

p_{t}^{o c c u r}

denotes its continue state. For details, see Equations (20) and (21).

h_{t}^{(1)} = R e L U (W_{1} h_{t} + b_{1})

(20)

p_{t} = σ (W_{2} h_{t}^{(1)} + b_{2})

(21)

In this context,

W_{1}

and

W_{2}

represent the weight matrices of the middle layer and the output layer, respectively, while

b_{1}

and

b_{2}

are the corresponding bias vectors. The symbol

σ

denotes the Sigmoid activation function, and

h_{t}^{(1)}

refers to the output result of feature

h_{t} \in R^{D}

for each time frame t after it passes through the middle layer.

To better align the detection results with the physical characteristics of LW and improve accuracy, we have designed a posteriori logic constraint that incorporates physical rules. This approach primarily consists of two key constraints: (1) Response value constraints: An start event should always be accompanied by a continue event. In other words, when a time frame is identified as the start of an LW event (

p_{t}^{s t a r t} = 1

), its corresponding continue state must also be true simultaneously (

p_{t}^{o c c u r} = 1

); (2) Uniqueness constraint of the start point: When no LW events occurred at the previous time (

p_{t - 1}^{o c c u r} = 0

), and LW happens at the current time (

p_{t}^{o c c u r} = 1

), this is marked as the start. The specific judging conditions are outlined below (22)–(24):

p_{t}^{x} = \{\begin{matrix} 1, p_{t}^{x} > θ \\ 0, p_{t}^{x} < θ \end{matrix}

(22)

p_{t}^{o c c u r} = \{\begin{matrix} 1, i f p_{t}^{s t a r t} = 1 \\ p_{t}^{o c c u r}, o t h e r w i s e \end{matrix}

(23)

p_{t}^{s t a r t} = \{\begin{matrix} 1, if p_{t - 1}^{o c c u r} = 0 and p_{t}^{o c c u r} = 1 \\ p_{t}^{s t a r t}, otherwise \end{matrix}

(24)

In Equations (22)–(24),

p_{t}^{x}

represents

p_{t}^{s t a r t}

or

p_{t}^{o c c u r}

; in this study,

θ

is taken as 0.5. Ultimately, the output follows the physical rules represented by the two-valued matrix

P \in {0,1}^{2 \times T}

.

2.2.5. Loss Function

During the training of the PBWDNIST model, we quantitatively evaluated its performance by calculating the deviation between real and predicted values. This guided the model to learn LW regional features, and improved its generalization ability. As determining the start and continue state of LW can be abstracted as a binary classification problem, Binary Cross Entropy (BCE Loss) was used as the loss function; see Equations (25)–(27).

Equation (25) is the loss function for the initial state of LW; Equation (26) is the loss function for the continue state of LW. Equation (27) is the overall loss function, summing the above two losses.

L_{s t a r t} = - \frac{1}{T} \sum_{t = 1}^{T} [P_{t}^{s t a r t} \cdot l o g (p_{t}^{s t a r t}) + (1 - P_{t}^{s t a r t}) \cdot l o g (1 - p_{t}^{s t a r t})]

(25)

L_{o c c u r} = - \frac{1}{T} \sum_{t = 1}^{T} [P_{t}^{o c c u r} \cdot l o g (p_{t}^{o c c u r}) + (1 - P_{t}^{o c c u r}) \cdot l o g (1 - p_{t}^{o c c u r})]

(26)

L = L_{s t a r t} + L_{o c c u r}

(27)

where

P_{t}^{s t a r t}

and

P_{t}^{o c c u r}

represent the true labels (0 or 1) for the start and continue states of LW at the t time frame,

p_{t}^{s t a r t}

and

p_{t}^{o c c u r}

denote the model prediction probabilities for these states, and T represents the total number of time frames.

2.3. Data Partitioning and Feature Engineering

Data source: This study uses ELF band data obtained from the SCM payload of CSES, covering the period from 1 April to 10 April 2020. The original data is a continuous waveform record. Since the CSES revisit period is 5 days and the original data is recorded in continuous waveform, the dataset can fully cover LW events in global spatial electromagnetic fields.

Data set partitioning: To ensure temporal independence, we carefully divided the data into training and test sets in chronological order. See Table 1.

Feature extraction method: Building upon the Bio-inspired Auditory Feature Encoder outlined in Section 2.2.1, we perform the following feature engineering on the original waveform data. First, the continuous waveform is divided into 4-second-long segments. Then, an FFT operation is applied to each segment, followed by extracting frequency-domain features using a 40-channel Mel filter bank. Lastly, the output energy values from the filter bank are logarithmically transformed to produce the frequency–time feature map F. The time resolution of this feature map is determined by the slice length, while the frequency resolution is defined by the number of channels in the Mel filter bank.

2.4. Experimental Configuration and Model Setting

The experimental environment is presented in Table 2.

The hyperparameter settings for the model are presented in Table 3.

3. Results

3.1. Evaluation Indicators

To achieve the collaborative optimization of model lightweight and detection accuracy, this study designs an evaluation system considering computational efficiency and detection performance. Specifically, we focus on the following: (1) model accuracy metrics (Precision, Recall, and F-score); and (2) model inference speed and compression indicators (the number of model parameters and time cost). Below is a detailed explanation of these definitions.

(1) Precision:

P = \frac{T P}{T P + F P}

(28)

TP indicates the number of LW events that were successfully and correctly detected. FP refers to the number of non-target events mistakenly identified as LW. Precision helps evaluate the model’s anti-interference capability. A higher value suggests not only a greater accuracy in identification but also a lower false positive rate, meaning that a vast majority of the predicted LW events truly occur (if P = 0.95, 95% of the predicted LW events are true).

(2) Recall:

R = \frac{T P}{T P + F N}

(29)

where FN represents the number of true LW events that were not detected. Recall shows how well the model can identify weak signals. A higher value indicates a better detection recall rate and fewer missed events (for instance, R = 0.90 means that 90% of true LW events have been successfully detected).

(3)

F_{1}

-score:

F_{1} = \frac{2 \times P \times R}{P + R}

(30)

This is the harmonic average of the accuracy and recall rate, which takes into account the accuracy and recall rate of the model at the same time. The larger the

F_{1}

-score, the stronger the overall ability of the model used to evaluate the comprehensive ability of the model in the scenario of category imbalance. It is chosen as an evaluation indicator because the

F_{1} - s c o r e

is more sensitive than the arithmetic mean when the P and R differ significantly, and can provide a single quantitative indicator for the accuracy–recall rate.

(4) The number of model parameters: This refers to the requirements for the model storage space.

(5) Time Cost: The whole process of one-month-data processing takes time, including the data preprocessing and detection stage.

3.2. Analysis of Experimental Results

Model Performance Analysis: PBWDNIST achieves a co-optimization of precision, speed, and lightweight design in the LW detection task. As shown in Table 4, the model’s precision reaches 93.0%, which is comparable to the detection level of YOLOv8l (94.0%) and significantly higher than that of the Mask R-CNN (85.1%) by 7.9 percentage points. Furthermore, under electromagnetic interference conditions, the false alarm rate is effectively suppressed. Although the recall rate (88.7%) is 6.5 percentage points lower than that of the Mask R-CNN (95.2%), the posteriori logic constraint module (Equations (22)–(24)) ensures effective control over the risk of missing critical events, thereby avoiding misjudgments due to oversensitivity while maintaining reliable detection performance. In terms of overall performance metrics, the F₁-score of the model is 90.7%, surpassing most comparison models. The empirical analysis shows the effectiveness of the fusion strategy of a lightweight network architecture and posteriori logic constraint.

As shown in Figure 7, PBWDNIST demonstrated breakthrough computing efficiency: data processing took only 10.8 h per month (Table 4), which improved reasoning efficiency by 99% compared to other models. Its millisecond real-time reasoning capability can be perfectly adapted to the online processing requirements of satellite loads, and the number of parameters and calculation time are greatly reduced while maintaining a high accuracy. This “high-precision, low-latency, micro-storage” characteristic makes the model break through the engineering bottleneck of the traditional method, and provides the key technical support for the construction of a new generation of intelligent space electromagnetic event monitoring systems. In Figure 8, we provide an example of the detection results of different models in Table 4. The frequency range of all frequency–time images is 5120 HZ and the time range is 4 s (160 time frames).

4. Discussion

Verification of spatiotemporal positioning ability: As shown in Figure 9, the model demonstrates strong spatiotemporal positioning capabilities. In the single-event scenario (Figure 9a), it accurately identifies the start state and continue state of the LW event. The frequency–time characteristics match the detection results closely. For sparse multiple events (Figure 9b), the model successfully separates independent trajectories using posterior logic constraints (Equations (22)–(24)). In dense scenarios (Figure 9c,d), the model can still effectively detect the number of LW. However, since the next LW has occurred before the previous one has ended, the model cannot separate multiple dense LW events very well. It should be noted that this overlapping problem is common among all current detection methods and is not unique to this model. This reflects the inherent limitations of LW in terms of physical characteristics and detection principles.

To verify the effectiveness of feature extraction at each stage, Figure 10 illustrates the feature evolution process in the PBWDNIST model. After Bio-inspired Auditory Feature Encoder (Figure 10a), the frequency–time distribution of LW events becomes clear. In the Sandglass-Shaped DSConv Network (Figure 10b), the 1 × 1 convolutional layer preserves global features, while three composite modules enhance local features, achieving global–local feature fusion. In the Parallel Bidirectional Gated Temporal Unit (Figure 10c), LW event details are enhanced and non-target signals are suppressed, demonstrating strong local frequency–time feature extraction. Finally, the Multivariate Spatiotemporal Detection Head refines and integrates LW features, eliminates interference, and outputs the LW detection result (Figure 10d). The functions of each module in feature extraction are as follows:

(a) Bio-inspired Auditory Feature Encoder: After the 4 s ELF time-domain waveform was encoded using the Mel spectrum (Figure 10a), a 40 × 160 frequency–time matrix was generated. The results reveal that the LW event exhibits a characteristic dispersion trajectory (highlighted in the red box). Its energy distribution aligns well with the critical band theory proposed by Zwicker in 1961 [36], which confirms the effectiveness of the bionic modeling through Mel filter banks. Additionally, the energy of background noise is notably reduced, demonstrating that the combination of pre-emphasis filtering and the Hamming window effectively enhances the signal-to-noise ratio (SNR).

(b) Sandglass-Shaped DSConv Network: As illustrated in Figure 10b, the network performs a hierarchical extraction of LW frequency–time features via a three-stage feature transformation process. First, a 1 × 1 convolution layer is applied for global structure modeling, and the input features are expanded using 1 × 1 convolution (1→128) to capture the global continuity of the LW dispersion trajectory. Next, three composite module enhances local features across different frequency bands. These modules, as shown in Figure 4b, focus on specific frequency ranges: the first module extracts high-frequency fast-attenuation components, the second captures intermediate-frequency harmonic resonances, and the third models low-frequency slow-variable components. Finally, the feature fusion part takes into account both global and local features simultaneously.

(c) Parallel Bidirectional Gated Temporal Unit: The module primarily leverages the SRU structure (as shown in Figure 5b) to address the timing dependence issue commonly found in traditional RNNs, all while preserving the frequency–time domain characteristics of audio signals. As illustrated in Figure 10c, this approach not only effectively enhances the local detail features of LW but also significantly suppresses the signals from non-target regions.

(d) Multivariate Spatiotemporal Detection Head: As illustrated in Figure 10d, this module successfully identifies both the start and continue state of the LW event with remarkable accuracy. To further validate the effectiveness of incorporating the a posteriori logical constraint (Equations (22)–(24)), Figure 11 provides an insightful visual comparison that highlights the core strengths of this module. In Figure 11a, we present a frequency–time image for comparative purposes. Without introducing physical constraints, the detection regions of adjacent LW events tend to partially overlap, resulting in a fused banded energy block in the output heat map, which cannot be effectively segmented (as shown in Figure 11b). However, after applying the constraints (Figure 11c), the model demonstrates its ability to segment adjacent LW events more effectively through the implementation of the a posteriori logical constraint modeling (Equations (22)–(24)).

In this study, by analyzing the detection results to infer the power spectral density (PSD) of the LW event region, the sensitivity of the model to different LW events and the detection blind area were evaluated. The results show that this model can effectively detect the vast majority of LW with PSD >

10^{- 6}

{n T}^{2} / H z

in the main feature areas, but there is a certain missed detection phenomenon for LW events with PSD <

10^{- 7}

{n T}^{2} / H z

in the main feature areas.

Engineering deployment of LW detection results: After completing the experiments, we proposed storing the LW event detection results directly in the original ELF data file. For ELF datasets with numerous LW events, we designed an efficient storage scheme to preserve the spatio-temporal correlations between the detection results and original signals. After the detection is completed, a data structure named Detection_Results is automatically created in the ELF file to store the latitude, longitude, time, and LW count. This structure generates audio segments for LW detection by extracting 10 lines of original data sequentially and stores results row by row, preserving the spatio-temporal correlation between the detection unit and the original data. This integrated storage mode supports subsequent research, such as electron density inversion, by providing comprehensive data in a unified format for studying spatial–physical phenomena correlations.

The test results show that, after adding this step, the time consumption for detecting one month’s data increases by approximately 5 s. Compared with the total time consumption of 10.8 h, the impact is negligible. The storage space of each source file increases by approximately 0.04 M. Compared with the original storage space of about 700 M, the growth is almost negligible.

5. Conclusions

In this study, we introduce a lightweight and low-latency automatic detection model called PBWDNIST. This model is specifically designed to address the real-time processing needs of LW on spaceborne platforms. By integrating a Bio-inspired Auditory Feature Encoder, a Sandglass-shaped DSConv Network, a Parallel Bidirectional Gated Temporal Unit, and a Multivariate Spatiotemporal Detection Head, the model achieves significant improvements in computational efficiency and engineering applicability while maintaining a high detection accuracy. Our results demonstrate the following: (1) in terms of detection performance, PBWDNIST achieves a precision of 93%, comparable to mainstream frequency–time image detection models such as YOLOv8l (94%); and, (2) in terms of computational efficiency, the number of parameters of the model is reduced to only 0.08 M (traditional model parameters are at least 11.2 M), and the monthly data processing time is only 10.8 h (traditional method requires at least 570 h), which has greatly improved the detection efficiency.

Author Contributions

B.W.: writing—review and editing, writing—original draft, visualization, validation, methodology, investigation, and formal analysis. J.Y.: writing—review and editing, supervision, resources, project administration, methodology, investigation, and funding acquisition. D.Y.: investigation, resources, and writing—review and editing. H.Y.: investigation, resources, and methodology. Z.Z. (Zhihong Zhang), Q.W., J.W., Z.Z. (Zeren Zhima) and X.S.: provided consultation on the idea and the manuscript writing of the work. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science Research Project of Hebei Education Department grant number ZC2024028, Civil Aerospace Technology Pilot Research Project grant number D040203, DRAGON-6 Project grant number 95407, and Common Application Support Platform for National Civil Space Infrastructure Land Observation Satellites grant number 2017-000052-73-01-001735.

Data Availability Statement

The raw/processed data required to reproduce the above findings cannot be shared at this time, as the data also form part of an ongoing study.

Acknowledgments

We extend our gratitude to all members of the CSES team at the National Academy of Natural Disaster Prevention and Control under the Ministry of Emergency Management for their technical support and service in providing the research data utilized in this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Christian, H.J.; Blakeslee, R.J.; Boccippio, D.J.; Boeck, W.L.; Buechler, D.E.; Driscoll, K.T.; Goodman, S.J.; Hall, J.M.; Koshak, W.J.; Mach, D.M.; et al. Global frequency and distribution of lightning as observed from space by the Optical Transient Detector. J. Geophys. Res. Atmos. 2003, 108, ACL 4-1–ACL 4-15. [Google Scholar] [CrossRef]
Barkhausen, H. Whistling Tones from the Earth. Proc. Inst. Radio Eng. 1930, 18, 1155–1159. [Google Scholar] [CrossRef]
Storey, L.R.O. An investigation of whistling atmospherics. Philosophical Transactions of the Royal Society of London. Ser. A Math. Phys. Sci. 1953, 246, 113–141. [Google Scholar]
Helliwell, R.A. Whistlers and Related Ionospheric Phenomena; Stanford University Press: Stanford, CA, USA, 1965. [Google Scholar]
Smith, R.L.; Helliwell, R.A.; Yabroff, I.W. A theory of trapping of whistlers in field-aligned columns of enhanced ionization. J. Geophys. Res. 1960, 65, 815–823. [Google Scholar] [CrossRef]
Armstrong, W.C. Lightning triggered from the Earth’s magnetosphere as the source of synchronized whistlers. Nature 1987, 327, 405–408. [Google Scholar] [CrossRef]
Hayakawa, M.; Yoshino, T.; Morgounov, V.A. On the possible influence of seismic activity on the propagation of magnetospheric whistlers at low latitudes. Phys. Earth Planet. Inter. 1993, 77, 97–108. [Google Scholar] [CrossRef]
Hayakawa, M. Association of whistlers with lightning discharges on the Earth and on Jupiter. J. Atmos. Terr. Phys. 1995, 57, 525–535. [Google Scholar] [CrossRef]
Santolík, O.; Parrot, M.; Inan, U.S.; Burešová, D.; Gurnett, D.A.; Chum, J. Propagation of unducted whistlers from their source lightning: A case study. J. Geophys. Res. Space Phys. 2009, 114, A03212. [Google Scholar] [CrossRef]
Collier, A.B.; Hughes, A.R.W.; Lichtenberger, J.; Steinbach, P. Seasonal and diurnal variation of lightning activity over southern Africa and correlation with European whistler observations. In Annales Geophysicae; Copernicus Publications: Göttingen, Germany, 2006; Volume 24, pp. 529–542. [Google Scholar]
Fiser, J.; Chum, J.; Diendorfer, G.; Parrot, M.; Santolík, O. Whistler intensities above thunderstorms. In Annales Geophysicae; Copernicus Publications: Göttingen, Germany, 2010; Volume 28, pp. 37–46. [Google Scholar]
Bayupati, I.P.A.; Kasahara, Y.; Goto, Y. Study of dispersion of lightning whistlers observed by Akebono satellite in the earth’s plasmasphere. IEICE Trans. Commun. 2012, 95, 3472–3479. [Google Scholar] [CrossRef]
Gokani, S.A.; Singh, R.; Cohen, M.B.; Kumar, S.; Venkatesham, K.; Maurya, A.K.; Selvakumaran, R.; Lichtenberger, J. Very low latitude (L = 1.08) whistlers and correlation with lightning activity. J. Geophys. Res. Space Phys. 2015, 120, 6694–6706. [Google Scholar] [CrossRef]
Záhlava, J.; Němec, F.; Santolík, O.; Kolmašová, I.; Hospodarsky, G.B.; Parrot, M.; Kurth, W.S.; Kletzing, C.A. Lightning contribution to overall whistler mode wave intensities in the plasmasphere. Geophys. Res. Lett. 2019, 46, 8607–8616. [Google Scholar] [CrossRef]
Ripoll, J.F.; Farges, T.; Malaspina, D.M.; Cunningham, G.S.; Hospodarsky, G.B.; Kletzing, C.A.; Wygant, J.R. Propagation and dispersion of lightning-generated whistlers measured from the Van Allen Probes. Front. Phys. 2021, 9, 722355. [Google Scholar] [CrossRef]
Sonwalkar, V.S.; Reddy, A. Specularly reflected whistler: A low-latitude channel to couple lightning energy to the magnetosphere. Sci. Adv. 2024, 10, eado2657. [Google Scholar] [CrossRef] [PubMed]
Fujinawa, Y.; Noda, Y. Field Observations of the Seismo-electromagnetic Effect Related to Earthquakes. In Seismoelectric Exploration: Theory, Experiments, and Applications; AGU: Washington, DC, USA, 2020; pp. 437–450. [Google Scholar]
Liu, J.Y.; Wang, K.; Chen, C.H.; Yang, W.H.; Yen, Y.H.; Chen, Y.I.; Hatorri, K.; Su, H.T.; Hsu, R.R.; Chang, C.H. A statistical study on ELF-whistlers/emissions and M≥ 5.0 earthquakes in Taiwan. J. Geophys. Res. Space Phys. 2013, 118, 3760–3768. [Google Scholar] [CrossRef]
Parrot, M. Electromagnetic noise due to earthquakes. In Handbook of Atmospheric Electrodynamics (1995); CRC Press: Boca Raton, FL, USA, 2017; pp. 95–116. [Google Scholar]
Hayakawa, M.; Schekotov, A.; Izutsu, J.; Yang, S.S.; Solovieva, M.; Hobara, Y. Multi-parameter observations of seismogenic phenomena related to the Tokyo earthquake (M = 5.9) on 7 October 2021. Geosciences 2022, 12, 265. [Google Scholar] [CrossRef]
Freund, F. Pre-earthquake signals: Underlying physical processes. J. Asian Earth Sci. 2011, 41, 383–400. [Google Scholar] [CrossRef]
Liu, S.; Han, Y.; Liu, Q.; Huang, J.; Li, Z.; Shen, X. Study on the CSES Electric Field VLF Electromagnetic Pulse Sequences Triggered by Volcanic Eruptions. Atmosphere 2025, 16, 208. [Google Scholar] [CrossRef]
Lichtenberger, J.; Ferencz, C.; Bodnár, L.; Hamar, D.; Steinbach, P. Automatic whistler detector and analyzer system: Automatic whistler detector. J. Geophys. Res. Space Phys. 2008, 113, A12201. [Google Scholar] [CrossRef]
Lichtenberger, J.; Ferencz, C.; Hamar, D.; Steinbach, P.; Rodger, C.J.; Clilverd, M.A.; Collier, A.B. Automatic Whistler Detector and Analyzer system: Implementation of the analyzer algorithm. J. Geophys. Res. Space Phys. 2010, 115, A12214. [Google Scholar] [CrossRef]
Jacobson, A.R.; Holzworth, R.H.; Pfaff, R.F.; McCarthy, M.P. Study of oblique whistlers in the low-latitude ionosphere, jointly with the C/NOFS satellite and the World-Wide Lightning Location Network. In Annales Geophysicae; Copernicus Publications: Göttingen, Germany, 2011; Volume 29, pp. 851–863. [Google Scholar]
Dharma, K.S.; Bayupati, I.P.; Buana, P.W. Automatic lightning whistler detection using connected component labeling method. J. Theor. Appl. Inf. Technol. 2014, 66, 638–645. [Google Scholar]
Konan, O.J.E.Y.; Mishra, A.K.; Lotz, S. Machine learning techniques to detect and characterise whistler radio waves. arXiv 2020, arXiv:2002.01244. [Google Scholar]
Maslej-Krešňáková, V.; Kundrát, A.; Mackovjak, Š.; Butka, P.; Jaščur, S.; Kolmašová, I.; Santolík, O. Automatic detection of atmospherics and tweek atmospherics in radio spectrograms based on a deep learning approach. Earth Space Sci. 2021, 8, e2021EA002007. [Google Scholar] [CrossRef]
Pataki, B.Á.; Lichtenberger, J.; Clilverd, M.; Máthé, G.; Steinbach, P.; Pásztor, S.; Murár-Juhász, L.; Koronczay, D.; Ferencz, O.; Csabai, I. Monitoring space weather: Using automated, accurate neural network based whistler segmentation for whistler inversion. Space Weather 2022, 20, e2021SW002981. [Google Scholar] [CrossRef]
Suarjaya, I.M.A.D.; Putri, D.P.S.; Tanaka, Y.; Purnama, F.; Bayupati, I.P.A.; Linawati; Kasahara, Y.; Matsuda, S.; Miyoshi, Y.; Shinohara, I. Deep Learning Model Size Performance Evaluation for Lightning Whistler Detection on Arase Satellite Dataset. Remote Sens. 2024, 16, 4264. [Google Scholar] [CrossRef]
Jiang, P.; Ergu, D.; Liu, F.; Cai, Y.; Ma, B. A Review of Yolo algorithm developments. Procedia Comput. Sci. 2022, 199, 1066–1073. [Google Scholar] [CrossRef]
Yuan, J.; Wang, Z.; Yang, D.; Wang, Q.; Zima, Z.; Han, Y.; Zhou, L.; Shen, X.; Guo, Q. Automatic Recognition of the Lighting Whistler waves from the Wave Data of SCM Boarded on ZH-1 satellite. In Proceedings of the EGU General Assembly 2021, Online, 19–30 April 2021. [Google Scholar] [CrossRef]
Yuan, J.; Li, C.; Wang, Q.; Han, Y.; Wang, J.; Zeren, Z.; Huang, J.; Feng, J.; Shen, X.; Wang, Y. Lightning whistler wave speech recognition based on grey wolf optimization algorithm. Atmosphere 2022, 13, 1828. [Google Scholar] [CrossRef]
Li, Y.; Yuan, J.; Cao, J.; Liu, Y.; Huang, J.; Li, B.; Wang, Q.; Zhang, Z.; Zhao, Z.; Han, Y.; et al. Spaceborne Algorithm for Recognizing Lightning Whistler Recorded by an Electric Field Detector Onboard the CSES Satellite. Atmosphere 2023, 14, 1633. [Google Scholar] [CrossRef]
Wang, Z.; Yi, J.; Yuan, J.; Hu, R.; Peng, X.; Chen, A.; Shen, X. Lightning-generated Whistlers recognition for accurate disaster monitoring in China and its surrounding areas based on a homologous dual-feature information enhancement framework. Remote Sens. Environ. 2024, 304, 114021. [Google Scholar] [CrossRef]
Zwicker, E. Subdivision of the audible frequency range into critical bands (Frequenzgruppen). J. Acoust. Soc. Am. 1961, 33, 248. [Google Scholar] [CrossRef]
Mermelstein, P. Distance measures for speech recognition, psychological and instrumental. Pattern Recognit. Artif. Intell. 1976, 116, 374–388. [Google Scholar]
Leutnant, V.; Krueger, A.; Haeb-Umbach, R. A new observation model in the logarithmic mel power spectral domain for the automatic recognition of noisy reverberant speech. In IEEE/ACM Transactions on Audio, Speech, and Language Processing; IEEE: Piscataway, NJ, USA, 2013; Volume 22, pp. 95–109. [Google Scholar]
Molau, S.; Pitz, M.; Schluter, R.; Ney, H. Computing mel-frequency cepstral coefficients on the power spectrum. In Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, Salt Lake City, UT, USA, 7–11 May 2001; Proceedings (cat. No. 01CH37221). IEEE: Piscataway, NJ, USA, 2001; Volume 1, pp. 73–76. [Google Scholar]
Zhou, B.; Yang, Y.Y.; Zhang, Y.T.; Gou, X.; Cheng, B.; Wang, J.; Li, L. Magnetic field data processing methods of the China Seismo-Electromagnetic Satellite. Earth Planet. Phys. 2018, 2, 455–461. [Google Scholar] [CrossRef]
Zhima, Z.; Hu, Y.; Piersanti, M.; Shen, X.; De Santis, A.; Yan, R.; Yang, Y.; Zhao, S.; Zhang, Z. The seismic electromagnetic emissions during the 2010 Mw 7.8 Northern Sumatra Earthquake revealed by DEMETER satellite. Front. Earth Sci. 2020, 8, 572393. [Google Scholar]
Huang, J.P.; Shen, X.H.; Zhang, X.M.; Lu, H.; Tan, Q.; Wang, Q.; Yan, R.; Chu, W.; Yang, Y.; Liu, D.; et al. Application system and data description of the China Seismo-Electromagnetic Satellite. Earth Planet. Phys. 2018, 2, 444–454. [Google Scholar] [CrossRef]
Khir, A.W.; O’brien, A.; Gibbs, J.S.R.; Parker, K. Determination of wave speed and wave separation in the arteries. J. Biomech. 2001, 34, 1145–1155. [Google Scholar] [CrossRef]
Mace, R.L.; Sydora, R.D. Parallel whistler instability in a plasma with an anisotropic bi-kappa distribution. J. Geophys. Res. Space Phys. 2010, 115, A7. [Google Scholar] [CrossRef]
Zhou, D.; Hou, Q.; Chen, Y.; Feng, J.; Yan, S. Rethinking bottleneck structure for efficient mobile network design. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part III 16. Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 680–697. [Google Scholar]
Nascimento, M.G.; Fawcett, R.; Prisacariu, V.A. Dsconv: Efficient convolution operator. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 5148–5157. [Google Scholar]
Lei, T.; Zhang, Y.; Wang, S.I.; Dai, H.; Artzi, Y. Simple recurrent units for highly parallelizable recurrence. arXiv 2017, arXiv:1709.02755. [Google Scholar]

Figure 1. Three LW waveforms recorded in 0.8 s as observed aboard the China Seismo-Electromagnetic Satellite (CSES) on 2 April 2020, and their frequency–time (f–t) spectrograms. The power spectral density of these signals is shown on the right-hand scale(This scale is applicable to the color interpretation of all frequency-time images in the article).

Figure 2. The PBWDNIST Whistler Network framework.

Figure 3. Bio-inspired auditory feature encoder.

Figure 4. Overall framework of Sandglass-Shaped DSConv Network: (a) Sandglass-Shaped DSConv Network; (b) Sandglass structure details; and (c) each module legend.

Figure 5. Parallel Bidirectional Gated Temporal Unit Overall architecture: (a) Parallel Bidirectional Gated Temporal Unit; (b) single SRU neuron structure; and (c) BiSRU structure diagram.

Figure 6. Multivariate Spatiotemporal Detection Head Overall architecture.

Figure 7. Diagram for model performance comparison.

Figure 8. Examples of the effect of each model in detecting LW. The frequency range of the frequency–time image in the examples is 0–5120 HZ, and the time range is 4 s (160 time frames): (a) original frequency–time image (for reference only and not used in the training of this model); (b) detection effect of PBWDNIST; (c) detection effect of Mask RCNN; (d) detection effect of Mask Scoring RCNN; (e) detection effect of YOLOv5; (f) detection effect of YOLOv5 Upgraded; (g) detection effect of YOLOv8m; and (h) detection effect of YOLOv8l.

Figure 9. Detection effect; the time window of each frequency–time image example is 4 s; and the frequency range is 0–5120 HZ: (a) one LW; (b) multiple LW; and (c,d) multiple LW result in overlapping tests.

Figure 10. Output feature diagram of each module: (a) Bio-inspired Auditory Feature Encoder outputs results; (b) Sandglass-Shaped DScConv Network outputs results; (c) Parallel Bidirectional Gated Temporal Unit outputs results; and (d) Multivariate Spatiotemporal Detection Head outputs results.

Figure 11. A posteriori logical constraint modeling incorporates before-and-after comparisons: (a) frequency–time image—for reference, the frequency range is 0–5120 HZ; (b) results without adding a posteriori logical constraint; and (c) the result of adding a posteriori logical constraint.

Table 1. Data set partitioning.

Datasets	Time Frame
Training	Continuous waveform from 1 April to 5 April
Testing	Continuous waveform from 6 April to 10 April

Table 2. Experimental environment.

Category	Parameter
OS	Ubuntu22.04 (64 bit)
CPU	Intel^® Xeon(R) CPU E5-2680 v4@2.40 GHz (64 G)
GPU	NVIDIA TITAN V (24 G)
Deep Learning Framework	Pytorch2.2.2 + CUDA12.4
Programming Language	Python3.9

Table 3. Model hyperparameter.

Hyperparameters	Value
Loss function	Binary CrossEntropy Loss
Optimizer	Adam
Learning rate	0.001
Dropout rate	0.3
Batch size	128
Epoch	50

Table 4. Model performance comparison based on the same dataset.

Model	Precision (%)	Recall (%)	F1 (%)	Params (M)	Time Cost (h)
Ours	93.0	88.7	90.7	0.08	10.8
Mask RCNN	85.1	95.2	89.8	44.32	842.16
Mask Scoring RCNN	85.2	96.3	90.2	62.75	935.61
YOLOv5 Upgraded	91.6	90.0	90.8	13.78	579.86
YOLOv8s	91.3	90.1	90.6	11.2	574.43
YOLOv8m	92.5	89.9	91.2	25.9	591.82
YOLOv8l	94.0	88.8	91.3	43.7	622.61

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, B.; Yuan, J.; Yang, D.; Zhang, Z.; Yin, H.; Wang, Q.; Wang, J.; Zhima, Z.; Shen, X. Revolutionizing the Detection of Lightning-Generated Whistlers: A Rapid Recognition Model with Parallel Bidirectional SRU Network. Remote Sens. 2025, 17, 1963. https://doi.org/10.3390/rs17121963

AMA Style

Wang B, Yuan J, Yang D, Zhang Z, Yin H, Wang Q, Wang J, Zhima Z, Shen X. Revolutionizing the Detection of Lightning-Generated Whistlers: A Rapid Recognition Model with Parallel Bidirectional SRU Network. Remote Sensing. 2025; 17(12):1963. https://doi.org/10.3390/rs17121963

Chicago/Turabian Style

Wang, Bolin, Jing Yuan, Dehe Yang, Zhihong Zhang, Hanke Yin, Qiao Wang, Jie Wang, Zeren Zhima, and Xuhui Shen. 2025. "Revolutionizing the Detection of Lightning-Generated Whistlers: A Rapid Recognition Model with Parallel Bidirectional SRU Network" Remote Sensing 17, no. 12: 1963. https://doi.org/10.3390/rs17121963

APA Style

Wang, B., Yuan, J., Yang, D., Zhang, Z., Yin, H., Wang, Q., Wang, J., Zhima, Z., & Shen, X. (2025). Revolutionizing the Detection of Lightning-Generated Whistlers: A Rapid Recognition Model with Parallel Bidirectional SRU Network. Remote Sensing, 17(12), 1963. https://doi.org/10.3390/rs17121963

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Revolutionizing the Detection of Lightning-Generated Whistlers: A Rapid Recognition Model with Parallel Bidirectional SRU Network

Abstract

1. Introduction

2. Materials and Methods

2.1. Overall Framework Structure

2.2. Detailed Implementation Process

2.2.1. Bio-Inspired Auditory Feature Encoder

2.2.2. Sandglass-Shaped DSConv Network

2.2.3. Parallel Bidirectional Gated Temporal Unit

2.2.4. Multivariate Spatiotemporal Detection Head

2.2.5. Loss Function

2.3. Data Partitioning and Feature Engineering

2.4. Experimental Configuration and Model Setting

3. Results

3.1. Evaluation Indicators

3.2. Analysis of Experimental Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI