1. Introduction
A lightning flash may include many discharge events such as initial breakdown, leader, return stroke, M component, continuous current, and K process [
1,
2,
3]. Although the distribution of lightning flashes is different in different regions and thunderstorms, the lightning flashes generate similar discharge events, and the discharge signals corresponding to the same discharge events have similar characteristics. Taking return stroke as an example, the characteristics of 223 return strokes (RSs) in Brazil are almost the same as those of 209 RSs in the United States. The geometric mean of time interval between return strokes is 61.4 ms and 61.6 ms, respectively. The geometric mean of peak current of the first return stroke is –22.3 kA and –26.3 kA, respectively [
4]. Other discharge events in different regions also have similar waveform characteristics [
2,
5,
6,
7,
8,
9,
10,
11].
Lightning discharge can generate signals from very low frequency (VLF), low frequency (LF) to very high frequency (VHF), which are often used for lightning positioning. The VHF signal of lightning is very rich, which lasts for the whole process of lightning discharge. Based on this signal, the fine positioning results of the lightning channel can be obtained [
12,
13,
14]. However, compared to the VLF/LF signal, the VHF signal propagates along a straight line with weak strength and short transmission distance, so it can be easily blocked by ground objects and is affected by the local electromagnetic environment. Therefore, the lightning VHF positioning systems [
15,
16,
17] are only used in some key fields or scientific research. In contrast, the VLF/LF signal of lightning has strong signal strength and long transmission distance, and is relatively less affected by environmental factors [
18,
19]. Therefore, the positioning technology based on VLF/LF signals is still the main means of business applications, and has been widely used in a wide range of lightning monitoring. Many countries and regions had built commercial two-dimensional cloud-to-ground (CG) lightning location networks based on VLF/LF signals, since the end of last century [
20,
21,
22,
23]. With the progress of detection technology, some commercial CG lightning location systems are being upgraded or redeveloped since the beginning of this century, to have a total lightning location capability, but their total lightning detection efficiency is not high [
24,
25,
26,
27,
28].
In the past 10 years, various studies have been carried out on fine-positioning technology of low-frequency total lightning, which plays a great role in understanding the lightning process. For example, low-frequency fine-positioning technology has greatly promoted the understanding of the lightning initiation process [
29,
30]. Fine-positioning results are applied to thunderstorm research to improve the understanding of the fine configuration relationship between lightning activity and thunderstorm structure [
31,
32].
At present, the time of arrival (TOA) method is generally used in the low-frequency total lightning location technology applied in scientific research [
33,
34]. The basic principle of the TOA algorithm is as follows. When lightning discharges, its electromagnetic signal propagates freely into space, and the signals received by the distributed stations are different in time. The time and location of the discharge signal can be calculated according to the time difference between the signal time recorded by the distributed stations. According to the different matching methods of pulse signals, the TOA method can be divided into two types [
35,
36].
One type uses simple pulse features to match multiple pulses and then uses the TOA method for location. In the process of pulse matching, either only the time limit is determined by the station spacing [
37] or both the time limit and the similarity of pulse amplitudes are considered [
38,
39,
40]. The method based on the pulse matching of simple pulse features is easy to implement, but the positioning is not precise enough. At the same time, when the pulses are abundant, the amount of calculation increases geometrically.
Another type of TOA method is based on the original waveform, gradually narrowing the waveform window for waveform correlation matching (referred to as waveform cross-correlation matching), and then using the TOA method for location [
41,
42,
43]. Compared to the TOA method based on pulse-peak feature matching, the precision of the location is improved due to the use of the original waveform. Furthermore, an empirical mode decomposition (EMD) technology is used to filter the original waveform before matching, which further improves the positioning accuracy [
44]. This method uses waveform cross-correlation processing and complex signal filtering, which greatly increases the amount of calculation and reduces the positioning speed.
Time reversal technology can also be used in the three-dimensional location of low-frequency lightning discharges [
45,
46]. For example, Chen et al. [
46] used the time reversal method to find the optimal solution in the space limited by the linear initial solution of the TOA method. In their processing, waveform cross-correlation matching is still used. In cases of a low signal-to-noise ratio (SNR), fewer matching stations or lower time accuracy, the method can still obtain accurate positioning results. However, it takes a long time to use the time reversal method for space optimization, and it is difficult to locate the whole thunderstorm process in real-time.
Recently, a graphics processing unit-based grid traversal localization algorithm (GPU-GTA) has been proposed. The algorithm establishes the time difference of arrival (TDOA) grid database in advance, and finds the grid matching with the measured TDOA, by searching [
47]. This method has a fast location speed, but the matching ability is not improved, so it is difficult to obtain fine lightning locations.
In addition, artificial intelligence technology has been applied to lightning detection and early warning technology, which improves the classification ability of lightning types and the performance of lightning early warning. For example, Zhu et al. [
48] used a support vector machine to classify the lightning electric field signals collected by the Cordoba Marx Meter Array, and the effective classification accuracy rate of cloud flashes and ground flashes was 97%. Wang et al. [
49] used a multi-layer one-dimensional convolutional neural network to automatically extract VLF/LF lightning waveform features, and then classified lightning discharges based on the features. The overall classification accuracy on the lightning dataset was 99.11%. Zhou et al. [
50] has built a deep-learning network that integrates satellite, radar, and lightning positioning data to predict the occurrence of lightning. It has a good performance in the short-term lightning prediction of 0–1 h. Since the beginning of this century, as an effective method of extracting target features, autoencoder has been widely used in various fields, which can get low dimensional encoding features from high-dimensional data through a multi-layer neural network [
51]. In recent years, convolutional neural networks have achieved excellent performance in extracting features of multiple targets [
52,
53], which directly promote the generation of convolutional autoencoders. The convolutional autoencoder integrates the convolutional neural network structure into the autoencoder, so that the feature weight is shared globally in the input, and the local integrity is maintained, thereby better preserving the data features [
54]. The convolutional autoencoder can be used to extract the characteristics of lightning pulses, so as to obtain the low dimensional feature of lightning signal.
In summary, the lightning 3D positioning system currently used in the business is based on simple features to locate discharge event, which has a fast positioning speed, but cannot finely locate the lightning channel. In the field of scientific research, although the precise location of the lightning channel has been realized, most are based on the original waveform, and the speed is relatively slow. At present, there is still no mature technology with both fine positioning and fast positioning capability for low-frequency lightning signals. At the same time, artificial intelligence technology has achieved good results in feature extraction. Therefore, here, artificial intelligence technology is introduced into total lightning positioning based on low frequency signals, and the TOA method based on deep-learning encoding feature-matching (a new algorithm) is proposed. This method has fine positioning ability, and greatly improves the matching efficiency and positioning speed.
2. Materials and Methods
2.1. LFEDA System
LFEDA is a three-dimensional (3D) lightning positioning system for thunderstorm monitoring built by the Chinese Academy of Meteorological Sciences in Conghua, Guangzhou.
Figure 1A is the map of China, and the red diamond represents the location of LFEDA. As shown in
Figure 1B, the LFEDA is composed of 10 distributed substations with a baseline of 6–60 km and an average distance between other stations and CHJ is 22 km, which accurately locates lightning discharges within 100 km. The SLC station was relocated to the ZTC station in 2017.
Figure 2 is the equipment of each substation, which is composed of a digitizer, a fast antenna, and a global positioning system (GPS) clock source. Each substation uses a fast antenna to detect the electric field change signal from 160 Hz to 600 kHz, and adopts segmented trigger acquisition technology to record the signal waveform in sections, while marking the time. The principle of the fast antenna is similar to that for the Los Alamos Sferic Array sensor [
38], but it receives the spatial electric field change signal through an upright capacitor plate, and an active integration circuit with a time constant of 1 ms. The time precision is ~100 ns, the sampling rate is 10 MS/s, the trigger sampling length is 1 ms, the pre-trigger length is 0.2 ms, and a 12-bit analog to digital data acquisition is used. This paper presents algorithm research based on LFEDA observation data of a typical cloud-to-ground lightning process on 15 August 2015. Moreover, the performance of the new algorithm was tested based on artificially triggered lightning flashes, which occurred in the LFEDA network.
2.2. Experiments with Artificially Triggered Lightning
The triggered lightning flashes used to evaluate the detection performance were obtained by the experiments of artificial triggered lightning. The experiment was jointly carried out since 2006, by the State Key Laboratory of Severe Weather of the Chinese Academy of Meteorological Sciences and the Guangzhou Institute of Tropical Marine Meteorology, China Meteorological Administration, in the Conghua district, Guangzhou. Thus far, 189 lightning flashes have been triggered successfully.
Figure 3A shows the layout of the triggered lightning test site. The rocket launcher and lightning rod are located inside the LFEDA network.
Figure 3B shows a photo of a triggered lightning, which was triggered at the position of the lightning rod through a metal wire pulled up by the rocket. For each triggered lightning flashes, the close-range electromagnetic field and channel-based current were detected synchronously, and the recording time was 5 s, which could ensure the complete acquisition of a lightning process. At the same time, the optical channel observations at 1.9 km and 600 m provided direct optical records of lightning return stroke channels. In order to compare the positioning results of this study with others, the triggered lightning flashes with return stroke in 2015 and 2017 were used in this study. Please refer to the articles of Chen et al. [
55] for specific information on artificially triggered lightning.
2.3. Location Method
The new algorithm based on deep-learning encoding feature (hereinafter referred to as encoding feature) matching is mainly divided into 4 steps—pulse extraction, feature extraction, pulse matching, and positioning (
Figure 4).
Pulse extraction: The lightning waveforms of all stations are filtered by a bandpass filter to remove the frequency components above 100 kHz and below 5 kHz, and is then normalized. After that, according to the threshold of noise and peak-to-peak interval (1 μs), the pulse peak is found, and a 25.6 μs waveform including the pulse peak is intercepted.
Feature extraction: The extracted pulse waveform is input into the trained convolutional autoencoder to obtain the encoding features, which together with the pulse-peak time constitute the pulse feature vector.
Pulse matching: First, one station is selected as the master station, and the pulses of other stations are matched with the master station pulse, one by one, according to the correlation coefficients of the encoding features. The matching condition is that the peak time difference between the two stations does not exceed the propagation time of light between the two stations, and the pulse encoding features of the two stations have the highest correlation. After the pulses are matched successfully, a set of matching pulse-peak time data is obtained from the pulse feature vectors.
Positioning: According to the matched pulse-peak time data and the geographic location of each station, the TOA method is used to calculate the occurrence time and location of the lightning discharge event. Taking the matching signals of 5 stations as an example, the details are as follows. For matched signals of any stations, according to the distance from the discharge event to 5 stations, five equations in the form of Equation (1) can be listed. By subtracting the four stations remaining from one of them, four equations in the form of Equation (2) can be obtained, which can be converted into matrix multiplication, to obtain linear initial solutions.
In Equation (1), (
x,
y,
z,
t) represents the position and time of lightning discharge, (x
i,
yi,
zi,
ti) represents the location and recorded pulse-peak time of the substation,
c is the speed of light (
c = 3 × 10
8 m/s), and
i is the serial number of the substation. In Equation (2), (
x,
y,
z,
t), (
xi,
yi,
zi,
ti),
c and
i have the same meanings in Equation (1), and (
xm,
ym,
zm,
tm) represents the position and recorded pulse-peak time of the main station.
Then, the Levenberg Marquardt algorithm is used to fit Equation (3) to obtain the final accurate numerical solution. (
x,
y,
z,
t) represents the location and time of lightning discharge, (
xi,
yi,
zi,
ti), c, and i have the same meaning in Equations (1) and (2). Finally, the output positioning results are screened by the goodness of fit (χ
2), under the condition of χ
2 < 5.
2.4. Establishment of Convolutional Autoencoder Model
To improve the matching ability, we establish a convolutional autoencoder for lightning signals, which is mainly composed of an encoder and a decoder. The encoder extracts fixed-length encoding feature from the input pulse waveform for subsequent matching and positioning. The encoding feature is the abstract information of pulse waveform, which is used to decode and recover the waveform to the maximum extent. The decoder is only used to calculate the encoder error during the training process. The convolutional autoencoder details are as follows.
First, based on the LFEDA data, a fixed-length lightning signal is input into the encoder, and the shallowest features are extracted through the first layer of the neural network. Then, the features are condensed through the maximum pooling layer, reducing the feature length by 1/2. The latter two layers of neural networks continue the above operations on the features extracted from the previous layer, and finally form 32 encoding features. The structure of the decoder is similar to that of the encoder but is only used to reconstruct the original signal to test the effect of encoding and training.
The convolutional autoencoder is trained for 10 trials. During the training process, the adaptive moment estimation (ADAM) algorithm is used for optimization. ADAM algorithm is a simple and computationally efficient algorithm for gradient-based optimization of stochastic objective functions. This algorithm combines the advantages of other algorithms, which can make the training process shorter and occupy less memory. The optimized loss function is shown in Equation (4), where n is the length of the data, i is the serial number of the data, E
true is the true electric-field value, and E
pred is the predicted electric-field value. When the loss is minimum, the weight of neural network is the most suitable for feature extraction of lightning signals.
The fixed-length waveform segments corresponding to 225,476 real lightning discharge events in a thunderstorm on 15 June 2017 were used as the training data set. The data included various lightning discharge events, so the features of various lightning discharge pulses could be accurately extracted.
2.5. Selection of the Model Key Parameter
The length of input pulse waveform and the encoding feature are two key parameters in the model, which need to be determined through the test. Taking pulse CHJ-1 in
Figure 5a as an example, we analyze the influence of different pulse waveform length and encoding feature length on feature extraction. The correlation coefficients between the original waveforms and the decoded waveforms are calculated in different cases, which reflects the encoding effect. The larger the correlation coefficient, the more similar the waveforms and the better the feature expression.
Table 1 shows the influence of encoding feature length on the correlation coefficients between the original waveform and the decoded waveform. With an increase in the encoding feature length, the correlation coefficient between the decoded waveform and the original waveform increases, which means that the decoded waveform is more accurate. When the lengths of encoding feature increases to 32 and 64, the correlation coefficients are greater than 0.99, indicating that the decoded waveforms are basically consistent with the original waveforms. However, a large encoding feature length significantly increases the amount of computation. Therefore, considering the decoding effect and computing speed, we chose 32 as the length of the encoding feature.
Table 2 shows the influence of pulse waveform length on the correlation coefficients between the original waveform and the decoded waveform. Although the correlation coefficients between the decoded waveforms and the original waveforms increased with the increase of pulse waveform length, the correlation coefficient change was not obvious. However, considering the distribution range of the duration of lightning pulse detected by LFEDA (ranging from several to tens of us), we chose 25.6 μs as the input waveform length of the model to ensure that most pulses could extract complete features instead of local features.
2.6. Matching Ability of New Algorithm
Taking two continuous pulse signals recorded synchronously by four stations as an example (
Figure 5), we compared the matching ability of the encoding feature matching method, pulse-peak feature (peak amplitude and time) matching method, and the pulse-multiple features (rising edge, falling edge, energy, peak, half width, duration, 10–90% rising edge, 10–90% falling edge, 30–70% rising edge, and 30–70% falling edge) matching method. The pulse-multiple feature matching method calculates the correlation coefficients of multiple features between different pulses, on the premise of meeting the time limit of signal propagation between stations, and finds the pulses with the highest correlation coefficients as the matching pulses. The matching method based on pulse-peak features finds the best match on the amplitude sequence and time sequence, by considering the similarity of pulse amplitudes on the basis of the time limit.
For the pulse-peak feature matching method, due to the different propagation paths of the lightning signal received by each substation, the pulse amplitudes of different stations are different, resulting in an incorrect match between XTC-1 and CHJ-2. The pulse multiple features matching method uses the correlation coefficients between the pulse multiple features of the CHJ station and the other three stations to match. Only those pulses whose correlation coefficient of feature is greater than the set threshold are regarded as matching pulses. When the pulses of other stations are matched with those of the master station (CHJ), the greater the difference between the correlation coefficients of the correctly matched pulses and the wrongly matched pulses, the better the matching ability, indicating that different pulses can be clearly distinguished. As shown in
Table 3, there is no significant difference in the correlation coefficients of multiple features for different pulses, and sometimes the correlation coefficients of mismatched pulses are higher. For example, the correlation coefficients between the XTC-1 and CHJ-1 pulses and between the XTC-1 and CHJ-2 pulses are 0.999999 and 0.999995, respectively. The matching is correct, but the difference of the correlation coefficient between correct matching and wrong matching is only 0.000004 (0.999999 − 0.999995 = 0.000004). The correlation coefficients between the ZCJ-2 and CHJ-1 pulses and between the ZCJ-2 and CHJ-2 pulses are 0.999950 and 0.999898, respectively. The higher correlation coefficients between ZCJ-2 and CHJ-1 leads to incorrect matching.
For the encoding feature matching method, the correlation coefficients between the encoding features of the CHJ station and the other three stations were calculated. As shown in
Table 4, there was no matching error, and the difference in correlation coefficients between different discharge signals was considerable. For example, the correlation coefficients between the XTC-1 and CHJ-1 pulses and between the XTC-1 and CHJ-2 pulses were 0.95147 and 0.78790, respectively, and the difference between the correlation coefficients was 0.16357, which was much larger than that based on pulse multiple features with a value of 0.000004.
It can be seen from the above examples that when the traditional pulse-peak feature and pulse multiple features are used for matching, the difference of correlation coefficients between the correctly matched pulse and the incorrectly matched pulse is small, even when an incorrect matching occurs. However, when encoding features are used, the correlation coefficients are significantly different, and the pulses can be matched correctly. Since pulse matching is the key of lightning fine location, the location based on encoding features can further improve the fine location ability of LFEDA.