Experimental Study of Cloud-to-Ground Lightning Nowcasting with Multisource Data Based on a Video Prediction Method

The evolution of lightning generation and extinction is a nonlinear and complex process, and the nowcasting results based on extrapolation and numerical models largely differ from the real situation. In this study, a multiple-input and multiple-output lightning nowcasting model, namely Convolutional Long- and Short-Term Memory Lightning Forecast Net (CLSTM-LFN), is constructed to improve the lightning nowcasting results from 0 to 3 h based on video prediction methods in deep learning. The input variables to CLSTM-LFN include historical lightning occurrence frequency and physical variables significantly related to lightning occurrence from numerical model products, which are merged with each other to provide effective information for lightning nowcasting in time and space. The results of batch forecasting tests show that CLSTM-LFN can achieve effective forecasts of 0 to 3 h lightning occurrence areas, and the nowcasting results are better than those of the traditional lightning parameterization scheme and only inputting a single data source. After analyzing the importance of input variables, the results show that the role of numerical model products increases significantly with increasing forecast time, and the relative importance of convective available potential energy is significantly larger than that of other physical variables.


Introduction
Lightning is a strong discharge phenomenon that occurs in the atmosphere and can cause large damage to power, transportation, communication and other facilities. The occurrence of lightning is often accompanied by heavy rain, strong winds and other severe convective weather, causing a large amount of casualties and property damage. Therefore, obtaining lightning forecast results with high spatial and temporal resolution is a research priority for meteorological departments in various countries.
Nowcasting usually refers to weather forecasting from 0 to 6 h [1] and is a forecast service to prevent catastrophic weather, such as emergency local strong storms. The occurrence of lightning is closely related to meteorological conditions. Through a series of observations, such as satellites [2] and sounding balloons [3], it is possible to determine whether there is conduciveness to the occurrence of lightning meteorological conditions. The K index indicates the static stability of the layer from 850 hPa to 500 hPa, and lightning usually occurs in the high value area of the K index [4]. Strong thunderstorms usually occur in the high value area of convective available potential energy (CAPE) [5] and the strong shear area of deep convection [6]. Since the sounding data are observations of a single station at a single moment, they cannot represent the weather state and the change pattern over time on a large scale, and the nowcasting results have some limitations.
Radar data have high spatial and temporal resolution and a wide detection range, which can quickly provide real-time weather observations and, therefore, are often used for monitoring and warning of catastrophic weather [7,8]. Numerous studies have shown that lightning usually occurs in regions where large vertical velocities [9] and high concentrations of ice crystal particles are present [10]. According to the statistics of lightning and radar echoes, the area where radar echoes of 40 dBZ are located at altitudes greater than 7 km [11] or the occurrence of 30 dBZ radar echoes in the −15 • C to −20 • C isothermal layer that have been observed in more than two consecutive scans can be regarded as the basis for lightning occurrence [12]. Dual-polarization radar is good at detecting information on ice crystal particle content and location, and the volume of ice crystal particles obtained from the inversion of dual-polarization radar can be used to forecast lightning according to different thresholds [13]. The method of identifying lightning occurrence areas based on radar data is obtained by using the statistics of historical samples from some areas but has some limitations in applicability in other areas. Meanwhile, a study pointed out that the correlation between lightning and radar echoes in linear convective systems is weaker than that of supercells [14]. The occurrence of lightning develops more rapidly, and the nowcasting results obtained based on statistical methods rapidly decrease in accuracy and produce large errors after a certain time limit.
In contrast to the above methods for lightning nowcasting, the use of numerical weather prediction models provides a more accurate dynamical and thermal state of the atmosphere. The PR92 lightning parameterization scheme establishes a relationship between lightning occurrence frequency and cloud top height, which has different forecast formulas for land and ocean areas [15]. The potential electrical energy establishes a relationship with the lightning density based on the microphysical variables within the cloud and considers the occurrence of lightning when its dissipation exceeds a prespecified threshold [16]. Wang et al. (2010) used the Global/Regional Assimilation and Prediction Enhanced System to construct a forecast equation based on the relationship between radar echoes and lightning occurrence frequency while using the cloud top temperature as a limiting condition. After testing two cases of lightning in South China, the nowcasting results of lightning occurrence location and density were more consistent with the observed results [17]. Based on the study of lightning initiation and discharge theory [18,19], some scholars have coupled the initiation and discharge parameterization scheme into the numerical model so that the numerical model has the ability to simulate the electric field characteristics within clouds and forecast lightning [20,21]. In recent years, numerical weather models have improved initial forecast model fields by assimilating sounding, radar, satellite and other observations, but there are still some biases in the nowcasting results [22], and the bias is not regular in time and space.
The continuous development of artificial intelligence technology in recent years has led to a series of breakthroughs in image recognition, speech recognition, autonomous driving, etc. The application of artificial intelligence technology to weather forecasting is gradually becoming popular. A deep neural network is a simulation of biological neurons, a dynamic network with strong nonlinear expression ability and self-adaptive capability generated by the superposition of large-scale neurons. The analysis of meteorological observations and products from numerical models using deep learning methods can identify areas of catastrophic weather occurrence at future moments, and the mathematical-physical process is relatively similar to application scenarios such as image recognition and video prediction. The more commonly used existing methods are convolutional neural networks (CNNs) and convolutional long-and short-term memory networks (ConvLSTMs), and some researchers have made a series of progress in pattern recognition [23,24] and lightning forecasting [25,26] using the above methods. Semantic segmentation is used for the classification of images at the pixel level and is commonly used in areas such as the detection of organ lesions in medical images and the classification of remote sensing images. Zhou et al. (2019) used NCEP Final Operational Global Analysis data (FNL) and Global Forecast System forecast data (GFS) to conduct convolution operations for grid points around the forecast grid points and judge whether lightning, hail and other strong convective weather will occur after binary classification in the fully connected layer [27]. Zhou et al. (2020) used Himawari-8 satellite and radar echo data and replaced two-dimensional convolution in a semantic segmentation network with three-dimensional convolution to obtain the probability of lightning occurrence in the forecast area [28]. The model obtained high forecast scores for the 0 to 1 h lightning nowcasting results.
Video prediction usually refers to analyzing object movements in the first few frames of still pictures or videos and predicting future images based on changes in their positions and forms [29]. For example, by analyzing the movement trajectory of pedestrians, the surrounding building environment and other factors, the target is tracked and the trajectory is predicted so that the driver can be alerted to react in time to avoid traffic accidents. Compared with image classification and semantic segmentation, video prediction methods are more suitable for dealing with dynamic image changes because they maintain continuity in the temporal dimension, and they have been successfully applied in autonomous driving [30] and short-term prognosis of precipitation [31,32].
Lightning usually occurs in small-and mesoscale convective systems, which are characterized by rapid changes and short times, and its development process is nonlinear and complex. The single extrapolation of historical observation data or parameterization schemes from numerical model products often largely differs from actual observation results. Instead, merging such observation data with numerical model products is a more promising method to improve lightning nowcasting results. Based on the characteristics of the video prediction method, the future change trend can be inferred from lightning characteristics in historical time, further using the indication of meteorological conditions from the numerical model products, which can effectively improve the lightning nowcasting results.
Based on the above research progress, in this study, a CNN module is added to the ConvLSTM to construct the lightning nowcasting model CLSTM-LFN, where the CNN module is used to extract the factors affecting lightning occurrence from the numerical model products and the ConvLSTM module is used to predict the spatiotemporal sequence forward composed of historical lightning occurrence frequency and factors affecting lightning occurrence, and it thus obtains lightning occurrence areas from 0 to 3 h in the future. Finally, the importance of the input variables is assessed using the permutation importance method. Section 2 describes the data and the study area. Section 3 introduces the structure of the deep learning model and the experimental design. In Section 4, we illustrate the nowcasting results, and Section 5 demonstrates the variable importance analysis results. The conclusions and discussion are provided in Section 6.

Lightning Data
The lightning data used in this study are from the National Lightning Monitoring Network of China, which can process the received data in real time. The observation range covers most areas of China and is used to detect the time, location and intensity of cloud-to-ground lightning flashes. The average observation range of a single lightning monitoring station is approximately 300 km [33], and the detection efficiency is close to 90% [34], which provides an excellent prerequisite for lightning research. In this study, the lightning occurrence frequency (number of lightning occurrences in a 1-h period) is obtained by preprocessing the lightning observation data, which is one of the input variables to the CLSTM-LFN.

WRF Model Prediction Products
The numerical model product is one of the input variables of CLSTM-LFN, and the numerical model used in this study is the Weather Research and Forecasting Model (WRF), which is a new generation of mesoscale weather forecasting models developed by the National Center for Atmospheric Research (NCAR) and other institutions and is highly portable and easy to maintain. The WRF model used in this paper is driven by the GFS forecast data, which are activated once a day at 12:00 (UTC) and have a forecast time of 36 h. The first 12 h is the spin-up time, and the forecast results of the next 24 h are used as the input variables to the CLSTM-LFN, with a 1-h interval between output results. On the same day, the start-up time of the numerical model products used by CLSTM-LFN is the same, and the numerical model products with different forecast times are selected to forecast lightning at different times. For example, to forecast lightning for the time period 1 July 2020 00:00 (UTC, same below) to 1 July 2020 03:00 and 1 July 2020 09:00 to 1 July 2020 12:00, the numerical model is activated at 30 June 2020 12:00 and the forecast times are 12 h to 15 h and 21 h to 24 h, respectively.
The center point of the simulated area is 37.52 • E, 101.33 • N, with a grid resolution of 9 km × 9 km and a horizontal grid size of 570 × 500, covering most of the land area and part of the marine area of China (Figure 1a). Considering the detection efficiency of the lightning observatory and the quality of the WRF model products, the region selected for this study is 19 • N-42 • N, 98 • E-126 • E, with a grid number of 256 × 256. The area considered for this study is an excerpt of the WRF model domain, covering most of central-eastern China and parts of western China (Figure 1b). The model has 41 vertical levels, and the upper model level is at 50 hPa. The Purdue Lin scheme is used for the microphysical scheme, and the longwave and shortwave radiation schemes are the RRTM and Goddard schemes, respectively. The boundary layer scheme is the MYNN2.5 scheme, and the land surface scheme is the Noah-MP scheme. GFS forecast data, which are activated once a day at 12:00 (UTC) and have a forecast time of 36 h. The first 12 h is the spin-up time, and the forecast results of the next 24 h are used as the input variables to the CLSTM-LFN, with a 1-h interval between output results. On the same day, the start-up time of the numerical model products used by CLSTM-LFN is the same, and the numerical model products with different forecast times are selected to forecast lightning at different times. For example, to forecast lightning for the time period 1 July 2020 00:00 (UTC, same below) to 1 July 2020 03:00 and 1 July 2020 09:00 to 1 July 2020 12:00, the numerical model is activated at 30 June 2020 12:00 and the forecast times are 12 h to 15 h and 21 h to 24 h, respectively. The center point of the simulated area is 37.52°E, 101.33°N, with a grid resolution of 9 km×9 km and a horizontal grid size of 570 × 500, covering most of the land area and part of the marine area of China (Figure 1a). Considering the detection efficiency of the lightning observatory and the quality of the WRF model products, the region selected for this study is 19°N-42°N, 98°E-126°E, with a grid number of 256 × 256. The area considered for this study is an excerpt of the WRF model domain, covering most of central-eastern China and parts of western China (Figure 1b). The model has 41 vertical levels, and the upper model level is at 50 hPa. The Purdue Lin scheme is used for the microphysical scheme, and the longwave and shortwave radiation schemes are the RRTM and Goddard schemes, respectively. The boundary layer scheme is the MYNN2.5 scheme, and the land surface scheme is the Noah-MP scheme. Thirteen variables from WRF model products, such as maximum vertical velocity and CAPE, are selected in this study. The reason is to consider that lightning is strongly indicative of strong convective weather, and physical variables, such as storm helicity, CAPE and precipitation, provide indications for lightning nowcasting on a macroscopic scale. Several studies have shown that physical variables, such as radar echo [35], water vapor [36], ice crystal particles [13] and vertical velocity [37], have strong correlations with the occurrence of lightning. The details of the input variables to the CLSTM-LFN from numerical model products are shown in Table 1. Thirteen variables from WRF model products, such as maximum vertical velocity and CAPE, are selected in this study. The reason is to consider that lightning is strongly indicative of strong convective weather, and physical variables, such as storm helicity, CAPE and precipitation, provide indications for lightning nowcasting on a macroscopic scale. Several studies have shown that physical variables, such as radar echo [35], water vapor [36], ice crystal particles [13] and vertical velocity [37], have strong correlations with the occurrence of lightning. The details of the input variables to the CLSTM-LFN from numerical model products are shown in Table 1. radar reflectivity at 6 km above ground level dBZ R 9 radar reflectivity at 9 km above ground level dBZ

Preprocessing of Lightning Data
The preprocessing of lightning data consists of two main parts: quality control and increasing the lightning occurrence frequency density. The observation results of the lightning detection network have some errors [38], and isolated lightning cannot be judged whether it truly occurs. Even if they do exist, their convective intensity is weak, and the forecasting process is difficult. Therefore, the quality control aspect is mainly the rejection of isolated lightning, and the detection is rejected if no other lightning is observed within 20 km of a lightning location.
Lightning occurs mainly in mesoscale weather systems, typically on spatial scales of tens to hundreds of kilometers. The common method used in previous studies to mark grid points is to convert the lightning to the nearest grid point. Due to the high resolution of the grid, the meteorological conditions between the grid points adjacent to the lightning occurrence location are relatively similar. After labeling by the above method, the grid points with similar meteorological conditions will be labeled with the opposite result (detection vs. non-detection), which will have a large impact on the training of the neural network model and is not conducive to the convergence of the model. On the other hand, the occurrence of lightning is a small probability event, and statistically, the number of grids in which lightning occurs in the sample used in this study only accounts for 4% of the total number of grids. The process of neural network training gradually reduces the error between the output result and the grid labeling result. The neural network model will recognize lightning as noise if the number of grid points with lightning occurrence is small, resulting in the area of lightning occurrence forecast by the neural network model being smaller than the actual observation. It is necessary to increase the density of the lightning observation frequency to improve the model training results. Therefore, in this study, the number of lightning observations within a 20 km radius of the grid point is used as the lightning occurrence frequency of the grid point.

Training Set and Test Set
The input variables of the CLSTM-LFN included historical lightning occurrence frequency in the past 3 h at the starting forecast moment and hourly WRF model products in the next 3 h. For example, to forecast lightning from 1 July 2020 03:00 to 1 July 2020 06:00, the input variables to the CLSTM-LFN are the hourly lightning occurrence frequency from 1 July 2020 00:00 to 1 July 2020 03:00 and the hourly WRF model products from 1 July 2020 03:00 to 1 July 2020 05:00 (WRF model is activated at 30 June 2020 12:00).
The target output of traditional methods for forecasting lightning occurrence areas is generally the presence or absence of lightning (1 or 0). Grid points with higher lightning occurrence frequencies are often accompanied by stronger convective weather, and several studies have shown a strong positive correlation between radar echoes [39,40], CAPE [41] Remote Sens. 2022, 14, 604 6 of 19 and lightning occurrence frequency. In the grid with lightning occurrence, when the difference in lightning occurrence frequency is large, the meteorological conditions of the grid points also have large differences, and marking all the grid points with lightning occurrence as 1 is, to some extent, not conducive to model convergence. Therefore, this study only focuses on the location of lightning occurrence, but the output of the CLSTM-LFN is the lightning occurrence frequency in the next 3 h, and lightning is considered to occur when the output is greater than the threshold N. A small value of N will lead to a higher false alarm rate, and a larger value will reduce the hit rate. After a series of tests, it is concluded that the CLSTM-LFN can achieve a higher hit rate and a lower false alarm rate when N = 5.
Summer is the high lightning period in the forecast area [42], and the samples selected for this study are from June to August 2020. June and July are training sets containing 1270 samples, August is the test set containing 216 samples, and each sample contains an hourly lightning forecast for the next 3 h. Since the CLSTM-LFN has 660,252 parameters, too few training samples can easily lead to overfitting, so it is necessary to use data augmentation methods to improve the model performance [43][44][45]. In this study, the input variables and labels of the training set were simultaneously rotated by 90 • , 180 • and 270 • in the horizontal direction to generate three additional training sets (5080 samples in total).

Neural Network Structure
According to the characteristics of video forecasting and the data structure used in this study, the neural network model needs to obtain the lightning trends from historical lightning occurrence frequencies and merge them in time and space with the numerical products from the WRF model. Therefore, this study designs the lightning nowcasting model Convolutional Long-and Short-term Memory Lightning Forecast Net (CLSTM-LFN) based on ConvLSTM, which consists of two networks, CNN and ConvLSTM Net, for extracting features from the WRF model products and for composing spatiotemporal sequence forward prediction.
In the input variables to the CLTSM-LFN, the temporal dimension of the historical lightning occurrence frequency is 3, and the north-south and east-west dimension is consistent with the number of grid points in the study area, which is 256 × 256. Thus, the dimension of the historical lightning occurrence frequency is [3,256,256,1], namely [time, rows, columns, variables].
The WRF model products are hourly weather conditions, including 13 variables, such as maximum reflectivity (R max ) and CAPE. Since the number of variables of the lightning occurrence frequency and the WRF model products are different and cannot be merged to form a spatiotemporal sequence in the time dimension, a two-dimensional convolutional network called CNN Net is first used to convolve the WRF model products to perform feature extraction and compress the number of variables to 1. Since the CNN Net performs feature extraction for WRF model products separately, the model products do not contain a temporal dimension, and the north-south and east-west dimension is also 256 × 256, so the dimensions of WRF model products are [256,256,13], namely [rows, columns, variables]. The convolved results are merged with the lightning occurrence frequency in the time dimension, which contains the starting forecast moment from 3 h in the past to 3 h in the future, forming a spatiotemporal sequence of dimensions [6,256,256,1]. Subsequently, ConvLSTM Net is used to extract features in time and space and then perform spatiotemporal sequence prediction to achieve hourly lightning occurrence area nowcasting for the next 0 to 3 h. The network structure and data flow are detailed in Figure 2. The structures of CNN and ConvLSTM Net are shown in Figure 3. CNN Net consists of 4 convolutional layers: the number of convolutional kernels in each layer is (128, 64, 32, 1), the convolutional kernel dimension is 2 × 2, the stride is 1 and the activation function is a rectified linear unit (ReLU) [46]. Meanwhile, the padding module is added in the convolution process to keep the grid resolution constant before and after the convolution. For the WRF model products after the convolution and reshaping operation of CNN Net, the 13 variables were compressed into 1 variable, and the time dimension was added. Then, the tensor dimension changed from [256, 256, 13] to [1,256,256,1]. The ConvLSTM Net consists of the ConvLSTM2D and Conv3D modules with (128, 64, 32, 1) and (32, 16, 1) convolution kernels, respectively, and both of the activation functions are ReLU, where ConvLSTM2D has a convolutional kernel size of 2 × 2 and a stride of 1. The last layer of the network is set to return all sequences (return_sequences = True) to obtain a spatiotemporal sequence of time length 6. Since the purpose of this study is to forecast lightning hourly for the next 3 h, the Conv3D module is added with a convolution kernel size of 2 × 2 × 1 and a stride of 1. The convolution operation is performed on pairs in the time dimension to compress the length of the time series from 6 to 3. The structures of CNN and ConvLSTM Net are shown in Figure 3. CNN Net consists of 4 convolutional layers: the number of convolutional kernels in each layer is (128, 64, 32, 1), the convolutional kernel dimension is 2 × 2, the stride is 1 and the activation function is a rectified linear unit (ReLU) [46]. Meanwhile, the padding module is added in the convolution process to keep the grid resolution constant before and after the convolution. For the WRF model products after the convolution and reshaping operation of CNN Net, the 13 variables were compressed into 1 variable, and the time dimension was added. Then, the tensor dimension changed from [256, 256, 13] to [1,256,256,1]. The ConvLSTM Net consists of the ConvLSTM2D and Conv3D modules with (128, 64, 32, 1) and (32, 16, 1) convolution kernels, respectively, and both of the activation functions are ReLU, where ConvLSTM2D has a convolutional kernel size of 2 × 2 and a stride of 1. The last layer of the network is set to return all sequences (return_sequences = True) to obtain a spatiotemporal sequence of time length 6. Since the purpose of this study is to forecast lightning hourly for the next 3 h, the Conv3D module is added with a convolution kernel size of 2 × 2 × 1 and a stride of 1. The convolution operation is performed on pairs in the time dimension to compress the length of the time series from 6 to 3.

2D and 3D Convolution Layers
Convolutional neural networks are one of the important models in the field of deep learning and can extract features from the original image. The early classical CNN model LeNet-5 was designed using the error back propagation algorithm, and after continuous improvement, progress has been made in face recognition and robot navigation [47][48][49][50][51].
The output characteristic graph of the convolution layer is obtained by a set of convolution kernels that perform the convolution operation of the previous layer and activation function. The two-dimensional and three-dimensional convolution layers act on the planar graph and the cube, respectively, and the formula is expressed as follows: where x l j is the output characteristic graph, f ( . . . ) is the activation function and * is the convolution operation. k l ij is the convolution kernel, b l j is the bias and M j is the set of input variables. convolution kernels, respectively, and both of the activation functions are ReLU, where ConvLSTM2D has a convolutional kernel size of 2 × 2 and a stride of 1. The last layer of the network is set to return all sequences (return_sequences = True) to obtain a spatiotemporal sequence of time length 6. Since the purpose of this study is to forecast lightning hourly for the next 3 h, the Conv3D module is added with a convolution kernel size of 2 × 2 × 1 and a stride of 1. The convolution operation is performed on pairs in the time dimension to compress the length of the time series from 6 to 3.

ConvLSTM
Shi et al. (2015) proposed ConvLSTM networks based on fully connected LSTM networks (FC-LSTM) [31]. FC-LSTM is good at processing data that are strongly correlated in anterior and posterior sequences and is often used for time series prediction. In addition to the temporal continuity of radar and precipitation data, they also have strong spatial characteristics in space, which can lead to the loss of spatial information when processed with FC-LSTM. The convolution operation is used in ConvLSTM to extract features instead of the fully connected method in FC-LSTM, which can capture sufficient spatial information, and the structure diagram is shown in Figure 4. Furthermore, i t , f t and o t represent input gate, forgetting gate and output gate, respectively; x t , h t and c t represent input variable, hidden variable and storage unit, respectively; t represents the step of the network; σ is the sigmoid function with the output range of [0, 1]; tan h represents the hyperbolic tangent function with the output range of [−1, 1]; W and b are the weights to be trained and bias, respectively.

2D and 3D Convolution Layers
Convolutional neural networks are one of the important models in the field of deep learning and can extract features from the original image. The early classical CNN model LeNet-5 was designed using the error back propagation algorithm, and after continuous improvement, progress has been made in face recognition and robot navigation [47][48][49][50][51].
The output characteristic graph of the convolution layer is obtained by a set of convolution kernels that perform the convolution operation of the previous layer and activation function. The two-dimensional and three-dimensional convolution layers act on the planar graph and the cube, respectively, and the formula is expressed as follows: where is the output characteristic graph, f(…) is the activation function and * is the convolution operation. is the convolution kernel, is the bias and is the set of input variables.

ConvLSTM
Shi et al. (2015) proposed ConvLSTM networks based on fully connected LSTM networks (FC-LSTM) [31]. FC-LSTM is good at processing data that are strongly correlated in anterior and posterior sequences and is often used for time series prediction. In addition to the temporal continuity of radar and precipitation data, they also have strong spatial characteristics in space, which can lead to the loss of spatial information when processed with FC-LSTM. The convolution operation is used in ConvLSTM to extract features instead of the fully connected method in FC-LSTM, which can capture sufficient spatial information, and the structure diagram is shown in Figure 4. Furthermore, , and represent input gate, forgetting gate and output gate, respectively; , and represent input variable, hidden variable and storage unit, respectively; t represents the step of the network; σ is the sigmoid function with the output range of [0,1]; tan h represents the hyperbolic tangent function with the output range of [−1,1]; W and b are the weights to be trained and bias, respectively.

Network Training
In this study, the deep learning framework Keras was used for network construction and training, which integrates several mainstream deep learning algorithms and has the advantages of simplicity and high modularity. The Adam optimizer was used in the training process, and the default settings were used for the parameters [52]. The loss function is the mean square error (MSE) with the following expression.
where m is the number of samples, and y i and f (x i ) are the observed value (true value) and neural network model prediction of the ith sample, respectively. The neural network model is trained and predicted using NVIDIA's general-purpose parallel computing architecture CUDA and the graphics processor Tesla V100. Experimental results show that CLSTM-LFN can complete the read-in of input data and the prediction of hour-by-hour lightning occurrence areas for the next 3 h in less than 5 min to meet the operational requirements.

Controlled Experimental Design
To investigate the effect of a single input data source on the CLSTM-LFN model and to compare it with traditional lightning forecasting methods, the control experiment consists of a lightning parameterization scheme and an empirical forecasting method, and different data sources are input to CLSTM-LFN to test the impact. The experimental design is as follows: PR92: The PR92 parameterization scheme predicts lightning according to the relationship between lightning occurrence frequency F and cloud top height H [15]. The cloud top height is determined based on the thresholds of radar echo (20 dBZ) and temperature (0 • C), and the forecast results of this parameterization scheme are included in the WRF model. The PR92 scheme contains both land and ocean scenarios, and the forecast equation on land is F = 3.44 × 10 −5 × H 4.9 dBZ_from_WRF: Based on the statistical relationship between lightning and radar echoes, a study has shown that lightning usually occurs if a reflectivity of 40 dBZ occurs at altitudes where temperatures are <0 • C [53]. Therefore, the forecast results of the numerical model products for radar echo and temperature were used to forecast the lightning occurrence area.
CLSTM-LFN-O: A variant of the CLSTM-LFN model with historical lightning occurrence frequency as single input data, trained using only ConvLSTM Net due to the lack of WRF model products.
CLSTM-LFN-W: The same network structure as CLSTM-LFN-O, with input data containing only WRF model products.

Nowcasting Results and Scoring Test
The nowcasting results were evaluated with three types of scores: threat score (TS) [54], false alarm rate (FAR) and probability of detection (POD) [55]. As mentioned in the previous section, lightning usually occurs in mesoscale convective systems with spatial scales typically of tens to hundreds of kilometers, so the test scores are calculated with the neighborhood method in this study. If the forecast result for a grid point is greater than the threshold N and lightning is observed within 2 grid points from the forecast point, the hit is considered successful. For different experiments, a series of tests are conducted to determine the threshold N so that the forecast results satisfy a high hit rate and a low false alarm rate.
The results of CLSTM-LFN and multiple control test scores are calculated as shown in Table 2, and the nowcasting results indicate that, regardless of which data source is used, the forecast results of the deep learning models are better than the empirical formula based on the radar echo and PR92 parameterization scheme, and CLSTM-LFN is more suitable for identifying the complex nonlinear evolution of lightning. The TS scores of CLSTM-LFN and CLSTM-LFN-O showed a decreasing trend with increasing forecast time, with both TS scores decreasing by more than 50%. The TS scores of CLSTM-LFN-W were all approximately 0.11 at different forecast times, with a small range of variation. At a forecast time of 0 to 1 h, both CLSTM-LFN and CLSTM-LFN-O show better forecasts considering that the lightning position and morphology are less variable at the beginning of the forecast and the historical lightning observation data provide more valid information. At forecast times of 1 to 3 h, CLSTM-LFN, after merging the WRF model products, outperforms the other models. Compared with CLSTM-LFN-O, the TS scores of the 0 to 3 h forecasts are improved by 9.75%, 5.23% and 10.09%, respectively. PR92 and dBZ_from_WRF failed to achieve effective lightning forecasts because the numerical model has some bias in the forecasts of radar echoes and cloud top height, and there is an inapplicability of the statistical relationship between these physical variables and lightning occurrence, with both having TS scores less than 0.1.

Case Study
Lightning at 24 August 2020 15:00 (UTC) occurred mainly in the south-central Guizhou region, eastern Yunnan and northwestern Hubei region, with a zonal distribution, and then the convective system gradually moved to the southeast ( Figure 5). The nowcasting results of CLSTM-LFN and CLSTM-LFN-O for lightning show that the changes in the convective system are small at the initial forecast time, and both achieve effective forecasts for 0 to 1 h lightning (Figure 5a,d). At the 1 to 3 h period, the forecast range of CLSTM-LFN-O is smaller compared with the real situation, and more misses occur in the border area of Guizhou and Guangxi (red lines in Figure 5e,f). The CLSTM-LFN forecasts of lightning are closer to the observed results due to the indication of the R max products by the numerical model (Figure 5b,c). However, in the border area of Jiangsu and Anhui, both CLSTM-LFN and CLSTM-LFN-O produced false forecasts (blue lines in Figure 5) due to the middle values of CAPE from numerical model products (Figure 5j-l) and a small amount of lightning at the initial forecast moment. Remote Sens. 2022, 14, x FOR PEER REVIEW 11 of 20 Further performance diagrams were used to quantitatively evaluate the forecast results ( Figure 6), and TS, POD, FAR and bias scores are included in the diagram [56]. The  In the case of 7 August 2020 03:00, lightning mainly occurred in the central-easter part of the Sichuan region (Figure 7). Due to the small lightning occurrence frequenc input to the CLSTM-LFN-O, the lack of indication of historical lightning occurrence fre quency in this area leads to the failure of CLSTM-LFN-O to achieve an effective forecas of the lightning occurrence area (Figure 7d-f). The numerical model products show tha there is a strong CAPE at the lightning occurrence location (Figure 7g,i), and the CLSTM LFN is significantly improved by the interaction of multisource data. At 05:00 on 7 Augus 2020 in the southern part of Zhejiang, the numerical model products also forecasted strong CAPE, but CLSTM-LFN and CLSTM-LFN-O failed to forecast lightning in this re gion due to the absence of lightning at the initial moment (blue lines in Figure 7). In the case of 7 August 2020 03:00, lightning mainly occurred in the central-eastern part of the Sichuan region (Figure 7). Due to the small lightning occurrence frequency input to the CLSTM-LFN-O, the lack of indication of historical lightning occurrence frequency in this area leads to the failure of CLSTM-LFN-O to achieve an effective forecast of the lightning occurrence area (Figure 7d-f). The numerical model products show that there is a strong CAPE at the lightning occurrence location (Figure 7g,i), and the CLSTM-LFN is significantly improved by the interaction of multisource data. At 05:00 on 7 August 2020 in the southern part of Zhejiang, the numerical model products also forecasted a strong CAPE, but CLSTM-LFN and CLSTM-LFN-O failed to forecast lightning in this region due to the absence of lightning at the initial moment (blue lines in Figure 7).
Combined with the results of the performance diagram (Figure 8 The above two cases of convective systems occurred slowly, the WRF model products provided effective information, and the CLSTM-LFN forecast results were more satisfactory. In the period from 19 August 2020 06:00 to 19 August 2020 08:00, CLSTM-LFN successfully hit a wide range of lightning from northeastern Shandong to eastern Guizhou while achieving a more accurate forecast for the gradually expanding range of lightning in northern Anhui (blue line in Figure 9c). However, CLSTM-LFN failed to forecast strip lightning in southeastern Shaanxi and cluster lightning in southeastern Jiangxi (red line in Figure 9b,c). The reason for this situation is that there was only scattered lightning in southeastern Shaanxi at 06:00 h, and the lightning occurred in a rapidly expanding range from 07:00 to 08:00 h. The cluster lightning in southeastern Jiangxi was concentrated at the border between southern Jiangxi and Fujian at the initial moment, and then the convective system moved rapidly to the northwest, which led to the weakening of the indication of historical lightning observation frequency. Remote Sens. 2022, 14, x FOR PEER REVIEW 13 of 20  The above two cases of convective systems occurred slowly, the WRF model products provided effective information, and the CLSTM-LFN forecast results were more satisfac-

Variable Importance Analysis
During neural network model training, different forecast variables contribute differently to the model, and finding the variables with greater influence is important for understanding the development process of lightning. Permutation importance is a common method for variable importance analysis [57]. The main idea is to first input all the variables into the neural network model for training and calculate the original forecast result test scores. Subsequently, each forecast variable is randomly reordered and then input to the trained model to obtain new forecast results and then calculate the forecast result scores. The importance of the input variables is judged according to the degree of change in the scores before and after the reordering. According to the input variables of CLSTM-LFN, three sets of tests were designed as follows.
Exp_obs: Randomly reordering historical lightning occurrence frequency as a whole to produce false observations in areas where no lightning occurs or to consider no lightning to occur in areas where it does occur.
Exp_WRF_whole: The 13 variables of the WRF model products as a whole are randomly reordered for the purpose of comparing their relative importance with historical lightning occurrence frequency (Experiment_obs).
Exp_WRF_sequence: The 13 variables of the WRF model products were sequentially reordered to compare the influence of individual physical variables on the forecast results.
The relative importance TS relative , was used to determine the importance of the variables based on the change in TS scores before and after the disruption (4), with higher relative importance indicating that the variable is more important [58]. Furthermore, based on the calculation of TS relative , , we define the variable relative to compare the relative importance of historical lightning occurrence frequency and WRF model products at different forecast times (5). Combined with the WRF model products, the lightning miss areas also do not show strong radar echoes and high CAPE values (Figure 9d,e). Considered together, the weakening of the indication of historical lightning frequency and the bias of the numerical model products are important reasons for the CLSTM-LFN lightning misses.

Variable Importance Analysis
During neural network model training, different forecast variables contribute differently to the model, and finding the variables with greater influence is important for understanding the development process of lightning. Permutation importance is a common method for variable importance analysis [57]. The main idea is to first input all the variables into the neural network model for training and calculate the original forecast result test scores. Subsequently, each forecast variable is randomly reordered and then input to the trained model to obtain new forecast results and then calculate the forecast result scores. The importance of the input variables is judged according to the degree of change in the scores before and after the reordering. According to the input variables of CLSTM-LFN, three sets of tests were designed as follows.
Exp_obs: Randomly reordering historical lightning occurrence frequency as a whole to produce false observations in areas where no lightning occurs or to consider no lightning to occur in areas where it does occur.
Exp_WRF_whole: The 13 variables of the WRF model products as a whole are randomly reordered for the purpose of comparing their relative importance with historical lightning occurrence frequency (Experiment_obs).
Exp_WRF_sequence: The 13 variables of the WRF model products were sequentially reordered to compare the influence of individual physical variables on the forecast results.
The relative importance TS t,p relative was used to determine the importance of the variables based on the change in TS scores before and after the disruption (4), with higher relative importance indicating that the variable is more important [58]. Furthermore, based The results of Exp_obs and Exp_WRF_whole showed that the relative importance of both variables decreased with increasing forecast time in the 0 to 3 h period (color bar in Figure 10). The calculation results of r t relative increase with increasing forecast time, indicating that the importance of numerical model products increases gradually compared with historical lightning occurrence frequency (blue line in Figure 10). The reason is that, as the forecast time increases, the lightning location and intensity keep changing, which leads to a decrease in the valid information provided by historical observations and an increase in the meteorological conditions represented by the numerical model. is the original forecast score without random reordering (TS scores for CLSTM-LFN in Section 4.1).
The results of Exp_obs and Exp_WRF_whole showed that the relative importance of both variables decreased with increasing forecast time in the 0 to 3 h period (color bar in Figure 10). The calculation results of relative increase with increasing forecast time, indicating that the importance of numerical model products increases gradually compared with historical lightning occurrence frequency (blue line in Figure 10). The reason is that, as the forecast time increases, the lightning location and intensity keep changing, which leads to a decrease in the valid information provided by historical observations and an increase in the meteorological conditions represented by the numerical model. The results of Exp_WRF_sequence showed that the importance of each variable decreases with increasing forecast time ( Figure 11). The relative importance of CAPE remains above 0.5 and is significantly greater than that of other physical variables, while the relative importance of microphysical variables, such as QGRAUP and Rmax, decreases more rapidly with increasing forecast time. Generally, a larger CAPE tends to produce stronger convective activities [59], and several studies have shown that there is a strong correlation between CAPE values and thunderstorms [60,61], which provides an indication for the occurrence of lightning in atmospheric circulation. With increasing forecast time, the deviation of microphysical variables forecasted by numerical model products The results of Exp_WRF_sequence showed that the importance of each variable decreases with increasing forecast time ( Figure 11). The relative importance of CAPE remains above 0.5 and is significantly greater than that of other physical variables, while the relative importance of microphysical variables, such as QGRAUP and R max , decreases more rapidly with increasing forecast time. Generally, a larger CAPE tends to produce stronger convective activities [59], and several studies have shown that there is a strong correla-tion between CAPE values and thunderstorms [60,61], which provides an indication for the occurrence of lightning in atmospheric circulation. With increasing forecast time, the deviation of microphysical variables forecasted by numerical model products gradually increases, causing its relative importance to decrease rapidly. However, the type, content and spatial distribution of ice-phase particles have a certain correlation with the location of lightning [62][63][64], which still provides a reference for the occurrence of lightning at the microscopic level and is an indispensable variable in the process of the nowcasting model. the location of lightning [62][63][64], which still provides a reference for the occurrence of lightning at the microscopic level and is an indispensable variable in the process of the nowcasting model. Combined with the case in Section 4.2 (Figure 9c,e), lightning usually occurs in the areas of high values of CAPE. However, this does not mean that lightning will necessarily occur in the high value area of CAPE; The correspondence between them is not significant enough, and the effective nowcasting of lightning still needs the common indication of multisource data. Figure 11. Relative importance of each variable of the WRF model products (Exp_WRF_sequence) at different forecast times.

Conclusions
The lightning nowcasting model CLSTM-LFN is obtained by merging the lightning observation frequency and WRF model products to construct a spatial-temporal sequence and training with feature extraction and video prediction methods. After testing the batch forecasts for August 2020, the results show that CLSTM-LFN can achieve effective forecasting for 0 to 3 h lightning occurrence areas after merging multisource data, which is a significant improvement compared with the single input data source and traditional lightning parameterization scheme. However, both historical lightning occurrence frequency and numerical model products are less indicative during the incipient and extinction phases of lightning, and the CLSTM-LFN forecast results for such processes still need to be improved.
The results of the input variables to the CLSTM-LFN relative importance analysis showed that the indication of numerical model products gradually increases with increasing forecast time compared to historical lightning occurrence frequency, but both can provide effective indication in the 0 to 3 h forecast time. The forecast results of CAPE by Combined with the case in Section 4.2 (Figure 9c,e), lightning usually occurs in the areas of high values of CAPE. However, this does not mean that lightning will necessarily occur in the high value area of CAPE; The correspondence between them is not significant enough, and the effective nowcasting of lightning still needs the common indication of multisource data.

Conclusions
The lightning nowcasting model CLSTM-LFN is obtained by merging the lightning observation frequency and WRF model products to construct a spatial-temporal sequence and training with feature extraction and video prediction methods. After testing the batch forecasts for August 2020, the results show that CLSTM-LFN can achieve effective forecasting for 0 to 3 h lightning occurrence areas after merging multisource data, which is a significant improvement compared with the single input data source and traditional lightning parameterization scheme. However, both historical lightning occurrence frequency and numerical model products are less indicative during the incipient and extinction phases of lightning, and the CLSTM-LFN forecast results for such processes still need to be improved.
The results of the input variables to the CLSTM-LFN relative importance analysis showed that the indication of numerical model products gradually increases with increasing forecast time compared to historical lightning occurrence frequency, but both can provide effective indication in the 0 to 3 h forecast time. The forecast results of CAPE by numerical models have fewer errors compared with other microphysical variables, so the relative importance of CAPE is significantly greater than that of other input variables.
With the increase in forecast time, the forecast of lightning depends more on the numerical model products, and how to extract useful information from the large number of numerical model products and improve the forecast time will be the research focus of future work.