Algorithm for Improved QPE over Complex Terrain Using Cloud-to-Ground Lightning Occurrences

Lightning and deep convective precipitation have long been studied as closely linked variables, the former being viewed as a proxy, or estimator, of the latter. However, to date, no single methodology or algorithm exists for estimating lightning-derived precipitation in a gridded form. This paper, the third in a series, details the specific algorithm where convective rainfall was estimated with cloud-to-ground lightning occurrences from the U.S. National Lightning Detection Network (NLDN), for the North American Monsoon region. Specifically, the authors present the methodology employed in their previous studies to get this estimation, noise test, spatial and temporal neighbors and the algorithm of the Kalman filter for dynamically derived precipitation from lightning.


Introduction
Precipitation is one of the most important elements of the atmospheric hydrological cycle as well as the climate system [1,2].Precipitation must be accurately measured over a large range of spatio-temporal scales, for example, from convective storms to the global climate system as well as provide accurate measures for applications to water resources, agriculture and civil protection [1,[3][4][5][6].On the larger end of the scale (continental to planetary), Quantitative Precipitation Estimation (QPE) is necessary for accurate assessement of the atmospheric energy budget and hydrological cycle [7].On higher space and time scales, it is also a key component in short-term forecasts (nowcasting) of flooding and other extreme weather events (e.g., [4,5]).
However, due to its high spatio-temporal variability, precipitation is one of the most difficult variables to monitor [8].Rain gauges, under ideal conditions, are considered the most accurate point measurement of precipitation; however, for estimating QPE, geo-statistical methods are necessary and the accuracy of the estimation depends on gauge network density, which in most cases is not adequate [9][10][11][12].Weather radars, on the other hand, are characterized as having high spatio-temporal resolution and coverage and have been considered the best operational option for estimation of rainfall [5,13,14].Radars, nevertheless, also have limitations, such as poor coverage in mountainous regions, no unique Z-R relationship, and evaporation of rain from higher altitudes, among others [5,[14][15][16].
In light of the fact that each of these techniques has limitations for QPE estimation, some researchers have offered merged products, for example, rain gauges and weather radar data [17][18][19][20] to generate gridded QPE.To these data, others have added climatologies of orographic precipitation [21] to produce a multi-sensor QPE (MQPE) and improve QPE accuracy.However, even with this combination of all these data sources and error reduction methodologies, precipitation estimation still suffers from problems with accuracy in regions with complex terrain [21][22][23].
In the last three decades, other techniques, such as satellite-derived precipitation, have been proposed and incorporated for QPE.Satellite-derived precipitation is not limited by coverage problems in mountainous regions as radars are, but it has its deficiencies [23].Mehran and AghaKouchak [24], in comparing satellite QPE with products like PERSIANN [25], CMORPH [26], and TRMM Multi-Satellite Precipitation Analysis Real Time (TMPA-RT) [27] revealed errors for strong precipitation events associated with topography; that is, precisely where multi-sensor methods also produce inaccuracies.Another limitation is that, at least for quantitative precipitation forecasting (QPF), its temporal resolution and latency (which is about 15 min) make these products useful for short-term QPF [28], but not for better precipitation nowcasts.Even with existing modern techniques and methods for QPE, the problems with precipitation estimation persist, especially in regions with poor sensor coverage such as mountainous terrain [22][23][24] where rainfall can be more frequent and copious.

Lightning-Precipitation Relationship (LPR)
Since the mid-twentieth century, many authors have proposed lightning as an estimator of precipitation.Workman and Reynolds [29] reported on the relationship between cloud electrical activity and precipitation.Battan [30] later calculated a rain-yield per flash by visually counting cloud-to-ground (CG) lightning events in southern Arizona.A number of studies such as [31][32][33][34][35]; among many others, have shown that lightning can be used as an estimator of convective precipitation (for a more extended review of these studies please see [36]).Research findings show that the lightning-precipitation relationship varies over a wide range of scales; temporal (minutes to seasonal) and spatial (storm scale, e.g., [34]) to complete regions (about 104 km 2 ; e.g., [33]).

Summary of Previous Work
This study is the third in a series of three papers examining LPR to estimate QPE.In the first paper, Minjarez and co-authors [22] investigated the relationship between convective precipitation and cloud-to-ground lightning in southwest Arizona and northern Sonora, Mexico employing STAGE IV NCEP data for the summer of 2005.The authors developed a LPR-based QPE method for seasonal, daily and hourly precipitation estimation in regions with poor sensor coverage, as well as a method to spatially smooth discrete lightning counts to estimate convective rainfall thereby improving quality and spatial coverage of radar-derived precipitation estimation.The effect of using poor sensor coverage negatively affects LPR, when non-covered regions were included, which is revealed in the decrease in explained variance R 2 between CG lightning and precipitation from 0.39 to 0.23.Finally, by using 1-h accumulated lightning data, the study estimated convective precipitation, finding that about 67% of total precipitation for the 2005 season can be related with CG lightning.Their method was successful in tracking air mass storms and more complex multicellular convective systems in regions with poor sensor coverage.
In [23], the authors improved their methodology for QPE estimation by using lightning at a space and time resolution of 5 km and 5 min.In order to address the time and space displacement that occurs when LPR is correlated at high resolutions, a linear space and time-invariant model was developed, leading to improved results in comparison to the two parameter model in the 2012 study.An optimal time and space neighboring for the gridded data that considered four time lags and one future time step and two coincident spatial neighbors was obtained.Since this linear model is fixed, the time variation in LPR that may occur within a storm was addressed by applying a time variant Kalman filter, which significantly improved the fixed time model.
Performing a data denial analysis in a radar-covered region, this study also provided more and stronger evidence that in a region with proper precipitation sensor coverage, like western Texas, the model can be calibrated by using the covered grids and precipitation can then be estimated in grids with poor sensor coverage.
With the aim of contributing to the development and promoting the use of lightning for estimating precipitation, this article presents, in detail, the algorithm developed by Minjarez and co-authors [23] to estimate QPE by cloud-to-ground lightning, as well as the methodology developed to test the model and select the parameters that were employed in the 2017 study.

Materials and Methods
Here, we present a linear multispatial-temporal model for estimating precipitation from cloud-to-ground lightning counts.In general, linearly estimated precipitation at time t and the (i, j) grid can be defined as: where ] is a feature vector of dimension k at time t from lightning observations L, and θ = [θ 1 , . . ., θ k , b] is a vector of estimation parameters.
For example, in [22], a model is developed by comparing the mean of convective precipitation in the sensor-covered domain at every time step with the corresponding lightning flash sum in the same domain.For this case f i,j (L, t) = L i,j (t) is established with only one feature as where L i,j (t) are the Gaussian lightning counts at time t and (i, j) grid.Thus, the model is The model parameters θ and b can be estimated in two ways.First, in order to calculate the model parameters, a vector of the mean of precipitation and lightning are calculated as: where N(t) is the number of grids with Gaussian counts different from zero at time t.The parameters are obtained by resolving the equation using least squares The model parameters are calculated by comparing the total seasonal convective precipitation (as defined by [23]) accumulation with the total lightning accumulated per grid.
Next, we describe the linear model with spatial and time neighbors, denominated [23] the Spatial and Time Invariant (STI) Model, or seasonal model.Let Nb V represent the spatial neighboring of the grid i, j at time t with V spatial neighbors in the vicinity: Let Λ i,j (t) be the temporally associated vector of lightning observations where I N are the previous time steps (negative time lags) and I P are the latter time steps (positive time lags).Λ ij (t) is a row vector that contains the Gaussian lightning counts of the grid i, j and its neighbors for each time lag considered in the STI model.The STI model can now be described as: and θ = (θ 1 , . . ., θ k , b) is the model parameter vector, where k = (2V + 1) 2 • (I P + I N + 1).The vector θ is adjusted by a least squares criteria by using the grids where is determined that convection is occurring in the relevant dataset.
A static STI model, fixed over all times, is most certainly not a very robust assumption.The model can substantially vary from one convective event to another and even during the evolution of the storm and this is not accounted for by the STI model.A more realistic model would dynamically respond to slowly varying LPR changes in time.In this respect, one of the methods that has been widely used for data assimilation in the atmospheric sciences, particularly for precipitation estimation, is the Kalman filter [37,38].In its original form, the Kalman filter is a linear recursive algorithm that minimizes the mean-squared error to estimate the state of a process.Use of the Kalman filter is thus a straightforward and efficient solution for constructing a time-varying LPR.
The basic idea is to consider the vector parameter θ as a discrete dynamical system of the form where W(t) is a parameter estimation error, modeled as a Gaussian random vector with zero mean and a covariance matrix Q.The estimation algorithm is then used to minimize the error between the observed precipitation at time t and the estimated precipitation obtained with the model and the parameter vector θ.The typical use for a Kalman filter is using a known system and observation operator to estimate the system's state given noisy observations.In our approach the Kalman filter is used in a different way, namely to update the system itself, similar to the one used by [39].
In order to employ Kalman filter, the estimated precipitation for the whole dataset at time t is calculated as: where N is the number of rows and M is the number of columns of grids that define the Cartesian regions to be analyzed and is the estimated precipitation vector.However, θ estimation should be realized only with covered grids (grids with valid precipitation observations).Then, let Y(t) = (Y 1 (t), . . ., Y C (t)) + V(t) be the observed precipitation vector at time t for just the covered grids and V(t) is the observation noise, assumed Gaussian with zero mean and covariance R. The estimated precipitation of covered grids is assumed, similar to the STI model, to be: where H is a matrix of dimension (C, N × M).Entries H a,b = 1 if Y a (t) correspond to the jth element of the vector P(t), else H ij = 0. H is then a matrix used to reduce a vector of dimension N × M that represents the observed precipitation as a vector with only the covered grids.Figure 1 shows the scheme of the procedure.The Kalman filter algorithm steps performed for each new time-step t are: 1.
Update the error matrix E(t) and the covariance matrix Ā(t) 2.
Update the Kalman gain matrix K(t) 3.
Update the model parameter vector estimation θ(t) 4.
Prepare covariance A(t) for the next time-step 5.
Produce updated precipitation estimate P(t) where A(t) is the a posteriori error covariance matrix.In order to apply the Kalman filter there is a need to impose an initial condition A(0) and the covariance matrix Q and R. The value of A(0) determines the initial estimation rate of the Kalman filter and is normally used as a diagonal matrix with large values.In our implementation of the algorithm, we initiate the Kalman filter with the STI model parameters derived from long-term observations.Once the Kalman estimation commences, it dynamically adjusts the coefficients at each time step (5 min) until 5 consecutive time steps report no lightning in the covered domain.At this point, the coefficients slowly (5 time steps) revert back to those of the STI which can be viewed as the best guess in the absence of new information.The rate at which we revert back is based loosely on the time evolution of a typical thunderstorm (air mass type) which lasts about 45-60 min, including developing, mature, and decay phases.Thus, we assert that our time lags assumption (25 min) is sufficiently long to assure that the next lightning strike will be associated with a new cell/storm.Although somewhat arbitrary, this period is certainly consistent with the physics of the problem.

Time and Space Neighboring
A novel way to treat the time and space misplacement in LPR is to use neighbors for higher resolutions.Figure 2 compares the correlation (vertical axes) for several time lags (horizontal axes) and space neighboring situations.It shows the variation in correlation coefficients between STI estimated and observed precipitation, considering all 10 lags and 4 space neighbor scenarios.
The results from Figure 2 show that, at high space and time resolutions, if precipitation and lightning at grid i, j are correlated without space and time neighbors, the relationship results are poor (between 0.1 and 0.25 for this case).This model can work for larger spatio-temporal resolutions, as shown by [22], for the seasonal case and 8 km grid size.By observing and evaluating the correlation plots, the benefit that results from the time and space neighbors is evident, although this varies from season to season and from region to region.For instance, for the southern Arizona area, comparing the correlation with and without use of spatial neighboring, the maximum benefit from the spatial neighbors is (over an average of 2 years) ∼16%, practically the same as the Texas result, a totally sensor covered region.The use of time lags has a maximum average impact of ∼20% for both regions.With just the addition of one past lag (000110000 lag scenario) the correlation increases up to 13% in comparison with 0-lag scenario.
If we compare this with the increment in correlation for the time lag case 000111000 (one future and one past time lags), we find that the past time lags are more relevant than the future ones.The impact on the correlation of using all time lags with respect to using past and one future lag is not more than 3% for both sites and for both seasons.Given the negligible computational benefit, this last lag criteria was chosen for our study.
This last result can be used as the threshold to differentiate the benefit from the improvement of the physics of the LPR and the mathematical benefit of only having more model parameters.The time neighboring more or less matches with the times reported by other authors where precipitation and CG lightning are correlated [34,35,40].In the combination of time lags and space neighbors there were no significant differences (regarding the 3% referred in the last paragraph) for more than 2 space neighbors.For grid spacing of about 5 km, representing an area of 25 square km, this suggests that at least one propagating air mass thunderstorm with its convective core and some of the transient stratiform precipitation can be captured.

Kalman Filter Testing Model Robustness
Overall, one of the factors that must be considered for a Kalman filter is the sensitivity to noise and model errors.In general, a robust model that does not track the high frequency variations in the Kalman error (E(t) above) is required.
For this study, a test of model robustness was performed.This test consisted of running the Kalman filter using real lightning observations as input, but scaled up by 2, starting at a specific point in time (shown in the top panel of Figure 3 below).It is expected that with the appropriate parameters in the correlation matrixes R and Q, this division will cause the Kalman filter to smoothly reduce the sum of the filter coefficients by a factor of 2 (after some time steps).Early attempts at this analysis found that the absence of persistent, excitation of the Kalman filter (few or no lightning input) can affect this testing, so Gaussian noise was added to the input in order to have all temporal values different to zero. Figure 3 shows a panel with the steps of this test for the event at the Texas site for the 150th day of 2009.The top plot shows Gaussian counts time series, the middle panel shows the ratio for the total parameter sum and, at the bottom panel, the time series of P per each lightning entry is presented.
Figure 3 shows disagreement between the P from the scaled input with that of the true input once the lightning strikes are degraded at around minute 150.It is clear that the parameters slowly adjust themselves to compensate for this "step" change in input, and when the ratio converges to 2 the estimated precipitation converge for both conditions (estimated from degraded input and another).This demonstrates that the selected Kalman parameters are appropriate for our use.

Discussion
In this article, the details of the methodology for estimating precipitation through use of cloud-to-ground lightning occurrences was presented.A linear model was developed for this purpose, spatial and time neighbors were used as a static filter which initiates the Kalman filter.A study of what was the optimal space and time neighboring was performed obtaining that the relation cost-benefit in terms of correlation and computational cost was 2 space and 5 time neighbors.As stated in [23]; this neighbor configuration provides better results, in general, for the season and individual convective storms in Texas and Arizona.
As presented, we intend for this methodology to be used as a new precipitation estimator which can also be integrated to the other techniques for calculating QPE, especially, but not exclusively for, those areas of strong convective precipitation and poor sensor coverage (e.g., southern Arizona).Finally, it should be clearly stated that this methodology just estimates convective precipitation associated or explained by lightning (as [22,23] defined) and thus does not estimate stratiform precipitation.This is one weaknesses of this method.Given that some authors have pointed out that current multi-sensor products still miss some strong convective events [24,41], our lightning-derived precipitation methodology can be utilized as another source of estimation for precisely these convective events and, hence, integrated as another source of Muli-sensor derived precipitation.

Figure 3 .
Figure 3. KMAF 2009 event on the 150th day of the year.Lightning time series (top panel), parameters ratio (middle panel) and estimated precipitation (bottom panel).