The Retrieval Relationship between Lightning and Maximum Proxy Reflectivity Based on Random Forest

: Using the SWAN (Severe Weather Automatic Nowcasting) maximum reflectivity mosaic product and the lightning positioning observations (LPOs) from the ADTD (Advanced Direction and Time of Arrival Detection) system obtained during the 2018–2020 warm season (May to September), adding multi-characteristic LPO parameters in addition to lightning density, the retrieval relationship between lightning and maximum proxy reflectivity, deemed FRST, is constructed by using random forest. The FRST is compared with two empirical relationships from the GSI (Gridpoint Statistical Interpolation) assimilation system, and the results show that the FRST retrieved result better reflects the frequency distribution structure and peak interval of maximum reflectivity. The correlation coefficient between the FRST retrieved result and the observed maximum reflectivity is 0.7037, which is 3.38 (3.12) times greater than that of empirical GSI relationships. The root mean square error and the mean absolute error are 50.85% (28.05%) and 57.15% (35.19%) lower than those for the empirical GSI relationships, respectively. The equitable threat score (ETS) and bias score (BIAS) for FRST are better than those of the empirical GSI relationships in all three maximum reflectivity intervals.


Introduction
Lightning positioning observations (LPOs) can provide effective convective activity information, and lightning data assimilation studies can improve the forecasting of disaster weather [1,2].Lightning is a good indicator of thunderstorms [2,3].With the advantages of high spatial resolution, wide coverage, low influence of topography, and continuous monitoring, LPOs can be used to monitor the development of thunderstorms [4] and to carry out lightning data assimilation in numerical models, thus improving the ability to forecast disaster weather [5,6].Therefore, lightning data assimilation has certain research value in disaster weather forecasting [7][8][9][10][11].
Since an LPO is not a conventional model variable that cannot be applied to model initialization directly, it is necessary to convert LPO to a model variable or related diagnostic variable using empirical or semiempirical relationships.Part of the previous studies analyzed the relationship between lightning and related variables [12,13] or compared the related variables with and without lightning [14] but did not give a retrieval relationship between them.Some studies found a retrieval relationship between lightning and related variables in order to assimilate the lightning data [15][16][17].Lightning density is the number of LPO records within a certain area in a certain period.In recent studies, lightning density was correlated with three-dimensional (3D) proxy reflectivity [6], precipitation rate [18], specific humidity [19], relative humidity [20], vertical velocity [15], ice-phase particle content [21], water vapor and graupel mixing ratio [22], ice-phase particle concentration, and water vapor content [23].Through these relationships, LPOs were assimilated into numerical models to improve disaster weather forecasting.Notably, the conversion of lightning density to 3D proxy reflectivity was performed with the GSI assimilation system of the Rapid Update Cycle operational forecasting system in the USA [16,17].The specific steps were as follows: (1) The lightning was converted to the maximum proxy reflectivity based on the empirical relationship between the lightning density and maximum proxy reflectivity; (2) the maximum proxy reflectivity was multiplied by the vertical profile coefficient to obtain the 3D proxy reflectivity.Many studies have used empirical GSI relationships to convert lightning density to 3D proxy reflectivity [5,[24][25][26][27], and some scholars have established retrieval relationships between lightning density and maximum proxy reflectivity or 3D proxy reflectivity.Sun et al. [28] used LPO and radar mosaic products from 2014 to 2018 in Central China to establish a retrieval relationship between lightning density and maximum proxy reflectivity on a 13 km grid and a 3 km grid; they noted that the maximum proxy reflectivity for the 3 km grid was more accurate than that for the 13 km grid, and the maximum proxy reflectivity for the 13 km grid was closer to the observations than that retrieved from the empirical GSI relationship.Chen et al. [6] established a logarithmic relationship between lightning density and maximum proxy reflectivity through six disaster cases in Beijing in the summer of 2017.The maximum proxy reflectivity was retrieved by the relationship and converted to 3D proxy reflectivity through a real-time profile retrieval scheme.The results showed that the 3D proxy reflectivity well reflected the actual observations.Previous studies have used lightning density to establish a relationship between lightning and maximum proxy reflectivity, which was obtained by fitting [6,28].In particular, in the GSI system, there are two empirical relationships between lightning density and the maximum proxy reflectivity [16,17].
In addition to the lightning density, time, location, intensity, and polarity of lightning can be used to identify complex thermodynamic processes in thunderstorm clouds [29][30][31][32], which are related to the maximum reflectivity.Yang et al. [33] noted that the location of lightning did not necessarily correspond to the location of strong radar echoes and indicated that lightning density and radar echo intensity often appeared to be related in time series.Yan et al. [34] showed that there were two peaks in positive cloud-to-ground lightning density, with the primary peak occurring during the development phase of the convective system and the secondary peak occurring during the dissipation phase of the convective system.A study by Zajac and Rutledge [35] revealed that negative cloud-toground lightning mainly occurred in convective clouds, while positive cloud-to-ground lightning mainly occurred in stratiform clouds.
The retrieval relationship between lightning and maximum proxy reflectivity affects the accuracy of the retrieved maximum proxy reflectivity and hence the effect of lightning data assimilation.Previous studies have mainly used lightning density to construct retrieval relationships and have not considered other lightning features linked to maximum reflectivity.Therefore, we attempt to fully use information on lightning density, time, location, intensity, and polarity to study the retrieval relationships in this paper.By considering more lightning features (i.e., more complicated relationships), we learn the retrieval relationship with a random forest [36][37][38][39], a machine learning method, instead of a traditional fitting method.The SWAN maximum reflectivity mosaic product and ADTD LPOs obtained during the 2018-2020 warm season (May to September) in Hebei Province (Figure 1) are used in this paper.In addition to the lightning density, multi-characteristic LPO parameters (such as the temporal coefficient, spatial coefficient, and current intensity) are considered.These LPO parameters and the random forest are used to construct the retrieval relationship between lightning and maximum proxy reflectivity for a high-resolution model grid.The new retrieval relationship is compared with the empirical GSI relationships.The article is organized as follows: Section 1 gives the introduction.In Section 2, the materials and methods are introduced, and a new retrieval relationship between lightning and maximum proxy reflectivity is constructed.In Section 3, the effect of the new retrieval relationship is assessed in the context of the empirical GSI relationships.Section 4 discusses the results and highlights several future research directions.The conclusions are presented in Section 5.For ease of understanding, Table 1 lists all the abbreviations used in this paper and their full definitions.considered.These LPO parameters and the random forest are used to construct the retrieval relationship between lightning and maximum proxy reflectivity for a high-resolution model grid.The new retrieval relationship is compared with the empirical GSI relationships.The article is organized as follows: Section 1 gives the introduction.In Section 2, the materials and methods are introduced, and a new retrieval relationship between lightning and maximum proxy reflectivity is constructed.In Section 3, the effect of the new retrieval relationship is assessed in the context of the empirical GSI relationships.Section 4 discusses the results and highlights several future research directions.The conclusions are presented in Section 5.For ease of understanding, Table 1 lists all the abbreviations used in this paper and their full definitions.

GSI1
A linear relationship between lightning density and maximum proxy reflectivity in the GSI system GSI2 Nonlinear relationship between lightning density and maximum proxy reflectivity in the GSI system

Materials
The ADTD lightning positioning system was developed by the Institute of Space Science and Applications of the Chinese Academy of Sciences and mainly detects cloudto-ground lightning [40,41].It can detect multiple return strokes of flashes [42,43], and the detection efficiency is above 80% [27,[42][43][44][45][46][47].There are 11 ADTD lightning positioning stations in Hebei Province, and the average minimum distance between the two stations is 113 km.In this paper, elements such as time, latitude, longitude, current strength, maximum steepness of the return stroke, and the positioning methods of the ADTD LPOs are used.Referring to the quality control method of Wang et al. [48], LPO records with positioning method from 2 stations or less are excluded, and only LPO records with absolute values of the current intensity at (5,500) kA and absolute values of the maximum steepness of the return stroke at (0, 500) kA/µs are retained.SWAN was initially developed by the State Key Laboratory of Disaster Weather of the Chinese Academy of Meteorological Sciences and was developed and applied by the Numerical Forecasting Center of the China Meteorological Administration.SWAN has been applied in real-time quality control and networking for CINRAD-SA, CINRAD-SB, and CINRAD-CB radars [49].In this article, the maximum reflectivity mosaic product of SWAN, which has a horizontal resolution of 0.01 • and a temporal resolution of 6 min, is applied.The distributions of ADTD lightning positioning stations and weather radar stations in the SWAN network are shown in Figure 1b.Considering radar data gaps and lagged values and the poor quality of radar data in some areas due to terrain blockage issues, a data set of the complete maximum reflectivity mosaic during lightning activity is obtained by focusing on the southern plain area of Hebei Province (the area south of latitude 39.5 • N and west of longitude 117.85 • E within Hebei Province, with a terrain height of less than 70 m, as shown in Figure 1b).

Two Empirical Relationships between the Lightning Density and Maximum Proxy Reflectivity in the GSI System
There are two empirical relationships between lightning density and maximum proxy reflectivity in the GSI system: linear (Weygandt et al. [16], known as GSI1) and nonlinear (Weygandt et al. [17], known as GSI2).The linear relationship is given by Equation (1), and the nonlinear relationship is shown in Table 2. LTG is the number of LPO records in a given grid cell (approximately 13 km × 13 km) summed over a 40-min period around the analysis hour (before 30 min to after 10 min), and REFL is the maximum proxy reflectivity [16,17,28].Considering the horizontal resolution of the SWAN product and the definition of lightning density LTG in the empirical GSI relationships, the lightning density is set as the number of LPO records within a radius of 0.08 • centered around the grid point of maximum reflectivity during the time windows of 30 min before and 10 min after the whole hour.An analysis of the lightning density in the 2018-2020 warm season indicates that the frequency of lightning density displays an exponentially decreasing trend, with a lightning density of 1 at 53.83% and a lightning density of 9 less than 1% (Figure 2a).When the lightning density is 10, the cumulative frequency reaches 95% (Figure 2b).Thus, the 10 LPO records around a grid point effectively reflect the lightning activity near that grid point and are thus used in the process of constructing the relationship between lightning and maximum proxy reflectivity.
Considering the horizontal resolution of the SWAN product and the definition of lightning density LTG in the empirical GSI relationships, the lightning density is set as the number of LPO records within a radius of 0.08° centered around the grid point of maximum reflectivity during the time windows of 30 min before and 10 min after the whole hour.An analysis of the lightning density in the 2018-2020 warm season indicates that the frequency of lightning density displays an exponentially decreasing trend, with a lightning density of 1 at 53.83% and a lightning density of 9 less than 1% (Figure 2a).When the lightning density is 10, the cumulative frequency reaches 95% (Figure 2b).Thus, the 10 LPO records around a grid point effectively reflect the lightning activity near that grid point and are thus used in the process of constructing the relationship between lightning and maximum proxy reflectivity.The process of constructing the relationship between lightning and maximum proxy reflectivity is shown in Figure 3.The total data set is constructed using the maximum reflectivity, lightning density, and multi-characteristic parameters of the 10 LPO records.The lightning density and multi-characteristic parameters are used as independent variables, and the maximum reflectivity is used as the object variable.The multi-characteristic parameters include temporal coefficient, spatial coefficient, and current intensity.The 10 LPO records are ranked in ascending order of spatiotemporal coefficient.A portion of the total data set is randomly selected as the test set, and the rest is selected as the training set.
The training set and the random forest are used to train the retrieval relationship between lightning and maximum proxy reflectivity, and the test set is used to compare the effect of the new retrieval relationship with that of the empirical GSI relationships.The details are described below.The process of constructing the relationship between lightning and maximum proxy reflectivity is shown in Figure 3.The total data set is constructed using the maximum reflectivity, lightning density, and multi-characteristic parameters of the 10 LPO records.The lightning density and multi-characteristic parameters are used as independent variables, and the maximum reflectivity is used as the object variable.The multi-characteristic parameters include temporal coefficient, spatial coefficient, and current intensity.The 10 LPO records are ranked in ascending order of spatiotemporal coefficient.A portion of the total data set is randomly selected as the test set, and the rest is selected as the training set.The training set and the random forest are used to train the retrieval relationship between lightning and maximum proxy reflectivity, and the test set is used to compare the effect of the new retrieval relationship with that of the empirical GSI relationships.The details are described below.
Considering the time and location of lightning-related to the maximum reflectivity, a temporal coefficient (t c ) and a spatial coefficient (r c ) are defined on the basis of the definition of lightning density LTG in the empirical GSI relationships.The temporal coefficient is the difference in time between the time of lightning and the time of maximum reflectivity (on the whole hour) divided by the standardized duration of 30 min.The spatial coefficient is the difference in distance between the location of lightning and the maximum reflectivity grid point divided by the standardized distance of 0.08 • .The current intensity is also used to reflect the intensity and polarity of lightning.A spatiotemporal coefficient (tr c ) is defined to consider the relationship between the time and location of lightning and the maximum reflectivity.The temporal coefficient, spatial coefficient, and spatiotemporal coefficient are calculated with Equations (2)-( 4), where t lgt is the time of lightning, t radar is the time of maximum reflectivity, lon lgt is the longitude of the lightning, lon radar is the longitude of the maximum reflectivity grid point, lat lgt is the latitude of the lightning, and lat radar is the latitude of the maximum reflectivity grid point.The 10 LPO records around a grid point are filtered and arranged in ascending order according to the spatiotemporal coefficients, preserving the multi-characteristic LPO parameters (including temporal coefficient, spatial coefficient, and current intensity) in the process.In addition to the lightning density, the multi-characteristic parameters of 10 LPO records are added to construct a data set that includes 31 independent variables, with maximum reflectivity as the object variable.The total data set obtained for the 2018-2020 warm season contains 780,273 records.
(2)  Considering the time and location of lightning-related to the maximum reflectivity, a temporal coefficient (tc) and a spatial coefficient (rc) are defined on the basis of the definition of lightning density LTG in the empirical GSI relationships.The temporal coefficient is the difference in time between the time of lightning and the time of maximum reflectivity (on the whole hour) divided by the standardized duration of 30 min.The spatial coefficient is the difference in distance between the location of lightning and the maximum reflectivity grid point divided by the standardized distance of 0.08°.The current intensity is also used to reflect the intensity and polarity of lightning.A spatiotemporal coefficient (trc) is defined to consider the relationship between the time and location of lightning and the maximum reflectivity.The temporal coefficient, spatial coefficient, and spatiotemporal coefficient are calculated with Equations (2)-( 4), where tlgt is the time of lightning, tradar is the time of maximum reflectivity, lonlgt is the longitude of the lightning, lonradar is the longitude of the maximum reflectivity grid point, latlgt is the latitude of the lightning, and latradar is the latitude of the maximum reflectivity grid point.The 10 LPO records around a grid point are filtered and arranged in ascending order according to the spatiotemporal coefficients, preserving the multi-characteristic LPO parameters (including temporal coefficient, spatial coefficient, and current intensity) in the process.In addition to the lightning density, the multi-characteristic parameters of 10 LPO records are added to construct a data set that includes 31 independent variables, with maximum reflectivity as the object variable.The total data set obtained for the 2018-2020 warm season contains 780,273 records. (2) The random forest is an integrated machine-learning algorithm that was proposed  The random forest is an integrated machine-learning algorithm that was proposed by Breiman [50] for solving classification and regression problems.The scikit-learn toolkit in Python covers nearly all mainstream machine-learning algorithms.In this paper, we use RandomForestRegressor, a random forest regressor in the toolkit; the parameter settings are shown in Table 3.A random selection of 134,524 records (approximately 17.24%) from the total data set is used as the test set, and the remaining records are used as the training set.The training set and random forest are used to train the retrieval relationship between lightning and maximum proxy reflectivity, deemed FRST, and the test set is used to compare the retrieval effects of the FRST and empirical GSI relationships (GSI1 and GSI2).A comparative analysis of the FRST, GSI1, and GSI2 results is presented in Section 3.

Verification Methods
To compare the new retrieval relationship with the empirical GSI relationships, three verification metrics, the correlation coefficient, the root mean square error, and the mean absolute error, are used.The correlation coefficient is calculated to reflect the magnitude of the correlation between the retrieved results and the observed maximum reflectivity.The magnitude of the difference between the retrieved results and the observed maximum reflectivity is measured by root mean square error and mean absolute error.In addition, the ETS and BIAS are used to reflect the retrieval effects of different relationships in different maximum reflectivity intervals.In a given interval, the ETS can vary from poorly retrieved results (when ETS = 0) to optimally retrieved results (when ETS = 1), and the BIAS represents systematic overestimation (when BIAS > 1) or underestimation (when BIAS < 1).We use the above verification metrics to provide a comprehensive evaluation of the retrieval relationships.For example, a high ETS indicates a good retrieval effect only if it is accompanied by a BIAS close to 1, a high correlation coefficient, and a low root mean square error and mean absolute error.The correlation coefficient, root mean square error and mean absolute error are calculated with Equations ( 5)- (7).Based on the observations and the retrieved results, statistical analysis is conducted using a dichotomous column table (Table 4), and the ETS and BIAS are calculated using Equations ( 8)- (10).O j and R j are the actual observations and retrieved results, O and R are the means of the actual observations and retrieved results, and N is the number of samples involved in the test.

Maximum Reflectivity Frequency
The maximum reflectivity frequency for the total data set and test set versus the maximum proxy reflectivity frequency from the retrieved results of the three relationships are shown in Figure 4.For the total data set, the maximum reflectivity ranges from 0 to 75 dBZ when lightning occurs, with a single-peak structure and a maximum frequency of 17.07% in the (30,35] dBZ interval (where 32.5 dBZ represents the (30,35] dBZ interval, and so on for other intervals).The test set reflects the maximum reflectivity frequency distribution of the total data set well.The FRST retrieved result not only reflects the singlepeak structure but also indicates a peak interval (35,40] dBZ similar to the actual peak interval (30,35] dBZ.However, the frequency is higher in the (25,45] dBZ interval and lower in other intervals.The GSI1 retrieved result shows a bimodal distribution, with the main peak interval at (15,20] dBZ and a frequency of 71.92% in this interval, which is a large shift from the actual peak interval at (30,35] dBZ.Although the GSI2 retrieved result reflects the actual peak interval of (30,35] dBZ, the frequency is too high (88.85%) in this interval and too low in other intervals.The cumulative frequency of lightning density influences the peak interval and maximum frequency of the GSI1 and GSI2 retrieved results.For the test set, the cumulative frequency is 71.92% when the lightning density is 1~2, and the GSI1 retrieved result calculated from Equation (1) falls within the (15,20] dBZ interval; the cumulative frequency is 88.85% when the lightning density is 1~5, and the GSI2 retrieved result obtained based on Table 2 is in the (30,35] dBZ interval.In addition to the lightning density, the FRST relationship encompasses other lightning characteristics related to maximum reflectivity.This is a possible reason why the FRST retrieved result better reflects the single-peak structure and peak interval of the maximum reflectivity frequency.

Frequency Distribution of Maximum Reflectivity at Different Lightning Densities
Figure 5a shows the frequency distribution of maximum reflectivity at different lightning densities based on the total data set.For a fixed lightning density, the frequency distribution of the maximum reflectivity displays an unimodal structure, with the highest frequency occurring in the (30,40] dBZ interval (where 35 dBZ represents the (30,40] dBZ interval, and so on for other intervals), accounting for more than 30% of that lightning density amount.When lightning activity occurs around a grid point, the maximum reflectivity may be large (above 65 dBZ) or small (below 5 dBZ) at that grid point.The frequency distributions of the maximum reflectivity for the test set and the total data set are approximately the same for different lightning densities (Figure 5a,b).When the lightning density is 1~10, the FRST retrieved results reflect the single-peak structure of the maximum reflectivity frequency distribution, with the maximum frequency in the (30,40] dBZ interval.However, the frequency is higher in the (30,40] dBZ interval and lower in the (10,30] dBZ and (40, 60] dBZ intervals (Figure 5b,c).According to the empirical GSI relationships, the lightning density and the maximum proxy reflectivity exhibit a one-to-one correspondence, such that the GSI1 retrieved result is concentrated in the (10, 20] dBZ, (20,30] dBZ, and (30,40] dBZ intervals when the lightning density is 1~2, 3~6, and 7~10, respectively (Figure 5d); the GSI2 retrieved result is concentrated in the (30,40] dBZ interval when the lightning density is 1~10 (Figure 5e).For a given lightning density, the retrieved results of

Frequency Distribution of Maximum Reflectivity at Different Lightning Densities
Figure 5a shows the frequency distribution of maximum reflectivity at different lightning densities based on the total data set.For a fixed lightning density, the frequency distribution of the maximum reflectivity displays an unimodal structure, with the highest frequency occurring in the (30,40] dBZ interval (where 35 dBZ represents the (30,40] dBZ interval, and so on for other intervals), accounting for more than 30% of that lightning density amount.When lightning activity occurs around a grid point, the maximum reflectivity may be large (above 65 dBZ) or small (below 5 dBZ) at that grid point.The frequency distributions of the maximum reflectivity for the test set and the total data set are approximately the same for different lightning densities (Figure 5a,b).When the lightning density is 1~10, the FRST retrieved results reflect the single-peak structure of the maximum reflectivity frequency distribution, with the maximum frequency in the (30,40] dBZ interval.However, the frequency is higher in the (30,40] dBZ interval and lower in the (10,30] dBZ and (40, 60] dBZ intervals (Figure 5b,c).According to the empirical GSI relationships, the lightning density and the maximum proxy reflectivity exhibit a one-to-one correspondence, such that the GSI1 retrieved result is concentrated in the (10,20] dBZ, (20,30] dBZ, and (30,40] dBZ intervals when the lightning density is 1~2, 3~6, and 7~10, respectively (Figure 5d); the GSI2 retrieved result is concentrated in the (30,40] dBZ interval when the lightning density is 1~10 (Figure 5e).For a given lightning density, the retrieved results of the empirical GSI relationships occur only within a fixed interval and not within other intervals, resulting in a frequency that is too high or too low (Figure 5b,d,e).Compared to the GSI1 and GSI2 retrieved results, the FRST retrieved results better reflect the frequency distribution structure and peak interval of the actual maximum reflectivity at different lightning densities.

Correlation Coefficient
The maximum proxy reflectivity is retrieved from the test set, and the correlation coefficient between it and the observed maximum reflectivity is calculated.The

Correlation Coefficient
The maximum proxy reflectivity is retrieved from the test set, and the correlation coefficient between it and the observed maximum reflectivity is calculated.The comparison indicates that the correlation between the FRST retrieved result and the observed maximum reflectivity is the best.The correlation coefficients for GSI1 and GSI2 are comparable at 0.1608 and 0.1709, respectively, and the correlation coefficient for FRST is 0.7037 (Figure 6).The correlation coefficient for FRST is 3.38 and 3.12 times higher than those for GSI1 and GSI2, respectively.The empirical GSI relationships use only lightning density to construct a retrieval relationship for the maximum proxy reflectivity.Figure 5a shows that the lightning density alone does not fully reflect this relationship.Even when the lightning density is 1, the maximum reflectivity may be large (above 65 dBZ).It is not sufficient to consider only the lightning density when constructing a retrieval relationship between lightning and maximum proxy reflectivity.Unlike empirical GSI relationships, the FRST relationship adds other lightning features linked to maximum reflectivity, which improves the correlation coefficient.comparison indicates that the correlation between the FRST retrieved result and the observed maximum reflectivity is the best.The correlation coefficients for GSI1 and GSI2 are comparable at 0.1608 and 0.1709, respectively, and the correlation coefficient for FRST is 0.7037 (Figure 6).The correlation coefficient for FRST is 3.38 and 3.12 times higher than those for GSI1 and GSI2, respectively.The empirical GSI relationships use only lightning density to construct a retrieval relationship for the maximum proxy reflectivity.Figure 5a shows that the lightning density alone does not fully reflect this relationship.Even when the lightning density is 1, the maximum reflectivity may be large (above 65 dBZ).It is not sufficient to consider only the lightning density when constructing a retrieval relationship between lightning and maximum proxy reflectivity.Unlike empirical GSI relationships, the FRST relationship adds other lightning features linked to maximum reflectivity, which improves the correlation coefficient.

Root Mean Square Error and Mean Absolute Error
The root mean square error and mean absolute error of the FRST are smaller than those for the GSI1 and GSI2; notably, the root mean square error of FRST (8.13 dBZ) is 50.85% and 28.05% lower, respectively (Figure 7a).Additionally, the mean absolute error of the FRST (5.93 dBZ) is 57.15% and 35.19% lower than that of GSI1 (13.84 dBZ) and GSI2 (9.15 dBZ), respectively (Figure 7b).Both the root mean square error and mean absolute error of the FRST are also smaller than those of the empirical GSI relationships at each fixed lightning density (Figure 7a,b).Thus, considering more lightning characteristics, i.e., more complicated relationships, and learning the retrieval relationship with a machine learning method instead of the traditional fitting method could be the reason for the reduced root mean square error and mean absolute error in FRST.

Root Mean Square Error and Mean Absolute Error
The root mean square error and mean absolute error of the FRST are smaller than those for the GSI1 and GSI2; notably, the root mean square error of FRST (8.13 dBZ) is 50.85% and 28.05% lower, respectively (Figure 7a).Additionally, the mean absolute error of the FRST (5.93 dBZ) is 57.15% and 35.19% lower than that of GSI1 (13.84 dBZ) and GSI2 (9.15 dBZ), respectively (Figure 7b).Both the root mean square error and mean absolute error of the FRST are also smaller than those of the empirical GSI relationships at each fixed lightning density (Figure 7a,b).Thus, considering more lightning characteristics, i.e., more complicated relationships, and learning the retrieval relationship with a machine learning method instead of the traditional fitting method could be the reason for the reduced root mean square error and mean absolute error in FRST.

ETS and BIAS
In previous studies, the regions with maximum reflectivity of 20-40 dBZ and >40 dBZ are defined as stratiform and convective cloud regions, respectively [51][52][53][54].In this paper, maximum reflectivity is divided into three intervals, namely, (0, 20] dBZ, (20,40] dBZ, and >40 dBZ, and BIAS and ETS are calculated in different intervals.Overall, the FRST is the best of the three relationships (Figure 8).

ETS and BIAS
In previous studies, the regions with maximum reflectivity of 20-40 dBZ and >40 dBZ are defined as stratiform and convective cloud regions, respectively [51][52][53][54].In this paper, maximum reflectivity is divided into three intervals, namely, (0, 20] dBZ, (20,40] dBZ, and >40 dBZ, and BIAS and ETS are calculated in different intervals.Overall, the FRST is the best of the three relationships (Figure 8).
The BIAS of the FRST is closest to 1 in the different intervals.At intervals of (0, 20] dBZ, (20,40] dBZ and >40 dBZ, the BIAS of the FRST are 0.5250, 1.2508 and 0.7087, respectively, the BIAS of the GSI1 are 4.9038, 0.4784 and 0, respectively, and the BIAS of the GSI2 are 0, 1.6766, and 0.0593, respectively (Figure 8a).Although the FRST is overestimated in the (20,40] dBZ interval and underestimated in the other intervals, it is more reasonable in the three intervals.The BIAS of the three relationships for the different intervals is consistent with the results of previous analyses of the maximum reflectivity frequency and the maximum reflectivity frequency distribution at different lightning densities (Figures 4 and 8a; Figures 5 and 8a).
The FRST displays the highest ETS.In the intervals of (0, 20] dBZ, (20,40] dBZ, and >40 dBZ, the ETS are 0.2875, 0.6693, and 0.3285 for the FRST, 0.0268, 0.3027, and 0 for the GSI1, and 0, 0.5364 and 0.0134 for the GSI2, respectively (Figure 8b).For all three intervals, the FRST displays the highest retrieval skill, and the GSI1 and GSI2 perform well only in the (20,40] dBZ interval. The GSI1 cannot retrieve the maximum proxy reflectivity at >40 dBZ and completely misses in this interval.The cumulative frequency of lightning densities from 1~2 is 71.92%, corresponding to the frequency of the GSI1 retrieved results in the (0, 20] dBZ interval (Figure 4); however, the maximum reflectivity is mainly in the (20,40] dbZ interval in reality (Figure 5b), indicating that the GSI1 excessively overestimates in the (0, 20] dBZ interval.Although the hit rate is high in the (0, 20] dbZ interval, excessive overestimation results in a low ETS for the GSI1 in this interval.The GSI2 retrieved results range from 30.13 to 43.74 dBZ, with complete misses and no skill in the (0, 20] dbZ interval.Only 1.58% of the GSI2 retrieved results (when the lightning density > 18) are in the >40 dBZ interval, with a low hit rate, severe misses, and low ETS.Possibly due to considering more lightning features, the FRST retrieved results are closer to the observed maximum reflectivity (Figure 7b), and the ETS and BIAS of the FRST are the best in all three intervals.

Test Case
To visualize the effects of the three relationships, a convective process at 14:00 UTC on 7 June 2022 is selected for comparative analysis.The results show that the FRST works best for this individual case.Two centers of strong maximum reflectivity were present at this time in Cangzhou City and Hengshui City, with intensities greater than 40 dBZ, and lightning was active near these strong centers (Figure 9).The lightning density distribution at this time is shown in Figure 10a.A comparison of the observed maximum reflectivity in the lightning ranges and the retrieved results of the three relationships indicate that the FRST retrieved result reflects the two strong centers well, but the intensity is comparatively weak at the strong centers and stronger in other lightning ranges (Figure 10b,c).The GSI1 reflects only the strong center in Hengshui City, and the retrieved result is weaker overall (Figure 10b,d).The GSI2 retrieved result is concentrated at (30,40] dBZ and does not reflect the two strong centers (Figure 10b,e).For this case, the FRST retrieved result best reflects the center and intensity distribution of the maximum reflectivity, which is closest to reality in the three relationships.The BIAS of the FRST is closest to 1 in the different intervals.At intervals of (0, 20] dBZ, (20,40] dBZ and >40 dBZ, the BIAS of the FRST are 0.5250, 1.2508 and 0.7087, respectively, the BIAS of the GSI1 are 4.9038, 0.4784 and 0, respectively, and the BIAS of the GSI2 are 0, 1.6766, and 0.0593, respectively (Figure 8a).Although the FRST is overestimated in the (20,40] dBZ interval and underestimated in the other intervals, it is more reasonable in the three intervals.The BIAS of the three relationships for the different intervals is consistent with the results of previous analyses of the maximum reflectivity frequency and the maximum reflectivity frequency distribution at different lightning densities (Figures 4 and 8a; Figures 5 and 8a).
The FRST displays the highest ETS.In the intervals of (0, 20] dBZ, (20,40] dBZ, and >40 dBZ, the ETS are 0.2875, 0.6693, and 0.3285 for the FRST, 0.0268, 0.3027, and 0 for the GSI1, and 0, 0.5364 and 0.0134 for the GSI2, respectively (Figure 8b).For all three intervals, the FRST displays the highest retrieval skill, and the GSI1 and GSI2 perform well only in the (20,40] dBZ interval. The GSI1 cannot retrieve the maximum proxy reflectivity at >40 dBZ and completely misses in this interval.The cumulative frequency of lightning densities from 1~2 is 71.92%, corresponding to the frequency of the GSI1 retrieved results in the (0, 20] dBZ interval (Figure 4); however, the maximum reflectivity is mainly in the (20,40] dbZ interval in reality (Figure 5b), indicating that the GSI1 excessively overestimates in the (0, 20] dBZ interval.Although the hit rate is high in the (0, 20] dbZ interval, excessive overestimation results in a low ETS for the GSI1 in this interval.The GSI2 retrieved results range from 30.13 to 43.74 dBZ, with complete misses and no skill in the (0, 20] dbZ interval.Only 1.58% of the GSI2 retrieved results (when the lightning density > 18) are in the >40 dBZ interval, with a low hit rate, severe misses, and low ETS.Possibly due to considering more lightning features, the FRST retrieved results are closer to the observed maximum reflectivity (Figure 7b), and the ETS and BIAS of the FRST are the best in all three intervals.

Test Case
To visualize the effects of the three relationships, a convective process at 14:00 UTC on 7 June 2022 is selected for comparative analysis.The results show that the FRST works best for this individual case.Two centers of strong maximum reflectivity were present at this time in Cangzhou City and Hengshui City, with intensities greater than 40 dBZ, and lightning was active near these strong centers (Figure 9).The lightning density distribution at this time is shown in Figure 10a.A comparison of the observed maximum reflectivity in the lightning ranges and the retrieved results of the three relationships indicate that the FRST retrieved result reflects the two strong centers well, but the intensity is comparatively weak at the strong centers and stronger in other lightning ranges (Figure 10b,c).The GSI1 reflects only the strong center in Hengshui City, and the retrieved result is weaker overall (Figure 10b,d).The GSI2 retrieved result is concentrated at (30,40] dBZ and does not reflect the two strong centers (Figure 10b,e).For this case, the FRST retrieved result best reflects the center and intensity distribution of the maximum reflectivity, which is closest to reality in the three relationships.

Advantages of the FRST
The FRST yields a better retrieval effect and higher practical value, which is beneficial for monitoring and forecasting disaster weather.First, when lightning occurs around a

Advantages of the FRST
The FRST yields a better retrieval effect and higher practical value, which is beneficial for monitoring and forecasting disaster weather.First, when lightning occurs around a grid point, the maximum reflectivity at that grid point ranges from 0 to 75 dBZ (Figure 4), and the maximum proxy reflectivity retrieved by the FRST best reflects this feature (Figure 5b,c), while the maximum proxy reflectivity retrieved by the GSI1 and GSI2 ranges from 17.5 to 40.0 dBZ and 30.13 to 43.74 dBZ, respectively.This is beneficial not only for monitoring thunderstorms but also for assimilating lightning data using radar reflectivity as an observation operator.In assimilation systems, such as the Weather Research and Forecasting Model Data Assimilation System, 25 dBZ is the threshold for radar reflectivity assimilation.If the reflectivity is above 25 dBZ, radar reflectivity assimilation is initiated, and lower reflectivity is not used in the assimilation system.When lightning occurs around a grid point, the maximum proxy reflectivity retrieved by the GSI2 exceeds 25 dBZ, and is used in the assimilation system, resulting in an excessively large area of reflectivity assimilation.In the FRST, a smaller maximum proxy reflectivity can be used, potentially reducing false alarms in model forecasting.The retrieved results of the empirical GSI relationships do not exceed 43.74 dBZ, and the models are unable to assimilate strong maximum proxy reflectivity, potentially forecasting weak convective centers.The FRST is able to retrieve a larger maximum proxy reflectivity, potentially enhancing forecasts of strong convective centers.Second, since the FRST retrieved result is strongly correlated with the observed maximum reflectivity (Figure 6a), with low root mean square error and mean absolute error (Figure 7), the maximum proxy reflectivity retrieved by the FRST can be considered a substitute for the observed maximum reflectivity in radar blind zones, such as mountains and oceans, compensating for the shortcomings of radar in the monitoring of thunderstorms.In addition, the FRST can retrieve maximum proxy reflectivity with a 0.01 • resolution, facilitating convective-scale assimilation for high-resolution models.

Limitations of the FRST
The parameter settings in the FRST are somewhat subjective.First, the calculation of the lightning density is subjective.In this paper, the calculation period is within a 40-min window (30 min before to 10 min after the whole hour), and the calculation area is the area within a 0.08 • radius centered at the grid point.Second, the selection of the number of LPO records used to construct the retrieval relationship is subjective.The 95% cumulative frequency of the lightning density is used as a cutoff for selecting the number of LPO records.Additionally, the parameter settings (e.g., n_estimators and criterion) of the random forest are subjective.The above parameters may affect the retrieval relationship and its effect, and no parameter sensitivity analysis is conducted in this paper.

Future Studies
In this paper, only the retrieval relationship between lightning and maximum proxy reflectivity is constructed.Further study on the retrieval scheme of the vertical profile is needed to convert maximum proxy reflectivity to 3D proxy reflectivity and achieve lightning data assimilation in numerical models.Focus should be placed on the impact of the FRST on model forecasting after lightning data assimilation in radar blind zones.In addition, 3D proxy reflectivity should be effectively integrated with 3D radar reflectivity before assimilation, and this topic should be explored in future work.In addition, the retrieval relationship between other lightning data (such as Lightning Mapping Imager products of Fengyun-4A) and maximum proxy reflectivity is worth investigating.

Conclusions
The retrieval relationship between lightning and maximum proxy reflectivity affects the effect of lightning data assimilation.Previous studies have mainly used lightning density to construct a retrieval relationship between lightning and maximum proxy reflectivity and have not considered other lightning features.In addition to the lightning density, the multi-characteristic parameters (including temporal coefficient, spatial coefficient, and current intensity) of the first 10 LPO records (in ascending order of spatiotemporal coefficients) are added in this paper.These LPO parameters and random forest are used to construct the retrieval relationship between lightning and maximum proxy reflectivity for a high-resolution model grid.A comparison of the FRST with two empirical GSI relationships shows that (i) the FRST retrieved result reflects the single-peak structure of the maximum reflectivity frequency well, and the peak interval (35,40] dBZ is similar to the actual peak interval (30,35] dBZ.At different lightning densities, the FRST retrieved result also reflects the frequency distribution structure and the peak interval of the maximum reflectivity.(ii) The correlation coefficient of the FRST is 3.38 (3.12) times greater than that for the empirical GSI relationships.The root mean square error and the mean absolute error of the FRST are 50.85%(28.05%) and 57.15% (35.19%) lower than those for the empirical GSI relationships, respectively.Among the three relationships, in the three maximum reflectivity intervals of (0, 20] dBZ, (20,40] dBZ, and >40 dBZ, the ETS of the FRST is the highest, and the BIAS of the FRST is closest to 1.

Figure 1 .
Figure 1.(a) Geographical location of the study area.The red box shows the area displayed in (b).(b) Terrain height (unit: m).The black solid lines indicate the city/provincial borders.The black dashed lines labeled 39.5°N and 117.85°E and the red line labeled 70 are used for selecting the cases described in Section 2.1.The weather radar stations in the SWAN network are indicated with redfilled circles, and the ADTD lightning positioning stations are marked with orange-filled triangles.'Hebei' represents Hebei Province.

Figure 1 .
Figure 1.(a) Geographical location of the study area.The red box shows the area displayed in (b).(b) Terrain height (unit: m).The black solid lines indicate the city/provincial borders.The black dashed lines labeled 39.5 • N and 117.85 • E and the red line labeled 70 are used for selecting the cases described in Section 2.1.The weather radar stations in the SWAN network are indicated with red-filled circles, and the ADTD lightning positioning stations are marked with orange-filled triangles.'Hebei' represents Hebei Province.

Figure 2 .
Figure 2. (a) Frequency of the lightning density.(b) Same as (a) but for the cumulative frequency.The horizontal coordinate is the lightning density, and the vertical coordinate is the frequency (cumulative frequency).

Figure 2 .
Figure 2. (a) Frequency of the lightning density.(b) Same as (a) but for the cumulative frequency.The horizontal coordinate is the lightning density, and the vertical coordinate is the frequency (cumulative frequency).

Figure 3 .
Figure 3. Process of constructing the relationship between lightning and maximum proxy reflectivity.

Figure 3 .
Figure 3. Process of constructing the relationship between lightning and maximum proxy reflectivity.

Figure 4 .
Figure 4. Frequency of the observed maximum reflectivity for the total data set (TOTAL) and test set (TEST) versus the FRST, GSI1, and GSI2 retrieved results.The horizontal coordinate is the maximum reflectivity (maximum proxy reflectivity) (unit: dBZ), and the vertical coordinate is the frequency.

Figure 4 .
Figure 4. Frequency of the observed maximum reflectivity for the total data set (TOTAL) and test set (TEST) versus the FRST, GSI1, and GSI2 retrieved results.The horizontal coordinate is the maximum reflectivity (maximum proxy reflectivity) (unit: dBZ), and the vertical coordinate is the frequency.

Figure 5 .
Figure 5. (a) Frequency distribution of the maximum reflectivity at different lightning densities for the total data set.(b) Same as (a) but for the test set.(c) Same as (a) but for the FRST retrieved results.(d) Same as (a) but for the GSI1 retrieved results.(e) Same as (a) but for the GSI2 retrieved results.The horizontal coordinate is the lightning density, and the vertical coordinate is the maximum reflectivity (maximum proxy reflectivity) (unit: dBZ).

Figure 5 .
Figure 5. (a) Frequency distribution of the maximum reflectivity at different lightning densities for the total data set.(b) Same as (a) but for the test set.(c) Same as (a) but for the FRST retrieved results.(d) Same as (a) but for the GSI1 retrieved results.(e) Same as (a) but for the GSI2 retrieved results.The horizontal coordinate is the lightning density, and the vertical coordinate is the maximum reflectivity (maximum proxy reflectivity) (unit: dBZ).

Figure 6 .
Figure 6.(a) Scatter plot of the observed maximum reflectivity compared with the FRST retrieved results.(b) Same as (a) but for the GSI1 retrieved results.(c) Same as (a) but for the GSI2 retrieved results.The horizontal coordinate is the observed maximum reflectivity (unit: dBZ), and the vertical coordinate is the maximum proxy reflectivity (unit: dBZ).

6 .
(a) Scatter plot of the observed maximum reflectivity compared with the FRST retrieved results.(b) Same as (a) but for the GSI1 retrieved results.(c) Same as (a) but for the GSI2 retrieved results.The horizontal coordinate is the observed maximum reflectivity (unit: dBZ), and the vertical coordinate is the maximum proxy reflectivity (unit: dBZ).

Figure 7 .
Figure 7. (a) Root mean square errors of the FRST, GSI1, and GSI2 retrieved results with respect to the observed maximum reflectivity.(b) Same as (a) but for the mean absolute error.The horizontal coordinate is the lightning density, and the vertical coordinate is the root mean square error (mean absolute error) (unit: dBZ).ALL represents all lightning densities, with 1 to 10 corresponding to the corresponding lightning density.

Figure 7 .Figure 8 .
Figure 7. (a) Root mean square errors of the FRST, GSI1, and GSI2 retrieved results with respect to the observed maximum reflectivity.(b) Same as (a) but for the mean absolute error.The horizontal coordinate is the lightning density, and the vertical coordinate is the root mean square error (mean absolute error) (unit: dBZ).ALL represents all lightning densities, with 1 to 10 corresponding to the corresponding lightning density.ote Sens. 2024, 16, 719 13 of 19

Figure 9 .
Figure 9. Distribution of maximum reflectivity (unit: dBZ) at 1400 UTC on 7 June 2022.The gray lines indicate the city/provincial borders.The red box shows the area analyzed in Figure 10.The crosses denote the lightning locations within the time window from 1330 UTC to 1410 UTC.'CZ' and 'HS' represent Cangzhou City and Hengshui City, respectively.

Figure 9 .Figure 10 .
Figure 9. Distribution of maximum reflectivity (unit: dBZ) at 1400 UTC on 7 June 2022.The gray lines indicate the city/provincial borders.The red box shows the area analyzed in Figure 10.The crosses denote the lightning locations within the time window from 1330 UTC to 1410 UTC.'CZ' and 'HS' represent Cangzhou City and Hengshui City, respectively.

Figure 10 .
Figure 10.(a) Lightning density within the time window from 1330 UTC to 1410 UTC on 7 June 2022.(b) Observed maximum reflectivity (unit: dBZ) at 1400 UTC on 7 June 2022.(c) Same as (b) but for the FRST retrieved results.(d) Same as (b) but for the GSI1 retrieved results.(e) Same as (b) but for the GSI2 retrieved results.The gray lines indicate the city/provincial borders.'CZ' and 'HS' represent Cangzhou City and Hengshui City, respectively.

Table 1 .
Abbreviations and their full definitions.

Table 1 .
Abbreviations and their full definitions.

Table 2 .
The nonlinear relationship between lightning density and maximum proxy reflectivity in the GSI system. )

Table 3 .
The parameter settings of the random forest.