Impact of Lightning Data Assimilation on the Short-Term Precipitation Forecast over the Central Mediterranean Sea

: Lightning data assimilation (LDA) is a powerful tool to improve the weather forecast of convective events and has been widely applied with this purpose in the past two decades. Most of these applications refer to events hitting coastal and land areas, where people live. However, a weather forecast over the sea has many important practical applications, and this paper focuses on the impact of LDA on the precipitation forecast over the central Mediterranean Sea around Italy. The 3 h rapid update cycle (RUC) conﬁguration of the weather research and forecasting (WRF) model) has been used to simulate the whole month of November 2019. Two sets of forecasts have been considered: CTRL, without lightning data assimilation, and LIGHT, which assimilates data from the LIghtning detection NETwork (LINET). The 3 h precipitation forecast has been compared with observations of the Integrated Multi-satellitE Retrievals for Global Precipitation Mission (GPM) (IMERG) dataset and with rain gauge observations recorded in six small Italian islands. The comparison of CTRL and LIGHT precipitation forecasts with the IMERG dataset shows a positive impact of LDA. The correlation between predicted and observed precipitation improves over wide areas of the Ionian and Adriatic Seas when LDA is applied. Speciﬁcally, the correlation coefﬁcient for the whole domain increases from 0.59 to 0.67, and the anomaly correlation (AC) improves by 5% over land and by 8% over the sea when lightning is assimilated. The impact of LDA on the 3 h precipitation forecast over six small islands is also positive. LDA improves the forecast by both decreasing the false alarms and increasing the hits of the precipitation forecast, although with variability among the islands. The case study of 12 November 2019 (time interval 00–03 UTC) has been used to show how important the impact of LDA can be in practice. In particular, the shifting of the main precipitation pattern from land to the sea caused by LDA gives a much better representation of the precipitation ﬁeld observed by the IMERG precipitation product.


Introduction
Precipitation is one of the key elements of the hydrological, cycle and it is an important phenomenon that affects human lives. It is then critical to provide precipitation forecast for different time horizons. A short-range precipitation forecast is fundamental for flood forecasting, early warning, and hydrological applications [1][2][3][4].
Numerical weather prediction (NWP) models use the physical descriptions of the many atmospheric processes and have been used for the short-term forecast of precipitation [5][6][7][8]. NWP models have improved significantly in recent years both in their physical parameterizations and in their spatial resolutions, thanks to the continuous advancement of the knowledge and of the computational resources [9][10][11].
Despite the continuous improvement of NWP, the accurate prediction of high-impact weather is one of the challenges of state-of-the-art numerical weather prediction (NWP) models [12][13][14]. The NWP models for short-range forecasting purposes require more observational data and effective assimilation methods to overcome the spin-up problems [10,11]. Thanks to the ever-increasing computing power and to the possibility of assimilating different types of observations, the forecast of severe weather events has improved constantly in the past decades [15,16].
Lightning data have been widely used in the past two decades and have some advantages compared to other sources of data [17,18]. For example, an important source of data to improve effectively the precipitation forecast at short range is given by radars (reflectivity and radial winds) [19][20][21][22]. However, compared to lightning data assimilation (LDA), the assimilation of radar data is achieved through more sophisticated and computationally expensive techniques, such as three-dimensional variational assimilation (3DVAR) [23,24] and the ensemble Kalman filter [20,25,26]. Flashes are observed with a sub-kilometric position error (this error is less than the spatial horizontal resolution of the meteorological model used in this paper, i.e., 3 km) and time (the detection of the strokes is instantaneous and much lower than the time step used in meteorological models). Since lightning is associated with convective activity, a positioning of the lightning in space and time gives a positioning of the most intense convective areas in meteorological systems [27,28]. Lightning data are simple to transfer and do not require a wide-band network connection. The process of computing the lightning position from the electromagnetic signals detected by sensors is fast, making lightning observations available in real time.
For the above reasons, LDA is suitable for improving the forecast of intense convective weather events in rapid update cycle (RUC) data assimilation systems. Broadly speaking, LDA can be divided into two classes: LDA in models parameterizing convection (horizontal resolution > 3-5 km) and LDA in models explicitly resolving the convection (horizontal resolution < 3-5 km) [18,29,30].
The first attempts to assimilate lightning in NWP models were based on relationships between lightning and the rainfall rate estimated by microwave sensors on board polarorbiting satellites [31][32][33][34]. These studies demonstrated not only the ability of LDA to improve the precipitation forecast but also the improvement of large-scale field forecasts, such as sea-level pressure. Benjamin et al. [35] used rain rates as a proxy for lightning and implemented LDA in an RUC data assimilation system. Papadopoulos et al. [36] used lightning to locate convection, and the simulated water vapor profile was nudged toward vertical profiles recorded during convective events. Mansell et al. [17] used lightning data to control the triggering function within the Kain-Fritch cumulus scheme [37] as a function of the flash rate. This method can also suppress the activation of the cumulus parameterization scheme whenever the model simulates unobserved lightning. The same method was also applied by Giannaros et al. [38] for eight events over Greece, showing improvement of the precipitation forecast given by LDA up to 24 h.
Fierro et al. [29] were the first to consider LDA in a convection resolving model (the weather research and forecasting (WRF) model at a 3 km horizontal resolution) using a methodology that imposed incremental increases in water vapor mass in the mixed-phase region (0 • C-20 • C layer) based on observed lightning flash density rates and simulated graupel mixing ratio. This same nudging method was applied for over 70 days during the 2013 warm season over the contiguous United States, showing the important and positive impact of LDA for a prolonged forecast period [39]. Fierro et al. [40] improved the forecast of a derecho event up to 6 h with LDA and showed that LDA can be comparable to that of three-dimensional variational assimilation (3DVAR) of radar data (radial winds and reflectivity factor).
Qie et al. [41] adapted the nudging function of Fierro et al. [29] to adjust graupel, snow, and ice crystal mixing ratios and applied it to a quasi-linear convective system over northern China, showing that the assimilation of graupel, snow, and ice crystals is important for outflow-dominated events.
Chen et al. [42] proposed a data assimilation scheme, based on the functions of Fierro et al. [29] and Qie et al. [41], which takes into account the dynamical and thermodynamic conditions. Federico et al. [8] implemented the method of Fierro et al. [29] in the regional atmospheric modeling system (RAMS) model and obtained a substantial improvement of the quantitative precipitation forecast (QPF) for 20 cases over Italy that occurred during Hydrological Mediterranean Experiment-Special Observing Period 1 (HyMeX-SOP1) [43]. Recently, Comellas Prat et al. [30], using the WRF model, applied the method of Fierro et al. [29] to evaluate the impact of LDA for three case studies of severe weather that occurred over Italy. The three cases span a range of convective precipitation from a longlived, wide-area intense-precipitation event to a heavy and localized rainfall case. WRF was used in an RUC data assimilation system (3 h and 6 h forecast). The LDA improved the quantitative precipitation forecast in all cases and for both temporal ranges.
The Mediterranean area is prone to heavy precipitation, and floods and flash floods are the most common and dangerous natural hazard in the area [44]. Several research studies have been carried out to explore these events in the Mediterranean. The HyMeX project [43,45] and previously the MEDEX (The Mediterranean Experiment) project [46] considered the water cycle in the Mediterranean basin, and the study of floods and flash floods were among the main topics of both projects.
Heavy-rainfall events over the Mediterranean occur mainly in fall and winter and have been reported in numerous papers ( [44,[47][48][49][50][51][52][53][54][55], among many others). The role of the Mediterranean Sea in heavy-precipitation storms has been also considered in the literature [56][57][58][59]. Because the Mediterranean Sea is a warm sea, it feeds the cyclones with large amounts of water vapor, reinforcing the cyclones and increasing their lifetime [60]. Often, storms form over the Mediterranean Sea [61][62][63], or spend most of their lifetime over the sea, as in the case of Medicanes [64][65][66], and are then advected toward land, causing floods, windstorm, and surges. For these cases, the air-sea interaction plays a major role in the storm evolution.
A better high-impact weather forecast over the sea would result in a more realistic simulation of its impact once it hits the coast and would be of practical importance for many sea-related activities (fishery, navigation, tourism, etc.). LDA is particularly important for improving the weather forecast over the Mediterranean Sea, considering that the groundbased radar coverage is not enough for monitoring weather systems more than 100-150 km away from the radar position and the evolution of weather systems over the Mediterranean Sea is partially unobserved by the Italian radars. For this reason, the impact of radar data assimilation over the Mediterranean Sea is limited and LDA plays a significant role in improving the weather forecast.
Thanks to the advent of the international Global Precipitation Measurement (GPM) mission [67,68], a new set of global precipitation products has become available for the meteorological scientific community. The GPM satellite constellation is made up of an ever-growing number of low-Earth-orbit (LEO) satellites equipped with cross-track and conical passive microwave (PMW) radiometers. The main satellite of the constellation is the NASA/JAXA (National Aeronautics and Space Administration/Japan Aerospace Exploration Agency) GPM Core Observatory (GPM-CO), which carries the most technologically advanced radiometer, the GPM Microwave Imager (GMI), and the dual-frequency Ku/Ka-band precipitation radar (DPR). The potential of the GPM constellation to analyze precipitation systems over the Mediterranean region (land and sea) has been demonstrated in several studies [69]. For instance, Panegrossi et al. [70] demonstrated the value of using precipitation retrievals from the GPM constellation for monitoring the evolution of different severe weather systems, in particular over sea regions or coastal areas with complex orography. On the other hand, Marra et al. [62] showed the capabilities of a single GPM-CO overpass over a violent hailstorm that developed offshore in front of Naples to capture in fine detail basic characteristics of the exceptional event, which was completely missed by numerical weather forecasting models. Furthermore, making use also of RAMS model forecasts, Marra et al. [64] documented how the PMW and DPR data from the GPM constellation can be fruitful to analyze the life cycle of the tropical-like cyclone (TLC) Numa, which developed on the Ionian Sea during 15-19 November 2017.
Apart from the insight that the information from these space-borne platforms provides for the study of individual weather events, the combination of the different sensor measurements offers unique and consistent precipitation estimations, which, combined with geosynchronous Earth orbit (GEO) infrared (IR) observations over the globe, provide precipitation products of unprecedented high spatial and temporal resolutions across the globe, such as the Integrated Multi-satellitE Retrievals for GPM (IMERG) [71].
In this paper, we focus on the impact of LDA on the short-term (3 h) precipitation forecast over the Mediterranean Sea using the WRF model. The rainfall forecast is verified against the IMERG dataset (Final Run) and by the rain gauge precipitation recorded at six stations located over small islands around the Italian peninsula.
The paper is organized as follows: In Section 2, we introduce the WRF model configuration and the IMERG dataset. Section 3 shows the results comparing the WRF precipitation forecast, with or without LDA, with the data from IMERG and from the rain gauges located in small Italian islands in the central Mediterranean Sea. Conclusions are reported in Section 4.

Satellite Precipitation Data
To perform the evaluation of the 3 h precipitation forecast over the sea, observational data were obtained from the IMERG Final Run (version 06B) at a horizontal resolution of 0.1 • × 0.1 • (about 10 × 10 km 2 ) and a temporal resolution of 30 min from the Goddard Earth Sciences Data and Information Services Center (GES DISC) website. We used the complete calibrated precipitation field from the post-real-time final run product, which is the satellite-gauge combination available~3.5 months after the observation month. To calculate it, first the precipitation estimates are computed using the Goddard Profiling Algorithm (GPROF2017) from the various satellite passive microwave sensors comprising the GPM constellation. These are then intercalibrated to the Combined Ku Radar-Radiometer Algorithm (CORRA) product, forward-and backward-morphed, and combined with microwave precipitation-calibrated GEO-IR fields, to be finally adjusted with seasonal Global Precipitation Climatology Project (GPCP) Satellite-Gauge (SG) surface precipitation data to provide the global half-hourly dataset [71].

Model Setup and Lightning Data Assimilation Procedure
The numerical model used in this study is the WRF model with advanced WRF dynamic (WRF-ARW), version 4.1.3 [72]. The simulations were conducted on one single domain, with 531 × 531 grid points and 42 vertical levels, with a model top at 50 hPa. The model domain (Figure 1) covers the central Mediterranean and the whole Italian territory and has a horizontal grid spacing of 3 km.
The physical schemes employed include the new Thompson microphysics scheme [73], the Yonsei University boundary layer scheme [74], the five-layer thermal diffusion for land surface processes scheme, the revised MM5 Monin-Obukhov scheme for surface layer physics [75], the Dudhia scheme [76], and the rapid radiative transfer model (RRTM) [77] as shortwave and longwave radiation schemes. No cumulus parameterization was activated.
The 3 h 0.25 • operational analysis/forecast cycle, issued at 12 UTC on the day before each day of forecast, from the integrated forecast system (IFS) global model of the European Centre for Medium-Range Weather Forecasts (ECMWF) was used as initial and lateral boundary conditions.
To assess the impact of LDA on the precipitation forecast over the sea, we selected the period from 2 to 30 November 2019. This is because this period was characterized by abundant precipitation over Italy, especially in the first 20 days of the month. The physical schemes employed include the new Thompson microphy [73], the Yonsei University boundary layer scheme [74], the five-layer therm for land surface processes scheme, the revised MM5 Monin-Obukhov schem layer physics [75], the Dudhia scheme [76], and the rapid radiative transfer m [77] as shortwave and longwave radiation schemes. No cumulus paramete activated.
The 3 h 0.25° operational analysis/forecast cycle, issued at 12 UTC on th each day of forecast, from the integrated forecast system (IFS) global model pean Centre for Medium-Range Weather Forecasts (ECMWF) was used as in eral boundary conditions.
To assess the impact of LDA on the precipitation forecast over the sea the period from 2 to 30 November 2019. This is because this period was char abundant precipitation over Italy, especially in the first 20 days of the month For all the simulations, we employed a 3 h rapid update cycle (RUC) d tion scheme. The RUC configuration was chosen because LDA impacts the w cast mainly in the short range [38,39,78]. We ran two configurations for each first, named CTRL, without LDA, and the second, named LIGHT, with LDA lation spanned a 9 h period after its initialization time. The first 6 h of eac were used for LDA in the LIGHT simulations and to spin-up the model for b rations, while the last 3 h were used for the forecast. Therefore, eight simu needed to cover the forecast for a full day.
Lightning data used for assimilation were provided by the LIghtning de work (LINET) [79]). LINET includes more than 500 stations worldwide, mor which are in Europe. This network covers the Italian territory and the wester nean Sea. LINET sensors detect very low frequency (VLF) and low-frequenc For all the simulations, we employed a 3 h rapid update cycle (RUC) data assimilation scheme. The RUC configuration was chosen because LDA impacts the weather forecast mainly in the short range [38,39,78]. We ran two configurations for each forecast: the first, named CTRL, without LDA, and the second, named LIGHT, with LDA. Each simulation spanned a 9 h period after its initialization time. The first 6 h of each simulation were used for LDA in the LIGHT simulations and to spin-up the model for both configurations, while the last 3 h were used for the forecast. Therefore, eight simulations were needed to cover the forecast for a full day.
Lightning data used for assimilation were provided by the LIghtning detection NETwork (LINET) [79]). LINET includes more than 500 stations worldwide, more than 200 of which are in Europe. This network covers the Italian territory and the western Mediterranean Sea. LINET sensors detect very low frequency (VLF) and low-frequency (LF) waves emitted during the flash. LINET is a total lightning system since it is able to measure both intra-cloud (IC) and cloud-to-ground (CG) discharges. The data processing technique follows a 3D method (TOA, Time of Arrival, method, [80]), by which the height of IC lightning is also calculated. Position accuracy is about 75 m for CG lightning. All discharges recorded within a 1 s period and within a 10 km radius are considered as a single flash for data assimilation [81].
For LDA, we followed the method of Fierro et al. [29], based on nudging. The modeled water vapor is compared with that calculated by the following equation: In this experiment, the coefficients were set to A = 0.95, B = 0.07, C = 0.25, D = 0.25, and α = 2.2. The term q s represents the saturation water vapor content at model atmospheric temperature, while q g is the graupel mixing ratio (g kg −1 ). X is the number of total flashes per grid cell in each assimilation interval (10 min). The water vapor mixing ratio of Equation (1) increases with the flash rate (X) and decreases with the graupel mixing ratio (q g ). Simulated water vapor is nudged to the water vapor calculated by Equation (1) when the latter is higher than the modeled one. The physical meaning of Equation (1) and nudging is that (a) the mixing ratio nudged by WRF increases as the flash rate increases, and (b) a convective environment is already represented in the model when graupel is simulated and the nudging term must decrease as the WRF graupel mixing ratio increases.
Setting A = 0.95 in Equation (1) gives a water vapor mixing ratio saturated or close to saturation, depending on the flash rate and graupel mixing ratio. When lightning is observed, the atmospheric column is saturated or close to saturation between the lifting condensation level (LCL) and −20 • C, the latter being the top of the charging zone [29]. For this reason, we change the modeled mixing ratio only when it is necessary to bring it close to saturation. In Fierro et al. [29], this substitution is done between 0 • C and −20 • C. However, in recent studies [18,82], a vertical level below 0 • C has been employed as the lower level for LDA. In particular, in Fierro et al. [18], the LCL is employed as the lower level for assimilation. When using 0 • C, the level at which water vapor adjustment starts can be too high, resulting in worse performance compared to the LCL height. In this study, the assimilation is performed between the LCL and −20 • C and LDA is performed every 10 min within the assimilation period (first 6 h of each run).
The idea behind this setting is to maximize the impact of LDA on the forecast. Comellas Prat et al. [30], using a similar WRF setting, showed that choosing A = 0.85 (B = 0.17, i.e., A + B = 1.02) resulted in worse performance compared to A = 0.95 (B = 0.07), the latter giving more weight to LDA. We performed similar tests for some cases of November 2019 (not shown), and the results reveal that A = 0.95 (B = 0.0.7) gives a better precipitation forecast compared to A = 0.85 (B = 0.17). Using values of A greater than 0.95 gives more weight to LDA, but the model becomes computationally unstable for some cases. For these reasons, A = 0.95 (B = 0.07) has been used in this study. The coefficients C, D, and α were set in Fierro et al. [29] after careful research. A different setting of these values would require a large number of simulations and is out of the scope of this paper.
In this work, WRF was set up with 42 vertical levels, while in a similar study [30], we used 60 vertical levels. Increasing the number of levels would likely lead to a better forecast but would also increase the computational cost. In this work, the number of vertical levels is a compromise between forecast accuracy and computational resources. In addition, to have an idea of the distribution of WRF vertical levels, we note that there are between 7 and 15 vertical levels in the layer 0/−20 • C, depending on the date.

Pearson Correlation Coefficient, Anomaly Correlation, and Taylor Diagram
In this section, we consider the comparison between the WRF 3 h forecast and the corresponding IMERG precipitation. For this purpose, the WRF 3 h forecasts for the whole period were regridded onto the IMERG 0.1 • × 0.1 • grid by a remapping procedure that conserves, to a desired degree of accuracy, the total precipitation forecast of the WRF native grid [83,84].
The average precipitation over the whole period is shown in Figure 2. Comparing the results for CTRL and LIGHT, the precipitation increase given by LDA both over the sea and over land is apparent. This increase is larger over the Tyrrhenian Sea, especially close to the Italian mainland, and over northern and southern Adriatic Sea. The IMERG dataset shows higher precipitation values over the sea compared to CTRL and LIGHT, revealing rainfall underestimation for both model configurations. This underestimation is reduced in LIGHT, showing the positive contribution of LDA to the short-term precipitation forecast. Even if better than CTRL, LIGHT substantially underestimates the rainfall compared to IMERG, especially over the Tyrrhenian Sea (west of Sardinia) and over the Ionian Sea. An opposite behavior is observed over land, where CTRL and LIGHT show larger precipitation than IMERG. In particular, the IMERG dataset is not able to represent the impact of orography on the precipitation (western and north-eastern Italian mainland), which is evident in both CTRL and LIGHT.
in LIGHT, showing the positive contribution of LDA to the short-term precipitation forecast. Even if better than CTRL, LIGHT substantially underestimates the rainfall compared to IMERG, especially over the Tyrrhenian Sea (west of Sardinia) and over the Ionian Sea. An opposite behavior is observed over land, where CTRL and LIGHT show larger precipitation than IMERG. In particular, the IMERG dataset is not able to represent the impact of orography on the precipitation (western and north-eastern Italian mainland), which is evident in both CTRL and LIGHT.  (2), yi is the precipitation 3 h forecast, oi is the corresponding observation given by IMERG dataset, N is the total number of data for each grid point (N = 232), and and ̅ are, respectively, the forecast and observed averages for each grid point.
The correlation results are shown in Figure 3. The result for CTRL ( Figure 3a) shows a good performance of the WRF forecast over Italy and the central Mediterranean because for most of the grid points (more than 98%, both for CTRL and LIGHT, on a total of 27,602 grid points), the correlation between the forecast and the observed precipitation is positive. About 1.5% of the grid points have a negative correlation coefficient for both LIGHT and CTRL.
Performance is improved by LDA (Figure 3b). A comparison of Figure 3a,b shows that LIGHT performs better, especially over the Adriatic and Ionian Seas. Differences between LIGHT and CTRL Pearson correlations (Figure 3c) help to visualize these results. From Figure 3c, the improvement of LIGHT forecast over CTRL is apparent both over sea and over land. The Pearson correlation coefficient for each grid point of the domain was computed: In Equation (2), y i is the precipitation 3 h forecast, o i is the corresponding observation given by IMERG dataset, N is the total number of data for each grid point (N = 232), and y and o are, respectively, the forecast and observed averages for each grid point.
The correlation results are shown in Figure 3. rainfall underestimation for both model configurations. This underestimation is reduced in LIGHT, showing the positive contribution of LDA to the short-term precipitation forecast. Even if better than CTRL, LIGHT substantially underestimates the rainfall compared to IMERG, especially over the Tyrrhenian Sea (west of Sardinia) and over the Ionian Sea. An opposite behavior is observed over land, where CTRL and LIGHT show larger precipitation than IMERG. In particular, the IMERG dataset is not able to represent the impact of orography on the precipitation (western and north-eastern Italian mainland), which is evident in both CTRL and LIGHT.  (2), yi is the precipitation 3 h forecast, oi is the corresponding observation given by IMERG dataset, N is the total number of data for each grid point (N = 232), and and ̅ are, respectively, the forecast and observed averages for each grid point.
The correlation results are shown in Figure 3. The result for CTRL ( Figure 3a) shows a good performance of the WRF forecast over Italy and the central Mediterranean because for most of the grid points (more than 98%, both for CTRL and LIGHT, on a total of 27,602 grid points), the correlation between the forecast and the observed precipitation is positive. About 1.5% of the grid points have a negative correlation coefficient for both LIGHT and CTRL.
Performance is improved by LDA (Figure 3b). A comparison of Figure 3a,b shows that LIGHT performs better, especially over the Adriatic and Ionian Seas. Differences between LIGHT and CTRL Pearson correlations (Figure 3c) help to visualize these results. From Figure 3c, the improvement of LIGHT forecast over CTRL is apparent both over sea and over land. The result for CTRL (Figure 3a) shows a good performance of the WRF forecast over Italy and the central Mediterranean because for most of the grid points (more than 98%, both for CTRL and LIGHT, on a total of 27,602 grid points), the correlation between the forecast and the observed precipitation is positive. About 1.5% of the grid points have a negative correlation coefficient for both LIGHT and CTRL.
Performance is improved by LDA (Figure 3b). A comparison of Figure 3a,b shows that LIGHT performs better, especially over the Adriatic and Ionian Seas. Differences between LIGHT and CTRL Pearson correlations (Figure 3c) help to visualize these results. From Figure 3c, the improvement of LIGHT forecast over CTRL is apparent both over sea and over land. To highlight the added value of the LIGHT forecast compared to CTRL, we considered the correlation coefficient values over a reference threshold of 0.5. For the CTRL forecast, the correlation coefficient is above 0.5 for 5476 grid points, while for the LIGHT forecast, the correlation coefficient is over the 0.5 threshold for 8067 grid points. The same behavior is shown for different correlation thresholds (Table 1). In addition, the average of the correlation coefficient over the domain of Figure 3 is 0.59 for CTRL and 0.67 for LIGHT. The anomaly correlation (AC) [85] is a measure of the association that operates on pairs of grid-point values in the forecast and observed fields, and it is designed to reward good forecasts of the pattern of the observed field, with less sensitivity to the correct magnitudes of the field variable.
The centered difference of the AC was computed: where c m is the monthly average of the 3 h precipitation of the IMERG dataset for each grid point, y' m and o' m are the anomalies of the 3 h precipitation of the WRF forecast and IMERG with respect to c m , and M is the total number of land or sea grid points, depending on whether the AC is computed for land or for sea. Overbars are averages over the M grid-points. The averages of the AC for CTRL and LIGHT for the whole period for the 3 h precipitation forecast are shown in Table 2. Positive values of the AC show the ability of WRF to represent the observed precipitation spatial pattern. LDA improves the WRF performance for both sea and land, with an increase in the AC of 8% and 5%, respectively. The AC is higher for land grid points, showing a better performance of WRF over land. The performance gap between land and sea is reduced when LDA is applied, because LDA has a greater impact over the sea.
The AC is positive and shows a good performance of WRF for most of the considered days, both for CTRL and for LIGHT. Values vary considerably among different times, being in some cases low and sometimes negative over land.
The AC is improved by LDA, as shown by its larger values for LIGHT with respect to CTRL. This improvement is larger until 20 November. From 20 to 30 November, a lower number of convective events occurred over Italy and, therefore, the impact of LDA is lower.
The impact of LDA is larger over the sea than over land. This result is partially determined by the lower performance of the WRF forecast over the sea, as confirmed by the average values of the AC over land and sea (Table 2 and Figure 4). Other factors, however, could contribute to this result. The smoothing of the precipitation field given by the regridding applied to the WRF forecast can penalize the impact of LDA over land, where, because of the complex orography of Italy, small-scale thunderstorms develop due to local forcing conditions. These thunderstorms are likely smaller than those developing over the sea, and regridding can impact unevenly the performance of LDA over land and sea.
The time series of the AC over land and sea are shown in Figure 4. The Taylor diagram was computed for the whole period for land and sea grid points, and it is shown in Figure 5.  The AC is positive and shows a good performance of WRF for most of the considered days, both for CTRL and for LIGHT. Values vary considerably among different times, being in some cases low and sometimes negative over land.
The AC is improved by LDA, as shown by its larger values for LIGHT with respect to CTRL. This improvement is larger until 20 November. From 20 to 30 November, a lower number of convective events occurred over Italy and, therefore, the impact of LDA is lower.
The impact of LDA is larger over the sea than over land. This result is partially determined by the lower performance of the WRF forecast over the sea, as confirmed by the average values of the AC over land and sea (Table 2 and Figure 4). Other factors, however, could contribute to this result. The smoothing of the precipitation field given by the regridding applied to the WRF forecast can penalize the impact of LDA over land, where, because of the complex orography of Italy, small-scale thunderstorms develop due to local forcing conditions. These thunderstorms are likely smaller than those developing over the sea, and regridding can impact unevenly the performance of LDA over land and sea. The correlation between IMERG observations and the WRF forecast is improved by LDA both over land and over the sea. Moreover, the precipitation error is reduced when flashes are assimilated, because the points referring to LDA are closer to the observed point along the x axis (point of coordinates (1,0)). This improvement is larger over the sea than over land. Figure 5 shows that the observation standard deviation is well represented by WRF over land, while it is underestimated over the sea. This result can be determined not only by an underestimation of the precipitation field standard deviation of WRF but also by an overestimation of the precipitation standard deviation given by the IMERG dataset. The IMERG Final Run dataset is calibrated using the GPCC (https://www.dwd.de/EN/ ourservices/gpcc/gpcc.html; last access on 12 February 2021) monthly gauge analysis available (mostly) over land. The presence of intense precipitation gradients across the coastline (see Figure 2 or Section 3.3) reveals a possible overestimation of the IMERG precipitation over the sea because the values over land are corrected by the calibration with rain gauge measurements. As a matter of fact, other studies [86] have reported the overestimation of the uncalibrated IMERG product with respect to the calibrated product. Larger values of rainfall give, usually, larger values of precipitation standard deviation, and this could partially explain the WRF underestimation of the precipitation standard deviation over the sea. The correlation between IMERG observations and the WRF forecast is improved by LDA both over land and over the sea. Moreover, the precipitation error is reduced when flashes are assimilated, because the points referring to LDA are closer to the observed point along the x axis (point of coordinates (1,0)). This improvement is larger over the sea than over land. Figure 5 shows that the observation standard deviation is well represented by WRF over land, while it is underestimated over the sea. This result can be determined not only by an underestimation of the precipitation field standard deviation of WRF but also by an overestimation of the precipitation standard deviation given by the IMERG dataset. The IMERG Final Run dataset is calibrated using the GPCC (https://www.dwd.de/EN/ourservices/gpcc/gpcc.html; last access on 12 February 2021) monthly gauge analysis available (mostly) over land. The presence of intense precipitation gradients across the coastline (see Figure 2 or Section 3.3) reveals a possible overestimation of the IMERG precipitation over the sea because the values over land are corrected by the calibration with rain gauge measurements. As a matter of fact, other studies [86] have reported the overestimation of the uncalibrated IMERG product with respect to the calibrated product. Larger values of rainfall give, usually, larger values of precipitation standard deviation, and this could partially explain the WRF underestimation of the precipitation standard deviation over the sea.
The intense precipitation gradients across the coastline are not present if the Early Run of IMERG (uncalibrated) is considered, and we noticed this difference also for other case studies. In this regard, we plan to perform a comparison with other precipitation products (e.g., MW-based products provided within the EUMETSAT H SAF project, http://hsaf.meteoam.it; last access on 12 February 2021) that will become operational in the near future and that will provide daily precipitation rate over a regular grid.
Overall, the above results indicate that LDA improves the representation of the rainfall field over both land and sea. The intense precipitation gradients across the coastline are not present if the Early Run of IMERG (uncalibrated) is considered, and we noticed this difference also for other case studies. In this regard, we plan to perform a comparison with other precipitation products (e.g., MW-based products provided within the EUMETSAT H SAF project, http://hsaf.meteoam.it; last access on 12 February 2021) that will become operational in the near future and that will provide daily precipitation rate over a regular grid.
Overall, the above results indicate that LDA improves the representation of the rainfall field over both land and sea.

Impact on Small Islands
To further explore the impact of LDA over the sea, we consider the performance of WRF, both with and without lightning data assimilation, over six Italian small islands in the central Mediterranean Sea, with available rain gauges. Rain gauge data come from the Italian Department of Civil Protection (DPC) precipitation database, which accounts for more than 3000 rain gauges over Italy [8]. The selected islands are Elba, Giglio, Lipari, Montecristo, Pantelleria, and Ponza, and their locations are shown in Figure 1.
The performance of the 3 h precipitation forecast of the model is evaluated using RMSE (Root Mean Square Error) and the Pearson correlation coefficient (r).
To calculate the WRF rainfall corresponding to a rain gauge, the nearest-neighborhood method is used. More specifically, we consider the four grid points surrounding the rain gauge and we select the one among them with a predicted value in better agreement with the observation. This method allows a spatial error of the predicted precipitation field of ∆x √ 2 4.2 km, where ∆x = 3 km is the WRF grid spacing. The nearest-neighborhood method is often used in the literature [61,87]. This method gives higher values of the model scores but tolerates a spatial error of the precipitation forecast (4.2 km in our case).
Statistics are presented for the 3 h precipitation forecast for the whole period. For clearer visualization, daily rainfall forecast time series at Lipari and Elba islands are also shown. For the daily (24 h) case, precipitation is obtained as the summation of the eight 3 h rainfall values belonging to the same day. Table 3 shows the RMSE and correlation coefficient for the six islands for the WRF 3 h rainfall forecast. An improvement of the RMSE and r is seen for all islands except Ponza. The improvement given by LDA is larger for Elba and Lipari, while it is smaller for Giglio, showing a notable variability from one island to another. An inspection of the forecast for Ponza shows that the LIGHT forecast overestimates the precipitation on 11 November. In particular, about 55 mm/day (20 mm/day) is forecast by LIGHT (CTRL), while the observation is 10 mm/day. This determines a worse performance of the LIGHT forecast compared to CTRL for Ponza. The time series of observed and forecast daily precipitation for CTRL and LIGHT simulations are shown in Figure 6, and the corresponding statistics are shown in Table 4.

A Case Study: 12 November 2019
The purpose of this section is to point out the impact of LDA on the 3 h precipitation forecast over the sea for a specific case. The event occurred on 12 November 2019, and we focus on the 3 h precipitation between 00 and 03 UTC. For the scope of this case study, the WRF model native grid is considered, with a horizontal grid spacing of 3 km, and no re-  Considering the time series for Lipari (Figure 6a), it is noted that LIGHT produces fewer false alarms than CTRL. This is especially evident on 6 and 11 November, when the LIGHT precipitation forecast is closer to the observed amount (about 15 mm/day and 28 mm/day, respectively). For Elba, there are days when LDA increases the rainfall forecast, matching better the observations (03, 09, 24), and days when LDA reduces the false alarms (12,15,17).

A Case Study: 12 November 2019
The purpose of this section is to point out the impact of LDA on the 3 h precipitation forecast over the sea for a specific case. The event occurred on 12 November 2019, and we focus on the 3 h precipitation between 00 and 03 UTC. For the scope of this case study, the WRF model native grid is considered, with a horizontal grid spacing of 3 km, and no remapping on the IMERG grid is performed.
In this time period, precipitation is located mainly on the southern part of the Apulia region (around 17.5 • E-40.35 • N), especially on the Salentine Peninsula, as recorded by rain gauges managed by DPC (Figure 7a) 3 h time period, the CTRL run simulates the precipitation pattern to the west of the observed rainfall.
Results for the LIGHT run ( Figure 7d) show a significant improvement in the precipitation forecast: the precipitation pattern over Calabria is shifted eastward, over the Ionian Sea. This precipitation shift reduces the rainfall amount over Calabria, in better agreement with both rain gauge and satellite observations, and increases the precipitation over the Ionian Sea, in better agreement with the IMERG dataset (although IMERG precipitation is likely overestimated).
Both CTRL and LIGHT runs give a reasonable representation of the precipitation maximum between northern Sardinia and the Italian peninsula (about 11° E-41.5° N). It is interesting to note that the LIGHT forecast shows a second maximum over the Tyrrhenian Sea, at about 14.5° E and 39.5° N, which is not forecast by the CTRL run. This maximum is evident in the IMERG precipitation.
In summary, the IMERG dataset ( Figure 7b) shows a precipitation maximum over the Ionian Sea. The rainfall pattern corresponding to this maximum is better depicted by the LIGHT run compared to CTRL. In addition, LDA gives improvements over the Tyrrhenian Sea.

Conclusions
In this paper, we considered the impact of lightning data assimilation on the 3 h precipitation forecast over the central Mediterranean Sea. To achieve this goal, rapid update cycle (3 h) lightning data assimilation of the WRF model for the month of November 2019 was set up. This month was chosen because it was characterized by several precipitation GPM-IMERG precipitation (Figure 7b) is intense in some regions, in particular over the Ionian Sea. Moderate to intense precipitation is observed over the Adriatic Sea, and the IMERG estimation is in good agreement with rain gauge records on the Adriatic coast. As noted in Section 3.1, when discussing the Taylor diagram ( Figure 5), the rainfall of IMERG shows an important change from sea to land, with smaller values over land (due to calibration). This precipitation gradient is sometimes observed in the IMERG dataset for the period considered in this paper and suggests that the rainfall over the central Mediterranean Sea can be overestimated by IMERG.
The CTRL run (Figure 7c) simulates an intense precipitation over eastern Calabria region (about 16.5-17 • E and 38-40 • N) and on the central part of the Apulia region, north of Salento. This intense precipitation over Calabria is not observed by rain gauges and satellites. Furthermore, the CTRL run gives an intense precipitation forecast over land, while the IMERG dataset shows a heavy precipitation over the Ionian Sea. Overall, for this 3 h time period, the CTRL run simulates the precipitation pattern to the west of the observed rainfall.
Results for the LIGHT run ( Figure 7d) show a significant improvement in the precipitation forecast: the precipitation pattern over Calabria is shifted eastward, over the Ionian Sea. This precipitation shift reduces the rainfall amount over Calabria, in better agreement with both rain gauge and satellite observations, and increases the precipitation over the Ionian Sea, in better agreement with the IMERG dataset (although IMERG precipitation is likely overestimated).
Both CTRL and LIGHT runs give a reasonable representation of the precipitation maximum between northern Sardinia and the Italian peninsula (about 11 • E-41.5 • N). It is interesting to note that the LIGHT forecast shows a second maximum over the Tyrrhenian Sea, at about 14.5 • E and 39.5 • N, which is not forecast by the CTRL run. This maximum is evident in the IMERG precipitation.
In summary, the IMERG dataset ( Figure 7b) shows a precipitation maximum over the Ionian Sea. The rainfall pattern corresponding to this maximum is better depicted by the LIGHT run compared to CTRL. In addition, LDA gives improvements over the Tyrrhenian Sea.

Conclusions
In this paper, we considered the impact of lightning data assimilation on the 3 h precipitation forecast over the central Mediterranean Sea. To achieve this goal, rapid update cycle (3 h) lightning data assimilation of the WRF model for the month of November 2019 was set up. This month was chosen because it was characterized by several precipitation events over Italy and the central Mediterranean area. A total of 232 three-hour-rainfall forecasts were considered for two different configurations of the WRF model, CTRL and LIGHT, the latter assimilating lightning data from the LINET network. To verify the model's performance, two different datasets were used: an IMERG gridded dataset at 0.1 • horizontal resolution and six rain gauges in six Italian islands in the central Mediterranean Sea.
LDA improves the precipitation forecast over the sea for the analyzed period. A comparison of the WRF forecast with the IMERG dataset clearly shows the positive impact of LDA, especially on the Ionian and Adriatic Seas.
The anomaly correlation shows a good ability of WRF to predict rainfall both over land and sea, with few exceptions. LDA improves the forecast, especially for the first 20 days of the month, when convective events occurred more frequently compared to the last part of the month. The impact of LDA is larger over the sea than over land. This is confirmed by the analysis of the Taylor diagram.
This diagram analysis, however, shows an underestimation of the precipitation standard deviation of the WRF model over the sea, which could be partially explained by a possible overestimation of the rainfall given by the IMERG dataset over the central Mediterranean Sea for the period selected in this paper.
The comparison between the WRF forecast and the rainfall recorded over small islands reveals a positive impact of LDA for all cases except Ponza. Both the reduction in the false alarms and the increase in correct forecasts contribute to overall performance improvement.
The case study of 12 November for the period 00-03 UTC shows how important LDA can be for specific cases. The CTRL forecast predicted the precipitation patterns a few tens of kilometers west of the real occurrence. This shift, however, has a notable impact on the precipitation forecast (double penalty error) because heavy precipitation is predicted over Calabria, where small amounts are recorded by rain gauges (false alarm), while the precipitation over the Ionian Sea is missed by the CTRL forecast (miss). Both errors are considerably reduced by LDA.
While the aim of the paper is to show the impact of LDA over the sea, the forecast was also verified over land with the IMERG dataset. Results show a positive impact of LDA over the Italian peninsula, in agreement with the results already found by the same authors in past studies with WRF and RAMS@ISAC numerical weather prediction models [8,30,55].
The findings of this paper are of practical importance for two main reasons: First, the rainfall forecast over the sea is important for navigation, fishing, and many sea-related activities; second, over Italy, and more in general over the Mediterranean region, convection can be generated or can evolve over the sea and is then advected over land, causing, sometimes, high-impact weather. Having a better representation of the convection over the sea can help to improve the forecast of these events and better predict their effect over the coastal regions, which can be often devastating.
While the results of this paper are encouraging, the seasonal variability of the Mediterranean climate requires further analysis to show the impact of LDA over the sea in different seasons. A dedicated study to evaluate the impact of LDA for convective events originating over the sea and advected over land is also necessary. These events are partially observed by radars, and LDA can contribute significantly to the improvement of precipitation forecast.
It is important to note that this paper limits its analysis to LDA. This likely causes an overestimation of the impact of LDA on the precipitation forecast, because no additional observations are considered in this study. The assimilation of other observations (e.g., radar, GNSS-ZTD (Global Navigation Satellite System -Zenit Total Delay), SYNOP) over land would be propagated over the sea by the model, while the assimilation of observations over the sea (e.g., SAR (Synthetic Aperture Radar), water vapor maps) would contribute directly to the precipitation forecast over the sea. Future studies, considering the assimilation of other observations in addition to LDA, will quantify more precisely the contribution of LDA to the improvement of the precipitation forecast over the central Mediterranean Sea. Another limitation of this study is that the impact of LDA is evaluated only for precipitation. The positive impact of LDA in predicting other surface parameters has been shown in some papers [17,21,88] and will be considered in future studies.