Overview of the New Version 3 NASA Micro-Pulse Lidar Network (MPLNET) Automatic Precipitation Detection Algorithm

Precipitation modifies atmospheric column thermodynamics through the process of evaporation and serves as a proxy for latent heat modulation. For this reason, a correct precipitation parameterization (especially for low-intensity precipitation) within global scale models is crucial. In addition to improving our modeling of the hydrological cycle, this will reduce the associated uncertainty of global climate models in correctly forecasting future scenarios, and will enable the application of mitigation strategies. In this manuscript we present a proof of concept algorithm to automatically detect precipitation from lidar measurements obtained from the National Aeronautics and Space Administration Micropulse lidar network (MPLNET). The algorithm, once tested and validated against other remote sensing instruments, will be operationally implemented into the network to deliver a near real time (latency <1.5 h) rain masking variable that will be publicly available on MPLNET website as part of the new Version 3 data products. The methodology, based on an image processing technique, detects only light precipitation events (defined by intensity and duration) such as light rain, drizzle, and virga. During heavy rain events, the lidar signal is completely extinguished after a few meters in the precipitation or it is unusable because of water accumulated on the receiver optics. Results from the algorithm, in addition to filling a gap in light rain, drizzle, and virga detection by radars, are of particular interest for the scientific community as they help to fully characterize the aerosol cycle, from emission to deposition, as precipitation is a crucial meteorological phenomenon accelerating atmospheric aerosol removal through the scavenging effect. Algorithm results will also help the understanding of long term aerosol–cloud interactions, exploiting the multi-year database from several MPLNET permanent observational sites across the globe. The algorithm is also applicable to other lidar and/or ceilometer network infrastructures in the framework of the Global Aerosol Watch (GAW) aerosol lidar observation network (GALION). Remote Sens. 2020, 12, 71; doi:10.3390/rs12010071 www.mdpi.com/journal/remotesensing Remote Sens. 2020, 12, 71 2 of 16


Introduction
Human life is strongly dependent on the water cycle [1]. In particular, precipitation is a key-player in pairing the Earth-atmosphere water and energy cycle, through modulating atmospheric column latent heat and affecting cloudiness and cloud lifetime. For this reason, long-term precipitation datasets are needed to analyze spatial and temporal trends and variability, especially at the global scale [2]. In the last two decades, thanks to the internet, a ground-based network of instruments has started to develop and measure important climate-related variables [3], including columnar and atmospheric profiles of aerosol optical and micro-physical properties through passive and active optical sensors (i.e., sunphotometers and lidars). Nevertheless, elastic [4] multi-wavelength and Doppler lidar observations containing raining events are usually unjustifiably disregarded in standard monitor activities, even if light rain events are clearly detectable on lidar data [5].
Some recent studies show that a correct precipitation parameterization [6] will drastically improve global climate models to forecast future scenarios that can help define the best mitigation and adaptation strategies. Further, precipitation studies are crucial to assessing aerosol indirect and semi-direct effects, since aerosols influence both cloud formation and precipitation that in turn removes aerosols from the atmosphere by scavenging effect. Isolated case studies using lidar data (together with ancillary instrumentation) to quantitatively assess the atmospheric profile of precipitation micro-physical and optical characteristics are shown in [5,[7][8][9][10]. Nevertheless, these efforts, due to their intrinsic complexity, are not suitable to be operationally implemented in a network of instruments.
Several studies showing the development of aerosol [11,12] and cloud [13] masking algorithms exist, but, to our knowledge, none demonstrate automatically detecting light rain events using lidar observations. In this paper we present a proof-of-concept rain masking algorithm and report results of an intercomparison with a disdrometer to prove the efficacy of the algorithm in detecting light rain, drizzle, and virga events from lidar observations. The algorithm, once extensively tested and validated also against other remote sensing instruments (i.e., high-frequency radars) will be implemented in the National Administration and Space Agency (NASA) Micropulse lidar network (MPLNET https://mplnet.gsfc.nasa.gov/; see Section 3) and will provide, when available, a new complimentary rain mask variable that can be used either as the starting point to further investigating scientifically interesting-precipitation cases (i.e., to assess their optical and micro-physical properties, [5,[7][8][9]) or simply to better characterize precipitation patterns and its variability at different spatial scales.
The developed algorithm, based on image processing techniques, applies morphological filters on composite plots of the Volume Depolarization Ratio (VDR) variable, as defined in [14] and this algorithm will permit MPLNET to fill the gap left by the joint NASA and Japan Aerospace Exploration Agency (JAXA) missions, as the Tropical Rainfall Measuring Mission (TRMM) followed by the Global Precipitation Measurement (GPM) [15], in detecting low intensity precipitation [5], especially at mid and higher latitudes [16]. Our paper is outlined as follows: in Section 3 we describe in detail the NASA MPLNET network and its products used as input by the rain masking algorithm. Section 4 shows the algorithm flowchart and all the different phases from input to output are carefully described. Section 5 reports the algorithm intercomparison and validation through co-located ground-based observations by disdrometer, while in Section 6 discussion and future perspectives are reported.

Materials and Methods
The lidar full dataset is publicly available at NASA MPLNET website (https://mplnet.gsfc.nasa. gov), while the 18 events of disdrometer data are available upon request.

The MicroPulse Lidar Network
The NASA MPLNET network [17], active since 1999, is a federated network of commercially-available Micropulse lidar (MPL) systems [18], produced by LEICA Geosystems, Lanham, MD, USA (formerly SigmaSpace). The instruments, active optical devices that have developed since CO 2 laser invention [19], are globally deployed to support the NASA Earth Observing System (EOS) program [20]. The MPLNET lidar network continuously monitors the atmosphere every 60 s, from the surface up to 30 km with a software-adjustable vertically-resolved spatial resolution (depending on the station, 0.030-0.075 km), under any meteorological condition and to the limit of laser signal attenuation. Both temporary and permanent observational sites are globally deployed, and are located at polar, mid-latitudes, tropical, and equatorial regions to retrieve the aerosol [21][22][23], cloud optical [24,25], and geometrical properties together with their radiative effects. The single-wavelength MPL lidar system, is co-located, when possible, together with the NASA Aerosol Robotic Network (AERONET; [26]) sunphotometer to reduce error in retrievals [27]. MPLNET products are freely and publicly available at the MPLNET website, which follows the modified EOS convention as: Level 1, Level 1.5, and Level 2, are all available in near real time (NRT). The only difference between L1 and L15 products are that data failing to meet the L15 Quality Assurance (QA) criteria are screened and replaced with Not a Number (NaN) in the files. The primary difference between L2 and L1/15 files is that L2 may have additional post-calibrations applied as well as corrections to instrument temperatures.
Since 2017, MPLNET has fully integrated polarized MPL systems into the network, which provides information about particle shape. Each instrument relies on the collection of two-channel measurements (i.e., the signal measure P co (z) and P cr (z)). A detailed description of the depolarization channel can be found in [14]. Even a half degree tilting of the lidar instrument with respect to the vertical direction, needed to avoid cirrus cloud specular reflection, is sufficient to substantially increase the VDR of the precipitation (even for spherical raindrops), as shown in [28]. This rain enhanced contrast in lidar VDR composite images facilitates its detection. Figure 1 shows a front descent on 27 March 2018 with multiple rainfall episodes showing higher VDR values (green bins). The start of the precipitation event is highlighted by red arrows. The proposed algorithm, uses, as the input composite image, the new Version 3 (V3) VDR variable, paired with the cloud mask [13] variable found in the L15 Normalized Relative Backscatter (NRB) and Cloud (CLD) data products [27,29], respectively. The cloud mask is an array of integer numbers where cloudy bins are labeled as 2, non-cloudy bins as 1, while bins with an indistinguishable signal-to-noise ratio are labeled as 4. For image-based detection techniques, the L15 NRB VDR variable is preferred to the L15 NRB variable (i.e., the backscattered energy by the atmosphere) as in the VDR variable composite images the rainfall bins show a higher volume depolarization ratio. This translates into higher contrast, as shown in Figure 2b.
It is clearly visible that, with respect to the strong red depolarizing structures (VDR>35%) (e.g., clouds containing ice), the signal assumes a well-defined rectangular shape that can be identified as rainfall. In contrast, during non-rain episodes, the signal does not assume a particular shape and the VDR shows lower values.

Processing Chain for Rain Detection
The proposed algorithm (flowchart shown in Figure 3) is based on image processing techniques. The algorithm processes VDR composite images from the MPLNET L15 NRB product. Then, the VDR composite plot is paired with the cloud masking variable found in L15 CLD product, as precipitation detection is uniquely carried out under the cloud base, given by the cloud mask [13] (natural rain does not exist under clear sky conditions). Thus, the proposed rain detection algorithm will analyze only the VDR bins under the clouds. Afterwards, these bins are preliminarily labeled as "rain" if, and only if, they are above a certain threshold value (0.07, see Section 4.2.1). Non-rainy bins are those either not topped by a cloud or with a VDR value below the threshold (0 < VDR < 0.07). This preliminary rain mask is used to estimate the parameters of the Laplace distributions under the hypotheses rain and non-rain and the rain a priori probability in order to use a maximum a posteriori (MAP) detector for estimating an accurate rain mask, see Section 4.2. Finally, a post-processing phase based on morphological operators is applied to reduce the image noise due to the signal extinction above the clouds and to remove any non-rectangular shaped detection, thus producing the final rain mask.
The algorithm is currently set-up to detect rain events for cloud bases at least 400 m above the surface and for rainfall episodes that last a minimum of 7 min. As a final step, all the detected rainfall events without any physical meaning are filtered out (i.e., precipitation not originating from the cloud base).

Maximum a Posterior Detector
To detect rainfall events, we use the previously described volume depolarization ratio composite image, which can be represented in a vectorial form as VDR = [VDR 1 , . . . , VDR i , . . . , VDR n ] with VDR i ∈ R + , where n is the total number of bins. We also denote c = [c 1 , . . . , c i , . . . , c n ], as the vector of the labels in the set C = {non-rain, rain}, where rain means that a generic VDR i is classified as "rain", otherwise it belongs to the class "non-rain". The detection problem is formalized into the Bayesian framework. Hence, the minimum Bayesian risk is achieved by the maximum a posterior probability (MAP) rule [30], i.e., we have where p(VDR | c) is the likelihood and p(c) represents thea priori probability. The likelihood p(VDR | c) in Equation (1) can be simplified by the additional assumption of conditional independence among data. This yields its factorized form as follows Thus, given c = [ c 1 , . . . , c i , . . . , c n ], the problem of its maximization can be simply solved by maximizing each term, i.e., for each pixel i ∈ [1, . . . , n], we have: This leads to a binary hypothesis test that can be written as where the likelihood ratio , VDR i is classified as rain, otherwise VDR i is associated to the class non-rain.
Under the assumption of Laplace distributed data for both the hypotheses (which have been experimentally validated, see Figure 4), the solution for each VDR i is where log is the natural logarithm, µ r and b r are the location and scale parameters, respectively, which characterize the Laplace distribution under the rain hypothesis and µ nr and b nr are the related parameters for the non-rain hypothesis.
(a) (b) Figure 4. Cumulative distribution functions (CDFs) for (a) non-rain VDR data and (b) rain VDR data. The empirical CDFs are the estimated CDFs from the data, i.e., considering non-rain data in (a) and rain data in (b).

Parameter Estimation
In order to estimate all the parameters in Equation (5), we exploit the lidar image under analysis and we find a first rough detection map, i.e., roughly identifying the rain bins. Thus, we apply a thresholding of the image's bins to solve the rain detection problem. The used threshold has been experimentally set to 0.07. After this preliminary step, we use the data labeled either as rain or as non-rain to estimate the parameters for both classes. Thus, given that n r independent and identically distributed samples belonging to a class rain, obtained selecting in VDR the rain bins, i.e., the ones detected as rain by the above-mentioned thresholding approach, we have that VDR r = [VDR r 1 , . . . , VDR r i , . . . , VDR r N ]. In order to estimate the values of the a priori probability for both the classes, we count all these bins, which belong to the class rain in the rough detection map.
Hence, if count(·) is the counter operator, p(c i = rain) = count(VDR r ) n . p(c i = non − rain) is simply equal to 1 − p(c i = rain).
The other parameters to be estimated are the ones related to the Laplace distribution. Focusing on the problem of the parameter estimations for the Laplace distribution under the class rain (all the considerations are equivalent even in the case of the estimations of the same parameters under the hypothesis of non-rain), the maximum likelihood estimators (MLEs) for µ r and b r are: • The MLE,μ r , of µ r isμ r = med(VDR r ), where med(·) is the sample median operator.

•
The MLE,b r , of b r is the mean absolute deviation from the median, i.e.,

Post-Processing Based on Morphological Operators
In this section we highlight and describe the post-processing step to raise the MAP detector performance presented in Section 4.2. Morphological operators basis are first described referring to their application to a generic image, I, see Section 4.3.1. Their use for the problem at hand is instead described in Section 4.3.2.

Basics of Morphological Operators
An image I : E ⊆ Z 2 → V ⊆ Z is analyzed by the morphological operators through the so-called structuring element (SE) B [31], which can be defined through its spatial support N B (x), i.e., the neighborhood with respect to the position x ∈ E in which B is centered, and by its values.
where S and S are the infimum and supremum values within the set S, respectively. The erosion (respectively, dilation) application has a filtering effect that suppresses bright (respectively, dark) regions smaller than B and the enlargement of dark (respectively, bright) ones. For bright and dark regions we mean that the local contrast in a certain region has intensity values respectively greater or lower with respect to the surrounding ones. Erosion and dilation operators can be recast into minimum and maximum operators on B, respectively, if I is a binary image. We also introduce, for convenience, the opening and closing that correspond to the two possible sequential compositions of erosion and dilation: withB denoting the SE obtained by reflecting B with respect to its origin. A closing removes dark regions smaller than B while an opening suppresses bright ones. For further details about morphological operators, the interesting readers can refer to the related literature [31].

Use of Morphological Operators for Rain Detection Post-Processing
Two kinds of post-processing based on morphological operators are applied to the rain detection map coming from the MAP detector described in Section 4.2. In particular, given the detection map (rain/non-rain) in matrix form, i.e., C, we first apply a low-pass morphological filter to remove salt and pepper noise into C to achieve a new low-pass version image denoted as C LP . The low-pass morphological filter can be formulated by sequentially applying a closing operator and an opening operator with the same structuring element B 1 , i.e., where B 1 is a disk structuring element with a radius experimentally set to 4.
Last, but not least, the post-processing operation is based on an opening operator with a rectangular structuring element B 2 , i.e., where C PP is the final map after the morphological post-processing. This last processing is performed to delete detected objects that do not follow a rectangular shape with some constraints about the minimal size. In particular, the minimal sizes of the sides of the rectangle are tunable parameters that are selected to 7 (i.e., 7 min) that is the minimal temporal resolution to consider that is raining (i.e., we are not able to detect rains that last less than 7 min), and about 2.67 for the other side. This value takes into account of the minimal vertical dimension of the rains and coming from the fact that the data has a spatial resolution of 75 m, thus, we have a minimum vertical dimension of the rains equal to 200 m. Therefore, only these kinds of detected objects have the right spatial features to be considered rain. These parameters have been tuned for this specific problem and type of data.
Obviously, considering lidar data with different features will lead to a new tuning of parameters in order to achieve high performance.

Intercomparison with Ground-Based Disdrometer Measurements
The algorithm results are compared against rain intensity measurements obtained from a co-located disdrometer, an in-situ measurement device designed to measure the drop size distribution (DSD) [10], represented as the number of drops per unit of volume and per unit of raindrop diameter. Disdrometers can be based on different measurement principles (high-speed cameras, Doppler effect, laser-optical, impact, etc.). The second generation Parsivel (Parsivel 2 ) laser-optical disdrometer manufactured by OTT [32] is used in this work. Parsivel systems were originally developed by PM Tech Inc., Germany. The instrument has a laser diode (emitting wavelength of 780 nm) generating a horizontal flat beam with a measurement area of 54 cm 2 .
The disdrometer principle of operation is based on laser technology. When a hydro-meteor passes through a volume uniformly illuminated by a laser beam , it produces a temporal attenuation proportional to its size with a duration depending on its fall speed. A relationship between the laser beam occlusion by the falling particle is applied to estimate the particle size. Parsivel instruments can measure particle diameters up to about 25 mm classifying them in 32 size classes of different widths. The instrument also estimates the hydro-meteor fall velocity by measuring the time necessary for the particle to pass through the laser beam, and thus stores particles in 32 × 32 matrices. The disdrometers high temporal resolution (60 s for this work) permits study in great detail of physical precipitation variability.

Rain Detection Algorithm Working under Simpler and Complex Meteorological Conditions
In this section we show how the rain masking algorithm works in different meteorological conditions, i.e., for light and stronger precipitation intensities. The comparison results are carried out at the NASA Goddard Space Flight Center (GSFC) MPLNET permanent observational site (Lat: 38.9 N, Lon: 76.3 W, Alt: 50 m a.s.l.), where measurement data from a co-located Parsivel 2 disdrometer are also available. The rain masking algorithm can also detect the so-called virga streaks ( Figure 5), a kind of precipitation that, due to the strong evaporation, will never reach the ground [5]. As can be easily understood, virga episodes cannot be detected by the disdrometers. The algorithm retrieves precipitation only on blue regions topped by clouds (cyan bins). Middle 2: New rain mask product (rain plotted in yellow). Lower: Co-located precipitation intensity measured by disdrometer. The green rectangular shapes help in visualizing the detected rain events.

A Simple Case: 22 April 2016
As described in Section 4.2, the rain detection algorithm first step consists in pairing VDR composite images with the L15 cloud masking variable. Then, a first guess of rain probability is produced only for the VDR signal above a certain threshold and below a cloud base, i.e., on deep blue regions topped by cyan cloud regions of upper middle plots of Figure 5. The detection in this case does not show any critical aspects. The cloud base is never below 400 m and the precipitation intensity is very low, i.e., 0.25 mm h −1 on daily average. Those intensities cannot completely attenuate the lidar signal. Two virga streaks are detected by the algorithm in the second half of the day. The retrieved disdrometer rain rate ( Figure 5; bottom), shows very low values, with a maximum of 0.76 mm h −1 at 1701 UTC. Globally, there is a partial agreement between the disdrometer and the rain masking algorithm: After 1815 UTC rain intensity drops so much that the disdrometer is unable to detect the precipitation, while after 2200 UTC, the rain masking algorithm fails to detect the rainfall up to the ground. Precipitation events from lidar data are then necessary to fill a gap in detecting very low precipitation intensity events (<0.05 mm h −1 ) that are crucial to study the aerosol effects and interactions on clouds and rainfall [33].

Intermittent Rain: 12 April 2016
The meteorological situation is more complex with respect to the previous case. In the upper composite plot (the volume depolarization ratio) of Figure 6 we can notice the transit of a front over the observational site with a progressive descent of the cloud base from 0000 UTC to about 0530 UTC. The precipitation is intermittent and at least four events are detected by the algorithm. The rainfall intensity is still low, as shown by in-situ disdrometer observations ( Figure 6): the average intensity during the day is 0.9 mm h −1 , with a peak of 12 mm h −1 around 1300 UTC that lasts only less than 5 min. The precipitation event at 0600 UTC in the rain mask is not detected by the disdrometer as the precipitation intensity is below its sensitivity (0.05 mm h −1 ) . During this day, the precipitation is more intense with respect to the the previous two cases, with an average intensity recorded, from 0000 UTC to 0700 UTC, of about 3.7 mm h −1 . Higher peaks of rain rates of about 8.8 mm h −1 , lasting more than 30 min, completely soak the telescope and receiver optics making it impossible to perform any further detection by the algorithm, as shown in Figure 7. The disdrometer shows that precipitation, even with a lower intensity, lasted almost the whole day. But after 0600 UTC the cloud base drops below 400 m making the detection by the algorithm impossible. This rainfall event highlights the limits of the lidar technology in detecting precipitation under very low signal to noise ratio [34]. From Figure 7 it is possible to fix a detection limit threshold depending on rain intensity and duration, i.e., rain rate > 8 mm h −1 lasting > 30 min.

Overall Intercomparison
The rain detection algorithm has been intercompared with ground-based disdrometer observations over 7 days between November 2015 and April 2016. The intercomparison, as stated before, was carried out at NASA Goddard Space Flight Center. Table 1 summarizes all the precipitation events, together with their lengths used, for the intercomparison.
As shown in Table 1, the algorithm performances are tested vs. the disdrometer observations. The detected precipitation events by the rain masking algorithm and the ground based disdrometers have different sensitivities, i.e., the lidar instrument can detect rainfall episodes not reaching the ground while the disdrometer can not detect intensities lower than 0.05 mm h −1 . On the contrary, higher intensity precipitation (at least 30 min with a rain rate > 8 mm h −1 , see Figure 7) can not be detected by the rain algorithm as the lidar signal is completely extinguished. It is important to stress that the rain masking algorithm has a 100% (14/14) success rate in detecting disdrometer observed precipitation. A detection is considered successful if the disdrometer observation and the algorithm detection share at least one minute in common, independently of the total precipitation duration, which can be different for the reasons previously explained.
Analysis from Table 1, shows that the algorithm is more sensitive at detecting lower intensity precipitation by 22% (4/18). If we examine the bias with respect to precipitation start and stop (see Table 1, fifth column), we found that the detection algorithm precedes the disdrometer observations by about 9 min, while the opposite is true in detection of the precipitation end, where the disdrometer lasts 5.5 min longer. This can be partially attributed to the water soaking the lidar telescope window with a consequent fully attenuated signal. Table 1. Precipitation events observed by Parsivel 2 at NASA GSFC used to validate the rain detection algorithm. The three values reported in the fifth column are the differences in minutes between disdrometer and algorithm in detecting precipitation. The first value represents the difference in minutes with respect to the precipitation start, the second one the same but for the precipitation end, while the third represents the difference in minutes on overall precipitation duration. Overall, the disdrometer measured 1084 min of precipitation vs. 1095 min of detected precipitation by the algorithm. We also calculated the root mean square error (RMSE) with respect to the precipitation global duration as defined in [4], and we found that the average absolute difference in precipitation total duration detected by the algorithm and measured by the disdrometer is of about 10.23 min. Considering the different characteristics of the ground-based instrument and the detection algorithm, the agreement is within reason.

Discussion and Conclusions
Automated networks of instruments started to develop in the last two decades aiming to continuously monitor crucial atmospheric physical, thermodynamic, geometrical, and optical variables. Among them, the NASA MicroPulse Lidar NETwork (MPLNET), active since 1999, has globally deployed more than 21 worldwide observational sites in the tropics, mid-latitudes, and polar regions in both hemispheres to automatically retrieve 24/24 the geometrical and optical properties of aerosol and cloud atmospheric profiles under any meteorological conditions. Despite that lidar has proven to be very effective in detecting especially light precipitation and drizzle, lidar data containing precipitation episodes are currently unjustifiably disregarded. As a proof of concept, in this study we developed a rain masking algorithm, based on the volume depolarization ratio variable, which is proven to be effective in detecting light rain, drizzle, and virga episodes. Once rigorously validated and operationally implemented into the NASA MPLNET lidar network, the rain masking algorithm will consistently help in understanding how light precipitation contributes to cloud formation and will fill a gap left by TRMM and GPM missions in detecting low intensity rainfall episodes. This is crucial to improving global climate model forecasts and for aerosol-clouds and in turn, precipitation interactions. Finally, as future development, the algorithm will be also tested on simpler elastic lidar instruments without the depolarization channel, i.e., the ceilometers, to assess the rain detection feasibility. In more detail, precipitation is a fundamental meteorological phenomenon that is the principal responsible for atmospheric aerosol removal. Analyzing a large database of lidar measurements will help in fully characterizing the aerosol cycle, from emission to deposition, and validate global model observations that show a strong negative correlation between Aerosol Optical Depth (AOD) and precipitation due to wet scavenging [35]. A synergy between both passive and active satellite NASA missions, i.e., the Moderate Resolution Imager Spectrometer (MODIS; [36] and the Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observation (CALIPSO; [37]), and the ground based lidar networks, such as the European Aerosol LIdar NETwork (EARLINET; [38]) part of the Aerosols, Clouds and Trace gases Research Infrastructure (ACTRIS http://www.actris.eu) and in North America as MPLNET, and more in general in the frame of Global Aerosol Watch (GAW) aerosol lidar observation network (GALION; [39]) will strongly contribute to quantitatively assess how the above cloud aerosol load influences clouds and then rainfalls. The synergy will then assess the "all-sky" aerosol contribution to clouds and precipitation.
The image-based technique methodology used in developing the proposed algorithm, will be tested in a future work over different instruments, i.e., ceilometers, where precipitation still looks like a higher-contrasted feature in the range corrected backscattered energy.