Measurements of Rainfall Rate, Drop Size Distribution, and Variability at Middle and Higher Latitudes: Application to the Combined DPR-GMI Algorithm

: The Global Precipitation Measurement mission is a major U.S.–Japan joint mission to understand the physics of the Earth’s global precipitation as a key component of its weather, climate, and hydrological systems. The core satellite carries a dual-precipitation radar and an advanced microwave imager which provide measurements to retrieve the drop size distribution (DSD) and rain rates using a Combined Radar-Radiometer Algorithm (CORRA). Our objective is to validate key assumptions and parameterizations in CORRA and enable improved estimation of precipitation products, especially in the middle-to-higher latitudes in both hemispheres. The DSD parameters and statistical relationships between DSD parameters and radar measurements are a central part of the rainfall retrieval algorithm, which is complicated by regimes where DSD measurements are abysmally sparse (over the open ocean). In view of this, we have assembled optical disdrometer datasets gathered by research vessels, ground stations, and aircrafts to simulate radar observables and validate the scattering lookup tables used in CORRA. The joint use of all DSD datasets spans a large range of drop concentrations and characteristic drop diameters. The scaling normalization of DSDs deﬁnes an intercept parameter N W , which normalizes the concentrations, and a scaling diameter D m , which compresses or stretches the diameter coordinate axis. A major ﬁnding of this study is that a single relationship between N W and D m , on average, uniﬁes all datasets included, from stratocumulus to heavier rainfall regimes. A comparison with the N W –D m relation used as a constraint in versions 6 and 7 of CORRA highlights the scope for improvement of rainfall retrievals for small drops (D m < 1 mm) and large drops (D m > 2 mm). The normalized speciﬁc attenuation– reﬂectivity relationships used in the combined algorithm are also found to match well the equivalent relationships derived using DSDs from the three datasets, suggesting that the currently assumed lookup tables are not a major source of uncertainty in the combined algorithm rainfall estimates.


Introduction
Two of the main instruments onboard the Global Precipitation Measurement (GPM) satellite are the Dual Precipitation Radar (DPR) and the GPM-Microwave-Imager (GMI). The retrieval algorithms for DPR and combined DPR-GMI (referred in the abstract as CORRA, (henceforth referred to as the Combined Algorithm or CMB) have adopted the scaled-normalized drop size distribution (hereafter referred to as the DSD) formulation of Testud et al., Bringi et al., and Lee et al. [1][2][3], respectively, as the basis for the rainfall retrieval procedure. The advantage, as pointed out numerous times in the literature, is that the inverse relation between N W and D m , which has been verified in numerous GPM-GV field programs [26,27]. In the new formulation, D m is first estimated without accounting for the relation between N W and D m , and, in a second step, N W is nudged towards an N W -D m climatological relationship. Retrieval of the parameter, mainly N W , using the state-of-the-art NASA GPM-GV polarimetric radar, is not entirely satisfactory [28].
In view of these recent CMB algorithmic developments, our main goal in this study is to compare the statistical properties of DSD parameters and statistical relationships between radar observables and DSD parameters using our measured DSDs with assumptions currently held in the CMB algorithm. The focus of our study is on characterizing such properties in order to complement recent studies where such properties and relationships have been derived for a variety of other different rain types [22,23,27,29].
Here, we investigate unconditionally and jointly two main DSD parameters (the normalized intercept N W and the mass-weighted mean diameter D m ) of rain drop size distributions from three different rain regimes (a fourth separate regime is the outer rain bands of organized oceanic systems after landfall). Our motivation is driven by the need to better understand and mitigate uncertainties in the GPM radar-radiometer precipitation combined algorithm.
This paper is organized as follows: Section 2 describes the DSD datasets collected by various instruments in different locations and climates. Section 3 describes the data analysis procedures and results, including the statistical relationships between DSD parameters as well as scattering simulations of radar reflectivities and specific attenuations at ku and ka bands. These are compared with the lookup tables used by the CMB algorithm, which are based on the normalized gamma DSD model. Section 4 focuses on the correlation between N W and D m and provides a discussion on the nudging of N W and its impact on the retrievals. We conclude with a discussion of our results and summarize the key points in Section 5.

Instrumentation and Data Collection
Below is a brief summary of the measured DSD sets included in this study (and DSDs from simulated gamma with uncorrelated parameters): (a) Measured DSDs from semi-arid Greeley (GXY), Colorado, USA, and sub-tropical Huntsville (HSV), Alabama, USA, are collated together to form about 2928 3-min averaged DSDs. At each site the 2DVD and MPS instruments were placed inside a 2/3-scaled DFIR (Double Fence Intercomparison Reference windshield [30]). An identical instrument suite was recently installed at the Wallops Precipitation Research Facility (henceforth WFF), to represent a mid-latitude coastal location. Since all three sites had identical instruments, the comparison between them would not have the uncertainty and other complications of using different sensors. The data quality procedures typically follow Schoenhuber et al. [7,8], with some caveats noted by [31]. (b) The NCAR C-130 was operated off the coast of Chile [32] equipped with a 'fast' 1-s 2D-C probe in stratocumulus drizzle (warm rain). The total number of 1-s DSDs was 4142, all quality controlled (J. Jensen, NCAR, personal communication). (c) A very large (arguably the largest ever quality controlled) set of DSDs acquired over the open ocean (OceanRain) described by Klepp et al. [33] and Protat et al. [22,23]. (d) Simulations of gamma DSDs with uncorrelated N W , D m , and shape parameter (µ). (e) The outer rain bands of: (a) Category-1 Hurricane Dorian, described by Thurai et al. [34] and modeled using a cloud particle model by Bringi et al. [35], which traversed the WFF disdrometer network site for ≈8 h; (b) tropical storm Irma (<14 h) near the Huntsville site; (c) tropical depression Nate, which was very shallow at times with negligible echo above the melting layer and 'pure' warm rain at times (overall <16 h) near Huntsville. Figure 1 shows the locations marked as WFF and HSV. The outer rainbands were typically stratiform in nature and occurred in the down shear left quadrant. The reason for including these DSDs is because the dynamics to be very different from the stratiform rain produced by mesoscale convective complexes.
A global location map of the regions where data (a-c) were collected is depicted in Figure 1, and Table 1 gives some brief information relating to the type of instruments, references to the data quality, and other salient features. As can be easily seen in Figure 1, the OceanRain project with an ODM-470 optical disdrometer 'hardened' for shipborne deployment (sited on top of the mast, with a wind vane to orient the instrument perpendicular to the main flow) has the largest number of 1-min DSD data collected during the voyages of the Australian Research Vessel Investigator from 2016 to 2018 covering the middle-to-upper latitudes of the S hemisphere.  A global location map of the regions where data (a-c) were collected is depicted in Figure 1, and Table 1 gives some brief information relating to the type of instruments, references to the data quality, and other salient features. As can be easily seen in Figure 1, the OceanRain project with an ODM-470 optical disdrometer 'hardened' for shipborne deployment (sited on top of the mast, with a wind vane to orient the instrument perpendicular to the main flow) has the largest number of 1-min DSD data collected during the voyages of the Australian Research Vessel Investigator from 2016 to 2018 covering the middle-to-upper latitudes of the S hemisphere. Table 1. Brief summary of the ground-based/OceanRain and C-130 instrumentation.

Instruments Number of DSDs Location
Ground-based (green, blue, and orange '+' marks in Figure 1 We followed the methodology of Lee et al. [3] for the normalization of the DSDs. The reference moments chosen here are M 3 and M 4 . Recall that h(x) is defined via N(D) = N 0 h(x), where x = D/D m . The functional form of h(x) does not necessarily have to be specified as long as its moments are finite. A flexible form used by Thurai and Bringi [9], Protat et al. [22], and Duncan et al. [24] is the generalized gamma, which has two shape parameters. The N 0 is termed the 'normalized' intercept parameter and D m , defined as M 4 /M 3 , is termed the mass-weighted mean diameter. Figure 2 shows the double-moment normalization [3] applied to the combined Greeley (GXY) and Huntsville (HSV) datasets (scatter plot depicted as black-filled circle markers). The color contours depict the 2D histogram or density plot of h(x) using the 4412 1-s stratocumulus drizzle DSDs. The color bar shows the contours of log N. The maximum D m of the drizzle DSDs was about 0.5 mm (see Figure 3a later). Hence, to compare the h(x) between the drizzle and the GXY-HSV data, we chose to threshold the latter at D m = 0.5 mm for consistency. The colored contours of the density of h(x) for the drizzle are compact and the scatter of the GXY-HSV h(x) are noted to be well-confined by the contour at N = 100, which is indicative of h(x) stability across very different rainfall regimes, e.g., [9,22]. Also shown in Figure 2 is the h(x) for the gamma model with μ = 3. The agreement is quite good for x > 0.5, below which the h(x) drops off very rapidly, whereas when two disdrometers are used, such as MPS and 2DVD, the x < 0.5 region is shaped concave up (as in [36]). The total drop concentration in the gamma model with μ = 2 to 3 can be low by an order of magnitude, which will bias, for example, the rain water content and the Ka-band specific attenuation, which are linearly related (we are referring to below cloud base, thus excluding cloud droplets). This is bound to impact the CMB algorithm, where the Ka-band reflectivity and path integrated attenuation are simulated from the attenuation-corrected Ku-band radar data. The colored contours of the density of h(x) for the drizzle are compact and the scatter of the GXY-HSV h(x) are noted to be well-confined by the contour at N = 100, which is indicative of h(x) stability across very different rainfall regimes, e.g., [9,22]. Also shown in Figure 2 is the h(x) for the gamma model with µ = 3. The agreement is quite good for x > 0.5, below which the h(x) drops off very rapidly, whereas when two disdrometers are used, such as MPS and 2DVD, the x < 0.5 region is shaped concave up (as in [36]). The total drop concentration in the gamma model with µ = 2 to 3 can be low by an order of magnitude, which will bias, for example, the rain water content and the Ka-band specific attenuation, which are linearly related (we are referring to below cloud base, thus excluding cloud droplets). This is bound to impact the CMB algorithm, where the Ka-band reflectivity and path integrated attenuation are simulated from the attenuation-corrected Ku-band radar data.

Histograms of DSD Parameters
The histograms of the DSD parameters [log 10 (N W ); D m ] give valuable information when different rain regimes are compared. Note that these parameters can be computed for any measured DSD. Figure 3 panel (a) shows the histogram of D m and panel (c) shows log 10 N W from stratocumulus drizzle using a 'fast' 2D-cloud probe (on aircraft) with a 25 µm resolution. The shape o latter is quite symmetric about the mode which is high at 10 6 mm −1 m −3 . The aircraft also had a cloud droplet probe (CDP) to measure in-cloud size spectra. The stratocumulus clouds were warm, and both cloud droplets and drizzle were present. However, the size spectra from the 2D-C probe, when compared with the droplet spectra from the CDP, showed spectral separation at sizes of about 50 µm. Hence, we have no reason to believe that the log 10 N W histogram of drizzle in panel (c) is incorrect. The D m histogram of drizzle in panel (a) is very positively skewed and of unusual shape. Panels (b) and (c) of Figure 3 compare the histograms of D m and log N W from overland and over ocean. The differences in the shape, mode, and width indicate that oceanic DSDs from the Southern Hemisphere high latitudes are characterized by lower concentrations of slightly bigger drops than DSDs over land. The difference in log (N w ) is consistent with the latitudinal variability observed between oceanic DSDs from the Southern Hemisphere high-latitude and from the Northern Hemisphere mid-latitude bands discussed in [22]. However, no difference was observed between the oceanic D m in those two latitude bands. Therefore, the observed variability between the two datasets seems to be attributable to a mixture of latitudinal and land versus ocean variability. trations of slightly bigger drops than DSDs over land. The difference in log (Nw) is consistent with the latitudinal variability observed between oceanic DSDs from the Southern Hemisphere high-latitude and from the Northern Hemisphere mid-latitude bands discussed in [22]. However, no difference was observed between the oceanic Dm in those two latitude bands. Therefore, the observed variability between the two datasets seems to be attributable to a mixture of latitudinal and land versus ocean variability. Liao et al. [29] obtained histogram shapes and modal values very close to our GXY-HSV combined histogram. Their database had >216,000 1-min DSDs using Parsivel disdrometers from NASA Ground Validation field projects such as MC3E, IFLOODs, and Wallops, representing regimes corresponding to deep summer convection in central Oklahoma, springtime convection in Iowa, and mid-latitude coastal regimes, respec- Liao et al. [29] obtained histogram shapes and modal values very close to our GXY-HSV combined histogram. Their database had >216,000 1-min DSDs using Parsivel disdrometers from NASA Ground Validation field projects such as MC3E, IFLOODs, and Wallops, representing regimes corresponding to deep summer convection in central Oklahoma, springtime convection in Iowa, and mid-latitude coastal regimes, respectively.
It is clear that marine drizzle cannot be detected by the Ku-or Ka-band radars. However, the plot of Z ku versus D m (see Figure 4) from drizzle and GXY-HSV illustrates that perhaps the drizzle branch 'joins' the more intense rainfall region fairly continuously. The reflectivity of drizzle at Ku-band varies very sharply as D m increases (the drizzle is shown as a density plot) like a nearly 'vertical' tower, with D m approximately constant at around 0.2 mm, while N W increases with Z ku . This feature of very tight compression generally occurs only when Z ku is normalized by N W prior to plotting against D m . The scatter plot in Figure 4 is from GXY-HSV, which shows that increasing Z ku is accompanied by increasing D m , along with much higher variability.

NW Versus Dm
As discussed previously, the most important covariability (next to Dm versus R) of the DSD gamma parameters used in the CMB algorithm is between log(NW) and Dm, which enables one to choose the a priori state as well as to nudge the log(NW) in the 'correct' direction to minimize the cost function, as in the latest version of the CMB algorithm. Section 4 deals exclusively with this topic, so only the salient points are given here. Figure 5 shows the log(NW) versus Dm variations derived from the ground-based, OceanRain, and airborne DSDs. A few caveats (see Table 1) of the interpretation of this figure are: (a) the varying number of data points in each rainfall regime (locations); (b) the differences in instrument types and measurement volumes and if theoretical drop fall speeds or measured values are used; (c) no classification of rain type is done, e.g., stratiform or convective rain.

N W versus D m
As discussed previously, the most important covariability (next to D m versus R) of the DSD gamma parameters used in the CMB algorithm is between log(N W ) and D m , which enables one to choose the a priori state as well as to nudge the log(N W ) in the 'correct' direction to minimize the cost function, as in the latest version of the CMB algorithm. Section 4 deals exclusively with this topic, so only the salient points are given here. Figure 5 shows the log(N W ) versus D m variations derived from the ground-based, OceanRain, and airborne DSDs. A few caveats (see Table 1) of the interpretation of this figure are: (a) the varying number of data points in each rainfall regime (locations); (b) the differences in instrument types and measurement volumes and if theoretical drop fall speeds or measured values are used; (c) no classification of rain type is done, e.g., stratiform or convective rain. In spite of these caveats, the following points can be made: (a) From the high resolution (25 microns) 'fast' 2D cloud probe aircraft DSD data, we found that the drizzle Dm is generally <0.5 mm with NW spanning at least two orders of magnitude for any given Dm; (b) The NW-Dm points from GXY-HSV appear to smoothly merge with the drizzle data for Dm < 0.35 mm; (c) The mean power law fit from [22], from their OceanRain DSDs, shown as 'squares', is an excellent fit through the entire size range covered by all datasets (with the exception of drizzle; see Appendix A), despite the large variability of NW for any given Dm. This is a major finding of this paper.

The Relationship between Rain Rate and DSD Gamma Parameters
The rain rate (R) variability is not only due to variations in the parameters that describe the parametric form of the DSD but also the intrinsic or 'underlying' DSD shape termed h(x), which arises as a result of the scaling normalization framework [3,4]. If the h(x) is stable across different rain types and climatologies, it forms a strong constraint that can be used in satellite (and ground-based) radar rainfall retrievals. For example, if the parametric form of h(x) is gamma, then a constant shape factor (e.g., μ = 2 to 3) defines h(x), provided the small drop truncation is negligible. If the disdrometer cannot measure the small drop end accurately, the h(x) can be stable but incorrect. In any case, NW and Dm are the two unknown DSD parameters that need to be retrieved. The ideal scenario is the selection of the number and order of the reference moments, whereby the scatter in h(x) is greatly reduced. In practice, the integral quantities that need to be estimated are R and rain water content (W) and perhaps other moments along with their error statistics. The GPM mission has adopted the Testud et al. [1] formulation (i.e., the scaled-normalized form) of the DSD.
From a theoretical (normalized) gamma model with parameters (NW, Dm, and μ) along with the simplified fall speed relation v = 3.78 D 0.67 [37], where v is in ms −1 and D in mm, it follows from [1] that: = .
(2) In spite of these caveats, the following points can be made: (a) From the high resolution (25 microns) 'fast' 2D cloud probe aircraft DSD data, we found that the drizzle D m is generally <0.5 mm with N W spanning at least two orders of magnitude for any given D m ; (b) The N W -D m points from GXY-HSV appear to smoothly merge with the drizzle data for D m < 0.35 mm; (c) The mean power law fit from [22], from their OceanRain DSDs, shown as 'squares', is an excellent fit through the entire size range covered by all datasets (with the exception of drizzle; see Appendix A), despite the large variability of N W for any given D m . This is a major finding of this paper.

The Relationship between Rain Rate and DSD Gamma Parameters
The rain rate (R) variability is not only due to variations in the parameters that describe the parametric form of the DSD but also the intrinsic or 'underlying' DSD shape termed h(x), which arises as a result of the scaling normalization framework [3,4]. If the h(x) is stable across different rain types and climatologies, it forms a strong constraint that can be used in satellite (and ground-based) radar rainfall retrievals. For example, if the parametric form of h(x) is gamma, then a constant shape factor (e.g., µ = 2 to 3) defines h(x), provided the small drop truncation is negligible. If the disdrometer cannot measure the small drop end accurately, the h(x) can be stable but incorrect. In any case, N W and D m are the two unknown DSD parameters that need to be retrieved. The ideal scenario is the selection of the number and order of the reference moments, whereby the scatter in h(x) is greatly reduced. In practice, the integral quantities that need to be estimated are R and rain water content (W) and perhaps other moments along with their error statistics. The GPM mission has adopted the Testud et al. [1] formulation (i.e., the scaled-normalized form) of the DSD.
From a theoretical (normalized) gamma model with parameters (N W , D m , and µ) along with the simplified fall speed relation v = 3.78 D 0.67 [37], where v is in ms −1 and D in mm, it follows from [1] that: (2) Figure 6 shows the theoretical plot of α = f(µ), showing that α varies very slowly with µ in the range of −2 to 10. Figure 7 shows the variation in R/N W versus D m obtained by Remote Sens. 2021, 13, 2412 9 of 20 (a) simulating a gamma distribution with the parameters being uncorrelated from each other but uniformly distributed in the intervals 200 to 6 × 10 6 mm −1 m −3 , 0 to 4 mm, and −2 to 4 for N W , D m , and µ, respectively, and (b) using the theoretical equation with µ = 0, i.e., exponential DSD. The agreement between using (a) random gamma (i.e., with random parameters being uniform in their respective intervals) and (b) a fixed gamma with shape parameter 0 is a point to be noted. To the best of our knowledge, this point has not been made in the literature. A simple explanation is that N W cancels off in the ratio R/N W . The residual variability in R/N W is then due to (µ, D m ). However, in the random gamma simulations, there is no imposed correlation between log(N W ) and D m or R and D m . To complete this sub-section, Figure 8 compares R/N W versus D m from the measured DSDs from the 1-s marine stratocumulus, OceanRain, and GXY-HSV. The curves are practically identical, with differences in R/N W being of the order of <1% in log scale.
Remote Sens. 2021, 13, x FOR PEER REVIEW 10 of 24 Figure 6 shows the theoretical plot of α = f(μ), showing that α varies very slowly with μ in the range of −2 to 10. Figure 7 shows the variation in R/NW versus Dm obtained by (a) simulating a gamma distribution with the parameters being uncorrelated from each other but uniformly distributed in the intervals 200 to 6*10 6 mm −1 m −3 , 0 to 4 mm, and −2 to 4 for NW, Dm, and μ, respectively, and (b) using the theoretical equation with μ = 0, i.e., exponential DSD. The agreement between using (a) random gamma (i.e., with random parameters being uniform in their respective intervals) and (b) a fixed gamma with shape parameter 0 is a point to be noted. To the best of our knowledge, this point has not been made in the literature. A simple explanation is that NW cancels off in the ratio R/NW. The residual variability in R/NW is then due to (μ, Dm). However, in the random gamma simulations, there is no imposed correlation between log(NW) and Dm or R and Dm. To complete this sub-section, Figure 8 compares R/NW versus Dm from the measured DSDs from the 1-s marine stratocumulus, OceanRain, and GXY-HSV. The curves are practically identical, with differences in R/NW being of the order of <1% in log scale.

The Relation between Normalized Radar Quantities and Dm
The CMB algorithm uses measured Zku and the generalized Hitschfeld-Bordan algorithm [19], where the specific attenuation kku versus Zku relation is expressed as a power law, to derive the initial profile of Dm. In this context, Figure 9 is important as the DSD-based simulations of kku and Zku (using the T-matrix scattering code; [38]) and their ratio (from various regimes described in the legend) fall on an 'invariant' curve without the need for rain type classification. Note that we have added DSDs from the outer rain bands of tropical storms Nate and Irma (after landfall near Huntsville) and hurricane Dorian (as it traversed over Wallops [34,35]). These DSDs are primarily stratiform with bright-band and some embedded weak convection (in total covering > 25 h of rainfall), with Nate having the lowest R (shallow and warm rain at times). For 0.5 < Dm < 2.5 mm, their overlap with OceanRain and GXY-HSV is visually excellent. The most relevant is the excellent fit of the CMB look-up table values to the measured DSDs (black dashed line). The variability of kku/Zku for a given Dm is due to the numerator and denominator being proportional to different moments of the DSD (approximately M3 and M6).
We now turn our attention to the relation between R and radar observables, such as Zku, that have both been normalized by NW. This relation is used in the CMB retrieval algorithm, and therefore, it is important to compare the lookup tables used in the CMB with measured DSDs and our T-matrix scattering calculations. The lookup tables are based on theoretical gamma DSDs with the parameters Dm and μ being varied systematically (and NW set to 1 mm −1 m −3 ). Figure 10 shows the relation between R/NW and Zku/NW for the different regimes. The straight line fits are remarkably parallel to each other. Note that a straight line fit to log(R/NW) versus log[Zku/NW] gives the slope (β) and intercept (α) of a power law of the form R = α NW (1-β) Zku β . It certainly appears that a single relation, with mean α = 0.0014 and mean β = 0.6514, would be appropriate for all the DSDs considered. Table 2 lists the (α, β) values, including the values for the random gamma simulated DSDs. It can be shown that there is a close relationship between NW and the parameter ε described by Iguchi et al. [39].

The Relation between Normalized Radar Quantities and D m
The CMB algorithm uses measured Z ku and the generalized Hitschfeld-Bordan algorithm [19], where the specific attenuation k ku versus Z ku relation is expressed as a power law, to derive the initial profile of D m . In this context, Figure 9 is important as the DSDbased simulations of k ku and Z ku (using the T-matrix scattering code; [38]) and their ratio (from various regimes described in the legend) fall on an 'invariant' curve without the need for rain type classification. Note that we have added DSDs from the outer rain bands of tropical storms Nate and Irma (after landfall near Huntsville) and hurricane Dorian (as it traversed over Wallops [34,35]). These DSDs are primarily stratiform with bright-band and some embedded weak convection (in total covering > 25 h of rainfall), with Nate having the lowest R (shallow and warm rain at times). For 0.5 < D m < 2.5 mm, their overlap with OceanRain and GXY-HSV is visually excellent. The most relevant is the excellent fit of the CMB look-up table values to the measured DSDs (black dashed line). The variability of k ku /Z ku for a given D m is due to the numerator and denominator being proportional to different moments of the DSD (approximately M 3 and M 6 ).
We now turn our attention to the relation between R and radar observables, such as Z ku , that have both been normalized by N W . This relation is used in the CMB retrieval algorithm, and therefore, it is important to compare the lookup tables used in the CMB with measured DSDs and our T-matrix scattering calculations. The lookup tables are based on theoretical gamma DSDs with the parameters D m and µ being varied systematically (and N W set to 1 mm −1 m −3 ). Figure 10 shows the relation between R/N W and Z ku /N W for the different regimes. The straight line fits are remarkably parallel to each other. Note that a straight line fit to log(R/N W ) versus log[Z ku /N W ] gives the slope (β) and intercept (α) of a power law of the form R = α N W (1-β) Z ku β . It certainly appears that a single relation, with mean α = 0.0014 and mean β = 0.6514, would be appropriate for all the DSDs considered. Table 2 lists the (α, β) values, including the values for the random gamma simulated DSDs. It can be shown that there is a close relationship between N W and the parameter ε described by Iguchi et al. [39].  At Ka-band, shown in Figure 11, the relationship of normalized R with normalized Zka is bound to be affected by non-Rayleigh scattering, as indicated by the enhanced scatter for the OceanRain DSDs. Both the GXY-HSV and OceanRain show non-linear behavior in the log-log domain. The slopes vary depending on the DSD source, the stratocumulus drizzle showing the smallest β = 0.64, whereas the GXY-HSV, the random gamma, and the CMB table show that the corresponding β values are very close to 0.73. The CMB fit at Ka-band matches the observations in terms of the slope up to Ze/NW~ 1, but a more complex piecewise fitting would be required to better match the non-linearity    Figure 11. As in Figure 10, but for Ka-band.
Another important plot in the context of evaluating the current assumptions held in the CMB algorithm is k/NW vs. Ze/NW (see Figures 12 and 13), which, if linear in the log-log domain, will result in (as before) k = α NW (1-β) Ze β . The linearity between the various measured DSDs is not expected to be perfect, i.e., some non-linearity is to be expected at the drizzle end (where different moments are correlated: M6 versus M3) and at the upper end (non-Rayleigh scattering versus M3). This is confirmed in Figures 12 (for Ku-band) and 13 (for Ka-band). Tables 2 and 3 give the coefficient α and the exponent β for the four DSD regimes, as represented by the three datasets. The CMB table values are close to the others, except for the drizzle fits. Additionally, note that the non-linearity is more evident at Ku-band ( Figure 12) relative to Ka-band ( Figure 13). Atlas et al. [37] showed that at Ka-band, the specific attenuation is linear with rain rate but not at lower frequencies, which would account for the non-linearity at Ku-band. The 'best' match at Ku-band is between the measured DSDs from GXY-HSV and the CMB table. From the radar profiling perspective, normalized k-Ze relationships are crucial for attenuation correction. The CMB lookup tables are based on a set of theoretical gamma DSDs (and not measured DSDs), yet the slope and exponent from the CMB tables are in good agreement with measured DSDs (exceptions noted above). Note also from Table 3, that the fit to random gamma-simulated DSDs is nearly identical with the CMB lookup table values. At Ka-band, shown in Figure 11, the relationship of normalized R with normalized Z ka is bound to be affected by non-Rayleigh scattering, as indicated by the enhanced scatter for the OceanRain DSDs. Both the GXY-HSV and OceanRain show non-linear behavior in the log-log domain. The slopes vary depending on the DSD source, the stratocumulus drizzle showing the smallest β = 0.64, whereas the GXY-HSV, the random gamma, and the CMB table show that the corresponding β values are very close to 0.73. The CMB fit at Ka-band matches the observations in terms of the slope up to Ze/N W~1 , but a more complex piecewise fitting would be required to better match the non-linearity caused by non-Rayleigh scattering.
Another important plot in the context of evaluating the current assumptions held in the CMB algorithm is k/N W vs. Z e /N W (see Figures 12 and 13), which, if linear in the log-log domain, will result in (as before) k = α N W (1-β) Z e β . The linearity between the various measured DSDs is not expected to be perfect, i.e., some non-linearity is to be expected at the drizzle end (where different moments are correlated: M 6 versus M 3 ) and at the upper end (non-Rayleigh scattering versus M 3 ). This is confirmed in Figure 12 (for Ku-band) and Figure 13 (for Ka-band).  The non-linearity of k/NW versus Ze/NW, while introducing complications in the attenuation correction algorithm, does not fundamentally change the methodology. A major source of uncertainty in the attenuation correction process is obviously NW. The large variability of NW and the NW-Dm relationships in the same (or different) climate regimes was documented in the prior section for our datasets, but these are still limited and need to be expanded to reduce uncertainties. The first results of using the research version of the CMB algorithm seem to indicate that over the Southern Ocean, the CMB algorithm starts the retrieval process with an underestimated value of NW. This could potentially explain the large discrepancies reported between different satellite products at high southern hemisphere latitudes [17].   Tables 2 and 3 give the coefficient α and the exponent β for the four DSD regimes, as represented by the three datasets. The CMB table values are close to the others, except for the drizzle fits. Additionally, note that the non-linearity is more evident at Ku-band ( Figure 12) relative to Ka-band ( Figure 13). Atlas et al. [37] showed that at Ka-band, the specific attenuation is linear with rain rate but not at lower frequencies, which would account for the non-linearity at Ku-band. The 'best' match at Ku-band is between the measured DSDs from GXY-HSV and the CMB table. From the radar profiling perspective, normalized k-Z e relationships are crucial for attenuation correction. The CMB lookup tables are based on a set of theoretical gamma DSDs (and not measured DSDs), yet the slope and exponent from the CMB tables are in good agreement with measured DSDs (exceptions noted above). Note also from Table 3, that the fit to random gamma-simulated  DSDs is nearly identical with the CMB lookup table values.   Table 3. Fitted coefficient α and the exponent β, corresponding to Figures 12 and 13. The non-linearity of k/N W versus Z e /N W , while introducing complications in the attenuation correction algorithm, does not fundamentally change the methodology. A major source of uncertainty in the attenuation correction process is obviously N W . The large variability of N W and the N W -D m relationships in the same (or different) climate regimes was documented in the prior section for our datasets, but these are still limited and need to be expanded to reduce uncertainties. The first results of using the research version of the CMB algorithm seem to indicate that over the Southern Ocean, the CMB algorithm starts the retrieval process with an underestimated value of N W . This could potentially explain the large discrepancies reported between different satellite products at high southern hemisphere latitudes [17].
From the plots in Figures 10-13, a strong result, hitherto not reported in the literature, is that after normalization, the CMB tables are in very good agreement with observational data and rather surprisingly with random gamma DSDs as well. The marine drizzle DSDs cannot be detected by GPM sensors due to sensitivity limits at Ku-and Ka-bands. Furthermore, the biases and uncertainties in CMB retrievals could arise from the incorrect specification of the a priori state and how representative it is of the climatology of the retrieval regime. One useable constraint is the relation between log(N W ) and D m (see Figure 5), which on average is close to the fit from the OceanRain DSDs alone [22,23]. Notwithstanding the large spread due to different rain regimes, we have indirect evidence that N W -D m constraints are useful, but not fully proven as of yet, in improving the retrieval of the rain rate using the CMB Algorithm.

Application to the CMB Algorithm
As mentioned in the previous sections, the combined algorithm employs a nudging procedure to weakly enforce N W -D m climatological constraints in the retrievals. The reason why N W -D m climatological relationships are useful is that even when dual-frequency observations are available (i.e., within the matched swath before June 2018 and across the entire swath after the change in the DPR pattern scanning strategy), N W and D m cannot be uniquely and unambiguously estimated. Climatological N W -D m information improves the estimation process by eliminating estimates inconsistent with a priori observations. The N W -D m constraining procedure was introduced in Version 6 of the combined algorithm [25]. Specifically, the relation between N W and D m was derived from the GPM Validation Network (VN; [40]) polarimetric radar products and implemented in the algorithm in a two-step procedure. In the first step, a radar-only retrieval is performed, assuming a nominal N W value that is independent of D m . In the second step, the N W is nudged towards the value predicted by the VN relations for the D m derived in step 1. Specifically, N W is updated by an increment proportional to the difference between the value predicted by the VN relation and its first step value. In V6, the procedure is applied only to stratiform rain and the nudging coefficient is determined through trial and error. The nudging is not applied to convective precipitation, as the VN N W -D m relationships predicts N W values that tend to result in substantial underestimation of severe convection.
A two-dimensional N W -D m histogram (or density plot) derived from one month (August 2018) of V6 data is shown in Figure 14a. The average VN and OceanRain N W -D m relationships are also shown in the figure. As apparent in the figure, although the (color contoured) density plot of the retrieved N W -D m exhibits an inverse relationship qualitatively similar to those derived from direct observations, there are also significant discrepancies, especially for large D m values. Some of these discrepancies may be explained by the differences between the spatial scales of the satellite and ground products. However, notable differences exist between the VN and OceanRain fits as well. As the nudging coefficient was determined by trial and error to minimize biases with respect to the same ground reference products as those in [15], the use of the VN-based N W -D m relationship (which is highly biased for a small D m and less biased for a large D m with respect to the OceanRain relationship) did not result in increased biases for D m smaller than 0.7 mm and larger than 2.0 mm. This is also the reason the OceanRain fit (which is not used per se in the Nw adjustment) is in better agreement with the density plot of the V6 N W -D m data than the VN relationship (which is actually used in the adjustment). As the details of this research were still emerging at the time the NW-Dm nudging procedure was revisited in V7 of the combined algorithm, the VN NW-Dm rather than OceanRain curve (which is also consistent with the GXY-HSV dataset) was used. Nevertheless, several changes were introduced in the algorithm. Specifically, the nudging procedure was extended to convective precipitation as well. As the VN stratification by precipitation types did not result in significantly different NW-Dm curves (except for very large Dm), the same curve shape was used for both convective and stratiform. However, the convective curve was shifted upwards by log10(NW) = 0.5 relative to the stratiform curve. That is, NW was nudged towards a relationship that predicts a value of log10(NW) that is larger by 0.5 than the curve shown in Figure 14 (which is valid only for stratiform precipitation). The shift was determined by trial and error and evaluation against ground estimates. In retrospect, the OceanRain NW-Dm curve (which, as mentioned before, is consistent with the GXY-HSV dataset) is a better option, but other items requiring attention in the combined algorithm (such as the mitigation of ground clutter and uncertainties in orographic) precluded additional NW-Dm updates in V7. Nevertheless, the current NW-Dm updates resulted in an overall increase of 7% of surface rain rates over land, which appear to offset some known biases in V6, and better consistency between single and dual frequency retrievals. The V7 histogram (density plot) of retrieved NW and Dm is shown in Figure 14b. As apparent in the figure, the agreement between the combined retrievals and OceanRain is improved for the same reason mentioned in the previous paragraph. The most significant changes in the V7 NW-Dm two-dimensional histogram As the details of this research were still emerging at the time the N W -D m nudging procedure was revisited in V7 of the combined algorithm, the VN N W -D m rather than OceanRain curve (which is also consistent with the GXY-HSV dataset) was used. Nevertheless, several changes were introduced in the algorithm. Specifically, the nudging procedure was extended to convective precipitation as well. As the VN stratification by precipitation types did not result in significantly different N W -D m curves (except for very large D m ), the same curve shape was used for both convective and stratiform. However, the convective curve was shifted upwards by log 10 (N W ) = 0.5 relative to the stratiform curve. That is, N W was nudged towards a relationship that predicts a value of log 10 (N W ) that is larger by 0.5 than the curve shown in Figure 14 (which is valid only for stratiform precipitation). The shift was determined by trial and error and evaluation against ground estimates. In retrospect, the OceanRain N W -D m curve (which, as mentioned before, is consistent with the GXY-HSV dataset) is a better option, but other items requiring attention in the combined algorithm (such as the mitigation of ground clutter and uncertainties in orographic) precluded additional N W -D m updates in V7. Nevertheless, the current N W -D m updates resulted in an overall increase of 7% of surface rain rates over land, which appear to offset some known biases in V6, and better consistency between single and dual frequency retrievals. The V7 histogram (density plot) of retrieved N W and D m is shown in Figure 14b. As apparent in the figure, the agreement between the combined retrievals and OceanRain is improved for the same reason mentioned in the previous paragraph. The most significant changes in the V7 N W -D m two-dimensional histogram (density plot) relative to the V6 density plot appear to occur for D m values around 1.0 mm and smaller. Nevertheless, the discrepancies between retrieved N W -D m and the fit to the OceanRain data for D m > 2.0 mm are large for both V6 and V7 and are suggestive of Non-Uniform Beam Filling (NUBF) effects that may impact both the combined retrievals and the Ocean-Rain analysis. The reconciliation of these discrepancies is left to future studies, as it likely requires a systematic revision of the NUBF parameterizations in the combined algorithm and an extended methodology to mimic NUBF effects in the N W -D m relation derived from the OceanRain data.

Discussion and Summary
Multiple studies [41,42], have shown that dual frequency space-borne radar observations are generally insufficient to unambiguously estimate rain DSDs. The inclusion of Path Integrated Attenuation (PIA) estimates from the Surface Reference Technique (SRT; [43]) improves the accuracy of N W and D m estimates [44], but the SRT PIA estimates are not always reliable and usable in the estimation process. Similarly, the inclusion of over-ocean radiometer observations in the estimation process, although beneficial [45], does not fully address the ambiguity problem, as in some situations, the signal from the surface and cloud water cannot be reliably separated from that of rain in the radiometer observations. As a consequence, additional information, such as the inverse climatological relation between N W and D m , is necessary to further improve the accuracy of precipitation estimates from space-borne radar observations. Jointly, the four DSD datasets investigated in this study span a large range of normalized intercepts N W and mass-weighted mean diameters D m . However, a unifying mean N W -D m relation appears to exist. Specifically, a relationship between N W and D m was derived exclusively from the OceanRain dataset. At the same time, the joint N W -D m distributions from the OceanRain and the GXY-HSV datasets overlap significantly, which, consequently, makes the OceanRain N W -D m relationship a good fit for the GXY-HSV dataset as well. This suggests that the OceanRain N W -D m relationship would be an appropriate constraint for global space-borne radar-based precipitation algorithms, such as the GPM combined algorithm. A comparison between the operational N W -D m relation used as a constraint in versions 6 and 7 of the combined algorithm and the OceanRain relation shows deficiencies in the combined relation for small drops (D m < 1 mm) and large drops (D m > 2 mm). However, given that the constraining procedure involves nudging coefficients that have been determined by trial and error and validated against ground precipitation products, the impact of inadequacies in the combined N W -D m constraint is rather limited. Nevertheless, the implementation of the OceanRain relation in the combined algorithm is likely to result in more accurate precipitation estimates and a better agreement between the N W -D m distributions estimated by the combined algorithm and the N W -D m distributions from the three datasets investigated in this study. Other priorities in the combined algorithm, such as the mitigation of ground clutter effects, precluded the implementation of the OceanRain relation in version 7 of the combined algorithm, but it is anticipated that the OceanRain N W -D m will be implemented in the next version of the combined algorithm.
We have devoted considerable effort to comparing the statistical relationships between the parameters of the measured gamma DSD distributions, such as N W -D m , R/N W versus D m , and k ku /Z ku versus D m . These statistical relationships are in excellent agreement with the mean fit from other similar studies [27,29]. Figures 10-13 showed important normalized relationships between specific attenuation and reflectivity, and R/N W versus k/Z e , which were compared to the equivalent relationships in the CMB lookup tables. Although this study has demonstrated that some improvements could be brought to the ka-band relationships to better account for non-Rayleigh scattering, the overall agreement between CMB lookup tables and the simulations from measured DSDs was excellent in terms of mean power law fit values as compared in Tables 2 and 3. This suggests that the combined lookup tables [15] are not a major source of uncertainty in the combined estimates, which is an important finding of this paper (which we believe has not been reported in other studies). It appears that the limited and hard to untangle amount of information in the observations along with other sources of uncertainties such as multiple scattering and non-uniform beam filling effects are the main causes of random and systematic errors in the combined precipitation estimates. Acknowledgments: We wish to thank J. Vivekanandan and J. Jensen from NCAR for supplying quality-controlled aircraft-installed 2D-C 1-s data and G-J Huang for assistance with data processing of the aircraft data.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of this study; in the collection, analyses, or interpretation of its data; in the writing of this manuscript, and in the decision to publish these results. Figure 5 shows that the power-law fit from [22] was a good overall representation of the variation of N W with D m for all datasets for the entire range, despite the large spread, e.g., in N W for any given D m . To quantify this spread, we consider here the differences between the N W from the DSDs and from the power law fit in [22] using logarithmic transformation. If we denote N W and D m from the DSDs as N W (measured) and D m (measured) , then: ∆ log 10 N W = log 10

Appendix A. N W versus D m Fitted Curve
Histograms of ∆ log 10 N W for the GXY-HSV dataset and the OceanRain dataset are shown in panel (a) of Figure A1, in green and red, respectively. Since the two histograms resemble a Gaussian-like distribution, a non-linear least-squares fit to a function with four unknown parameters, p 0 , p 1 , p 2 , and p 3 , was computed for each of the histograms. The function, denoted as f(x), used here is the following: In Equations (A3) and (A4), p 0 will represent the maximum fitted value of the histograms, p 1 the mode (position) of the histograms, p 2 the standard deviation, and p 3 any offsets connected with the histograms. Values of the fitted parameters are given in Table A1.
The fitted curve for the OceanRain Data shows somewhat lower value for p 2 (i.e., narrower) and higher value for p 0 (taller) than for the GXY-HSV; furthermore, it has an offset p 3 very close to zero. This is to be expected since the power-law fit given in Equation (A2) was originally done for the OceanRain datasets. Both have a p 1 close to zero, indicating that there is very little systematic bias. The overall conclusion here is that Equation (A2) is an excellent representation for the mean variation for both datasets, but that the GXY-HSV curve shows more spread. Note that the N W versus D m also has a dependence on rain type, as shown for example in [46], where stratiform and convective rain were shown to be separated by a clear demarcation line. Hence, one can expect further improvement if the two rain types are considered separately.
For the drizzle dataset, however, it was found that Equation (A2) was not a good representation of the N W -D m variation; the blue curves in panel (b) of Figure A1 show this clearly. A very clear bias is seen, having the mode of ∆ log 10 N W at~−0.85. for the GXY-HSV (green) and OceanRain (rain) datasets, and their fitted curves; (b) the same as (a) but with drizzle histograms included (blue). Table A1. Values of the parameters fitted to the histograms in Figure A1a.  Table A1. Values of the parameters fitted to the histograms in Figure A1a.