Coupling a Neural Network with a Spatial Downscaling Procedure to Improve Probabilistic Nowcast for Urban Rain Radars

: Rainfall forecasting plays a key role in mitigating environmental risks in urban areas, which are subject to increasing hydrogeological risk due to transformations in the urban landscape. We present a new technique for probabilistic precipitation nowcasting by generating an ensemble of equiprobable forecasts, which is especially useful for weather radars with limited spatial range, and that can be used operationally on devices with low computational capacity. The ensemble members are obtained by a novel stochastic noise generation process, consistent with the spatial scales not resolved by the prediction model, which allows continuous downscaling of the output of a deep generative neural network. Through an in-depth analysis of the results, for precipitation accumulated in the ﬁrst hour, measured by all the most robust skill indicators, extended to an entire year of data at 5-min time resolution, we demonstrate that the proposed procedure is able to provide calibrated, reliable, and sharp ensemble rainfall forecasts with a quality comparable or superior to the state-of-the-art classical alternative optical ﬂow technique. The ensemble generation procedure we propose is sufﬁciently general to be applied in principle to other deterministic architectures as well, thus enabling their generalization in probabilistic terms.


Introduction
Precipitation nowcasting, following the definition of the World Meteorological Organization (WMO), is a high-resolution forecasting technique up to two hours in the future that is essential for weather-dependent decision-making processes in many sectors, such as emergency services, flood warning systems, air traffic control, and energy management to name a few [1].
Nowcasting systems based on the evolution of the estimated precipitation field by weather radar have been shown to be remarkably effective Quantitative Precipitation Nowcasting (QPN) tools.The most common classical nowcast methodologies are based on the concept of optical flow (OF) and make use of a sequence of images prior to the issue of the forecast to advect precipitation fields with radar-based wind estimates [2][3][4].These methods suffer from increasing uncertainty as the forecast time increases, due to the fact that they lack a physical description of thunderstorm growth and decay, which is considered to be the major deficiency of such nowcasting systems [5].One approach to mitigate this drawback is the fusion of Numerical Weather Prediction (NWP) data with radar data [6] or even the assimilation of radar data into NWP models [7].A different approach for characterizing the uncertainties in the evolution of nowcasting has been introduced with STEPS [8], where the S-PROG scheme [9] produces a cascade of forecasts, the small-scale features of which are randomly generated consistently with the uncertainty expressed by the nowcasting model in the period preceding the forecast.STEPS has proved to generate skillful rainfall forecast in a variety of rainfall conditions [10] and in addition to being currently used operationally has also become a de facto benchmark for any data-driven nowcast procedure and a study and development platform for the science community [11].
The potential offered by the highly nonlinear neural network approach to the problem of rainfall radar nowcasting, starting from the pioneering studies of the Hong-Kong Meteorological Observatory group [12,13], has also been the subject of widespread interest in recent years [14][15][16][17][18].In [15], neural networks have been used to detect growth and decay patterns, achieving significant performance improvements in terms of RMSE compared with a forecast based on classical OF methods, but also highlighting the need for a probabilistic approach to quantify the uncertainty of the forecast.In [14], a U-Net type network architecture is proposed to implement a data driven rain radar nowcasting system, achieving good performance but highlighting the progressive alteration of the spectral characteristics of the solution in the forecast.In [16], the same U-Net architecture is enriched with attention modules obtaining good results and limiting the size of the model needed.In [19], rain radar images and wind forecasts are used in a deep learning model to transform the rainfall intensity regression problem into a classification problem in which the probability of exceeding a certain precipitation threshold is estimated.In [17], a deep architecture with a self attention mechanism is proposed for the realization of a probabilistic forecast, integrating ground-based radar information with satellite images and weather model forecasts obtaining a forecast over the entire territory of the US.Using significant computational resources, the probabilistic forecast is achieved by generating precipitation probabilities for different thresholds.An alternative approach for making probabilistic forecasts is proposed in [18], where the problem of loss of detail and spectra alteration is mitigated by using a generative type network to produce different scenarios from the learned distributions.More recently, in [20], by combining a discriminator and an attention module, the authors show that their NN architecture outperforms the current operational QPN with lead times till 3 h.
As also highlighted in [20], the forecasts issued by current deep learning systems express uncertainty with increasing timescales related to the increasing level of smoothness of the predicted precipitation fields.It is therefore not surprising that such forecasts may completely miss some small-scale weather phenomena that, as mentioned earlier, are critical to the operational value of the forecast itself [13].Another common feature of all neural network-based methods is the huge computational resources required and the challenging replicability of the architecture solutions also because they are often not released in the public domain.OF solutions such as STEPS on the other hand exhibit a problem that cannot be eliminated when applied to small domains due to the advection of the information out of the domain as the prediction time increases.
The basic idea of this work is to verify the feasibility and effectiveness of an approach that, by overcoming these limitations, manages to fully exploit the unquestionable predictive potential demonstrated by the neural networks, and by the PredNet [21,22] in particular.The chosen operational model is based on the interpretation of the neural network prediction as an optimal estimate of the mean of an ensemble of nowcasts which spread in turn is representative of the associated uncertainty.The approach, which we believe can also be generalized to other nowcast systems based on deterministic neural networks other than PredNet, consists of injecting, during the extrapolation phase, stochastic noise at scales not explicitly resolved by the network.
This, as we shall see, makes it possible to generate an ensemble of forecasts that we verify are all in a good approximation equally likely, in an operational configuration similar to that of the STEPS procedure, which is thus the natural candidate to be used as a benchmark for testing its performance.
In the following, Section 2 presents data and methods used to obtain the results discussed in Section 3. Section 4 hosts the conclusions and a discussion of potential future developments.

Materials and Methods
This section describing Materials and Methods is organized as follows: the data used for simulation and verification are presented in Section 2.1, and Section 2.2 describes the neural network used in this analysis, while the procedure for its probabilistic exploitation and the STEPS methodology used as a benchmark are presented in Sections 2.3 and 2.4, respectively.

Data
Our motivation for undertaking this study was to test the feasibility of a short-range precipitation forecasting procedure that could be used to cover the nowcasting needs for a metropolitan area [23].To this end, we used data measured from a meteorological radar located in the region of Osaka and Kyoto cities in Japan.These data, in the public domain and already used in [24], were collected from an online archive available on Yahoo (https://developer.yahoo.co.jp/webapi/map/openlocalplatform/v1/js/; the service is no more available from Wednesday, 6 April 2022), and cover three consecutive years, from 2015 to 2017.The spatial resolution is 1 km 2 , the temporal frequency is 5', and the domain, shown in Figure 1, covers an area of about 10 4 km 2 .The climate of the area is classified as warm-temperate, with 1730 mm annual cumulation and particularly heavy rainfall during the summer season.The resolution of the radar images obtained through an API is higher than the actual resolution of the data, so an upscaling of the image is performed as the first operation that results in the creation of square matrices of 96 × 96 elements.The data are a false-color representation of precipitation intensity.For each pixel, colors can then be mapped as a progressive index of rainfall intensity ranging from 0 to 14 as in Table 1 to which increasing values of precipitation rate, from 0 mm/h up to 100 mm/h, correspond.The array containing the color index (class) is normalized to a maximum unit value for processing in the neural network and is converted back to the rainfall intensity value during post-processing.The use of rainfall intensity or its processing by a logarithmic transformation proved to be less effective than the direct processing of the color scale index.The data were also filtered by removing all weather events in which the radar images were not complete, either because they were missing or because portions of the image had been lost.

PredNet
The radar image nowcasting problem can be considered a supervised learning problem, in which the parameters of a neural network are optimized to minimize an objective function that depends on the differences between the estimated precipitation image and the actual field.Among various network architectures applied to the weather nowcasting problem, we opted for the PredNet model.This paper is the evolution of a previous study [22] in which we analyzed the performance of the PredNet architecture, highlighting its excellent performance in merely deterministic terms, but also its limitations and the need for a revisiting in probabilistic terms.As specified, the architecture is borrowed from computer vision, likewise the optical flow methodology, which represents the state-of-theart architecture in the field.It was chosen because of the originality of the implemented approach and because of its generality.Other more recent approaches have introduced improvements that are not directly applicable to the case of precipitation nowcasting because they are specific to the prediction of video frames, assuming that the moving objects are immutable in shape, or introducing the concept of image depth and perspective, clearly not applicable to our application.
The PredNet architecture consists of a deep, recursive convolutional network with connections from large to small scale (top-down) and vice versa.The approach leverages the experience of similar models based on recursive and convolutional networks for predicting video frames.It also draws particular inspiration from the concept of 'predictive coding' in the neuroscience literature, which hypothesizes that the brain continuously makes predictions by anticipating incoming stimuli.
The PredNet architecture employed is shown sketched in Figure 2 and consists of a ladder structure with a descending branch that generates the prediction of the precipitation field image at progressively more detailed scales and an ascending branch in which the prediction is compared with actual data.Further descriptive details of the architecture can be found in [22] to which we refer the reader for brevity and in Lotter, Kreiman, and Cox's article [21].
The training is conducted with a Gradient Descent procedure in which the network parameters are modified to minimise e 0 , i.e., the Mean Absolute Error (MAE) between the calculated and the real image.Since we are interested in predicting not only the next frame, the loss function used in the training phase is obtained as the average MAE error of the 24 images h 0 = Xt generated by the network with respect to the real precipitation images a 0 = X t : where MAE is an operator that returns the average absolute error between the real radar image and the one estimated by the model; the sum is extended to both the 12 instants used for learning the current weather situation and the 12 instants of the forecast, in order to train the network for both the reconstruction of the radar image and the rainfall forecast.
Journal Not Specified igure 3. PredNet: on the left the general architecture with the Discriminator and Generator n the right the details of the two blocks.
The training is conducted with a Gradient Descent procedure in which the n arameters are modified to minimise e 0 , i.e. the Mean Absolute Error (MAE) betwe alculated and the real image.Since we are interested in predicting not only the next he loss function used in the training phase is obtained as the average MAE error 4 images h 0 = Xt generated by the network with respect to the real precipitation 0 = X t : where MAE is an operator that returns the average absolute error between t The optimal weights of the neural network were obtained using a gradient descent procedure with the Adam optimiser.The neural network model and training procedures were implemented in Python [25] using the PyTorch library [26].The calculation was performed on an NVIDIA Tesla K80 GPU with 12 GB of built-in RAM.The model will be hereafter referred to as NN.
The choice of the number of layers (4) and their dimension (see Table 2) is directly related to the size of the image to be processed (96, 96), and to the scale of weather phenomena that is reasonably conceivable to resolve given the field size and time horizon.The depth of the layers derives from a process of optimization of the achievable performance by limiting the size of the network so that it can be processed by a single GPU card, assuming a possible operational implementation in a small-scale city radar with reduced computational capabilities.

Proposed Methodology: Stochastic PredNet
In order to illustrate the procedure proposed in this paper, it is necessary to understand the way PredNet operates both in the phase preceding the emission of the forecast and in the prediction phase.During the hour preceding the forecast, PredNet 'digests' the 12 images at 5' intervals, starting from zero knowledge at instant zero and progressively reducing the differences with the measured input data, as illustrated in the Figure 3.This initial phase is similar to the data assimilation procedure that is part of the operational chain of a numerical weather service, and the duration of this pre-processing can be considered similar to the assimilation window.We will often refer to this phase in the following as 'assimilation'.itted to Journal Not Specified 4 of 21 The array containing the color index (class) is normalized to a maximum unit value for processing in the neural network and is converted back to the rainfall intensity value during post-processing.The use of rainfall intensity or its processing by a logarithmic transformation proved to be less effective than the direct processing of the color scale index.The data were also filtered by removing all weather events in which the radar images were not complete, either because they were missing or because portions of the image had been lost.

PredNet
The radar image nowcasting problem can be considered a supervised learning problem, in which the parameters of a neural network are optimized to minimize an objective function that depends on the differences between the estimated precipitation image and the actual field.Among various network architectures applied to the weather nowcasting problem, we opted for the PredNet model.This paper is the evolution of a previous study [22] in which we analyzed the performance of the PredNet architecture, highlighting its excellent performance in merely deterministic terms, but also its limitations and the need for a revisiting in probabilistic terms.As specified, the architecture is borrowed from computer vision, likewise the optical flow methodology, it represents the state of the art architecture in the field.It was chosen because of the originality of the implemented approach, and because of its generality.Other more recent approaches have introduced improvements A careful analysis of the Figure 3 through comparison of the measured data, shown in the first column, and that estimated by the PredNet in the hour before the issuing of the nowcast shows that the difference between the 'assimilated' observations and the observations is, at least visually, minimal.A deeper analysis shows that the substantial difference lies in the resolved scales.In fact, as Figure 4 shows, the resolved power spectrum shows a marked loss of power from scales of the order of 16 km.This behaviour is not peculiar to the example shown, but represents a systematic feature for all the events studied and is caused by the minimisation process of the neural network's optimisation function.This weakness allows us, nevertheless, to address the problem of the uncertainty in the assimilation phase.Since the internal representation of the network is unable to resolve scales below a certain limit, we can add to the spectral representation of the fields a stochastic component in this range of scales.This enables for obtaining a set of scenarios, all indistinguishable from the point of view of the scales resolved by the neural network, representing an ensemble of initial conditions for the subsequent prediction phase.
The procedure used to inject stochastic noise at scales not explicitly resolved by the neural network takes inspiration from the dowsncaling procedure called rain f arm [27].
In detail, let us denote by X t and Xt the observed and reconstructed images at n instants preceding the nowcast emission, and by |P(X t ) K | 2 and |P( Xt ) K | 2 the power spectra, as a function of wave number K, averaged over these n events.We then calculate the associated Fourier spectrum, defined minus a random multiplicative component of unitary norm: where Φ K is a uniformly distributed random phase.Applying the inverse Fourier transform, we obtain a random field g(r) defined in the spatial domain, which by construction is Gaussian distributed.After normalisation of its distribution, an exponential transform is applied: to obtain a field that will therefore have a log-normal distribution.We can think about H(r) as the realisation of a random precipitation field with the same spatial correlation as the field X t from which it originates, since, in a weak sense and minus a random phase of unitary norm in the Fourier transform, it has the same power spectrum.reducing the differences with the measured input data , as illustrated in the figure 2. This 174 initial phase is similar to the data assimilation procedure that is part of the operational chain 175 of a numerical weather service, and the duration of this pre-processing can be considered 176 similar to the assimilation window.We will often refer to this phase in the following as 177 'assimilation'.

178
A careful analysis of the figure 2 through comparison of the measured data, shown 179 in the first column, and that estimated by the PredNet in the hour before the issuing of 180 the nowcast shows that the difference between the 'assimilated' observations and the 181 observations is, at least visually, minimal.A deeper analysis shows that the substantial 182 difference lies in the resolved scales.In fact, as the figure 4 shows, the resolved power 183 spectrum shows a marked loss of power from scales of the order of 16 km.This behaviour 184 is not peculiar to the example shown, but represents a systematic feature for all the events 185 studied and is caused by the minimisation process of the neural network's optimisation 186 function.This weakness allows us, nevertheless, to address the problem of the uncertainty 187 in the assimilation phase.Since the internal representation of the network is unable to 188 resolve scales below a certain limit, we can add to the spectral representation of the fields 189 a stochastic component in this range of scales.This enables to obtain a set of scenarios, 190 all indistinguishable from the point of view of the scales resolved by the neural network, 191 representing an ensemble of initial conditions for the subsequent prediction phase.In red, the power spectrum for the 12 images at 5' intervals of the precipitation field in the hour preceding the nowcast; in green, the power spectrum of the PredNet estimate for the image preceding the nowcast; in blue, the spectra of the field obtained by downscaling the NN output field using the procedure described in the Section 2.3.
Applying the same procedure starting from the average spectrum derived from the reconstructed images Xt and denoting by R(r) the random field obtained by using the same random phases Φ used for the generation of H(r), we define the field: m(r) represents a dimensionless random field with spatial correlation similar to that of the fields from which it is derived, always bounded because R(r) > 0, with approximately unitary values proportional to the ratio between a random version of the field with the same level of details of measured data X t and the corresponding lower resolution field Xt .This field, used as a multiplier for the output of the neural network output Xt , is able to inject random noise at scales not resolved by the network and with the correct spatial correlation.The effect of the multiplicative process in the 'assimilation' phase can be evaluated by the comparison of the power spectra evaluated before (green lines) and after downscaling (blue lines) and those of the measured data (red lines), in Figure 4, and visually by comparison of the 3 rows of 12 fields observed, "assimilated" by NN and downscaled in Figure 3.
During the extrapolation stage, the network proceeds recursively by predicting the next step based on the previous radar images, with 12 time steps of 5' to make a one-hour forecast, and using also the forecasts at the previous instants as input for all steps after the first.
As a result, the problem of unresolved scales becomes more and more apparent, as shown in Figure 5, where the green lines for the spectra of the predicted field at the 12 time steps show a progressive loss of spatial detail that spreads eventually over all scales.To overcome this issue, we devised to operate in the same way as for the assimilation step, dividing the extrapolation process into 5' time steps and adding at each step a stochastic component coherent with the scales not explicitly resolved by PredNet via a multiplicative process with coefficients m(r) defined in Equation (5).In this procedure, the only free parameter is the number n of timesteps used to compute the average power spectra |P(X t ) K | 2 .Taking n = 1, i.e., without averaging, one will generate fields with a high but unrealistic level of detail.Conversely, excessively large values of n will have the effect of making the downscaling procedure ineffective.Empirically, it has been found to be optimal choosing n = 3.This value is used both in the definition of the perturbation scenarios (assimilation) and in the downscaling phase during the forecast.The effect of the multiplicative process on the unperturbed fields can be visually appreciated by comparison with the fields in the three groups of stamps (obs, NN and m0 to m9) in Figure 6 (see the caption for explanatory details).
Version October 24, 2022 submitted to Journal Not Specified 9 of 21 In blue the power spectrum for the nowcast of 10 perturbed members.

3.
we impose that the mean value of the ensemble members is equal to the deterministic unperturbed prediction of PredNet (em_adjust precedure), and that the Cumulative Distribution Function (cd f _adjust precedure)) of the predicted field is equal to that of the last observed field 4.
starting from step 2, the procedure is iterated for all 12 time steps of 5' each, and for each ensemble member, obtaining NN, the one-hour forecast for the chosen number M of scenarios.In conclusion and, schematically, the proposed procedure can be summarised as follows (see pseudocode representation in Algorithm 1): 1.
A set of equally probable scenarios S is generated in the hour before the forecast is issued, injecting into the neural network's estimate D of the measured data a noise component consistent with the difference in spatial detail between the measured data and those "assimilated" by the network itself, using the multipliers N defined in Equation ( 5); 2.
With the same method, at each nowcast time step f ahead of 5', a stochastic component, consistent with the uncertainty evaluated under similar conditions in the period immediately preceding its emission, is added to the prediction; 3.
We impose that the mean value of the ensemble members is equal to the deterministic unperturbed prediction of PredNet (em_adjust precedure), and that the Cumulative Distribution Function (cd f _adjust precedure) of the predicted field is equal to that of the last observed field; 4.
Starting from step 2, the procedure is iterated for all 12 time steps of 5' each, and for each ensemble member, obtaining NN, the one-hour forecast for the chosen number M of scenarios.The constraint of the ensemble mean, at step 3 of the described procedure, binds the ensemble mean solution to remain identical to the deterministic prediction.Without this constraint, each of the previously determined scenarios, with its associated stochastic component and thus uncertainty, can evolve independently of the others and can thus in principle diverge, and in any case be completely different, from the optimal solution obtained from the unperturbed deterministic forecast (see an example in the second row of images in Figure 6).Theoretical support of the proposed methodology rests on the conjecture that the deterministic forecast, which we have already verified, in [22], gives better results than any other state-of-the-art procedure in terms of RMSE, indirectly proving that the solution can be considered optimal on average.By constructing around the deterministic solution an ensemble that has a variance (spread) that on average is directly proportional to the RMSE of the deterministic forecast (which we have assumed to coincide with the ensemble mean), we are implicitly verifying the plausibility of the initial conjecture.This is precisely what is discussed in Section 3.1 of the article.

The Benchmark Nowcast Technique: STEPS
One of the most effective approaches that can provide realistic and high-quality predictions that is widely used in operational contexts is provided by the method called STEPS [8].See also [10], where more recent developments are also discussed.
STEPS is available in the public domain through a Python library called pySTEPS [28].pySTEPS is used by a large community, and in addition to being used as an operational tool for nowcasting, it is used as a development, benchmarking, and study platform for implementing new techniques.For the estimation of the advection field, pySTEPS implements three methodologies: a local type based on the classical Lucas-Kanade (LK) technique [29,30], a global variational type (VET) [31,32], and a spectral type (DARTS) [33].Extrapolation by advection is performed with a backward-in-time semi-Lagrangian scheme [32].The most sophisticated technique in the pySTEPS library, used here as a benchmark, is called STEPS and takes into account the dependence of the predictability of precipitation phenomena for different scales.In practice, using an approach called SPROG [9], after a logarithmic transformation, a decomposition of the initial field into a multiplicative cascade of fields at decreasing spatial scales is performed, and each of these spatial fields is advanced with an advection obtained consistently during the period preceding the nowcast.Field decomposition at different scales can be achieved using techniques of varying degrees of complexity, such as the discrete cosine transform or wavelets.In this case, an FFT was used in which the different portions of the spectrum are weighted by Gaussian functions [11] and then, via inverse FFT, returned to the spatial domain.The additional uncertainty due to the fact that growth, decay, creation, and/or ultimate disappearance of precipitating cells may occur during the period of validity of the prediction, along with uncertainties related to the different spatial scales, have been taken into account through stochastic noise built on the spatial and temporal variability, observed in the period prior to the emission, which is added to each partial field.
For implementation details, see [28] and the related bibliography.This approach makes it viable to obtain not a single deterministic prediction, but a set of equally probable predictions (ensemble) and thus also to quantify their spatio-temporal uncertainty.The results obtained with this methodology will be indicated with STEPS in the following.

Results
The performances of the proposed method were evaluated for the year 2017.Data of 2015 and 2016 were used for training and validation of the proposed NN method.The performance indicators were evaluated against the precipitation value, at the time of forecast verification, estimated by the radar itself and expressed in mm/h.Of course, they do not represent a direct measure of error with respect to actual rainfall.The implicit assumption is that systematic errors in rainfall estimation will be offset by equivalent biases in the forecast.Ultimately, the error in the radar forecast is expressed in units transformed to mm/h.
The indicators used to verify the global quality of the probabilistic forecast are the Continuous Ranked Probability Score (CRPS) and the Rank Histogram (RH), and the class of indicators related to exceeding a fixed precipitation threshold: the Reliability Diagram (RD), the Receiver Operating Characteristic (ROC) curve, and the associated Area Under the Curve (AUC) [34].
The performance study was performed on about 2500 precipitation one-hour long events (equivalent to more than 100 rainy days) in 2017 and corresponding NN and STEPS nowcasts were compared systematically to verify the effectiveness of the proposed methodology.Here, we define as an event, any occurrence of rain that happened during the year that allows the nowcast procedure to be used.In practice, this means that, in the three images before the forecast, the surface area of the domain covered by precipitation (Wet Area Ratio) is at least 10%.A single "event" is one hour long (the maximum nowcast time), and it is non overlapping with the following one.This choice incorporates into the performance evaluation also events of weak intensity that are usually discarded in the reviews of similar benchmark studies found in the literature.The analysis of a full year's worth of data allows the effectiveness of the proposed approach to be verified in the most statistically robust way, leaving it to further study to optimize the learning process for specific high-intensity events.The usefulness of the nowcast of precipitation using radar imagery is related to its use in operational contexts to estimate, for example, hydrologic risk following intense precipitation events.From this point of view, rather than the forecast at a given horizon of the instantaneous rain rate, one is interested in the cumulative value of precipitation.For this reason, verification was made on the cumulative forecast precipitation in the first hour of the forecast and only in some cases, as specified in the text, for that accumulated in the first half hour.For ease of reading, we will divide the discussion of the results into specific sections, one for each of the indicators mentioned.We begin by first discussing a key characteristic that an ensemble must possess in order to describe the uncertainty of the particular event being predicted: it must be able to adjust its variability between members (spread) in such a way to be proportional to the forecast error.This attribute of an ensemble is known as the spread-skill relationship, and ultimately represents an a posteriori verification of the plausibility of the assumptions made to construct our ensemble.

Spread-Skill
In [35], Palmer showed that a perfect ensemble must possess on average the characteristic that its spread, evaluated as the square root of the ensemble variance, must be identical to the root mean square error (RMSE) with respect to the ensemble mean.Figure 7 shows the scatter plot of the two variables for all the test cases and the 12 times of forecast and the correlation coefficient between the two time series.As can be seen, the spread is well-correlated with the RMSE.A more careful analysis shows that its values are slightly but systematically lower, especially for higher RMSEs (see also average values in the legend of the figure).This means that on average the NN ensemble is slightly underdispersive.This issue, which is also common to the STEPS ensemble, is nonetheless of minor importance because, as we will see from the analysis conducted later on the rank histograms in Section 3.3, the ensemble's overall characteristics in terms of spread are very good.

CRPS
The CRPS is a generalization of the mean absolute error for a deterministic forecast.For a continuous variable the instantaneous, CRPS t is defined as the quadratic difference between the forecast CDF and the empirical CDF of the observed value.When the forecast CDF is known only through an ensemble, different estimators of the instantaneous CRPS can be written.Following the indication in [36], we choose to use the expression: where M = 100 is the number of ensemble members, X t is the observed radar image at time t and Xm t the m.th ensemble nowcast image verifying at the same time t.The first type of probabilistic verification was performed by comparison of the CRPS, averaged over all of the domains, for the one-hour accumulated precipitation evaluated for all the events.Results are shown in Figure 8.Despite the completely different approach, it is interesting to note that the two methods have, on average, very similar CRPS values.However, the full-year 2017 average CRPS of NN (0.54 mm/h) is slightly better than that of STEPS (0.56 mm/h) as shown in the legend within the figure.This proves a slightly better overall performance of the probabilistic forecast by NN than by STEPS.It should also be noted that this CRPS value for NN is evaluated for the entire domain, while that for STEPS is clearly limited to the domain of existence of the STEPS forecast only.

Rank Histogram
The observed values should be uniformly distributed among the ensemble members if they have equal probability, as hypothesized.The rank histogram, showing the frequencies at which the verifying observation falls in each of the 101 bins defined by the 100 ensemble member, should then be flat in the ideal case.It is interesting to compare the rank histograms for the two methodologies obtained by accumulating the prediction-observation comparisons over all the events studied so as to obtain an idea of the average spread and thus how reliable are the related ensemble forecasts.Due to the oft-mentioned limitations of the STEPS procedure in relation to the size of the domain covered by the radar, and to highlight the trend of this indicator as a function of forecast time in Figure 9, we first show the rank histogram for NN and STEPS for the value of the cumulative precipitation in the first 30' of the forecast.Since the performance of each of the methods can vary substantially depending on the event, the distribution for the test cases studied using box-plot, and whiskers delimiting the 5% and 95% quantiles, was shown in the figure .From the analysis and comparison of the two figures, it can be seen that, on average, for the all the events the rank for both methods gives indications of good quality for probabilistic prediction even at such a high level of detail in probability space.Careful analysis confirms a slight under-dispersion bias for both methods.STEPS also shows a slight bias with a tendency to overestimate the frequency of the medium high intensity events.the correlation coefficient between the two time series.As can be seen, the spread correlated with the RMSE.A more careful analysis shows that its values are sligh systematically lower, especially for higher RMSE's (see also average values in the le the figure).This means that on average the NN ensemble is slightly underdispersiv issue, which is also common to the STEPS ensemble is nonetheless of minor impo because, as we will see from the analysis done later on the rank histograms in sect the ensemble's overall characteristics in terms of spread are very good.The CRPS is a generalization of the mean absolute error for a deterministic f For a continuous variable the instantaneous CRPS t is defined as the quadratic dif between the forecast CDF and the empirical CDF of the observed value.When the f CDF is known only through an ensemble, different estimators of the instantaneou can be written.Following the indication in [36] we choose to use the expression:  the 5% and 95% quantiles, was shown in the figure .From the analysis and comparison 378 of the two figures, it can be seen that on average for the all the events the rank for both 379 methods gives indications of good quality for probabilistic prediction even at such a high 380 level of detail in probability space.Careful analysis confirms a slight under-dispersion 381 bias for both methods.STEPS shows also a slight bias with a tendency to overestimate the 382 frequency of the medium high intensity events.

388
The evaluation of performance related to the ability to predict the exceeding of given 389 thresholds for the value of cumulative precipitation is also of considerable interest.

Reliability Diagram
The evaluation of performance related to the ability to predict the exceeding of given thresholds for the value of cumulative precipitation is also of considerable interest.Figure 11 shows the Reliability Diagram (RD) for the exceeding of the 0.1 mm/h threshold for cumulative precipitation in the first hour of nowcast, which we can take as an indication of the ability to discriminate rainy areas from non-rainy areas.The shaded areas depict the area where the forecast has predictive value from a probabilistic point of view while the lower subplots show the numerosity with which each of the probability classes is predicted (probability histograms).Comparison with the plots on the left for NN and those on the right for STEPS reveals perhaps a slight prevalence of STEPS that remains very close to the ideal behavior represented by the dashed diagonal.From the respective probability histograms, it is important to note that intermediate probabilities are predicted much less frequently than those for extreme values (note the logarithmic scale for N), which is a clear indication that the ensembles are very selective in assigning high or low probabilities and thus effective (this characteristic is called "sharpness").The same plots are shown in Figure 12 for exceeding gradually increasing precipitation thresholds: 1 mm/h, 5 mm/h, and 20 mm/h.These threshold values can be considered representative for the detection of low, medium and high intensity events.It can be seen that the diagram for NN and the associated probability histogram give a reliable forecast indication on all thresholds considered staying always well within the shaded area.The noisiness of the lines of the diagram for a 20 mm/h threshold is to be related not only to the greater difficulty in predicting such events, which occur with very low probability, but also to the small number of the sample against which the verification can be carried out.Note also the greater sharpness of NN when compared with STEPS, which "tries" to predict these extremely rare events even with a high probability.sharpness of NN when compared with STEPS, which "tries" to predict these extremely events even with an high probability.

ROC curve
A further indication about the quality of the probabilistic forecast comes from ROC curve, that expresses a measure of the ability to characterize the forecast correc

ROC Curve
A further indication about the quality of the probabilistic forecast comes from the ROC curve, which expresses a measure of the ability to characterize the forecast correctly in terms of expected probability.The graph is obtained by plotting the frequencies of forecast success (POD) versus false alarm frequencies (POFD) and can be used, for example, to identify the probability threshold that offers the best trade-off between costs and benefits.Figure 13 shows the plot for NN and STEPS for exceeding the cumulative precipitation threshold of 1 mm/h in the first hour of forecast.As can be seen NN is much closer to the ideal behavior (POD = 1 and POFD = 0) than STEPS, again indicating a better quality of the NN nowcast.An overall measure of forecast quality is expressed by the area under curve (AUC), which in the ideal case should be equal to 1. AUC values less than 0.5 (area under the dashed diagonal) indicate zero probabilistic forecast utility.The procedure has proven to succeed in providing a probabilistic forecast of a q comparable or superior to that obtainable through STEPS even for a radar with a li spatial coverage, like a small urban radar, and thus capable of providing the decision with a powerful forecasting tool to mitigate risks from severe precipitation events.
Summarising the major achievements of this work, are : • we proposed a physical-statistical interpretation of the result obtained from the cation of the PredNet network for the nowcasting of radar images, as the ens The variability of the AUC estimate for STEPS as the rainfall threshold increases, which in theory should be monotonically decreasing, is most likely to be related to the reduction in the nowcast area of STEPS.On the other hand, the NN procedure is designed precisely to circumvent this drawback, and in any case it consistently shows higher AUC values than those of STEPS.

Discussion
This paper is the natural continuation of [22] in which the same PredNet neural network used here was applied to the same data to verify the performance in deterministic terms of the prediction up to one hour of the precipitation field estimated from the images of a weather radar.
The proposed method uses the aforementioned network in a configuration capable of providing a set of forecasts all equally likelihood, through which the uncertainty of the predictor can be characterised in probabilistic terms.
The procedure has proven to succeed in providing a probabilistic forecast of a quality comparable or superior to that obtainable through STEPS even for a radar with a limited spatial coverage, such as a small urban radar, and thus capable of providing the decision maker with a powerful forecasting tool to mitigate risks from severe precipitation events.
Summarising the major achievements of this work, we have the following: • We proposed a physical-statistical interpretation of the result obtained from the application of the PredNet network for the nowcasting of radar images, as the ensemble mean of a probabilistic forecast; • This interpretation was tested by constructing around the unperturbed forecast an ensemble of scenarios obtained by adding noise, with the correct spatial correlation, compatible with the scales not explicitly resolved by the PredNet; • The detailed analysis made on extended data set leads us to conclude that the method we proposed has performance superior or equal to the probabilistic STEPS method, which represents a well recognized benchmark; • The proposed solution, based on a generative model, succeeds at providing an overall good quality probabilistic prediction for the entire domain covered by the radar and thus solves the problem that occurs when implementing a prediction procedure based on optical flow for a radar of limited spatial range; • For a domain of small size, the extrapolation phase of the procedure can be performed with limited computational resources, allowing it to be performed in low-cost edgecomputing devices that can be included in the same radar apparatus.
The main motivation that inspired us to conduct this study was to verify the feasibility of a nowcasting procedure for a rain radar installed in the city of Cagliari (Sardinia, Italy) [37] for which we do not have till now sufficient data to make a specific in-depth study.Of course, the results discussed are not directly exportable to another radar or another geographic area.However, there is also no valid reason to assume that the procedure could be applied with similar success to other radars and other geographic areas by repeating the network learning phase and using the same operating procedure.Of course, in the future, this assumption will have to be explicitly verified for the specific case.
Since training of the network was done using a general "event" choice criteria, it is very likely, therefore, that the forecast performance for intense events could be improved, even significantly, by using a more specific "event" choice criteria during the training phase.
Comparison with multiple alternative methods, which is lacking here, is essential to verify the quality of the results of a study.In this specific case, however, we were interested in the development of an open source solution to the probabilistic nowcasting problem where, to our knowledge, the only one, and moreover unanimously recognized by the meteorological community as one of the best, is indeed STEPS.Another aspect that characterized the proposed method is the capability to be processed by a single GPU card, assuming a possible operational implementation in a city radar operating on a small scale and with reduced computational capabilities.The other probabilistic and neural network-based nowcasting procedures discussed in the introduction are not available, to our knowledge, in the public domain and moreover rely on the use of significant computational resources.
To our knowledge, this operational approach and the underlying interpretive idea are a completely new development and may represent a small step forward from the state of the art both in the field of operational procedures for radar nowcasting, especially for small radars, and in the interpretive aspect of the results obtainable through the use of a generative neural network.

Figure 1 .
Figure 1.The map of Japan in which is highlighted the area that represents the domain covered by the weather radar whose data are used in this study.

Figure 2 .
Figure 2. PredNet: on the left , the general architecture with the Discriminator and Generator blocks; on the right, the details of the two blocks.

Figure 2 .
Figure 2. Sequence of 12 images at 5' intervals of the precipitation field during the hour preceding the nowcast.The first row shows the observed data (obs), the second row the estimated values by the PredNet (NN), the third row one of the scenario fields obtained applying the spatial downscaling process described in the paragraph 2.3 to the corresponding fields of the second row.

Figure 3 .
Figure 3. Sequence of 12 images at 5' intervals of the precipitation field during the hour preceding the nowcast.The first row shows the observed data (obs), the second row the estimated values by the PredNet (NN), and the third row one of the scenario fields obtained applying the spatial downscaling process described in the Section 2.3 to the corresponding fields of the second row. 192

Figure 4 .
Figure 4.In red, the power spectrum for the 12 images at 5' intervals of the precipitation field in the hour preceding the nowcast.In green the power spectrum of the PredNet estimate for the image preceding the nowcast.In blue the spectra of the field obtained by downscaling the NN output field using the procedure described in the paragraph 2.3

Figure 4 .
Figure 4.In red, the power spectrum for the 12 images at 5' intervals of the precipitation field in the hour preceding the nowcast; in green, the power spectrum of the PredNet estimate for the image preceding the nowcast; in blue, the spectra of the field obtained by downscaling the NN output field using the procedure described in the Section 2.3.

Figure 6 .
Figure 6.In red, the power spectrum of the 12 images at 5' intervals of the precipitation field during the nowcast hour.In green the power spectrum related to the nowcast of the unperturbed PredNet.In blue the power spectrum for the nowcast of 10 perturbed members.

Figure 5 .
Figure 5.In red, the power spectrum of the 12 images at 5' intervals of the precipitation field during the nowcast hour; in green, the power spectrum related to the nowcast of the unperturbed PredNet; in blue, the power spectrum for the nowcast of 10 perturbed members.

Figure 6 .
Figure 6.Sequence of 12 images at 5' time intervals of the precipitation field for the nowcast phase.The first row (obs) shows the observed data, the second the deterministic unperturbed nowcast (NN) and the next 10 rows (m0 to m9) the nowcast of 10 perturbed members.

Algorithm 1 :
Pseudocode of the stochastic PredNet algorithm Input : O [−11:0] // 12 radar images of the past hour Output: NN [1:M,1:12] // ensemble of M forecasts, 1 hour long // PredNet 12 time steps deterministic nowcast D [1:12] D [−11:12] = PredNet(O [−11:0] , steps = 12) // for each time step in the past hour an ensemble of M scenarios S is built adding noise N at scales not resolved by the deterministic PredNet nowcast D for f = −11 to 0 do for m = 1 to M do S m, f = D f • N m, f // cdf is adjusted to that of the observed field S m, f = cd f _adjust(S m, f , O f ) // evolve perturbed scenarios S with PredNet for f = 1 to 12 do // evolve each scenario independently for m = 1 to M do NN [m,−11: f ] = PredNet(S [−11: f −1] , steps = 1) for m = 1 to M do // ensemble mean of NN is adjusted to the nowcast D NN m, f = em_adjust(NN m, f , D f ) // noise is added compatible with unresolved scales NN m, f = D f • NN m, f // cdf is adjusted to that of the last observed field NN m, f = cd f _adjust(NN m, f , O 0 ) // last NN is added as predictor to the scenario m S m, f = NN m, f

Figure 7 .
Figure 7. Scatter plot of spatial average of the ensemble mean root mean square error against ensemble spread for all the test cases and the 12 times of forecast.The labels show the correlation coefficient between the two time series and their respective mean values.

Figure 7 .
Figure 7. Scatter plot of spatial average of the ensemble mean root mean square error against ensemble spread for all the test cases and the 12 times of forecast.The labels show the correlation coefficient between the two time series and their respective mean values.

Figure 8 .
Figure 8. Plot of the STEPS against the NN CRPS for the nowcast of the one hour cumulated all the precipitation events of 2017.

Figure 8 .
Figure 8. Plot of the STEPS against the NN CRPS for the nowcast of the one hour cumulated rain for all the precipitation events of 2017.

Figure 10 show
Figure 10 show the cumulative rank histogram for pySTEPS and NN for the prediction of cumulative rainfall in the first hour of the forecast.As can be seen, both methodologies have on average a class distribution that is close to ideal although the rank histogram of NN is slightly more uniform. 383

Figure 9 .
Figure 9. Boxplot of the rank histogram for the 100-member ensemble of half-hourly cumulative precipitation forecasts by the NN method and for the STEPS method for all rainfall events in 2017.The reach of the whiskers beyond the first and third quartiles are set to the 5% and 95% percentiles.

Figure 9 .Figure 10 .
Figure 9. Boxplot of the rank histogram for the 100-member ensemble of half-hourly cumulative precipitation forecasts by the NN method and for the STEPS method for all rainfall events in 2017.The reach of the whiskers beyond the first and third quartiles are set to the 5% and 95% percentiles.
Figure 390 11 shows the Reliability Diagram (RD) for the exceeding of the 0.1 mm/hr threshold for 391 cumulative precipitation in the first hour of nowcast, which we can take as an indication 392 of the ability to discriminate rainy areas from non-rainy areas.The shaded areas depict 393 the area where the forecast has predictive value from a probabilistic point of view while 394 the lower subplots show the numerosity with which each of the probability classes is 395 predicted (probability histograms).Comparison with the plots on the left for NN and 396

Figure 10 .
Figure 10.As for the previous figure for the accumulated precipitation in the first hour.

Figure 11 .
Figure 11.Reliability Diagram for exceeding the cumulative precipitation threshold of 0.1 m in the first half hour of the forecast.On the right is the plot for NN and on the left is the pl STEPS.The lower subplots represent, on a logarithmic scale, the numerosity with which each fo probability class was issued.The gray horizontal line corresponds to the frequency with whi event is observed (climate), while the shaded areas highlight the area where the ensembles have reliability and thus predictive value from a probabilistic point of view.

Figure 11 . 21 Figure 12 .
Figure 11.Reliability Diagram for exceeding the cumulative precipitation threshold of 0.1 mm/h in the first half hour of the forecast.On the right is the plot for NN and on the left is the plot for STEPS.The lower subplots represent, on a logarithmic scale, the numerosity with which each forecast probability class was issued.The gray horizontal line corresponds to the frequency with which the event is observed (climate), while the shaded areas highlight the area where the ensembles have good reliability and thus predictive value from a probabilistic point of view.

Figure 12 .
Figure 12.Reliability diagram for NN (to the right) and STEPS (to the left), for exceeding precipitation thresholds of: 1mm/hr (fisrt row), 5 mm/hr (second row) and 20mm/hr (third row).Figure 12. Reliability diagram for NN (to the right) and STEPS (to the left), for exceeding precipitation thresholds of: 1 mm/h (fisrt row), 5 mm/h (second row) and 20 mm/h (third row).
24, 2022 submitted to Journal Not Specified

Figure 13 .
Figure 13.ROC curve for NN and STEPS for exceeding the cumulative precipitation thresho mm/hr in the first hour of nowcast.

Figure 13 .
Figure 13.ROC curve for NN and STEPS for exceeding the cumulative precipitation threshold of 1 mm/h in the first hour of nowcast.

Table 1 .
Lookup table between the RGB colors of the radar image representation, precipitation intensity, and the numerical value assigned to the class.

Table 2 .
PredNet tensors: size of the tensors for the different levels of the PredNet architecture expressed as (channels, width, height).

Table 2 .
PredNet tensors: size of the tensors for the different levels of the PredNet architecture expressed as (channels, width, height).
Table 3 shows comparisons of AUC values for NN and STEPS for different cumulative thresholds.

Table 3 .
Area under the ROC AUC curve for NN and STEPS for exceeding different cumulative precipitation thresholds in the first hour of the forecast.