A Comparison between One-Step and Two-Step Nesting Strategy in the Dynamical Downscaling of Regional Climate Model COSMO-CLM at 2.2 km Driven by ERA5 Reanalysis

: Recently, the European Centre for Medium Range Weather Forecast (ECMWF) has released a new generation of reanalysis, acknowledged as ERA5, representing at the present the most plausible picture for the current climate. Although ERA5 enhancements, in some cases, its coarse spatial resolution (~31 km) could still discourage a direct use of precipitation ﬁelds. Such a gap could be faced dynamically downscaling ERA5 at convection permitting scale (resolution < 4 km). On this regard, the selection of the most appropriate nesting strategy (direct one-step against nested two-step) represents a pivotal issue for saving time and computational resources. Two questions may be raised within this context: (i) may the dynamical downscaling of ERA5 accurately represents past precipitation patterns? and (ii) at what extent may the direct nesting strategy performances be adequately for this scope? This work addresses these questions evaluating two ERA5-driven experiments at ~2.2 km grid spacing over part of the central Europe, run using the regional climate model COSMO-CLM with different nesting strategies, for the period 2007–2011. Precipitation data are analysed at different temporal and spatial scales with respect to gridded observational datasets (i.e., E-OBS and RADKLIM-RW) and existing reanalysis products (i.e., ERA5-Land and UERRA). The present work demonstrates that the one-step experiment tendentially outperforms the two-step one when there is no spectral nudging, providing results at different spatial and temporal scales in line with the other existing reanalysis products. However, the results can be highly model and event dependent as some different aspects might need to be considered (i.e., the nesting strategies) during the conﬁguration phase of the climate experiments. For this reason, a clear and consolidated recommendation on this topic cannot be stated. Such a level of conﬁdence could be achieved in future works by increasing the number of cities and events analysed. Nevertheless, these promising results represent a starting point for the optimal experimental conﬁguration assessment, in the frame of future climate studies.


Introduction
A proper spatial and temporal characterization of past precipitation regimes represents a pivotal challenge in climate research [1][2][3]. Such a challenge is more relevant when this information is adopted for driving impact models [4][5][6][7][8][9][10][11] and evaluating their performances [12][13][14]. The required accuracy and reliability are however hardly retrievable by in situ observation networks, especially in those areas characterized by a scarce homogeneity and density of observation points.
A first step forward in this view relies on the use of climate reanalyses. A climate reanalysis provides a physically consistent and reliable global reconstruction of past weather, without any gap in space or in time, through the data assimilation of historical observations for both atmospheric and soil variables. Recently, the European Centre for Medium Range Weather Forecast (ECMWF) has released a new generation of reanalysis, acknowledged as ERA5 [15]. Such a new generation of reanalysis provides a picture of the current climate from 1979 onwards at hourly resolution. ERA5 relies on 4D-Var data assimilation by using the Cycle 41r2 of the Integrated Forecasting System (IFS). Compared to the previous generation of reanalysis, acknowledged as ERA-Interim [16] and widely used as a reference by the climate community, ERA5 features an enhancement in spatial and temporal resolution (~31 km compared to the previous~80 km), improving the representation of troposphere and tropical cyclones, with a better global balance of precipitation and evaporation.
Although ERA5 represents nowadays the most plausible description for current climate, its coarse resolution as well as the assumptions made in sub-grid parameterisations could discourage in some cases a direct use of these data as input for impact models, especially referring to precipitation. Such a gap could be partly faced thanks to a dynamical downscaling of ERA5 at convection permitting scale (resolution < 4 km, hereinafter referred also as "very high resolution" or VHR) over specific areas of interest.
Some European projects and initiative (e.g., H2020 European Climate Prediction, EUCP; Coordinated Downscaling Experiment for Flagship Pilot Study at Convection Permitting Scales, CORDEX-FPS CPS) and an increasing number of scientific works [17][18][19][20][21][22][23] have investigated the benefits of VHR simulations, showing their ability in providing a space and time consistent description of past events. In general, these studies have demonstrated as convection-permitting models provide an added value for the evaluation of the sub-daily rainfall characteristics, such as the diurnal cycle and intensity of hourly precipitation extremes [17,18,20,[24][25][26][27][28][29]. An additional improvement is represented by a more accurate characterization of interactions with complex orography [17,24]. Finally, refining the spatial resolution may also enable to use specific parameterizations for modelling cities. It is indeed recognized that urban environments are usually warmer than their surroundings [30]. A proper parameterization of cities could affect the development of precipitation through an easier activation of convection dynamics due to large surface sensible heat fluxes [31].
A comprehensive desk-review has revealed that only few works have been devoted to the downscaling of reanalysis at VHR. Among the most interesting, there are some national activities for smaller regions (e.g., the high-resolution reanalysis system COSMO-REA6 [32]). In addition, advanced experiments in this field are the ERA5-Land reanalysis [33] representing a refined version of ERA5 with a spatial resolution of~9 km, and the UERRA (Uncertainties in Ensembles of Regional Reanalyses) reanalysis [34] over Europe representing a refined version of ERA40/ERA-Interim at~5.5 km.
Moving from a coarser resolution to a finer one, a pivotal issue is the selection of the optimal strategy to nest the finer simulation into the parent analysis. Traditionally, finer simulations are nested in an intermediate step with coarser resolution of the same model to respect a prescribed spatial resolution ratio (usually up to 10:1) [12]. However, recent studies on this topic [35][36][37][38] and experiences [39,40] in the operational chain building-up has investigated the performance of the finer simulation when such an intermediate step is avoided. Specifically, Brisson et al. [36] have investigated on how to reduce computational costs without a reduction of model performance; they have concluded that removing the one nesting step does not significantly influence the representation of precipitation at convection permitting scale. On the same topic, Marsigli et al. [39] have analyzed the effect of the direct nesting approach compared with the one based on the intermediate step, for ensemble forecasting over Italy in terms of precipitation; they highlight a little outperformance of the direct-nesting ensemble for intense precipitation. Matte et al. [37] have concluded that the double nesting approach reduces the effective resolution jump, drastically decreases the effect of spatial spin-up, and allows a reduction of the optimal domain size of the high-resolution simulation, resulting in important computational savings. Finally, Tolle et al. [38] have addressed the influence of the horizontal resolution of the intermediate simulation on extremes achieved with higher resolution at convection-permitting scale, finding that overestimation of precipitation persists in winter although a different horizontal resolution is used in the intermediate simulation.
The direct nesting strategy would be extremely attractive from several viewpoints, especially for saving time and computational resources, already aggravated by running VHR climate simulation. This is possible as long as an adequate level of accuracy and reliability of the prediction is achieved with respect to those carried out with traditional strategies.
Within this framework, a number of questions may be raised about the dynamical downscaling of ERA5 at convection permitting scale: may the dynamical downscaling of ERA5 reproduce past precipitation dynamics reliably and coherently? 2.
at what extent may the direct nesting strategy performances be adequately for the scope in hand?
These questions are addressed in this work by evaluating two ad-hoc climate experiments at~2.2 km driven by ERA5, performed with an optimized configuration of the regional climate model COSMO-CLM [41] switching on the module TERRA-URB tailored for urban environments [42]. The former relies on a one-step nesting strategy, in which the simulation at 2.2 km is directly "one-way nested" in ERA5; the latter on a "two-step nesting strategy", in which the simulation at 2.2 km is one-way nested in a 12 km grid spacing which in turn is one-way nested in ERA5.
Part of the central Europe covering the period 2007-2011 are investigated in both experiments. The evaluation is carried out at different spatial and temporal scales by comparing COSMO-CLM results with those provided by gridded observational datasets such as E-OBS [43] and the Radar-based Precipitation Climatology with Gauge-adjusted one-hour precipitation sum (RADKLIM-RW) [44] and existing reanalysis products such as ERA5-Land and UERRA.
This downscaling exercise represents a preliminary test whose ambition is to define the nesting strategy for a more comprehensive ERA5 downscaling at~2.2 km over Europe for the period 1989-2018. For this reason, computational domain and investigated period have been assumed as a trade-off between computational effort and scope of the work.
The study firstly describes the climate experiments ( § 2.1) and the datasets ( § 2.2) considered to evaluate VHR enhancements at different spatial and temporal scales as well as the methodology used for the evaluation at each scale ( § 2.3). Then, it presents ( §3) and discusses ( §4) the main results for the several scales investigated ( § 3.1, 3.2, 3.3) focusing on the differences between the investigated nesting strategies.

Climate Experiments
ERA5 is dynamically downscaled in this study at 0.02 • (~2.2 km) over part of the Central Europe (Lon = 3 • W-10.5 • E; Lat = 47.5 • N-53.5 • N) with the regional climate model COSMO-CLM [41] switching on the module TERRA-URB for accounting the urban parameterizations [42].
Two experiments are performed to test different nesting strategies ( Figure 1). The former, labelled as "CCLM002-Direct", adopts a one-step nesting strategy, in which the simulation at 2.2 km is directly "one-way nested" in ERA5 (1:15 resolution jump); the latter, labelled as "CCLM002-Nest", relies on a traditional "two-step nesting strategy", in which the simulation at 2.2 km is one-way nested in a 12 km grid spacing which in turn is one-way nested in ERA5 (1:3:6 resolution jump).
Both CCLM002 experiments share parameterizations (Table 1), computational domain ( Figure 2) and investigated period (2007-2011 with 2006 as spin-up). This period can be considered as long enough for sensitivity tests [45].  The optimized COSMO-DE setup is adopted for both CCLM002 experiments. It results from the protocol established in the frame of the Coordinated Downscaling Experiment (CORDEX) [46,47] of the World Climate Research Programme (WCRP) for the Flagship Pilot Study (FPS) on convection [19] focusing on the investigation of convective-scale events in a few key regions of Europe and the Mediterranean basin with convection-permitting regional climate models. Regarding the intermediate simulation (Table 1), it is run with COSMO-CLM over the common domain ( Figure 2) investigated within the European branches of the CORDEX, acknowledged as EUROCORDEX [47]. The same setup as in EUROCORDEX is adopted for COSMO-CLM at 0.11°. All the configurations reported in Table 1 do not include the spectral nudging.  Both CCLM002 experiments share parameterizations (Table 1), computational domain ( Figure 2) and investigated period (2007-2011 with 2006 as spin-up). This period can be considered as long enough for sensitivity tests [45]. Pilot Study (FPS) on convection [19] focusing on the investigation of convective-scale 159 events in a few key regions of Europe and the Mediterranean basin with convection-per-160 mitting regional climate models. Regarding the intermediate simulation (Table 1), it is run 161 with COSMO-CLM over the common domain ( Figure 2) investigated within the European 162 branches of the CORDEX, acknowledged as EUROCORDEX [47]. The same setup as in 163 EUROCORDEX is adopted for COSMO-CLM at 0.11°. All the configurations reported in 164 Table 1 do not include the spectral nudging. 165  The optimized COSMO-DE setup is adopted for both CCLM002 experiments. It results from the protocol established in the frame of the Coordinated Downscaling Experiment (CORDEX) [46,47] of the World Climate Research Programme (WCRP) for the Flagship Pilot Study (FPS) on convection [19] focusing on the investigation of convective-scale events in a few key regions of Europe and the Mediterranean basin with convection-permitting regional climate models. Regarding the intermediate simulation (Table 1), it is run with COSMO-CLM over the common domain ( Figure 2) investigated within the European branches of the CORDEX, acknowledged as EUROCORDEX [47]. The same setup as in EUROCORDEX is adopted for COSMO-CLM at 0.11 • . All the configurations reported in Table 1 do not include the spectral nudging.  Ritter and Geleyn [48] Ritter and Geleyn [48] Ritter and Geleyn [48] Convection scheme Deep and shallow convection based on Tiedtke [49] Shallow convection based on Tiedtke [49] Shallow convection based on Tiedtke [49] Microphysics • E-OBS [43,54]: it is a daily gridded land-only observational dataset over Europe at a horizontal resolution of 0.1 • (~11 km). It contains data for precipitation amount, mean/maximum/minimum temperature, sea level pressure, and surface shortwave downwelling radiation. Its latest version (v.21) delivered by Copernicus Climate Data Store covers the period 1950-2019. As general information, the E-OBS relies on the "blended" time series from the station network of the European Climate Assessment & Dataset (ECA&D) project. It is calculated following a two-stage process to derive the daily field and the uncertainty in these daily estimates. The limitations due to the interpolation method are the underestimation (typically 10-20%) of high intensities (smoothing effect) and overestimation at low intensities (moist extension into dry areas), while systematic errors are more substantial for convective rainfall [17,55]. • RADKLIM-RW [44]: it is a radar-based dataset for Germany (region of 1100 km × 900 km), available at the DWD Open Data Portal, at a horizontal resolution of 1 km. It provides hourly precipitation adjusted to rain gauge measurements. RADKLIM-RW represents a reanalysed and temporally extended version of RADOLAN-RW. It relies on consistent processing techniques, new correction algorithms (e.g., for distanceand height-dependent signal reduction and for spokes) and more rain gauges for adjustment. The dataset currently covers the period of 2001 to 2017.

Existing Reanalysis Dataset
In addition to the observational datasets, two reanalyses available on the Copernicus Climate Datastore (CDS) are considered for the evaluation of the climate experiments: • ERA5-Land [33]: it is an hourly land-only ERA5-driven reanalysis. It gives a consistent view of the land variables evolution from 1981 onwards at an enhanced horizontal resolution (~9 km) compared to ERA5. ERA5-Land is essentially an offline simulation of the ERA5 surface scheme with improved forcing, making it computationally affordable for relatively quick updates. Despite its resolution is enhanced with respect to ERA5, ERA5-Land does not derive from a dynamical downscaling, then precipitation should not be much improved. • UERRA (MESCAN-SURFEX option) [34,56]: it is a reanalysis at~5.5 km providing estimations of the climate in Europe from 1961 to 2019 at 00, 06, 12, and 18 UTC. It descends from the UERRA-HARMONIE, a reanalysis (~11 km) based on a 3-D data assimilation system assuming along the lateral borders data from ERA40 for the years before 1979, and ERA-Interim for the years until 2019. Operatively, it combines the UERRA-HARMONIE with the MESCAN system and the land surface platform SURFEX to derive daily accumulated precipitation. To this aim, additional surface observations are considered.

Levels of Analysis
To inspect a multitude of features and potentialities, CCLM002 experiments are investigated at different spatial and temporal scale.
Specifically, three levels of analysis are defined:   Precipitation data are processed considering a selection of indicators (see Table 2) based on the recommendations coming from the Expert Team on Climate Change Detection and Indices (ETCCDI) list [57]. These indicators traditionally assume 30-years as a reference period. Such a period reliably accounts for the intrinsic inter-annual variability, reducing the effect of external forcing that may induce statistically significant trends and thus undermine the homogeneity of the data. In this view, they merely provide in this study an indication on the climate experiments reliability and coherence, and not a specific climate characterization of the investigated area due to the reduced reference period (2007-2011).

Statistical Tools
The predictive skills of CCLM002 experiments are assessed with respect to observations (i.e., E-OBS or RADKLIM) by using as statistical tools the normalized Taylor diagram [58], the distribution added value (DAV) index [59] and the Kling-Gupta Efficiency (KGE) index [60]. These tools are also adopted for deriving the performances of CCLM experiments with respect to ERA5-Land and UERRA.
The Taylor diagram quantifies and displays the degree of correspondence between a variable simulated by a model and its observed counterpart according to three statistics: the Pearson correlation coefficient ρ, the root-mean-square deviation (RMSD) E', and the normalized standard deviation σ. These three statistics are related as in the following: where obs and m refer to observation and model, respectively. DAV provides an objective and normalized measure of the added value in terms of potential gain in the performance of climate models due to the usage of a higher resolution, comparing higher-and coarser-resolution simulation probability density function (PDFs) to the observational PDF. In this perspective, DAV accounts for the difference in Perkins skill scores between high resolution (subscript hr) and low resolution (subscript lr) assuming the observations (subscript obs) as reference: where S hr and S lr are the Perkins skill score for high and low resolution, respectively; n represents the number of bin considered to obtain the PDF; Z hr , Z lr and Z obs are the frequencies of values in a given bin for high resolution, low resolution and observations, respectively. DAV allows estimating the benefit associated with a higher resolution. Specifically, DAV = 0 indicates that no gain is found; DAV < 0 points out a loss associated with the usage of a higher resolution; DAV > 0 expresses the beneficial impact of increasing the grid spacing.
It should be emphasized that Taylor diagram statistics and DAV are quantified considering data on a common grid. To this aim, all datasets are interpolated onto the coarser grid (i.e., E-OBS grid).
KGE is a goodness-of-fit measure, traditionally adopted for an objective evaluation of runoff model performance. In general, this index evaluates the performance of a model timeseries (subscript m) with respect to an observed one (subscript obs): where ρ is the Pearson correlation coefficient data while σ and µ represent, respectively, the standard deviation and the mean for model m and observation obs. KGE = 1 indicates a perfect agreement between observed and simulated data; KGE < −0.41 indicates that model data underperform the mean of observed data [61].

First Level of Analysis: Evaluation at Areal Scale
In the first level of analysis, the CCLM002 experiments are evaluated with respect to the E-OBS observational dataset and to the existing reanalysis products (i.e., ERA5-Land and UERRA) in terms of spatial distribution of PRCPTOT, RR1, and R95p. Operatively, PRCPTOT and RR1 are calculated on a yearly base and then averaged over 2007-2011, while R95p is derived as a statistic over the whole investigated period. Figures 4-6 show the spatial distribution of PRCPTOT, RR1, and R95p with the relative normalized Taylor diagrams. In these representations, the maps are plotted considering all the grid points belonging to the evaluation domain in their native resolution (4089, 4089, 11484, and 69360 grid points for E-OBS, ERA5-Land, UERRA, and CCLM002 both direct and nested, respectively). Such an approach should penalize more a coarser resolved model (featuring a smaller spatial variability per construction); however, it aims at highlighting the actual spatial variability at a finer scale. Regarding the statistics for Taylor diagrams, they are instead obtained by processing data interpolated onto a common grid (i.e., the E-OBS grid).    (Figure 5f), all the datasets feature higher variability than E-OBS. In terms of correlation, UERRA and ERA5-Land (correlation =~0.8) slightly outperforms CCLM002 experiments (correlation =~0.7).
Moving to R95p (Figure 6), E-OBS (Figure 6a) returns values with a spatial average of 15.4 mm/day and a standard deviation of 2.11 mm/day. A generalized increase in R95p is detected from coarser to finer resolution with the spatial averages varying between 14.2 mm/day for ERA5-Land ( Figure 6b) and 17.5 mm/day for CCLM002-Direct (Figure 6e). If ERA5-Land overestimates PRCPTOT, it underestimates R95p across the evaluation domain (Figure 6b) with respect to E-OBS (Figure 6a) with a reduced variability (standard deviation = 1.17 mm/day). By looking at the Taylor diagram (Figure 6f), the variability increases with the refinement of resolution. In terms of correlation, UERRA and CCLM002-Direct present a higher correlation (~0.7 and~0.6, respectively) with respect to ERA5-Land and CCLM002-Nest (~0.5). Finally, for the root-mean-square deviation RMSD, it varies between 0.5 and 1 for CCLM002-Direct, UERRA and ERA5-Land while it is > 1 for CCLM002-Nest.
To objectively rank the performances of CCLM experiments with respect to the other reanalysis products, the Probability Distribution Function (Figure 7) of PRCPTOT, RR1, and R95p are elaborated in terms of DAV score (Equation (2)) and reported in Table 3. For this elaboration, the same interpolated data as for Taylor diagrams are considered. The DAV is computed for each indicator by first comparing ERA5-Land, taken as lr, to CCLM002 experiments, taken as hr, and then comparing UERRA, assumed as lr, to CCLM002 experiments, assumed as hr. Moving to R95p (Figure 6), E-OBS (Figure 6a) returns values with a spatial average of 15.4 mm/day and a standard deviation of 2.11 mm/day. A generalized increase in R95p is detected from coarser to finer resolution with the spatial averages varying between 14.2 mm/day for ERA5-Land ( Figure 6b) and 17.5 mm/day for CCLM002-Direct (Figure 6e). If ERA5-Land overestimates PRCPTOT, it underestimates R95p across the evaluation domain ( Figure 6b) with respect to E-OBS ( Figure 6a) with a reduced variability (standard deviation = 1.17 mm/day). By looking at the Taylor diagram (Figure 6f), the variability increases with the refinement of resolution. In terms of correlation, UERRA and CCLM002-Direct present a higher correlation (~ 0.7 and ~ 0.6, respectively) with respect to ERA5-Land and CCLM002-Nest (~ 0.5). Finally, for the root-mean-square deviation RMSD, it varies between 0.5 and 1 for CCLM002-Direct, UERRA and ERA5-Land while it is > 1 for CCLM002-Nest.
To objectively rank the performances of CCLM experiments with respect to the other reanalysis products, the Probability Distribution Function (Figure 7) of PRCPTOT, RR1, and R95p are elaborated in terms of DAV score (Equation (2)) and reported in Table 3. For this elaboration, the same interpolated data as for Taylor diagrams are considered. The DAV is computed for each indicator by first comparing ERA5-Land, taken as lr, to CCLM002 experiments, taken as hr, and then comparing UERRA, assumed as lr, to CCLM002 experiments, assumed as hr.    Moving from ERA5-Land (about 9 km of resolution) to CCLM002 (about 2.2 km of resolution), both investigated nesting strategies point out an added value. In general, CCLM002-Nest outperforms CCLM002-Direct for RR1 (~+19 % against~+4 %) and R95p (~+28 % against~+4 %), whereas an opposite behaviour is returned in terms of PRCPTOT (~+29 % against~+38 %).

Second Level of Analysis: Evaluation at City Scale
of Paris and Cologne (Table 4).   From an operational viewpoint, the mean annual cycles of monthly precipitation and 390 monthly number of wet days are averaged over the city domain (Figure 3b) to obtain a 391 single time series. It is noteworthy to remark that E-OBS surely represents a consolidated 392 reference for Germany cities as they are widely covered by the ECA&D network station; 393 the same is not for the French cities for which the available observed data are more limited 394 [62,63]. 395 In this perspective, the uncertainty in observational dataset is highlighted by com-396 paring E-OBS data with high resolution observations data (i.e., SAFRAN [64] for Paris and 397 RAKDLIM-RW for Cologne). Such a comparison confirms the reliability of E-OBS for a 398 Germany city (e.g., Cologne) while slight differences arise for a French city (i.e., Paris). 399 To objectively assess the performances of the different datasets, the Kling-Gupta Ef-400 ficiency (KGE) index is calculated for monthly PRCPTOT and monthly RR1 over Paris and 401 Cologne, assuming E-OBS as reference, and reported in Table 5. 402   From an operational viewpoint, the mean annual cycles of monthly precipitation and monthly number of wet days are averaged over the city domain (Figure 3b) to obtain a single time series. It is noteworthy to remark that E-OBS surely represents a consolidated reference for Germany cities as they are widely covered by the ECA&D network station; the same is not for the French cities for which the available observed data are more limited [62,63].
In this perspective, the uncertainty in observational dataset is highlighted by comparing E-OBS data with high resolution observations data (i.e., SAFRAN [64] for Paris and RAKDLIM-RW for Cologne). Such a comparison confirms the reliability of E-OBS for a Germany city (e.g., Cologne) while slight differences arise for a French city (i.e., Paris).
To objectively assess the performances of the different datasets, the Kling-Gupta Efficiency (KGE) index is calculated for monthly PRCPTOT and monthly RR1 over Paris and Cologne, assuming E-OBS as reference, and reported in Table 5. Regarding the city of Paris (Figure 8), E-OBS monthly precipitation (Figure 8a) varies between 20 and 60 mm/month with maximum value in August. In terms of monthly number of wet days (Figure 8b), the evolution follows the same temporal pattern of monthly precipitation with values ranging between 6 and 14 number of wet days for month. In this case, the maximum value is recorded in November. Compared to E-OBS, CCLM002-Direct lowers monthly precipitation (Figure 8a) in the summer period (June, July, and August); on the other side, CCLM002-Nest reduces monthly precipitation (Figure 8a) in the summer period, returning a remarkable peak in November (~80 mm/day). In terms of number of monthly wet days (Figure 8b), both CCLM002 experiments show substantially lower values in the summer and in the early fall (values between 6 and 8 number of wet days for month). Furthermore, in this case, the CCLM002-Nest shows a peak in November. Regarding the other reanalyses, UERRA provides the more consistent evolution with respect to E-OBS as for monthly precipitation (Figure 8a) as for monthly number of wet days (Figure 8b). Finally, ERA5-Land overestimates both indicators; specifically, the overestimation is slight for monthly precipitation (Figure 8a) in June, November, and December, while it is remarkable in terms of monthly number of wet days (Figure 8b) from March to July.
Moving to the city of Cologne (Figure 9), E-OBS monthly precipitation (Figure 9a) ranges between 20 and 120 mm/month with the peak recorded in August. In terms of monthly number of wet days (Figure 9b), the values vary between 4 and 16 number of wet days for month, with maximum values in July and December. This time, CCLM002-Direct is able to correctly reproduce the observed evolution as for monthly precipitation (Figure 9a) as for monthly number of wet days (Figure 9b). Such an improvement is detectable in terms of KGE (Table 5,~0.91 for monthly PRCPTOT and~0.87 for monthly RR1). On the other side, CCLM002-Nest fails in detect precipitation in July and August; this is reflected in lower values of KGE (~0.52 for monthly PRCPTOT). Regarding the other reanalyses, UERRA (KGE =~0.95 for monthly PRCPTOT and~0.89 for monthly RR1) provide evolution in line with E-OBS and CCLM002-Direct; on the other side, ERA5-Land once again overestimates the monthly precipitation (except for the summer period) and the number of monthly wet days, returning a reduced performance with respect to CCLM002-Direct.

Third Level of Analysis: Evaluation at sCale of Event
The last section focuses on the evaluation of the experiments' accuracy in spatializing and amounting different extremes precipitation events occurred during the investigated time span. To this aim, instead of E-OBS, the RADKLIM-RW dataset is assumed as reference for its finer spatial resolution. Such a dataset covers partially the evaluation domain ( Figure 3) providing data for Germany and some surrounding areas.
Two summer events (labelled as "Event 1" and "Event 2") featuring different dynamics are investigated: the former (Figure 10 (Figure 11a). Both events are analysed as performed in Coppola et al. [19] identifying a region of maximum precipitation as indicated by observations (i.e., specific area of interest), and calculating the hourly accumulated precipitation averaged over each box (Figures 12 and 13 for the Event 1 and Event 2, respectively). Such an evaluation strategy is adopted to assess the timing and intensity of the events. To support the analysis, the KGE scores are computed and reported in Table 6 for hourly precipitation.  The Event 1 (Figure 10) involves the north-eastern part of the domain (Figure 10a) with total accumulated precipitation ranging between 35 mm and 95 mm. The spatial dis-  The Event 1 (Figure 10) involves the north-eastern part of the domain (Figure 10a) with total accumulated precipitation ranging between 35 mm and 95 mm. The spatial distribution of this event is well recognized by CCLM002-Direct (Figure 10e, spatial correla- remark that all the time-series start from 06:00 UTC to ensure a consistency between 480 UERRA and the other datasets, as well as that UERRA is reported as a step plot since it 481 provides only daily information. The performances of datasets for the Event 1 are also synthetically examined by using 493 the KGE index for hourly data ( Table 6). The KGE indications reflect those returned by 494 Figure  formances in this sense. In general, UERRA seems also to be the only dataset capable of 504 reaching the observed peaks. 505 By looking at the selected specific area of interest (Figure 11a), Figure 13 plots the 506 evolutions of hourly accumulated precipitation returned by each dataset. In terms of rep-507 resentation, the same remarks as for Figure 12 are valid. 508

511
This time, both ERA5-Land and CCLM002-Direct anticipate the accumulation, re-512 turning an underestimation in terms of total accumulated precipitation with respect to 513 RADKLIM-RW. Conversely, UERRA overestimates the observed total accumulated pre-514 cipitation (~90 mm for UERRA against ~70 mm for RADKLIM-RW). Finally, CCLM002-515 Nest is completely unsuitable in reproducing such an event. 516 By using also for the Events 2 the KGE index to synthetically highlighted the perfor-517 mances of datasets at hourly scale (   The Event 1 (Figure 10) involves the north-eastern part of the domain (Figure 10a) with total accumulated precipitation ranging between 35 mm and 95 mm. The spatial distribution of this event is well recognized by CCLM002-Direct (Figure 10e, spatial correlation = 0.56), ERA-Land (Figure 10b, spatial correlation = 0.68) and UERRA (Figure 10c, spatial correlation = 0.85), even if ERA5-Land returns lower values of total accumulated precipitation. Finally, CCLM002-Nest (Figure 10d) fails in spatializing this event (spatial correlation = −0.05).
By looking at the selected specific area of interest ( Figure 10), Figure 12 plots the evolutions of hourly accumulated precipitation returned by each dataset. It is noteworthy to remark that all the time-series start from 06:00 UTC to ensure a consistency between UERRA and the other datasets, as well as that UERRA is reported as a step plot since it provides only daily information.
CCLM002-Direct well detects timing and intensity of the event. On the contrary, ERA5-Land identifies the observed timing, but it fails in terms of intensity returning an underestimation with respect to RADKLIM-RW. Regarding CCLM002-Nest, it fails both in timing and in intensity: it delays the activation of precipitation and underestimates the total accumulated amount in comparison with RADKLIM-RW. Finally, UERRA cannot be evaluated in terms of timing while it slightly overestimates the observed total accumulated precipitation.
The performances of datasets for the Event 1 are also synthetically examined by using the KGE index for hourly data ( Table 6). The KGE indications reflect those returned by Figure 12: CCLM002-Direct outperforms the other datasets (KGE~0.95). Conversely, CCLM002-Nest completely fails with KGE < 0.20. It is worth noting that for UERRA only a daily KGE estimate is possible (KGE~0.83) and that such a value would hint at a good performance of this dataset.
Regarding the Event 2 ( Figure 11), it involves the northern part of the domain (Figure 11a) with total accumulated precipitation up to 160-170 mm. The spatial distribution of this event is well recognized by UERRA (Figure 11c, spatial correlation = 0.86), while ERA-Land (Figure 11b, spatial correlation = 0.26), CCLM002-Direct (Figure 11e, spatial correlation = 0.13), and CCLM002-Nest (Figure 11d, spatial correlation = 0.05) provide lower performances in this sense. In general, UERRA seems also to be the only dataset capable of reaching the observed peaks.
By looking at the selected specific area of interest (Figure 11a), Figure 13 plots the evolutions of hourly accumulated precipitation returned by each dataset. In terms of representation, the same remarks as for Figure 12 are valid.
This time, both ERA5-Land and CCLM002-Direct anticipate the accumulation, returning an underestimation in terms of total accumulated precipitation with respect to RADKLIM-RW. Conversely, UERRA overestimates the observed total accumulated precipitation (~90 mm for UERRA against~70 mm for RADKLIM-RW). Finally, CCLM002-Nest is completely unsuitable in reproducing such an event.
By using also for the Events 2 the KGE index to synthetically highlighted the performances of datasets at hourly scale (Table 6) ERA5-Land and CCLM002-Direct return comparable performances according to this score index, while CCLM002-Nest returns even negative values of KGE. Furthermore, in this case, for UERRA only a daily KGE estimate is possible (KGE~0.88); such a value highlights how UERRA generally outperforms the other datasets for the event in hand.

Discussion and Conclusions
Recent projects and activities aimed at enhancing the horizontal resolution of Regional Climate Models (~1 to 3 km) is stimulating an open debate on the redefinition of simulation protocols. This topic is becoming relevant as at present the end-users request data more and more tailored for adaptation purposes and capable of reassessing past precipitation events up to the city scale. In this view, an issue is represented by the selection of the optimal nesting strategy for the dynamical downscaling of reanalysis.
Multiple dynamical downscaling might offer beneficial perspectives to achieve highresolution regional climate simulations, although its use may have an impact on the quality of simulations. On one hand the nesting techniques are responsible for an important part of RCM bias [36], on the other since the integration scale of global models (e.g., GCMs, reanalysis) largely differ from convection permitting scale, a multiple nesting strategy is required to carry out such simulations. Moreover, do additional nesting steps tend to increase the computational cost. It is therefore of interest to investigate the impact of difference multiple nesting strategies on model performance.
This works tries to contribute on this issue by evaluating two downscaling experiments of ERA5 at the convection permitting scale (~2.2 km). These experiments differ for nesting strategies, testing a traditional two-step against a direct one-way nesting strategy. They have been evaluated in terms of precipitation at different spatial and temporal scales in comparison with observational datasets (i.e., E-OBS and RADKLIM-RW) and existing reanalysis products (i.e., ERA5-Land and UERRA).
Although the limited investigated period (2007-2011) and spatial coverage (part of the central Europe), the results of this work provide some preliminary insights for a more comprehensive ERA5 downscaling activity.
The two-step nested experiment (i.e., CCLM002-Nest) shows similar capabilities than the one-step nested experiment (i.e., CCLM002-Direct) when it is evaluated for climate statistics (e.g., PRCPTOT, RR1, and R95p in §3.1). Specifically, according to the Taylor diagram statistics and DAV evaluation, it is shown that CCLM002-Direct returns enhanced performances for PRCPTOT and deteriorated performances for RR1 with respect to CCLM002-Nest. On the other side, the same level of confidence cannot be achieved for R95p as the Taylor diagram statistics suggest that CCLM002-Direct outperforms CCLM002-Nest while the opposite is carried out in terms of DAV. An added value in general arises for both CCLM002 experiments with respect to ERA5-Land (Table 3); such an improvement vanishes when CCLM002 are compared to UERRA (Table 3). Indeed, UERRA seems to be in this work the optimal reference as it directly includes observations through a data assimilation procedure; however, this is not a general rule as it strictly depends on the investigated variables and spatial domains [65].
The reliability of CCLM002-Nest decreases when it is evaluated at the city ( §3.2) and event scale ( §3.3). In these cases, CCLM002-Direct outperforms CCLM002-Nest. Specifically, at the city scale, it provides coherent and reliable results capturing trend and peaks of monthly precipitation amounts and monthly number of wet days, also in comparison with ERA5-Land and UERRA (Table 5). On the other hand, at the event scale, CCLM002-Direct is able to recognize timing and intensity of the rainfall events (Figures 12 and 13), while CCLM002-Nest completely fails in their detection.
Such a different behaviour is in line with previous works. Specifically, CCLM002-Nest is driven at the Lateral Boundary Condition (LBC) by a freely evolving (i.e., not nudged) intermediate simulation (i.e., CCLM011), which allows internal variability to develop [18]. For this reason, events at the meteorological scale ( §3.3) and over a very limited area of interest ( §3.2) could not be correlated with the ERA5 reanalysis. CCLM002-Nest has then a low probability to properly reproduce the right timing and intensity of localized heavy events. Such a tendency seems to be attenuated when CCLM002 is directly nested into ERA5. It is noticeable to stress that the results can be highly model and event dependent [19] and then the number of cities and events should be increased to obtain a comprehensive evaluation on this topic. For this reason, despite the dynamical downscaling of ERA5 at 2km seems to be able to reproduce past precipitation dynamics reliably and coherently, a clear recommendation cannot be stated about at what extent may the direct nesting strategy performances be adequately for the scope in hand. In this perspective, this work, based on ERA5 reanalysis, is in line with previous experiences on this topic [36][37][38][39], showing anyway a neutral or improved performance of the finer simulation when such an intermediate step is avoided.
Then, in general, it seems that the selection of the most appropriate nesting strategy depends mainly on the goals for which the data are produced (i.e., climate statistics and event-based analysis). This should be taken in mind when a new downscaling activity targeted to past climate is scheduled. Definitively, this work should be regarded as the first step of a deeper analysis, where more cases will be considered, especially including a large number of investigated cities and areas of analysis and a longer period of simulations.