Investigating the WRF Temperature and Precipitation Performance Sensitivity to Spatial Resolution over Central Europe

The grid size resolution effect on the annual and seasonal simulated mean, maximum and minimum daily temperatures and precipitation is assessed using the Advanced Research Weather Research and Forecasting model (ARW-WRF, hereafter WRF) that dynamically downscales the National Centers for Environmental Prediction’s final (NCEP FNL) Operational Global Analysis data. Simulations were conducted over central Europe for the year 2015 using 36, 12 and 4 km grid resolutions. Evaluation is done using daily E-OBS data. Several performance metrics and the bias adjusted equitable threat score (BAETS) for precipitation are used. Results show that model performance for mean, maximum and minimum temperature improves when increasing the spatial resolution from 36 to 12 km, with no significant added value when further increasing it to 4 km. Model performance for precipitation is slightly worsened when increasing the spatial resolution from 36 to 12 km while further increasing it to 4 km has minor effect. However, simulated and observed precipitation data are in quite good agreement in areas with precipitation rates below 3 mm/day for all three grid resolutions. The annual mean fraction of observed and/or forecast events that were correctly predicted (BAETS), when increasing the grid size resolution from 36 to 12 and 4 km, suggests a slight modification on average over the domain. During summer the model presents significantly lower BAETS skill score compared to the rest of the seasons.


Introduction
Earth system models (ESMs) and climate circulation models (GCMs) are still the principal tools of the scientific community for projecting future climate [1,2]. Nonetheless, both are incapable of simulating local scales, as they currently resolve resolutions of approximately 100 km or coarser, while there are many important climate phenomena that occur at spatial scales of less than 10 km (e.g., convective cloud processes, turbulence, wind patterns over complex terrain, sea breeze effects, etc.). In addition, ESMs and GCMs do not satisfactorily represent vegetation variability, complex topography and coastlines, which are significant components of the physical system that govern the climate change signal on a local or regional scale. To cope with these deficiencies, dynamical downscaling techniques have been developed and are currently adopted, for effectively adapting the large-scale projections of the inferred climate components provided by an ESM or a GCM to regional or local scales, through explicitly solving the process-based physical dynamics of the regional climate system at high spatial resolution, when driven by the large-scale low-resolution data of the ESM/GCM [3,4].
Regional climate models (RCMs) are among the most effective tools for dynamically downscaling global climate projections to local scales [5], but their ability to reproduce current climate conditions needs to be evaluated, in the first place, against observations before being used for such studies. This exercise allows the identification of potential inherent drawbacks related to the assumptions being made by the setup of the RCM, the parameterizations used and their associated uncertainties, as well as an extensive evaluation of the RCM ability to reproduce significant climate features over the domain of interest. Two approaches can be adopted in order to assess RCM's ability to reproduce current climate, either the use of GCM/ESM data or the use of reanalysis data as initial and boundary conditions of the RCM (e.g., [6][7][8]). GCM/ESM data are obtained using current greenhouse gases (GHG) forcings. The first approach is not flawless, given that GCMs are not forced by observed data, therefore possible systematic errors developed by a GCM/ESM may propagate into the RCM outputs. As a result, the added value of an RCM used for downscaling purposes might be diminished. Still, such simulations are very useful, as they allow climate projections and assessments for various climate change scenarios on finer resolutions than those resolved by the GCMs/ESMs. On the other hand, reanalysis data are the most accurate representation of the archived climate observations at high temporal and spatial resolution forced by climatic observation, therefore they are extensively used for evaluation purposes of the current climate.
Among the climatic parameters assessed by RCMs, temperature and precipitation are two parameters of crucial relevance for our societal life and the ecosystems. Being able to correctly capture or project their temporal, spatial and quantitative distribution is of high importance. However, their simulation is still very challenging given the wide range of processes involved. Increasing the spatial resolution resolved by RCMs can, to some extent, address the deficiencies in correctly capturing their temporal, spatial and quantitative distribution when downscaling techniques are applied. Multi-nesting approaches in downscaling procedures can add more detail in the assessments, ensuring at the same time that the outputs at the finer scales resolved by the nests of the RCM are dynamically consistent with the large-scale flows. However, the use of several consecutive nests is both a computationally demanding process and does not assure a perfect replicability, degradation or improvement by the RCM outputs. As a result, a number of studies have been conducted assessing the spatial resolution effect, when multi-nesting approaches are used, on models' performances.
Long term studies have been conducted over Europe, reaching up to 12 km grid-point distance. Vautard et al. [9] examined heatwave prediction within the EURO-CORDEX project at 50 and 12 km resolutions using a multi-model ensemble. They found that there is no significant improvement in maximum temperature prediction, especially in mountainous regions. Kotlarski et al. [10], within the same framework, examined air temperature and precipitation at grid resolutions of 50 and 12 km and could not find clear benefit by the increase in the grid spacing. This was also the conclusion of Jaeger et al. [11] for temperature and precipitation within the ENSEMBLES project at 0.44 • and 0.22 • spatial resolutions as well as van Roosmalen et al. [12] for Denmark. However, Heikkilä et al. [13] found noteworthy added value comparing 30 and 10 km simulations over Norway.
A number of studies [14][15][16] focus only on precipitation sensitivity with respect to various spatial resolutions and regions. Their results address the estimated biases which do not seem to be clearly improved. Giorgi and Marinucci [14] found that precipitation amount, intensity, and distribution depend on the grid size. Leung and Qian [15] found that the increase of the resolution (from 40 to 13 km) did not cause a uniform improvement of precipitation assessments over complex terrains. Li et al. [16] found that increasing the horizontal resolution from 30 to 10 km improved the forecasting ability for precipitation. Rauscher et al. [17] in the framework of the ENSEMBLES project found that both patterns and temporal evolution of precipitation during summer are improved when decreasing grid spacing from 50 to 25 km. Chan et al. [18] also found added value in capturing precipitation events in topographically complex regions as a result of decreasing grid spacing. Precipitation spatial pattern representation is improved but only a small or not significant improvement can be detected for mean biases along coastlines [7]. Prein et al. [19] examined the representation of mean and extreme precipitation within the EURO-CORDEX project at 0.44 • and 0.11 • resolutions and found that increased resolution adds value, while in regions with complex terrain (e.g., Alps, the Carpathian) added value in precipitation biases tends to cancel out by averaging. Similar were the findings of Torma et al. [20] within the EURO-CORDEX framework that found improvement in the spatial representation and the extremes of precipitation at finer grid resolutions.
Despite the large number of studies examining the effect of the spatial resolution on models' performances, the question of up to what spatial scales downscaling global data to local scales can actually improve local representation of temperature and precipitation and whether very fine resolutions (below 10 km) are necessary for their improved representation still remains inadequately addressed. Addressing this challenge, in this study we assess the Advanced Research Weather Research and Forecasting model (ARW-WRF, hereafter WRF) [21,22] temperature and precipitation performance sensitivity to the grid size resolution. Performance sensitivity is examined for WRF downscaled simulated mean, maximum and minimum daily temperatures as well as precipitation and we compare the model's predictions against daily data from the E-OBS data base [23], in order to draw conclusions on the added value of their representation in higher grid size resolution scales. The grid size resolutions selected here, i.e., 36, 12, and 4 km, extend beyond the typical size of 0.11 • (~12 km) used in previous studies (e.g., [24][25][26]). The selected domain is extended over central Europe, due to the significant number of observational data, for assessing both annual and seasonal impact.

Modeling Setup
The Weather Research and Forecasting (WRF) model [21,22] version 3.9.1 is used to simulate meteorological variables. WRF is one of the most widely used Regional Climate Models (RCMs) for downscaling global data to regional scales. It is a mesoscale numerical weather prediction system used for reproducing local weather and climate at high spatial resolutions. It has extensively been used for climate and meteorological applications over Europe (e.g., [25][26][27]).
In this study, the parent-coarse model domain is centered at (49 • N, 10.5 • E) and consists of 50 grid cells east and north with a grid cell size of 36 km. The two nested domains have 100 and 250 grid cells in the west-east and south-north direction with grid cell sizes of 12 and 4 km, respectively. The finer nested domain covers the central European region (Figure 1). The nests are one-way interactive to avoid feedback of the inner to the outer domains so that the results represent the resolution effect only. In the vertical direction, the model used 40 layers. The NCEP FNL (Final) Operational Model Global Tropospheric Analyses data at 1-degree resolution are used as the single initial and lateral boundary conditions for the parent domain, while the latter is updated every 6 hours throughout the model simulations. The modeling setup is similar to the one used in the WRF EURO-CORDEX framework [10,25] for all grid resolutions, employing the WSM-5 microphysics scheme, the RRTMG radiation scheme, the YSU PBL scheme, and the NOAH land surface scheme. Simulations cover the period July 2014 to December 2015, with a 6 month period being used as a spin-up time, allowing a more realistic development of snow cover [28].

Observational Data
Comparison between predicted and observed values for mean, maximum, minimum temperatures and precipitation is performed for the year 2015 using daily data from the E-OBS dataset [ [23,29]. The E-OBS data are based on the European Climate Assessment and Dataset (ECA&D) project station observation data (https://www.ecad.eu/download/ ensembles/download.php#datafiles (accessed on 5 February 2021)) that covers the entire European domain. The E-OBS dataset has extensively been used in the past for comparison studies over Europe (e.g., [10,26,[30][31][32][33][34][35]). To evaluate model performance, results were compared with the ensemble mean of the regular 0.25 • grid version of the E-OBS v20.0e Atmosphere 2021, 12, 278 4 of 17 observational dataset. Therefore, the E-OBS grid of 0.25-degree is used as a reference upon which all WRF domain grids are interpolated. After interpolating model-derived temperature, and precipitation to the E-OBS grid within the investigation area, i.e., D3 domain, mean, maximum, minimum temperatures, and precipitation were calculated. One could argue that the specific database is too coarse to compare against the finer domain's outputs, however, as pointed out by Prein et al, [19] and Fantini et al, [36], it is anticipated that if processes are captured better at higher resolution, improvements are still visible when regridded to coarser resolution. As a result, in order to assure a fair intercomparison among the three grid resolutions, we chose to regrid variables to the grid of E-OBS.

Observational Data
Comparison between predicted and observed values for mean, maximum, minimum temperatures and precipitation is performed for the year 2015 using daily data from the E-OBS dataset [ [23,29]. The E-OBS data are based on the European Climate Assessment and Dataset (ECA&D) project station observation data (https://www.ecad.eu/download/ensembles/download.php#datafiles) that covers the entire European domain. The E-OBS dataset has extensively been used in the past for comparison studies over Europe (e.g., [10,26,[30][31][32][33][34][35]). To evaluate model performance, results were compared with the ensemble mean of the regular 0.25° grid version of the E-OBS v20.0e observational dataset. Therefore, the E-OBS grid of 0.25-degree is used as a reference upon which all WRF domain grids are interpolated. After interpolating model-derived temperature, and precipitation to the E-OBS grid within the investigation area, i.e., D3 domain, mean, maximum, minimum temperatures, and precipitation were calculated. One could argue that the specific database is too coarse to compare against the finer domain's outputs, however, as pointed out by Prein et al, [19] and Fantini et al, [36], it is anticipated that if processes are captured better at higher resolution, improvements are still visible when regridded to coarser resolution. As a result, in order to assure a fair intercomparison among the three grid resolutions, we chose to regrid variables to the grid of E-OBS. Despite the extensive use of the E-OBS database, there are some known shortfalls related to the spatial coverage of its network stations and the quality of the data where sparse density of stations exist, affecting the magnitude of daily extremes in temperature (e.g., [32,[37][38][39][40][41]) and possibly the total precipitation that is underpredicted [41], especially in mountainous and snow-covered regions [42]. However, given that E-OBS has a dense station network with good temporal coverage over central Europe, it has been selected for our study, as its known inefficiencies will not affect the comparison with our simulated data.

Performance Metrics
Mean bias (MB), separated into positive and negative biases for avoiding any misleading results due to counterbalancing of positive and negative values, mean absolute error (MAE), root mean square error (RMSE) and the index of agreement (IoA) ( Table 1) are the statistical indices used in order to assess the impact of grid size resolution on the model's simulated outputs for temperature and precipitation. These metrics are widely used and simply reproducible allowing in a rigorous way the assessment of the model performance. The statistical analysis is based on the daily values for each individual model grid cell assessing both annual and seasonal impacts. In Table 1, X predicted and X observed stand for the daily gridded predicted and observed values, with n being the total number of grid points, while overbars denote mean values.
In addition, the bias adjusted equitable threat score (BAETS) [43] for precipitation is used in order to assess how well the forecast "yes" correspond to the observed "yes" events. Table 2 presents the 2 × 2 contingency table in the form required for this analysis, used in verifying dichotomous forecasts.
The BAETS is given by the formula: where H A is the bias adjusted number of hits (H), where lambertw stands for the Lambert W-function or omega function, F denotes the forecast event (correctly forecast area or "hits" plus the "false alarms"), O denotes the observed area and N denotes the total number of verification points or events. BAETS has a value between − 1 / 3 and 1, with 0 indicating no skill and 1 is the perfect score.

Mean Temperature
The model reproduces the observed annual domain mean temperature over all three grid size resolutions used in this study. As can be seen in Table 3, there is an average underestimation of the observed mean temperature of 0.13 • C for the 36 km grid size domain (D1), of 0.12 • C on average for the 12 km grid size domain (D2), and of 0.10 • C on average for the 4 km grid size domain (D3). An overestimation is found mainly over the north and northeast region of the domain and an underestimation over the southern part of the domain. This finding, i.e., the cold bias at the north and the warm bias at the south part of Europe has also been stated in other RCM studies [10,26]. The highest positive and negative differences are found in regions characterized by complex orographic features, e.g., the Alps and northern Italy ( Figure 2). This trend, as well as the spatial pattern, do not change, in general, with the increase in the spatial resolution. However, model performance is better when decreasing the grid spacing from 36 (D1) to 12 km (D2) but no significant change is found when the spatial resolution is further increased (i.e., 4 km (D3)).

Mean Temperature
The model reproduces the observed annual domain mean temperature over all three grid size resolutions used in this study. As can be seen in Table 3, there is an average underestimation of the observed mean temperature of 0.13 °C for the 36 km grid size domain (D1), of 0.12 °C on average for the 12 km grid size domain (D2), and of 0.10 °C on average for the 4 km grid size domain (D3). An overestimation is found mainly over the north and northeast region of the domain and an underestimation over the southern part of the domain. This finding, i.e., the cold bias at the north and the warm bias at the south part of Europe has also been stated in other RCM studies [10,26]. The highest positive and negative differences are found in regions characterized by complex orographic features, e.g., the Alps and northern Italy ( Figure 2). This trend, as well as the spatial pattern, do not change, in general, with the increase in the spatial resolution. However, model performance is better when decreasing the grid spacing from 36 (D1) to 12 km (D2) but no significant change is found when the spatial resolution is further increased (i.e., 4 km (D3)). This is also supported by the domain wide average values of the statistical measures for D1-D3 (Table 3). These measures show statistically significant improvement in the biases, the RMSE and the MAE when increasing the spatial resolution from 36 to 12 km, while a minor improvement is seen with further increase of the resolution to 4 km. From a statistical point of view, this implies that simulations with a grid size of 12 km might be This is also supported by the domain wide average values of the statistical measures for D1-D3 (Table 3). These measures show statistically significant improvement in the biases, the RMSE and the MAE when increasing the spatial resolution from 36 to 12 km, while a minor improvement is seen with further increase of the resolution to 4 km. From a statistical point of view, this implies that simulations with a grid size of 12 km might be adequate to describe annual temperature trends, derived from daily data, over large regions, avoiding computationally demanding simulation with fine grid spacing. Comparing the grid data between domains D1 against D2 and D2 against D3 (Table 3, columns ∆ ij (D2-D1) and ∆ ij (D3-D2)) it is clear that there is a statistically significant change on the average values of biases and RMSE between D1 and D2, and a minor change between D2 and D3 with the reduction of the grid resolution. The better closure between D2 and D3 grid data implies that there is no clear statistical evidence of improvement when downscaling data to the finer grid resolution used here (i.e., 4 km). Investigating the climatological variability of annual average temperatures in combination with the grid size effect, we compare the grid biases, i.e., simulated minus observed daily values for each grid, for domains D1 against D2 ( Figure 3a) and D2 against D3 (Figure 3b). The D2 simulation tends to reduce biases compared to D1 with slightly higher temperatures (below the diagonal). The biases between the D2 and D3 simulations are very similar (mostly fall on the diagonal) and smaller in range than the D1 simulation ( Figure 3). As a result, the improvement is higher for D2 compared to D1, with no significant added value being seen for D3 compared to D2. adequate to describe annual temperature trends, derived from daily data, over large regions, avoiding computationally demanding simulation with fine grid spacing. Comparing the grid data between domains D1 against D2 and D2 against D3 (Table 3, columns Δij(D2-D1) and Δij(D3-D2)) it is clear that there is a statistically significant change on the average values of biases and RMSE between D1 and D2, and a minor change between D2 and D3 with the reduction of the grid resolution. The better closure between D2 and D3 grid data implies that there is no clear statistical evidence of improvement when downscaling data to the finer grid resolution used here (i.e., 4 km). Investigating the climatological variability of annual average temperatures in combination with the grid size effect, we compare the grid biases, i.e., simulated minus observed daily values for each grid, for domains D1 against D2 ( Figure 3a) and D2 against D3 (Figure 3b). The D2 simulation tends to reduce biases compared to D1 with slightly higher temperatures (below the diagonal). The biases between the D2 and D3 simulations are very similar (mostly fall on the diagonal) and smaller in range than the D1 simulation ( Figure 3). As a result, the improvement is higher for D2 compared to D1, with no significant added value being seen for D3 compared to D2. In addition, the spatial error variability, defined as the difference between the first (25th) and third (75th) quartile, is derived for the three domain resolutions. Results show improvement in the spatial error variability (0.85 °C for D1, 0.70 °C for D2 and 0.68 °C for D3) with the reduction of the grid resolution. Although a clear improvement is seen when comparing D1 against D2, there is no clear evidence of improvement when comparing D2 In addition, the spatial error variability, defined as the difference between the first (25th) and third (75th) quartile, is derived for the three domain resolutions. Results show improvement in the spatial error variability (0.85 • C for D1, 0.70 • C for D2 and 0.68 • C for D3) with the reduction of the grid resolution. Although a clear improvement is seen when comparing D1 against D2, there is no clear evidence of improvement when comparing D2 against D3. These findings also support the conclusion derived from the statistical analysis that simulations with a grid size resolution of 12 km are sufficient for describing annual temperature trends over large domains.
The seasonal mean temperature plots present a similar spatial pattern between the three domains for each season ( Figures S1-S4  the model, for all grid resolutions, overestimates mean temperature over a major part of the domain and underestimates it mainly over the northwest and central parts of Italy ( Figure S1). Increasing the spatial resolution from 36 (D1) to 4 km (D3) improves the statistical metrics (Table S1), suggesting that the finer domain better represents autumn mean temperature. This is related to both the positive and negative MBs which are improved when moving from the coarser to the finer resolution. During winter the model, in all three grid resolutions, underestimates mean temperature mainly over northwest Italy and the region over the Alps ( Figure S2). Increasing the spatial resolution from 36 (D1) to 12 km (D2) leads to improved statistical measures (Table S2) but no significant change is found when the spatial resolution is further increased (i.e., 4 km (D3)). During spring and summer, the model, in all grid resolutions, underestimates mean temperature in most parts of the domain with an exception of the north-northeast region (Figures S3 and S4). Increasing the spatial resolution from 36 (D1) to 12 km (D2) leads to improved statistical measures (Tables S3 and S4) but no significant change is found when the spatial resolution is further increased (i.e., 4 km (D3)).

Maximum Temperature
The model underestimates annual maximum temperature in a major part of the domain. The highest differences are found in the Alps region ( Figure 4). This trend, as well as the spatial pattern, do not change with the increase of the spatial resolution. However, model performance is better when increasing the spatial resolution from 36 (D1) to 12 km (D2), while a minor change is found when the spatial resolution is further increased (i.e., 4 km (D3)). This is also supported by the domain wide average values of the statistical metrics (Table 4).

MAED1
MAED2 MAED3  The seasonal maximum temperature plots present a similar spatial pattern between the three domains for each season ( Figures S5-S8 of Supplementary Material). During autumn, the model overestimates maximum temperature over the domain with the exception of the Alps region ( Figure S5). Increasing the spatial resolution from 36 (D1) to 4 km (D3) improves the statistical measures (Table S5) suggesting that the finer domain better represents autumn maximum temperature. However, the improvement between 36 (D1) and 12 km (D2) is more important compared to the improvement between 12 (D2) and 4 km (D3). During winter, the model underestimates maximum temperature in major part of the domain ( Figure S6). Underestimation is mainly noted over Italy, the Eastern Alps and most part of Switzerland. Increasing the spatial resolution from 36 (D1) to 12 km (D2) improves the statistical measures (Table S6) but no significant change is found when the spatial resolution is further increased (i.e., 4 km (D3)). During spring and summer, the model underestimates maximum temperature in most parts of the domain (Figures S7 and S8). Increasing the spatial resolution from 36 (D1) to 12 km (D2) improves statistical measures (Tables S3 and S4) but no significant change is found when the spatial resolution is further increased (i.e., 4 km (D3)).

Minimum Temperature
The model underestimates annual minimum temperature in major part of the domain for all three grid resolutions used. The highest differences are found over northern Italy and the Alps region ( Figure 5). This trend, as well as the spatial pattern, do not change with the increase of the spatial resolution. Model performance does not change in the higher spatial resolution grids (i.e., 12 (D2) and 4 km (D3)) compared to the 36 km (D1) domain, except for the positive bias that is improved on the finer nested domain (Table 5).

Minimum Temperature
The model underestimates annual minimum temperature in major part of the domain for all three grid resolutions used. The highest differences are found over northern Italy and the Alps region ( Figure 5). This trend, as well as the spatial pattern, do not change with the increase of the spatial resolution. Model performance does not change in the higher spatial resolution grids (i.e., 12 (D2) and 4 km (D3)) compared to the 36 km (D1) domain, except for the positive bias that is improved on the finer nested domain (Table 5).   The seasonal minimum temperature plots present a similar spatial pattern between the three domains for each season (Figures S9-S12 of Supplementary Material). For all seasons, mean predicted values are lower than the observed ones, with domain D1 presenting a better closure with observations compared to D2 and D3, mainly as a result of the gradual increase in the negative bias when moving from D1 to D2 and D3. This finding might be related to the stronger negative bias in precipitation (Tables S13-S16) when moving to finer grid resolutions that leads to gradually larger evaporative cooling. During autumn the model overestimates min temperature over the eastern border of the domain and north-east Italy ( Figure S9). Increasing the spatial resolution does not improve the statistical measures except for the positive bias (Table S9). During winter the model underestimates min temperature mainly over north-west Italy and the Alps region while there is a mixed trend for the rest of the domain ( Figure S10). Increasing the spatial resolution from 36 (D1) to 4 km (D3) causes a slight improvement on the positive bias but the rest of the statistical measures do not improve (Table S10). During spring and summer, the model underestimates min temperature in most part of the domain (Figures S11 and S12). Increasing the spatial resolution from 36 (D1) to 4 km (D3) does not improve the statistical measures except for the positive bias (Table S1).

Precipitation
The model overestimates annual precipitation in major part of the domain except the regions at the west and south east borders ( Figure 6). Simulated and observed data are in quite good agreement in areas with precipitation rates below 3mm/day with the model being able to represent the precipitation range within a ±25% accuracy. However, for high-precipitation areas such as the alpine and mountainous regions, differences are quite high (up to 2.5 mm/day overestimation by the model) that are probably related to known E-OBS deficiencies in properly capturing the correct range of precipitation in regions with sparse and uneven station coverage. This trend as well as the spatial pattern do not change with the increase of the spatial resolution. The statistical analysis suggests that model performance is slightly better when increasing the spatial resolution from 36 (D1) to 12 km (D2) while further increasing spatial resolution to 4 km (D1) has a negligible effect (Table 6). Minor improvements in models' performances on a daily, seasonal or annual basis when increasing the grid spacing to convection permitting simulations have also been found by other studies as well (e.g., [18,24,44,45]) suggesting that sub-daily timeframes need to be considered in such cases for improvements to be seen.  The statistical analysis suggests that model performance is slightly better when increasing the spatial resolution from 36 (D1) to 12 km (D2) while further increasing spatial resolution to 4 km (D1) has a negligible effect (Table 6). Minor improvements in models' performances on a daily, seasonal or annual basis when increasing the grid spacing to convection permitting simulations have also been found by other studies as well (e.g., [18,24,44,45]) suggesting that sub-daily timeframes need to be considered in such cases for improvements to be seen.
Investigating the grid size effect, we compare the biases, i.e., simulated minus observed daily values, for domains D1 against D2 (Figure 7a) and D2 against D3 (Figure 7b). The biases between the D2 and D3 simulations are very similar (mostly fall on the diagonal) with slightly lower precipitation rates for D3 domain (below the diagonal) and smaller in range than the D1 simulation (Figure 7a). As a result, the projection improvement is higher for D2 compared to D1, with no significant added value being seen for D3 compared to D2. In addition, the spatial error variability derived for the three domain resolutions shows only a minor improvement in the spatial error variability (0.58 mm/d for D1, 0.56 mm/d for D2 and 0.55 mm/d for D3) with the reduction of the grid resolution. Investigating the grid size effect, we compare the biases, i.e., simulated minus observed daily values, for domains D1 against D2 (Figure 7a) and D2 against D3 (Figure 7b). The biases between the D2 and D3 simulations are very similar (mostly fall on the diagonal) with slightly lower precipitation rates for D3 domain (below the diagonal) and smaller in range than the D1 simulation (Figure 7a). As a result, the projection improvement is higher for D2 compared to D1, with no significant added value being seen for D3 compared to D2. In addition, the spatial error variability derived for the three domain resolutions shows only a minor improvement in the spatial error variability (0.58 mm/d for D1, 0.56 mm/d for D2 and 0.55 mm/d for D3) with the reduction of the grid resolution. The seasonal precipitation plots present a similar spatial trend between the three domains for each season (Figures S13-S16 of Supplementary Material). During autumn, the model underestimates precipitation mainly at the north-west and south-east part of the domain ( Figure S13) in all resolutions considered, while the statistical analysis suggests negligible effect of the grid size on the results (Table S13). During winter the model underestimates precipitation mainly at the north-west part of the domain ( Figure S14). The statistical analysis suggests that model performance is negligibly affected when increasing the spatial resolution from 36 (D1) to 12 km (D2), while further increasing to 4 km (D3) spatial resolution does not improve statistics (Table S14). During spring, the model overestimates precipitation at major part of the domain ( Figure S15). The statistical analysis suggests that increasing the spatial resolution from 36 (D1) to 12 km (D2) as well as from 12 (D2) to 4 km (D3) does not modify significantly model performance. During summer, the model overestimates precipitation at major parts of the domain with the largest values over the high elevated regions of northern Italy ( Figure S16). This positive bias is caused by the cumulus parameterization of the model that overestimates convective precipitation. The statistical analysis suggests that model performance is worsened when increasing the spatial resolution from 36 (D1) to 12 km (D2) while further increasing to 4 km (D1) spatial resolution slightly improves statistics compared to 12 km (D2) ( Table S16).
Increasing the spatial resolution from 36 (D1) to 4 km (D3) suggests that annual mean BAETS, which measures the fraction of observed and/or forecast events that were correctly predicted, is slightly affected on average over the domain (Table 7), presenting a mixed trend spatially with small changes, apart from a few cells across the domain (Fig-Figure 7. Scatter plots presenting (a) D1 (x-axis) against D2 (y-axis) and (b) D2 (x-axis) against D3 (y-axis) simulated annual precipitation biases for all grid cells considered.
The seasonal precipitation plots present a similar spatial trend between the three domains for each season (Figures S13-S16 of Supplementary Material). During autumn, the model underestimates precipitation mainly at the north-west and south-east part of the domain ( Figure S13) in all resolutions considered, while the statistical analysis suggests negligible effect of the grid size on the results (Table S13). During winter the model underestimates precipitation mainly at the north-west part of the domain ( Figure S14). The statistical analysis suggests that model performance is negligibly affected when increasing the spatial resolution from 36 (D1) to 12 km (D2), while further increasing to 4 km (D3) spatial resolution does not improve statistics (Table S14). During spring, the model overestimates precipitation at major part of the domain ( Figure S15). The statistical analysis suggests that increasing the spatial resolution from 36 (D1) to 12 km (D2) as well as from 12 (D2) to 4 km (D3) does not modify significantly model performance. During summer, the model overestimates precipitation at major parts of the domain with the largest values over the high elevated regions of northern Italy ( Figure S16). This positive bias is caused by the cumulus parameterization of the model that overestimates convective precipitation. The statistical analysis suggests that model performance is worsened when increasing the spatial resolution from 36 (D1) to 12 km (D2) while further increasing to 4 km (D1) spatial resolution slightly improves statistics compared to 12 km (D2) (Table S16).
Increasing the spatial resolution from 36 (D1) to 4 km (D3) suggests that annual mean BAETS, which measures the fraction of observed and/or forecast events that were correctly predicted, is slightly affected on average over the domain (Table 7), presenting a mixed trend spatially with small changes, apart from a few cells across the domain (Figures 8 and 9). The number of these cells is higher when increasing the spatial resolution from 36 (D1) to 12 km (D2), where a negative impact is dominant among these cells, compared to the impact when increasing the spatial resolution from 12 (D2) to 4 km (D3). The same trend with annual BAETS analysis is also found in the seasonal BAETS analysis (Figures S17 and S18, Table S17). However, during summer, the model presents significantly lower BAETS skill score compared to the rest of the seasons.
Atmosphere 2021, 12, x FOR PEER REVIEW 14 of 18 resolution from 36 (D1) to 12 km (D2), where a negative impact is dominant among these cells, compared to the impact when increasing the spatial resolution from 12 (D2) to 4 km (D3). The same trend with annual BAETS analysis is also found in the seasonal BAETS analysis ( Figure S17 and S18, Table S17). However, during summer, the model presents significantly lower BAETS skill score compared to the rest of the seasons.

Conclusions
WRF performance over central Europe for mean and maximum temperature, both annually and seasonally, is better when increasing the spatial resolution from 36 to 12 km, while a minor change is found when the grid resolution is further increased to 4 km as shown by the statistical analysis performed in this study. The exception is the maximum temperature during autumn, which is further improved when the spatial resolution is increased to 4 km. However, the improvement between 36 and 12 km is much more important compared to the improvement between 12 and 4 km. Model performance for both annual and seasonal minimum temperatures does not change in the finer spatial resolution grids (i.e., 12 and 4 km) compared to the 36 km domain, except for the negative bias, which is improved on both nested domains. Model performance for annual and seasonal mean precipitation as well for annual and seasonal mean BAETS, which same trend with annual BAETS analysis is also found in the seasonal BAETS analysis (F ure S17 and S18, Table S17). However, during summer, the model presents significan lower BAETS skill score compared to the rest of the seasons.

Conclusions
WRF performance over central Europe for mean and maximum temperature, bo annually and seasonally, is better when increasing the spatial resolution from 36 to 12 k while a minor change is found when the grid resolution is further increased to 4 km shown by the statistical analysis performed in this study. The exception is the maximu temperature during autumn, which is further improved when the spatial resolution is creased to 4 km. However, the improvement between 36 and 12 km is much more i portant compared to the improvement between 12 and 4 km. Model performance for bo annual and seasonal minimum temperatures does not change in the finer spatial reso tion grids (i.e., 12 and 4 km) compared to the 36 km domain, except for the negative bi which is improved on both nested domains. Model performance for annual and seaso mean precipitation as well for annual and seasonal mean BAETS, which measures

Conclusions
WRF performance over central Europe for mean and maximum temperature, both annually and seasonally, is better when increasing the spatial resolution from 36 to 12 km, while a minor change is found when the grid resolution is further increased to 4 km as shown by the statistical analysis performed in this study. The exception is the maximum temperature during autumn, which is further improved when the spatial resolution is increased to 4 km. However, the improvement between 36 and 12 km is much more important compared to the improvement between 12 and 4 km. Model performance for both annual and seasonal minimum temperatures does not change in the finer spatial resolution grids (i.e., 12 and 4 km) compared to the 36 km domain, except for the negative bias, which is improved on both nested domains. Model performance for annual and seasonal mean precipitation as well for annual and seasonal mean BAETS, which measures the fraction of observed and/or forecast events that were correctly predicted, is slightly affected when increasing the spatial resolution from 36 to 4 km. The model's statistical performance for precipitation is quite good in areas with low precipitation rates, while in high-precipitation areas such as the mountainous regions, it is not. Precipitation predictability is slightly worsened when increasing the spatial resolution from 36 to 12 km, while further increasing it to 4 km has a negligible or minor effect. BAETS presents a weak correlation with the spatial resolution, presenting similar behavior over the three domains; during summer, the model presents significantly lower BAETS skill score compared to the rest of the seasons.
WRF captures the basic features of temperature and precipitation in magnitude, space and time over central Europe for all three grid size resolutions used in the present study. The results highlight some seasonal deficiencies and suggest their improved representation when analysis is carried out in the 12 km domain compared to the 36 km one. The model's skill is not better when further decreasing grid spacing (i.e., when comparing the results of the 4 km against the 12 km domain). This implies that downscaling produces skillful information up to 12 km grid size that is used in this study, however, the finer grid resolution of 4 km used does not provide statistically significant improved representation of annual or seasonal temperature and precipitation. This finding does not necessarily mean that model performance is not improved when the finer resolution of 4 km used in this study is employed. As a matter of fact, the better representation of vegetation variability, complex topography and coastlines of the fine resolution, which are significant components of the physical system, are anticipated to improve the model's performance. The statistically small improvement found here for the finer domain could be related to the comparison with the E-OBS coarser resolution than the 4 km used here. We acknowledge that an evaluation based on the high-resolution data would potentially preserve the finer resolution details and the decreased improvement in the statistical analysis seen for the finer resolution compared to the 12 km domain could be a result of the averaging. On the other hand, if processes are better captured at higher resolution, improvements are expected to be visible even when regridded to coarser resolution. Still, this points out the crucial need for high resolution and quality observations over the European domain for improved representation of such parameters in very fine scales.
Supplementary Materials: The following are available online at https://www.mdpi.com/2073-4 433/12/2/278/s1, Figure S1: Spatial distribution plots for autumn mean temperature: observed data (upper panel), differences between observed and simulated data for the three nested domains (middle row), and the related MAE (lower row); Figure S2. Spatial distribution plots for winter mean temperature: observed data (upper panel), differences between observed and simulated data for the three nested domains (middle row), and the related MAE (lower row); Figure S3. Spatial distribution plots for spring mean temperature: observed data (upper panel), differences between observed and simulated data for the three nested domains (middle row), and the related MAE (lower row); Figure S4. Spatial distribution plots for summer mean temperature: observed data (upper panel), differences between observed and simulated data for the three nested domains (middle row), and the related MAE (lower row); Figure S5. Spatial distribution plots for autumn maximum temperature: observed data (upper panel), differences between observed and simulated data for the three nested domains (middle row), and the related MAE (lower row); Figure S6. Spatial distribution plots for winter m maximum ax temperature: observed data (upper panel), differences between observed and simulated data for the three nested domains (middle row), and the related MAE (lower row); Figure S7. Spatial distribution plots for spring maximum temperature: observed data (upper panel), differences between observed and simulated data for the three nested domains (middle row), and the related MAE (lower row); Figure S8. Spatial distribution plots for summer maximum temperature: observed data (upper panel), differences between observed and simulated data for the three nested domains (middle row), and the related MAE (lower row). Figure S9. Spatial distribution plots for autumn minimum temperature: observed data (upper panel), differences between observed and simulated data for the three nested domains (middle row), and the related MAE (lower row); Figure S10. Spatial distribution plots for winter minimum temperature: observed data (upper panel), differences between observed and simulated data for the three nested domains (middle row), and the related MAE (lower row); Figure S11. Spatial distribution plots for spring minimum temperature: observed data (upper panel), differences between observed and simulated data for the three nested domains (middle row), and the related MAE (lower row); Figure S12. Spatial distribution plots for summer minimum temperature: observed data (upper panel), differences between observed and simulated data for the three nested domains (middle row), and the related MAE (lower row); Figure S13. Spatial distribution plots for autumn mean precipitation: observed data (upper panel), differences between observed and simulated data for the three nested domains (middle row), and the related MAE (lower row); Figure S14. Spatial distribution plots for winter mean precipitation: observed data (upper panel), differences between observed and simulated data for the three nested domains (middle row), and the related MAE (lower row); Figure S15. Spatial distribution plots for spring mean precipitation: observed data (upper panel), differences between observed and simulated data for the three nested domains (middle row), and the related MAE (lower row); Figure S16. Spatial distribution plots for summer mean precipitation: observed data (upper panel), differences between observed and simulated data for the three nested domains (middle row), and the related MAE (lower row); Figure S18. Seasonal mean BAETS change spatial distribution plots. Table S1. Autumn mean temperature statistical analysis ( • C); Table S2. Winter mean temperature statistical analysis ( • C); Table S3. Spring mean temperature statistical analysis ( • C); Table S4. Summer mean temperature statistical analysis ( • C); Table S5. Autumn maximum temperature statistical analysis ( • C); Table S6 Winter maximum temperature statistical analysis ( • C); Table S7 Spring maximum temperature statistical analysis ( • C); Table S8 Summer maximum temperature statistical analysis ( • C); Table S9 Autumn minimum temperature statistical analysis ( • C); Table S10 Winter minimum temperature statistical analysis ( • C); Table S11 Spring minimum temperature statistical analysis ( • C); Table S12 Summer minimum temperature statistical analysis ( • C); Table S13 Autumn mean precipitation statistical analysis (mm/day); Table S14 Winter mean precipitation statistical analysis (mm/day); Table S15 Spring mean precipitation statistical analysis (mm/day); Table S16 Summer mean precipitation statistical analysis (mm/day); Table S17. Seasonal mean BAETS. Funding: This work was supported by the EU LIFE CLIMATREE project "A novel approach for accounting & monitoring carbon sequestration of tree crops and their potential as carbon sink areas" (LIFE14 CCM/GR/000635).

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Data Availability Statement:
The simulation data presented in this study may be obtained on request from the corresponding author.