In this section, the results of the WRF sensitivity experiments are presented and discussed, as emerged through the comparison with weather station datasets. The analysis is divided per daily precipitation and maximum and minimum temperatures, which are the most important for the definition of climate indices, related to future impact studies, such as heat and cold waves and drought events.
3.1. Precipitation
Monthly precipitation time-series are illustrated in
Figure 2. The seasonal pattern of monthly precipitation is well captured by the majority of the schemes, also observed in the study of García-Díez et al. [
57], remarking the highest precipitation during winter and lowest during summer. The black, dashed line illustrates the seasonal variability of monthly precipitation derived from the average values of the available stations, which is noticed on the models results. There are obvious similarities in the precipitation patterns among all experiments and observed data for the 5 years of comparison, yielding smoothly the precipitation’s inter-annual variability, especially during the wettest months and summer periods characterized of limited rainfall. Some differences show that in general, there is an overestimation of precipitation compared to ground data for all setups, and this probably is caused by excessive wintertime precipitation [
85]. On the other hand, ERA-Interim appears to underestimate winter precipitation from November to January. Some cases of precipitation’s underestimation are related to the PP3 setup during the study period, and as this simulation appears to have the less overestimation among setups.
This fact is also confirmed with the 5-yearly estimation of statistical errors based on daily precipitation values in
Table 4, where PP3 shows the best performance, with positive percentage BIAS of about 19%, while the other models have values of over 40%. As being observed, RMSE, MAE and COR results as well show slightly better performance for PP3 compared to the other simulations. It is worthy to note that PP3 bias (19%) is significantly smaller than the ±50% reported in the Third Assessment Report of the Intergovernmental Panel on Climate Change by Giorgi et al. [
86] for RCMs, and while not small, it is close to the range of the best performing RCMs shown in several studies [
3,
23,
71,
87]. PP3 uses the Betts–Miller–Janjic cumulus parameterization, WSM6 for microphysics, the Mellor–Yamada–Janjic as PBL scheme and Monin-Obukhov similarity theory (
Table 1). Statistical metrics between ERA-Interim and the WRF results indicate some loss of performance in the WRF model, with an underestimation of 5.5% PBIAS value in the total of stations’ grid points.
The study of the spatial distribution of 5-annual mean precipitation in WRF simulations and observations shows a clear topographical dependency (
Figure S1). The analysis of the amount of precipitation yields large differences between plain areas and higher elevations, with maximum values of annual precipitation found in the western part of the country, related to fronts passages with increased vertical lift due to orographic enhancement in mountainous locations. All simulations depict similarly the spatial pattern of precipitation, with excessive rain being observed only in mountainous locations; however, there are no representative stations to validate such precipitation amounts. The same behavior is observed for all model setups, although PP7 seems to produce higher amounts of precipitation during the examined 5-year period than PP2 and PP3. This fact could be related to the interaction of the microphysical scheme with the association of PBL (MYJ) scheme, which is in line with other studies with higher precipitation totals and more convective precipitation [
88,
89]. Additionally, for precipitation over areas of complex topography, wet bias is particularly found to be common to several RCMs [
90,
91], and is probably caused by an overestimation of orographic precipitation enhancement [
92], and/or to an inaccurate PBL simulation [
7,
11,
93].
The Taylor diagrams shown in
Figure 3 provide the comparative assessment of the four different model experiments to the choice of the physical parameterizations, to simulate the seasonal spatial pattern of daily precipitation during the examined period. The simulated results are compared to all observational data from 23 stations. The best simulation is marked by the largest correlation, smaller CRMSE and being closer to the observed standard deviation. It is found that the highest correlation in the range of 0.5–0.6 is observed during the winter and spring seasons, with the poorest correlation in summer, resulting though in the lowest centered RMSE values (~3.5 mm). It is well-known that the satisfied representation of summer precipitation is a demanding field for any model, and as the convective processes prevail, it is not easy to determine confidently the appropriate cumulus scheme. The poor performance of model setups is observed during winter and autumn, where the highest CRMSE values are displayed. All models’ simulations seem to have similar performance; however, PP3 appears to have slightly lowest errors during all seasons, thus yielding a better performance compared to the rest of the setups.
Because of the heterogeneous spatial pattern of precipitation in Greece, which is strongly associated with the orography, the extended coastline (see
Figure 4) and the limited number of stations for comparison, statistical errors are also displayed per station in detail in
Figure S4.
This analysis allows also exploring which of the simulations outperforms by station, which of the stations statistically is better validated from the WRF model, and finally which setup represents outmost the majority of the stations in Greece.
It is evident that the prevailing simulation with the lowest errors among stations is represented by PP3, showing a significant improvement regarding the others.
In general, regional models still misrepresent daily precipitation and precipitation extremes because of resolution or/and parameterization deficiencies [
3]. The precipitation analysis was extended on studying the intensity of daily precipitation; thus, the 80th, 90th, 95th and 99th higher ranking percentiles were calculated. This analysis is very important for climate change assessment related to extreme weather events, drought and flooding events, that have significant socioeconomic impacts on the global community. The results are shown in
Figure 5, where the WRF percentiles distribution is plotted versus the observational data for the domain of Greece and several indicative stations distributed all over Greece during these 5 years. The straight (blue) line depicts a perfect performance, indicating the over- or under-estimation of the simulated values compared to the observations.
The percentiles, obtained for the area of Greece, show that all WRF simulations follow very well the observational percentiles. The PP3 model setup outperforms remarkably well the extreme percentile 99%, while the other setups tend to slightly overestimate rainfall for 90% and 95% extremes. A general inspection of the percentiles results by station shows that WRF dynamical downscaling simulations overestimate extreme precipitation events, with few exceptions regarding the 99% percentile, and also that PP3 model simulation tends to reproduce rainfall extremes better than the other setups. In
Table 5, precipitation verification statistics are presented in the contingency table for four distinct threshold values of precipitation compared to the 5-year observational data. This table depicts the results by station, only for the best performed simulation PP3. The forecasts show reasonable skills for both low and medium intensity rainfall days, as the model runs show POD values of (0.7–0.86) and (0.6–0.85), respectively. Regarding extreme rainfall events, the majority of the stations indicate POD values close to 0.5–0.6, followed by FAR values of (0.5–0.7), meaning that a very low percentage of these rain events (observed and/or predicted) were correctly forecast. Probably the rare episode of convective precipitation is often missed or underestimated by the model and the convective scheme.
3.2. Maximum Temperature/Minimum Temperature
Annual and seasonal changes in the daily minimum (TN) and maximum (TX) temperatures for the selected period have been analyzed. In general, it was found that physics parameterizations appear to have less noticeable effect on temperature than on precipitation [
56].
In
Figure 6, the inter-annual cycle of daily-average minimum and maximum values of temperature by month is displayed for the total number of grid stations‘ points for the whole region of the study. The colored lines show the results of the simulations, and the dashed, black lines indicate observational data. In general, the observational seasonality is precisely captured during 2000–2004, while the summer/winter peaks are clearly identified as well. Similar representation and behavior of temperatures are observed regarding all physical schemes. Both temperature measures are in agreement over the study period, but WRF TX results are consistently colder, while TN is much warmer than observational data for all physical schemes. This bias appears to result mainly from the summertime over-prediction of daily-minimum temperature and summertime under-prediction of TX; daily-maximum biases tend to be smaller in magnitude and seasonally invariant, while the warm bias is mainly confined to the maximum temperatures. Additionally, the spatial distribution (
Figure S2) of the simulated 5-years mean that the daily TX is characterized normally by a warm decreasing gradient from the coasts and low altitudes regions to mountainous chains, verified by the weather station values in spite of the limited number of observational data. Similar results are observed for minimum temperature as well (not shown). ERA-interim also performed well the inter-annual cycle for both temperatures, indicating lower values than the observed and modeled values during summer maximum values in the case of maximum temperature.
Table 6 presents the statistical metrics calculated for daily values. A high correlation coefficient of 0.96 is observed between the station and the simulated daily maximum temperature TX, with an overall negative BIAS from −1.1 to −1.4 °C, indicating a slightly better performance of the model for the PP3 (−1.06 °C) simulation.
The RMSE and MAE errors have values close to 2.6 °C and 2 °C, respectively, with similar values found for all simulations. Regarding the daily TN, a high correlation coefficient of 0.92 is observed between observations and model data, with a consistent positive bias of around 2 °C, ~3.5 °C RMSE and ~ 2.7 °C MAE values. These findings are in good agreement with high resolution climate analysis for temperature by Berg et al. [
4] for Germany, and little higher values regarding RMSE/MAE values, (especially in the case of TX BIAS, which is found negligible −0.4 °C) in a similar study of Soares et al. [
3] for Portugal. The results showed an improvement in maximum temperature with respect to the ERA-Interim dataset, and higher bias regarding minimum temperature, but without significant discrepancies on the other statistical errors.
An overall performance of the simulations is illustrated in
Figure 7 by Taylor plots for seasonal periods (winter, spring, summer and autumn - 7a) and annual (7b) during 2000–2004.
The diagrams for maximum temperature showed a good match between model results and observations at seasonal time scales. The lowest performance was obtained during the summer months, with correlations close to 0.75 that increase to 0.8 for PP5 simulation results. In addition, in the representation of metrics by station in
Table 5, high RMSE values are observed, of about 3–3.5 °C for several stations. It was found in the initial study [
26] that the cell of the certain model points that correspond to the location of the observed stations is characterized by the sea dominant land use category (e.g., Corfu, Heraklion, Mitilini, Argostoli etc.), and consequently during summer period could affect the results, with higher differences in temperature leading to stronger sea-land interaction in combination with the appearance of more intense thermal instability.
The correlations in the other seasons are much higher in the range of 0.9–0.95, and lower errors are observed with very similar values for all models. The Taylor diagram in a yearly time scale showed a very good agreement of models performance, with no distinct differences among them during the 5-year period. The correlation coefficient results showed a good match with values above 0.9 arising in the climatological study of Marta-Almeida et al. [
94] for seasonal time scales in Spain.
Minimum temperatures showed a slightly lower performance than TX, with correlation values around 0.85 during spring, autumn and winter regarding all simulations results, while during summer months, a correlation lower than 0.8 is observed with no significant changes with respect to errors. A good agreement with observations is also found, and no significant statistical differences are yielded among PPs models for TN, as illustrated in the Taylor plot of
Figure 7b with respect to the yearly timescale. The values of the correlations are higher than 0.9, as an increasing number of days were averaged. It is noticed that for all simulations the correlation coefficients of TN yield lower values (0.92–0.93) for high resolution domains, which is in agreement with other works [
3,
71]. It could be deduced that the model performs better for TN with the PP2 scheme with positive BIAS near 2 °C.
Table 7 depicts the best setup based on the daily values of statistical metrics BIAS (not shown), RMSE, MAE and the correlation coefficient (not shown) for the daily maximum temperature separately for each of the 28 stations, during the 5 years and the four different experiments. These same calculations were derived as well as for TN (not shown). Some exceptions in
Table 5 concern 6–7 stations, that their BIAS is in the range of 2–3.5 °C, and probably is related to the selection of the nearest model point to station that is not located in land cell, or displays a significant height difference. From this analysis, it is evident that the setup that statistically outperforms with the lowest errors among stations is PP3, showing a significant improvement regarding the others, and thus representing the majority of the stations in Greece.
The range of errors appears to have values of 1.2 to 4 °C. It is also observed that the underestimation of the model is consistent for the entire area, and is close to −1.2 °C for the majority of the observational stations used in this comparison.
The comparison of model results and observations in terms of probability distribution (see
Figure 8) for some indicative stations (for all stations see
Figure S3) shows satisfied agreement for the majority of the stations for the 5-year period daily TX, as all WRF simulations follow the pattern distribution of observed data without having distinct differences among setups.
In a few stations, a lower model observations correspondence is found. More specifically, the Argostoli, Corfu, Florina, Heraklion, Kithira, Methoni, Milos and Mitilini stations illustrate a large shift towards colder values in the medium temperature range with higher density values. In Chania station all simulations appear to have higher density values for hotter temperatures, while in Ierapetra, the opposite behavior is observed. On the other hand, the TN probability distributions of WRF simulations (not shown), appear to have a large shift towards hotter values in the temperature range corresponding to either higher or lower density. This behavior justifies the consistent model’s overestimation, especially during the summer period.
Percentiles of TX and TN (the 1st, 5th, 25th, 75th, 95th and 99th) of daily values for the 28 stations, as well as for the total region of Greece were calculated, in order to focus on the examination of extremes description by the different simulations. Percentiles for the WRF simulations versus observational percentiles are shown indicatively in
Figure 9 and
Figure 10, for TX and TN for several stations, and their average through the domain of Greece. As in the case of precipitation, the over- or underestimation of the simulations is indicated by the blue line, which represents the perfect description. It is evident that the maximum temperature is very well reproduced by WRF, with no significant differences between the different simulations, and with slight underestimation mostly for percentiles higher than 50%. There are some stations, e.g., Argostoli or Mitilini, that appear to have larger deviations in the extreme percentiles, and others like Naxos that show very good performances in predicting the extremes. In accordance, regarding the minimum temperature, all models’ setups indicate no significant differences in simulating percentiles as well, however in what concerns their behavior an almost systematic overestimation, is observed overall. Probably, as Pérez et al. [
56] point out in the case study of the Canary Islands, these deviations could be due to the insufficient temperature correction on the representation of mountainous areas, because the altitude difference between the model and stations points has strong influence.
From the set of the simulations carried out in this study, obvious similarities were found in the precipitation patterns among simulations and observations during the 5-yr period, verifying smoothly the precipitation’s inter-annual variability, especially during the wettest months and summer periods. Additionally, the lowest positive percentage BIAS of about 19% was calculated for the selected combination of physics parameterizations PP3, while for the rest of the setups, values of over 40% were obtained, highlighting as the best configuration for the simulation of precipitation in the PP3 setup. Furthermore, all simulations regarding the maximum temperature demonstrated improved statistical metrics with respect to ERA-Interim data, and not significant differences in the case of the minimum temperature. Although some verification results among ERA-Interim and the WRF results indicated some loss of model performance for daily precipitation, the results demonstrated the necessity to use a high resolution RCM model for climate studies in Greece, due to the difficult geomorphology of the region (complex orography, irregular coastlines and regions with heterogeneous land cover).
Statistical analysis of the maximum and minimum temperatures for all tested simulations showed satisfactory results, characterized by a good match between modeled and observed data, with high correlation above 0.9, negative bias around 1–1.5 °C, and a positive bias of around 2 °C, respectively. Moreover, good performance was deduced with regards to the examination of extreme percentiles for temperatures and precipitation, with some deviation due to complex topography. PP3 showed a slightly better performance for the maximum temperature in the majority of the stations, while PP2 for the minimum temperature.
Given the small discrepancies between the results of these two setups in temperatures, and taking into account the noticeable difference in the results of PP3 for precipitation, we recommend PP3 as a good choice for the upcoming climate simulations. Several studies have supported the PP3 WRF setup (MYJ, WSM6, RRTMG, NOAH, BMJ) for climate or forecasting applications as overall, more balanced behavior is displayed for both surface variables; the annual precipitation cycle is captured adequately, and closer agreement with the observational datasets is found regarding temperatures and their extreme values [
2,
37,
38,
39,
44,
88,
94,
95].
It should be mentioned that this study was based on previous research that has already examined a combination of physics parameterizations, and performed sensitivity tests for the area of interest, analyzing the effect of the chosen schemes; therefore an in-depth analysis of physical scheme inter-comparison was not in the scope of the current work. The use of RCMs for the simulation of historical, current and future climate, particularly in view of the warming climate, is continuously increasing, as it is considered important for studying regional climate changes. It is considered important to emphasize that the current study aimed at identifying an appropriate WRF model set up in order to perform in the future high-resolution historical climatology simulation, by downscaling ERA-interim reanalysis to the domain of Greece. For the realization of the future work related to the hindcast evaluation of the WRF model, different interpolation methods will be considered for the determination of the nearest model points to the stations to decrease statistical errors, particularly for those located within coastal and mountainous areas. In the future analysis, we will also consider the application of bias correction methods to reproduce more reliable results regarding precipitation. Additionally, according to Emery [
96], if the mean error for temperature exceeds the ±0.5 threshold, it is suggested to adjust bias correction methodologies, before the results are applied to other studies.