An Assessment of Observed and Simulated Temperature Variability in Sierra de Guadarrama

: This work provides a ﬁrst assessment of temperature variability at interannual and decadal timescales in Sierra de Guadarrama, a high mountain protected area of the Central System in the Iberian Peninsula. Observational data from stations located in the area and simulated data from a high-resolution simulation (1 km) with the Weather Research and Forecasting (WRF) model, fed from ERA Interim reanalysis, are used in order to analyse the temperature variability in the period 2000–2018. Comparison among all datasets allows evaluation of the realism of the model simulations. The results show that the model tends to underestimate the observational mean temperatures and anomalies at high-altitude stations. A linear mean temperature vertical gradient of − 5.81 ◦ C/km is observed, but it is overestimated by the model ( − 6.56 ◦ C/km). The variability of the daily temperature anomalies for both observations and, to a lesser extent, simulations increases with height. The added value that the WRF offers against the use of the ERA Interim is evaluated. The results show that the WRF provides a better performance than the reanalysis, as it shows smaller biases with respect to observational temperature anomalies. Finally, the study of temperature trends over the Sierra de Guadarrama and its surroundings for the period 2000–2018 shows a warming in the area, signiﬁcantly pronounced in autumn. When extended to the last decades, observations show that this warming has been happening since the ﬁrst half of the 20th century, especially during the period 1970–2018, but not as much as during 2000–2018.


Introduction
Mountains are treasures of natural and cultural heritage. They offer a useful space for research and educational and leisure activities, as well as natural resources needed by society. The main resource provided by mountains is water, much of which comes in the form of snow or ice. Winter snow is a source of water for spring and summer [1]. This water is used both for consumption and generation of hydroelectric energy and sustainability of crops [2]. However, mountains are not only important for interests and human needs. They are also greatly important for biological diversity, since they serve as a habitat for a large number of species, both animals and plants [3]. Thus, it is increasingly necessary to have a deeper knowledge of high mountain climates that can be used to help managing ecosystems, risks, and other environmentally-related activities. environmental and natural values and resources for the capital city and all the minor towns located on both sides of the mountain range. Therefore, it is of practical relevance to understand different aspects of climatology of this region that sits between the Atlantic and Mediterranean influences. This will be accomplished by using a high-resolution WRF [17] simulation and reanalysis data from ERA Interim [18], as well as observational data from weather stations located at both medium-and high-mountain locations in Sierra de Guadarrama and the lowlands around it.
Throughout the study, the skill of the model in reproducing temperature over the area will be tested by comparing the instrumental data to the WRF simulations and reanalysis data. This comparison is performed at the few sites with available data and is informative of the added value provided by the increased resolution of the RCM simulation. Additionally, the simulated model fields will provide valuable information beyond the limited coverage of the observational network.
Furthermore, it is of interest to address the evolution of this high-altitude continental environment in the context of the more intense global warming experienced over the last decades [19]. This work provides a first analysis of the simulated and observed climatology of temperature in Sierra de Guadarrama for the last 20 years, as well as a first evaluation of multidecadal trends that allows to quantify annual and seasonal warming rates at regional and local scales during the second half of the 20th century. Based on other studies, like [20] or [21], where a mean temperature trend of 0.10 • C/decade was found over the Iberian Peninsula for the period 1850-2005, a warming is to be expected over the area of interest in the present work. This will allow for comparing the spatial homogeneity of multidecadal trends at the center of the Iberian Peninsula.
This text is structured as follows: In Section 2, all datasets (observations, simulations, and reanalysis) are described; the methodology is described in Section 3; and results are presented in Section 4. An analysis of the regional climatology from the model simulations and observational data is presented in Section 4.1 for the 2000-2018 period. Section 4.2 focuses on regional temperature trends and extends the period of analysis to multidecadal timescales up to the 1950s. A discussion of results is included in Section 5.

WRF Model Simulations
The WRF simulation [17] used in this study is an experiment that was spatially configured in four domains ( Figure 1) with different grid spacings down to the finest horizontal resolution of 1 km. The outermost domain, D1, is configured with a horizontal resolution of 27 km; the second domain, D2, with 9 km; D3 with 3 km; and lastly, the innermost domain, D4, presents a resolution of 1 km, covering the Sierra de Guadarrama and a large area of the provinces of Madrid and Segovia. This allows for representing orography at a very high resolution of 1 km, slightly beyond the limit of the turbulent eddy scales [22]. The WRF model was initialised as a cold start at 0 hours every day and was run for 48 h, using a temporal step of 120 s, storing every hour output. The first 24 h were discarded as model spin-up, so as to obtain as a stable simulation as possible and the outputs for the following 24 h were retained. This process was repeated until a complete simulation, spanning the period January 2000 to July 2018, was obtained. ERA Interim reanalysis data were used as both initial and boundary conditions [18,23].
The selected physical configuration of the WRF model in this experiment involved parameterisations of the 3.6.1 model version, including long-wave radiation as represented in a new version of the Rapid Radiative Transfer Model, RRTMG [24], that takes into account the effects of the detailed absorption spectrum of water vapour, carbon dioxide, and ozone. The short-wave radiation scheme is based on Iacono et al. (2008) [24]. A modified version of the Kaine et al. (1990,1993) [25,26] scheme is used for the cumulus parameterisation, although it is activated only in the first three domains. The Yonsei University (YSU) scheme [27] was used to parameterise the planetary boundary layer (PBL) in the four domains. For microphysics, the WRF Single-Moment 6-Class scheme was adopted, similar to Lin et al. (1983) [28]. The Noah land-surface model was used to provide the heat and moisture fluxes at land points [29]. The chosen surface layer model is based on Monin-Obukhov's theoretical analysis [30] with Carslon-Boland [31] viscous sub-layer and standard similarity functions from look-up tables [17,32]. Finally, the land-use and land-cover data used to determine the physical properties of the surface [33] and the topographic data for all domains were obtained from the USGS GTOPO30 dataset, which has a 30" latitude x longitude grid resolution [34][35][36].
It should be noted that the time span of the simulation, 2000-2018, was decided on the basis of covering the most recent period and extending backwards as long as possible, allowing for computational constraints. The decision to set up a configuration in four domains was made to achieve the highest possible resolution and the most accurate representation of orography. Nevertheless, increasing resolution does not necessarily grant more accuracy in simulations [37]. Nevertheless, the WRF set up described in Figure 1 allows for evaluating the added value of increasing resolution. In addition, the set up of this model is based on previous experience [32,37] and not necessarily optimal for this region. This work establishes an initial reference configuration that can lay the ground for improved modelling and model-data comparison.  Table 1). Grey shadings depict orographic height. Note that MRT (Table 1) depicts the location of Madrid city.
For the purpose of the present study, the simulated temperature from the D3 and D4 domains has been used with a daily temporal resolution. Data from the D1 and D2 domains are also used for the specific evaluation of the effects of increasing resolution. In addition, to ease comparisons with observations, data of the nearest WRF grid points (center of the grid cells in Figure 1) in both domains to the observational stations have been selected and treated for model-data comparison. These grid points are referred to as co-located grid points to the observational sites. The information of the horizontal coordinates and height of the co-located WRF grid points to the stations (denoted by cWRF) can be found on Table 2.

ERA Interim Reanalysis Data
ERA Interim [18,23], referred to as ERAIT in the text, is a reanalysis product developed by the European Centre for Medium-Range Weather Forecasts (ECMWF). Additional to its use as boundary conditions to the WRF experiment (see Section 2), this has been used here to evaluate the added value of the WRF model experiments. The spatial resolution of ERAIT data is approximately 80 km, which is a much coarser resolution than the grid of the exterior domain of the WRF simulation, D1 (Figure 1), and as it can be noticed, most of the closest grid points to the observational stations are located outside the domain D4 ( Figure 1 and Table 2). Furthermore, for the purpose of comparison, all observational sites are represented by different co-located grid points in the reanalysis. Due to the already mentioned coarse resolution, there are only four ERAIT grid points that can stand as candidates for all sites. These points are referred to as cERAIT, in the same way as done with cWRF. ERAIT daily temperature data have been used for this study in the period January 2000-July 2018.

Observations
Daily temperature records are available at 19 sites in the region of interest (Table 1 and Figure 1). Observational data come from two sources. On the one hand, stations from the GuMNet (Guadarrama Monitoring Network) infrastructure [38] have been used (orange circles in Figure 1). This infrastructure is formed by 10 ground-based and subsurface monitoring stations distributed over the Sierra de Guadarrama. Most of these stations are located on the Madrilian side of the Sierra (to the southeast), except for a couple of them settled on the Segovian side (to the northwest). All stations are located in mountainous environments up to 2225 masl within the innermost WRF domain, except for two of them which are found below 1000 masl. Due to the recent commencement of operation in 2014 and the lack of continuous records of some stations, only data from 8 of the GuMNet stations are used herein. Some of these stations also have records for the earlier period 2000-2014, which were registered by the Parque Natural de Peñalara (PNP, Table 1). A quality control was applied to the GuMNet data for the period 2014-2018 in order to exclude unphysical values following guidelines in [39,40].
On the other hand, 10 stations from the Spanish National Meteorological Agency (AEMet) have been used (red squares in Figure 1). Half of these stations are located in and around the Sierra de Guadarrama, below 1900 masl. The rest of AEMet stations are located in plateau areas in and around the city of Madrid. The AEMet sites provide, therefore, information from the lower lands and also longer records than the GuMNet sites, thus offering valuable insights into trends in Section 4.2. The set of observational stations are addressed to as OBS hereafter.
Some of the analysis can be influenced by the presence of missing data in the OBS. In order to make model-data comparison independent of missing data, cWRF and cERAIT have been masked out in some specific analysis, which means that they have been filled in by the same missing data gaps as the OBS. When this masking is used, it will be identified by a letter m in the dataset name, i.e., cmWRF or cmERAIT.

Methodology
As a first step, temperature data from the WRF model, ERAIT reanalysis, and OBS have been compared. The annual cycle of daily temperature was calculated and compared for every station and every dataset. These cycles were estimated by calculating the temperature average for every day of the year and by low-pass filtering it after (using first a 31-day centered moving average in order to discard the intra-monthly variability). In this way, a smoothed annual cycle was obtained. By substracing the annual cycles from the daily temperature raw data, temperature anomalies were calculated. This comparison was performed over the period January 2000-July 2018, since this is the common period for every dataset, which comes determined by the length of the WRF simulation and the availability of data from the shortest timeseries at the highest sites. From the resulting daily temperature anomalies, frequency distribution (box-whiskers plot) and regional averages were obtained and used to compare models and observations. The relative performance of the WRF model in comparison to the use of the ERAIT reanalysis data was assessed by the use of several metrics. Taylor diagrams [41] allow for two datasets to be compared through their correlations, their root mean square error (RMSE) and their standard deviation ratios. In this way, it is possible to know if the high resolution simulation constitutes an added value in the area of the Sierra de Guadarrama with respect to the reanalysis data. A modified version of the Brier Skill Score (BSS) [13] has also been used to compare ERAIT and the increase of resolution in the WRF domains. This index is calculated based on the ratio of the RMSE of a reference simulation and a test simulation relative to observations: where RMSE re f is the RMSE of the reference simulation with respect to observations and RMSE test is the RMSE of a test simulation with respect to OBS that is compared to the reference.
When BSS = 0, the new "test" model does not introduce any improvement. If BSS > 0, the residuals of the new model have a smaller RMSE than RMSE re f , so it provides an improvement relative to the reference. Otherwise, if BSS < 0, the reference model provides a better performance. The BSS index has been used to compare the performance of D1, D2, D3, and D4 domains data points co-located to the observational sites against the co-located ERAIT points and, on the other hand, the relative performance when increasing the resolution in each domain transition. This is D1 against ERAIT, D2 against D1, D3 against D2, and D4 against D3. This analysis was performed over the daily temperature anomalies during the spring (March, April, and May), summer (June, July, and August), autumn (September, October, and November), and winter (December, January, and February) months, as well as in the whole year.
After the evaluation of the model performance, a study of the temperature trends in the region was carried out. For that purpose, monthly temperature anomalies were calculated for the period January 2000-July 2018. In the first instance, the trends of anomalies were analysed in the D3 and D4 WRF domains, as well as at the observational stations, annually and seasonally, using a linear fit. The significance of these trends was evaluated using a Student's t-test. Finally, for the longer observational series (see Table 1), the monthly anomalies were calculated for their complete record with respect to the period January 2000-July 2018, annually and seasonally. From these anomalies, temperature trends for the last decades were calculated for different time intervals: (1)  The confidence interval of the trends was calculated for a significance level of p = 0.05 both with and without taking into consideration the autocorrelation of the timeseries [42]. The first method only takes into account the number of independent terms of the series [43] and is more restrictive than the second.

Model-Observations Comparison
Mean annual and seasonal temperatures from OBS, WRF model, and ERAIT reanalysis are shown in Figure 2. These maps provide a basic climatological description of the temperature in the Sierra de Guadarrama, where orography is dominant, so the coldest temperatures are found at the highest altitudes. Moreover, temperatures in the northern plateau, in the north-west side of the mountain range, seem to be milder than temperatures in the southern plateau, in the south-east side, for every dataset considered, either annually or seasonally. Regional averages (squares in Figure 2) show that OBS and the WRF simulation are very similar at this spatial scale during summer, autumn, and winter seasons, whereas cmWRF shows a small cold bias during the spring and annual period. The reanalysis shows a warmer bias in general, except for the winter and annual cases, whose averages fall within the same range as the observational ones. Local values of observed mean temperatures are in agreement with the simulated spatial distribution. Nevertheless, stations located at the high mountain valley area, Alameda (ALM, Table 1) and Ontalva (ONT), tend to be colder than the simulated values in all cases.
The capability of the model's high resolution to realistically capture the terrain orography, where it has already been seen that the temperature dependency on height is evident, is shown in Figure 3, where mean seasonal and annual values have been plotted only for D4, thus avoiding the changing resolution with D3. The linearity of the altitude gradient for WRF and observations is noticeable, with a value of −6.56 • C/km for the annual temperatures in the WRF simulation (light grey crosses in Figure 3), −6.79 • C/km in the simulated co-located grid points (black pentagons), and −5.81 • C/km in the observations (black triangles in Figure 3). It seems that temperatures provided by the model for the co-located grid points overestimate the slope of the observed temperatures by about 1 • C/km. It should also be noted that this is due to an underestimation of temperatures at higher altitudes and that the higher the stations are, the greater the difference in altitude between the real station and the WRF co-located points. Thus, the representativeness of the co-located grid points in the model is limited in a highly complex-terrain frame as in our case. This could have been at least partially avoided by post-processing the model output and correcting it by altitude, but in this work the focus has been given on addressing the direct model output. This pattern seems to be repeated for every season, being slightly more pronounced in summer. As for the reanalysis performance, since observational stations are related to only four grid points from the ERAIT reanalysis, only one of which is located within the domain D4 and none of them over 1000 masl, the altitude gradients for this dataset (squares in Figure 3) are much more pronounced (−14.75 • C/km for the annual period), so that the difference between observations and reanalysis is very noticeable for high-altitude stations. In this way, the 1-km resolution of the model implies an improvement, given the greater number of grid points with altitudes closer to those of the observations.   This climatological analysis has been carried out within the WRF D4 domain using the 1-km high resolution data. However, it is relevant to know whether this high resolution provides an improvement in the representativeness of the terrain with respect to the previous domains. In Figure 4, the comparison of the observed versus the simulated mean temperatures for every domain and reanalysis can be seen, both annually and seasonally. A simulation identical to the observations would spread its mean values over the diagonal. WRF improves the simulation of the mean temperatures with respect to reanalysis in all cases. This difference is greater (in the range of 4 • C) for lower temperatures, thus for higher altitudes. The largest improvement takes place between ERAIT and the WRF D1 domain and it happens in all seasons. The improvement progressively decreases for the D2, D3, and D4 domains, being noticeable at the highest-altitude sites. Therefore, the WRF simulation produces a progressive improvement, particularly at higher altitudes. For the D4 domain, values distribute around the diagonal with slight underestimations at the highest sites, in consistency with Figure 3. This supports the rationale of using higher resolution over complex terrain. For lower altitudes, the improvement with resolution saturates with the D1 or D2 coarser-resolution domains. For higher-altitude sites, discrepancies with observations may also be the result of representation errors [44] and limitations of this WRF configuration in reproducing local features realistically enough. From here on, we continue to use the D4 domain data even if this is not strictly arguably outperforming the D1 to D3 domains at every single point.  After analysing the mean state, the deterministic part of the temperature variability is considered first by addressing the annual cycle. Figure 5 shows a description of the annual cycles calculated for every site and dataset as explained in Section 3. Annual cycles of four individual stations are shown at the top as an example. They show how observations, model, and reanalysis behave in different regions inside and outside the D4 WRF domain. For the cases selected, it can be seen that the annual cycle is comparable among the different sources of data. At the station located in the southern subplateau in the D3 domain, Madrid Retiro (MRT, Table 1), and at another station located in the northern subplateau in the D4 domain, Segovia (SGV), observed and simulated annual cycles hardly show any difference, reanalysis temperatures being slightly lower than observations at MRT. In the example of the station located in the valley, ONT, it can be seen, as in Figures 2 and 3, that both WRF and ERAIT overestimate temperatures of this site, the overestimation of WRF being smaller. As for the station located at the mountain top, Dos Hermanas (DHS), it can be seen that WRF is again in better agreement with observations and a quite remarkable warm bias appears in the reanalysis. The dashed lines in Figure 5a-d show regional averages in observations, WRF, and reanalysis. Notice that the higher-altitude sites present cycles progressively below the regional average. Notice also that the sites with more data (MRT and SGV) present cycles that are smoother than those of ONT and DHS. It is noteworthy how masked WRF and ERAIT reproduce those features, thus related to the shortness of the records. The bottom panel of Figure 5 shows box whiskers plots for the estimated annual cycles of every site, including those in the top panel. A distinction between those sites within the D3 WRF domain from those of the D4 domain is also made. Sites are ordered by the increasing altitude to the right. The regional averages of both domains are also represented (AVE3 and AVE4). The bulk of sites seems to reproduce the behaviour already evidenced at the four example locations in Figure 5a-d. Annual cycles of the observations agree to a great extent with those calculated from the model, their range being smaller as the altitude becomes higher. It can be noticed how temperatures of both OBS and WRF simulations are very consistent. The warm bias of ERAIT at ONT and DHS (Figure 5a-d) becomes now progressively evident towards higher altitudes above the ALM site. ERAIT is unable to reproduce the colder range of the annual cycle with overestimated, too-warm temperatures both in summer and winter. In general, ERAIT is biased slightly cold with respect to OBS and WRF in D3 and warm in D4 (see AVE3 and AVE4). This is consistent with the overestimation of altitude gradients in Figure 3. The WRF simulation does adjust the annual range of temperatures to increasing altitudes (decreasing temperature range). WRF tends to show colder summers and winters of the order of 2-3 • C at some sites (e.g., Raso del Pino I -RPI-, Navacerrada -NVC-, and Zabala -ZBL-), in consistency with Figures 3 and 4. As discussed earlier, this can be the result of the representation of errors [44] and limitations of this WRF configuration in reproducing local features with more detail. Recall that, at every site, the closest, co-located grid points have been selected and the model series masked (cmWRF and cmERAIT) to include the same missing data as observations to ensure a proper comparison in these terms. It is also arguable that a more ad-hoc comparison might have included height corrections or slope orientation considerations for grid-point selection. These comparatively more adaptive options have not been considered herein.
The daily distribution of observed and simulated temperature anomalies after substracting the annual cycle is represented in Figure 6. This figure allows for comparing the variability of temperatures beyond the deterministic regularity of the annual cycle and shows that the variability of OBS increases with altitude, with the exception, again, of the stations located in the mountain valleys (ONT and ALM). WRF always presents a smaller temperature range or very similar to that of the observations, in every case, both annually and seasonally. This is more pronounced at the stations within the D4 domain. This does not occur, however, at the stations in the valley area. In general, it can be appreciated that, while WRF is able to better reproduce the extreme events (black dots in Figure 6), ERAIT fails in capturing the most extreme variability, especially at high altitude stations. However, at these stations the variability of the WRF model seems to be still below that of OBS, both annually and seasonally.  (e) Figure 6. Box-plots of temperature anomalies with respect to the annual cycle for stations in both the D3 and D4 domains during spring (a), summer (b), autumn (c), winter (d), and year (e). Stations are ordered by altitude. Regional averages are separated by a vertical line on the right side of the graphs. OBS anomalies are shown in orange, cmWRF in blue, and cmERAIT in green.
The temperature variability at regional scale is represented by regional averages in Figure 7 for the OBS, cmWRF, and cmERAIT anomalies within the D4 domain. They evidence a high level of covariability, correlations being above 0.95 (p < 0.05). As in Figure 6, it is also shown that neither the model nor the reanalysis is able to reproduce the most extreme anomalies, thus slightly underestimating the variability of OBS. For the WRF model, the regional average of the complete D4 domain (not masked) is also shown (brown dashed line in Figure 7). The resemblance of the regional anomalies from OBS, cmWRF, and cmERAIT can be extended to the series calculated without masking out the WRF temperature with the observational sites. Correlations between WRF and OBS, cmWRF and cmERAIT are above 0.9 in every case. This suggests that the behaviour of the temperature variability is reasonably homogeneous over the D4 domain and just a few points are enough to represent the spatial distribution of temperatures in this region. The high correlations among observations and models can be observed in Figure 8, where the Taylor diagrams [41] evaluating the performance of the WRF model and ERAIT reanalysis with respect to the observational data are shown. The evaluation is carried out for both the WRF and ERAIT co-located grid points (diamonds) and for their regional averages (triangles) inside the D4 domain. The higher the correlation is and the closer the data are to 1 standard deviation ratio, the lower the RMSE and better the capability of the model to reproduce the observations. It can be seen how WRF introduces an improvement in the representation of regional temperature anomalies over the ERAIT temperatures, barely noticeable in summer, but very remarkable in winter. Except for summer, with correlations higher than 0.95 for both WRF and ERAIT, WRF outperforms the ERAIT regional correlation, this being particularly remarkable in winter when WRF correlates above 0.95 and ERAIT above 0.75. Annual standard deviations are well represented by WRF, with ratios close to 1, while ERAIT overstates variability. Seasonally, WRF and ERAIT tend to underestimate variability in all seasons, particularly in winter, an exception being ERAIT overestimation of regional standard deviation in autumn.   The temperature anomalies of the co-located grid points follow a pattern similar to the regional averages in which WRF provides better results than ERAIT, especially during year, autumn and winter. In every case, most of the points show correlations above 0.90, except for the ERAIT grid points in winter. Despite winter being the season with poorer performance with respect to OBS, correlations are still high (above 0.8 at most sites). Correlations are also high for the reanalysis in this season, except for some sites corresponding to the highest altitude stations, i.e. Cabeza Mediana (CBM , Table 1), RPI, Cotos (CTS), NVC, Hoyas (HYS), ZBL, and DHS (notice the symbols fall out of the frame in winter). Both WRF and reanalysis show a worse performance in summer at ONT and ALM, both located in a valley. This supports the mentioned difficulties of both WRF and ERAIT to represent the temperature variability over the valley. In general terms, WRF introduces an improvement in the representation of temperature anomalies over the ERAIT result, especially during the annual season, autumn, and winter, when these improvements are significant. Note that this assessment is made on the anomalies after filtering out the annual cycle. Otherwise, the values of this metric would be inflated by the presence of a regular signal.
Some ERAIT values are spread over larger RMSEs than those of WRF, particularly in autumn, winter, and annual. Nevertheless, it is worth pointing out that some specific sites also display larger RMSEs separating from the main group in the WRF simulations in summer, autumn, and winter, rendering also lower correlations for these sites and in some cases, even slightly lower than ERAIT. This is the case of ONT, for instance. Figure 9 shows more insights into the performance of RMSE and the relative added value of increasing resolutions. While Figure 4 assesses the added value in the representation of the site mean temperatures, Figure 9 focuses on the RMSEs of the anomalies, i.e. the simulated and observed variability. This is done by showing estimates of the BSS for the WRF co-located and masked points placed within the D4 to D1 domains with respect to the same points for ERAIT in Figure 9a, and the results for BSS of stepwise increments in the domains using D1, D2, D3, and D4 as tests in Equation (1) and ERAIT, D1, D2, and D3, respectively, as references (Figure 9b). Results for the regional averages (AVE) are included at the top of the figure and score estimates for local sites are below them in decreasing order of altitude. Figure 9a shows that, in general, there is an improvement from the RCMs with respect to the reanalysis in representing the temperature variability, particularly at higher altitudes (positive BSS values). This concerns, in particular, all seasonal cases between DHS and CBM, as well as some seasons in San Rafael (SRF , Table 1 ), SGV, and Colmenar Viejo (CLV). The improvement is also generally more evident for higher resolution RCMs (D3 and D4, 3 km and 1 km) than for the coarser ones (D1 and D2, 27 km and 9 km). Nonetheless, the RMSEs worsens irrespective of the chosen RCM resolution for ONT, ALM, and Talamanca del Jarama (TLJ), including also some cases in CLV (mostly summer) and Herrería (HRR, mostly spring and summer, but also other seasons for some RCM configurations).
When comparing the relative improvement of a higher resolution RCM with respect to its coarser mother domain (Figure 9b), the improvements of applying the dynamical downscaling method become more evident. The largest BSS value jumps correspond to D1 against ERAIT (yellow) and D2 against D1 (red), while the relative improvement of D3-D2 and D4-D3 is not always that clear, as they are around 0. It is noteworthy that the increase from 3 km to 1 km does not follow the more general trend of RCM with respect to ERAIT, resulting in a slight loss of added value for some of the sites and slight increases of accuracy for other sites. Figures 4 and 9 suggest that reaching a 1 km resolution is more relevant for reducing biases in the mean at high-altitude sites than for improving the simulated variability.

Temperature-Trend Analysis
The WRF model has shown the ability to realistically simulate the variability of observations. Therefore, it provides a suitable testing ground due to the availability of long and continuous series over the whole domain of study to explore the potential temperature trends in the region. Monthly temperature anomalies from January 2000 to July 2018 have been used for this analysis. Due to the limited length of this period, trend estimates relate therefore only to decadal variability. The maps in Figures 10a and 11 show the trends of WRF monthly temperature anomalies for the D4 domain and a large part of the D3 domain, as well as the trends of temperature anomalies of the observational data, both for the annual and seasonal cases. The trend significance for each grid point of the model and every station is shown highlighted with stippling. It should be noted that the stations at ONT, ALM, RPI, DHS, HYS, and HRR have been omitted due to the limited length of the series. Furthermore, some sites, like CTS or CBM, have been included, though they have shorter records, thus being representative for their experimental only period (Table 1). Simulated and observed trends for the period of study are, in general, positive for the domains D3 and D4, with larger values in the southern plateau. Nonetheless, this is not the case in spring, when negative temperature trends are widespread, except for some areas of the southwestern plateau where trends are again slightly positive. Cooling occurs also in winter, particularly in the north-east side of Sierra de Guadarrama. However, all seasonal and annual trends are not significant in general, except for autumn. Significance is only achieved at MRT and Madrid Barajas (MBJ, Table 1) observational sites. In spite of the lack of significance, the spatial structure of the simulated and observed patterns of trends are in overall agreement. Nevertheless, there are also some local discrepancies between simulated and observed trends. CTS and CBM should be considered for their periods of availability of data only, shorter than the time span of the simulation. ZBL is the only discrepant site in the summer season. Madrid Torrejón (MTJ) shows winter cooling, while nearby sites and model show relative warming. In that area, MRT and MBJ show some spring warming in a context of modelled cooling. At higher altitudes, it is noteworthy that during spring and winter, simulated and observed trends display a very patchy behaviour that seems to change with altitude and orientation. The results for autumn stand out (Figure 11c), where clearly positive trends of about 1 • C/decade can be observed, mainly centered over the region of Sierra de Guadarrama and the south-east side. In this season and this region, the observation can be extended to the rest of the southern sub-plateau, where trends are shown to be significant. Observations show more pronounced trends than those in WRF, although the sign of trends is in agreement between model and observations. In summary, trends show summer and autumn warming in the area of Sierra de Guadarrama and its surroundings during the last 20 years, specially pronounced in autumn and in the southern side of the mountains and more southwards.
In addition to the spatial distribution of trends, the temporal perspective can be broadened back by using the full-time span of the records back to 1893 (in the longest case). With the exception of spring and winter that show some variability in the spatial distribution of (non-significant) warming and cooling, annual, summer, and autumn show a very homogeneous behaviour in the sign of trends. In addition, the previous model-observations comparison suggests a very homogeneous behaviour in the temporal variability of temperatures across the whole region ( Figure 7). Thus, some similarity is to be expected in the long-term behaviour of temperatures across sites and seasons. The results of Figures 10b and 12 show the monthly temperature anomalies of the regional averages from the WRF domains D3 and D4 and the observational stations for their complete record (Section 2.1 and Table 1), for the annual (Figure 10b) and seasonal ( Figure 12) cases. A 2-year running mean has been applied to the monthly anomalies to make representation less noisy. Due to data gaps in their record, the CTS, ZBL, and CBM stations have been omitted in this case. Additionally to the complete record of each station, annual and seasonal trends have also been calculated for four different periods to evaluate how trends have changed through time: 1950-2018, 1970-2018, 1990-2018, and finally, the period of the present study 2000-2018 ( Figure 13 and Table 3). Timeseries have been organised by the length of their time span in this case. Trend estimates in Figures 10 and 12 and in column 3 of Table 3 refer to the period of full availability of data and, therefore, to different time scales. Thus, they should be interpreted with caution. In most instances, they approximately coincide with the trend evaluated for the specific period in Table 3 that most agrees with the time span of each specific site. It is noteworthy that almost all trends in the period of data availability are positive. Spring trends in the D3 and D4 domains indicate cooling, in consistency with Figure 11, as well as winter temperatures for SGV and SRF, none of them being significant. Nevertheless, winter cooling in SGV and SRF persists back to the 1970s and continues during the three periods of analysis (Table 3). Except for those cases, all timeseries in the region indicate warming, that progressively becomes more significant as longer periods back in time are considered. The large agreement in the multidecadal variability is also noteworthy across the different sites, highlighting broad periods of concurrent warming or cooling that are suggestive of an influence of large-scale dynamics and/or changes in the external forcing [45]. The reasons for this are beyond the scope of this work and will not be analysed here. The three time intervals shown in Table 3 allow for common frames of comparison for all sites. The last two decades show overall positive and relatively low-magnitude trends in comparison to the results when the period is broadened back in time. During this period, the correlations between WRF and observations are large (Figures 7 and 8) and interannual variability changes are very similar to those in WRF D3 and D4 domains and in any of the sites. Annual trends indicate warming for both the model regional averages (not significant) and all sites (significant). This is mostly the contribution of post 2015 warming ( Figure 10). Seasonal trends in the last two decades show larger warming in winter, summer, and autumn (Figure 12), being the largest and most significant in autumn, in consistency with Figure 11. The coolest periods occur in autumn in the early 2000s, in winter in the mid-2000s, and in summer in the late 2000s. If the perspective is broadened back to 1990, trends get progressively steeper, with the early 1990s delivering particularly cold autumns and winters at most sites ( Figure 12) that also show up in the annual values ( Figure 10). For the 1970-2018 period, most trend estimates are significant for all seasons. All annual changes are significant (Figure 10), in agreement with those for the broader Iberian Peninsula [20,21] and in the global and hemispherical trends [19,46]. The early 1970s were the coolest years since 1950. Seasonally, autumn, winter, and spring were the coolest of the decade, while summers were cooler in the late 1970s. The mid-1980s were particularly cold in spring and winter. In the 1950s, winters were the coolest ever since then, but spring, summer, and autumn are warmer than the 1970s. Trends in the range of 0.17 to 0.23 • C/decade are found for annual and seasonal values at MRT since 1900.
For an easier visualisation of trends, Figure 13 synthesises the results of Table 3 by representing trends by time period and altitude. It is remarkable that trends start to get significant if we consider the 1950-2018 period and they increase (accelerate) for the following decades. In autumn, the intensification of the warming is progressive so that trends for the 1970-  , and for the complete period of data availability at each site (grey); note that the latter changes from site to site. Significant trends (p < 0.05) are represented by a hollow circle. Filled circles mean that the significance was calculated taking into account the autocorrelation of the series. No significance is represented by crosses. Note that trends in the complete period case mix information from different time scales. Table 3. Seasonal and annual trends in the D3 and D4 WRF domains and at observational stations for different time periods: the complete time span of data availability at each site, 1950-2018, 1970-2018, 1990-2018, 2000-2018. Uncertainty confidence intervals are included for every trend. Significant trends (p < 0.05) are highlighted in italics and with * . A mark of * * implies that trends are significant according to autocorrelation. Note that the complete period column shows information that relates to different time scales (Table 1).

Conclusions
Temperature variability over the centre of the Iberian Peninsula including the complex terrain region in Sierra de Guadarrama was evaluated by comprehensively analysing and comparing 19 observational timeseries that lie within the area of study and a high-resolution simulation made with the WRF model over the area during the 2000-2018 time interval, the WRF simulation being driven by ERA Interim reanalysis. Three main targets were addressed. First of all, the spatial distribution of mean annual and seasonal temperatures as well as the annual cycles were compared in the model and observations. Second, the goodness of the model in reproducing the intra-and inter-annual variability, once the annual cycle was filtered out, was evaluated. In both cases, the added value of increasing resolution was assessed. Finally, decadal trends were described for annual and seasonal timescales in the model and observations for the period of interest. The analysis was also expanded to multi-decadal timescales for the whole period of availability of observational data.
The mean temperature distribution was closely related to orography, finding the coldest mean temperature values on the summits and the northern plateau being milder than the southern plateau, due to its higher elevation. Observational temperatures were slightly underestimated by the model at the higher-altitude sites and, thus, gradients were overestimated for all seasons. For instance, values of −6.79 • C/km and −5.81 • C/km for the annual altitude temperature gradient were obtained for WRF and OBS, respectively. The increasing resolution of the WRF domains introduced a progressive improvement in representing the climatology, with a clear added value at the highest-altitude sites. For the lowest sites, coarser resolution would have been sufficiently accurate to represent mean climate.
The analysis of the annual cycles of temperatures shows that both the simulated and reanalysis data provided a good performance when reproducing the observations at low altitudes. Nonetheless, WRF presented a slight cold bias with increasing altitude, while the reanalysis evidenced a large overestimation of the yearly range of temperatures for higher-altitude sites. Thus, overall, a higher-resolution simulation was needed to accurately represent the temperature range over a complex terrain. The configuration of the simulation included herein seemed appropriate enough to accurately represent the broad aspects of the climatology of the region, but it is possible that improvements to the model configuration could lead to a better performance using the simulation described herein as a reference.
After filtering out the variability represented by the regularity of the annual cycle, simulated and observed daily temperature anomalies were compared. The range of variability increased with altitude for the observations and was underestimated by the models, the underestimation being larger for ERA Interim. The regional averages for each dataset were nevertheless highly correlated, values exceeding 0.95, except for the reanalysis in winter, when correlations were a little lower. This means that the reanalysis simulation was able to capture the main features of interannual and decadal variability during the last decades. The WRF improved the range of variability at regional and local scales. The improvement was larger in autumn and winter and smaller in summer. Yet, the increase in resolution was not found to be that effective in reducing the RMSE of the observations for the highest-resolution domains. A 3-km resolution seemed to perform as well as the 1-km resolution. Therefore, reaching a resolution of 1 km seemed to be more relevant for the representation of the average climate and reducing model biases in the mean annual cycle than for the representation of the variability.
All these comparisons have been performed selecting the model gridpoints co-located to the observational sites and masked out to replicate their distribution of missing data. However, when the full-model domain was compared with the co-located and masked simulation, results were very similar. This means that, in spite of orographic complexity, the variability of temperatures in the area was spatially homogeneous enough so that a few sites were enough to be representative in sampling interannual and decadal changes for the whole region. This is actually consistent with results of analysing the long-term trends that showed a very similar interannual to multidecadal variability in the region of study.
The evaluation of monthly temperature anomalies trends over the area of study gave positive trends in the period 2000-2018, more pronounced in autumn, when significant (p < 0.05) values exceeded 1 • C/decade in vast areas of the southern plateau. Moreover, when this analysis was extended to the last decades, it showed that warming was widespread over all sites, although uneven through time. Trends were found to be significant at the only site that had records since the beginning of the 20th century (MRT). Trends were significant for all annual and seasonal series available since 1950 and intensified for all seasons except for winter when post 1970 and post 1990 records are considered. The results of the trend assessment, although with larger detail here, were consistent with those of previous trend analysis over the broader Iberian Peninsula [20,21]. Therefore, warming in Sierra de Guadarrama and the nearby lowlands was clear and more pronounced in autumn. There is, however, no clear evidence of an intensification of trends with altitude. In order to gain confidence in temperature-altitude responses, longer observational timeseries are needed in altitude.