Which Reanalysis Dataset Should We Use for Renewable Energy Analysis in Ireland?

: Attention should always be given to which reanalysis dataset to use when preparing analysis for a project. The accuracies of three reanalysis datasets, two global (ERA5 and MERRA-2) and one high-resolution regional reanalysis (MÉRA), are assessed by comparison with observations at seven weather observing stations around Ireland. Skill scores are calculated for the weather variables at these stations that are most relevant to the renewable energy sector: 10 m wind for wind power; surface shortwave radiation (SW) and 2 m temperature for photovoltaic power generation. The choice of which reanalysis dataset to use is important when future planning depends on this data. The newer ERA5 generally outperforms the other two reanalyses. However, this is not always true, and the best performing reanalysis dataset often depends on the variable of interest and location. As errors are signiﬁcant for these reanalysis datasets, consideration should also be given to datasets speciﬁcally tailored to renewable energy resource modelling.


Introduction
This paper examines a range of skill scores for two global reanalysis datasets, ERA5 [1] and MERRA-2 [2], and one high-resolution regional reanalysis dataset, MÉRA [3], compared to weather observing stations in Ireland. Reanalyses are useful datasets for monitoring and comparing past and present climate conditions, testing the accuracy of past forecasts, driving numerical weather prediction (NWP) models, and identifying climate variations and change. They are used to an increasing extent in various commercial sectors including energy [4,5], agriculture [6], water resources [7,8], and insurance [9,10]. This paper will focus on the subject of renewable energy. Past data can be used for future planning, particularly in terms of selecting the most suitable site locations, for example, for wind farms. However, long records of historical wind speed data at turbine hub-height are rare, especially in exact locations where potential wind farms may be developed. Therefore, using gridded reanalyses, especially in regions where observations are sparse, is necessary to investigate long-term trends along with extreme events. In order to ensure that reanalyses adequately represent reality, their accuracy must be validated. In this paper, we evaluate these three datasets, which have not been analysed together before, specifically for the variables that are relevant for wind and solar electricity generation: wind speed, temperature, and shortwave radiation (SW).
Results from previous studies that have examined ERA5, MERRA-2, or MÉRA have found that these reanalyses tend to overestimate surface SW compared to observations [11][12][13].
The bias varies with cloudiness, resulting in overestimates of SW under cloudy conditions and slight underestimates of SW under clear skies [12,14]. Often, errors in SW in the reanalysis products can be attributed to the misrepresentation of cloud properties and/or aerosols [12][13][14][15].
The accuracy of 10 m wind speed in reanalyses has been shown to vary depending on location and dataset. MERRA-2 overestimates wind capacity across 23 countries in Europe at a national aggregate scale [16]. MERRA-2 tends to overestimate wind speeds at inland locations, especially in Europe. This can often be attributed to mismatches in model elevation or topography [17,18]. Although MÉRA has a tendency to overpredict at lower wind speeds and underpredict at higher wind speeds, the standard deviation of wind speeds has been shown to match well with observations, which can be attributed to increased resolution and improved model physics [3,19].
Temperature fields have been shown to generally be in good agreement with station observations. MÉRA outperforms ERA-Interim for 2 m temperature, often as a result of increased model spatial resolution [19]. Daily minimum temperatures in MERRA-2 tend to be too high, whereas daily maximum temperatures tend to be too low, and this is largely due to corresponding errors in radiation in the model [20]. ERA5 shows improvements over its predecessor, ERA-Interim [21]. For solar energy installations, photovoltaic (PV) output depends on a number of factors, including SW and ambient air temperature. PV panel efficiencies generally decrease with increasing temperature and with decreasing SW. Therefore, biases in 2 m temperature could feed into errors in calculated power output for PV electricity generation.
Due to some large errors in reanalysis datasets, some renewable energy developers do not directly use reanalyses and instead use alternatives such as the high-resolution New European Wind Atlas (https://map.neweuropeanwindatlas.eu/ (accessed on 29 April 2021)) and the Global Solar Atlas (https://globalsolaratlas.info/map (accessed on 29 April 2021)). These products provide outputs tailored to the needs of energy resources modellers. However, researchers who want to study the behaviour of wind and solar PV together still use reanalysis products to do this. While this paper focuses on validation in terms of wind and solar resources, the analysis of these individual variables can also be useful for many applications.
The aim of this paper is to compare a recent global reanalysis dataset (ERA5) with an older global reanalysis dataset (MERRA-2) and to a high-resolution regional dataset (MÉRA). The layout of the paper is as follows: Section 2 gives an overview of the data and methodology used in this study. Results are presented and discussed in Section 3 in terms of systematic errors and reanalysis skill scores. Finally, Section 4 provides some conclusions.

Materials and Methods
Data from the two global reanalysis datasets, ERA5 and MERRA-2, and a highresolution regional reanalysis, MÉRA, are compared to seven weather observing stations geographically dispersed around Ireland over a 26-year period of common data availability . Three weather variables are chosen: 2 metre (2 m) temperature, 10 metre (10 m) wind, and surface shortwave radiation (SW). SW and temperature are the primary variables influencing PV. Wind speed is the primary variable influencing wind power generation. Ideally, this wind speed would be measured at a typical turbine hub-height. However, there are no long-term records of hub-height wind speed available at different locations around Ireland; therefore, we focus this study on the 10 m wind speed records from synoptic stations.
It should be noted that all three reanalyses use conventional observations in the form of surface land observations (including the temperature and wind variables) from synoptic weather stations as a source of input to the model's data assimilation process [3,19,21,22]. Therefore, the validations performed on these datasets are not truly an independent comparison.

Ground Measurements
The Irish meteorological service, Met Éireann, runs a network of WMO standard weather observing stations around Ireland. In this study, we use seven stations that have a continuous record of hourly data for the three variables considered here for over 25 years, limited by the closure of some weather stations in 2008. These stations represent the longest available record of wind and SW covering different regions of Ireland ( Figure 1). The locations include both coastal-Belmullet, Dublin Airport, Malin Head and Valentia Observatory, and inland stations-Birr, Clones and Kilkenny.

Reanalysis Datasets
Hourly output from a high-resolution regional reanalysis MÉRA [3] is utilised in this study. Its performance is compared to two global reanalyses, MERRA-2 [2] and ERA5 [1]. Information about the resolution and coverage of the datasets are summarised in Table 1. Bilinear interpolation is used to produce a colocated grid point for comparison with station observations.

Skill Scores
Standard skill scores are used to evaluate the skill of the reanalyses compared to the selected Met Éireann observation stations. The skill scores used are; mean error (ME), root mean square error (RMSE), anomaly correlation coefficient (ACC), and Spearman's correlation (r). ME is defined as where n is the number of observations, x r is the reanalysis value of the parameter in question, and x o is the corresponding observed value. RMSE is defined as The ACC is the correlation between anomalies from the reanalysis and observations relative to climatology [23]. Therefore, this score compares errors considering seasonal variability.
where x c is the climatology of the parameter in question. Climatology is calculated by using a 30-day rolling mean and then averaging over the common validation period . ACC values greater than 0.6 imply that the data capture the observed large-scale weather patterns well. Normalised Taylor diagrams [24] are used in this study to summarise how closely the model data matches observations. They are graphical representations of three skill scores: normalised standard deviation, correlation coefficients, and normalised centred root mean square error (CRMSE). In this paper, the results are normalised relative to the observations for ease of comparison. Therefore, model standard deviation and CRMSE are divided by the standard deviation of the observations. Normalised CRMSE is calculated as where the overall mean of a field is indicated by an overbar and σ X o is the standard deviation of observations. Modelled estimations that agree well with observations will lie nearest to the point marked 1 on the x-axis.

Results
This section investigates systematic errors in the reanalyses and the general behaviour of the models. Each variable is analysed separately. Table 2 shows the ACC, Spearman correlation, ME, and RMSE of hourly temperature averaged over all stations for the entire period. All ACC results are above 0.85, indicating that these reanalyses capture the seasonal variability well. ME and RMSE are best for ERA5. MERRA-2 has the poorest results for all three of these skill scores. Table 2. Skill scores of each reanalysis compared to observations: anomaly correlation coefficient (ACC), Spearman's correlation (r), mean error (ME), and root mean square error (RMSE) for hourly 2 m temperature averaged over all stations.

Reanalysis
The ability of a reanalysis model to match observed temperature varies from location to location, as well as varying between reanalysis models. Generally, inland stations show different behaviours compared with coastal stations (the spatial distribution of temperature is shown in Figure S1 of the supplementary material). The global reanalysis MERRA-2 often overestimates temperature at coastal stations but underestimates temperature at inland stations ( Figure 2). At inland stations, the elevation of grid-points in MERRA-2 are higher than actual station elevations. The global reanalyses' interpolated grid-points are between 14 m and 56 m higher than actual elevations at selected inland Met Éireann stations. As temperature decreases with height, this results in the MERRA-2 inland locations having colder temperatures than observed. However, ERA5 grid-point elevations are also higher than the weather observing stations, particularly for inland locations; therefore, due to the contrasting error signals at Clones and Kilkenny (Figure 2), elevation mismatch between reality and model alone cannot account for temperature error in ERA5 and similarly for MÉRA. The magnitude of temperature error is relatively small compared with other weather variables, suggesting that temperature is one of the variables better captured by the models.  The climate at coastal locations such as Belmullet is strongly influenced by the adjacent sea surface temperatures. As the sea has a higher specific heat capacity than the land, the sea heats and cools more slowly than the land. However, if the influence of the sea is too strong in the reanalysis model, the air temperature at a coastal station will vary less during the year than it should. This seems to be the case in ERA5 and MERRA-2 at this coastal station as they have a positive bias in autumn (September, October, November: SON) and winter (December, January, February: DJF), but a negative bias in spring (March, April, May: MAM) and summer (June, July, August: JJA). This is particularly evident for all coastal stations in MERRA-2, when the cancellation of positive and negative seasonal errors lead to the small overall errors observed in Figure 3. The ME of MÉRA's temperatures exhibits smaller changes from season to season, showing that it is better at capturing these seasonal changes in temperature both at coastal and inland stations. ME also changes less throughout the day for MÉRA compared with the other reanalyses, probably due to MÉRA's better representation of elevation and land-use. Errors also show a diurnal variability, in which nighttime temperatures overestimate and daytime temperatures underestimate. A similar diurnal pattern is observed at all coastal stations, suggesting that land-sea interactions play a large role in the underdispersion of the variability in diurnal temperature, whereas inland stations generally display a reversed diurnal pattern, particularly in spring and summer.

MÉRA
A Taylor diagram is a useful tool to compare the skill of all three reanalyses (Figure 4). Correlation coefficients are high (≥0.94) for all models, showing that all three reanalyses perform well in capturing the variability of temperature. The variability of temperature is best captured by MÉRA, as seen by all stations clustering near the normalised standard deviation of 1.0, while ERA5 and MERRA-2 are often underdispersive, particularly for coastal locations. MERRA-2 generally has the poorest CRMSE skill while ERA5 generally has the best. Figure 4 shows how temperature's accuracy varies between location and model. There is a grouping of coastal stations versus inland stations in which each model's performance is better at inland locations where inland stations generally have a better CRMSE along with a normalised standard deviation that better matches that of observations.

Wind Speed
The average ACC values for 10 m wind speed are ≥0.76 for all reanalyses (Table 3), indicating that the seasonal variability of wind speed is captured reasonably well. ERA5 exhibits the lowest overall ME and RMSE. ERA5 may perform better at 10 m wind speeds because it has a vertical level located at approximately 10 m above the surface. MÉRA and MERRA-2 need to use model level data from approximately 12 m and 90 m, respectively, when calculating their 10 m winds. Table 3. Skill scores of each reanalysis compared to observations: anomaly correlation coefficient (ACC), Spearman's correlation (r), mean error (ME), and root mean square error (RMSE) for hourly 10 m wind speed averaged over all stations. Wind speed in MERRA-2 is higher than observations at all stations, except at the northern coastal station of Malin Head, possibly due to the low spatial resolution in MERRA-2, which fails to capture the land-sea differences at coastal stations. The other near-coastal stations (Belmullet, Dublin Airport, and Valentia Observatory) are windier in the global reanalyses than in observations, attributed largely with the proximity of the lowresolution grid-points to the sea and possible misrepresentation of surface friction, where the on-land point is treated as a smoother sea surface location resulting in less friction to reduce the high offshore wind. Again, the elevation difference between the model and observations leads to a generally windier estimation by the reanalyses, especially at inland stations. At coastal stations, MÉRA underestimates wind speeds, while it overestimates wind speeds at inland stations in all seasons, probably due to the misrepresentation of local effects of surface friction where near-coastal stations are classified as on-land grid-points by the model but in reality, the station has more offshore characteristics.

Reanalysis
There is little variability in seasonal bias, except at Valentia Observatory, where MÉRA overestimates wind speeds in Autumn and Winter and underestimates in Spring and Summer, possibly due to the misrepresentation of the grid-box as part of the nearby sea. Along with the general bias in the reanalyses, at each station, there is a diurnal pattern in the wind speed error. An overestimation in wind speed during the night often changes to an underestimation in wind speed during daytime hours. This may be due to the reanalyses not capturing the vertical mixing in the lower part of the atmosphere. At sunrise, observed wind speeds increase faster than all reanalyses ( Figure 5 and Figures S10-S15 in the Supplementary Material) due to turbulent mixing between different vertical levels, where higher wind speeds are entrained from faster wind speed layers higher up in the planetary boundary layer, causing the wind speed at the surface to become faster. MÉRA captures this effect best, possibly due to the higher resolution in the model. A Taylor diagram ( Figure 6) shows that ERA5 is consistently underdispersive at all stations, MERRA-2 is overdispersive at all stations except Malin Head, while MÉRA changes depending on the station, where inland stations have a tendency to be overdispersive and coastal stations are underdispersive. Wind speed is less variable over the sea where there is little change in the sea surface, while inland locations are more variable due to changes in surface heating and, consequently, the wind speed. CRMSE is better for ERA5 and in MÉRA for most stations compared with MERRA-2. The average r values are ≥0.84 for all three reanalyses, indicating that day-to-day variability of wind speed is generally well captured by the reanalysis datasets. Importantly, Figure 6 indicates that there is no clear best model for overall accuracy.

Shortwave Radiation
In Ireland, typical winter values of hourly SW are less than 50 W/m 2 and summer values reach approximately 500 W/m 2 on average, but can reach values of up to 1000 W/m 2 within individual hours. As the focus of this paper is on renewable energy, only the summer period (JJA) is examined here, as this is most relevant to the production of photovoltaic (PV) in Ireland. Although MÉRA has the smallest ME, ERA5 has the best overall RMSE score and is the only reanalysis to achieve an ACC value marginally greater than 0.6 (the implied measure of skill) ( Table 4). MERRA-2 has the largest ME, with all stations having a large positive bias (Figure 7). The small ME for MÉRA is due to the strong coastal versus inland effect, which leads to errors cancelling (Figure 7).   Table 4. Skill scores of each reanalysis compared to observations: anomaly correlation coefficient (ACC), Spearman's correlation (r), mean error (ME), and root mean square error (RMSE) for hourly SW averaged over all stations for summer (June, July, and August).  Stations located on the Atlantic facing coast (Malin Head, Belmullet, and Valentia Observatory) display a negative ME in MÉRA, whereas all other locations have a positive ME in MÉRA (Figure 7). This underestimation along the west coast of Ireland may be due to the presence of more thick clouds, which tend to have too much cloud water causing an underestimation of SW, as seen in other studies [13]. MERRA-2 and ERA5 have a positive bias in SW at all stations, mainly due to a poor prediction of cloud patterns, as seen in other studies such as [11,12].

Reanalysis
There is a obvious diurnal cycle in SW. However, this diurnal pattern is not accurately captured by the reanalyses. All reanalyses underestimate SW values in the morning (Figure 8 and Figures S16-S20 in Supplementary Material), possibly due to a poor estimation of aerosols in the models. By contrast, in the afternoons, all reanalyses overestimate SW, possibly attributable to shortcomings in the model physics, which may fail to capture the common development of cumulus clouds in the afternoon. The contrast between inland and coastal stations is particularly pronounced for MÉRA in the morning (Figure 8), when the coastal stations have a stronger underestimation.  The Taylor diagram in Figure 9 suggests that ERA5 best captures the observed SW, although it is slightly underdispersive at five of the seven stations. The correlation coefficients are 0.79 to 0.87 for each reanalysis, suggesting that part of the variability in hourly SW is captured by the reanalyses. Overall, ERA5 appears to perform better for SW during JJA, as seen in Figure 9 and Table 4. This may reflect the inclusion of satellite data in the data assimilation process, which is not included in MÉRA but is part of MERRA-2. ERA5's performance may also be attributed to the higher spatial resolution compared with MERRA-2.

Conclusions
The accuracy of three reanalyses are assessed here in terms of hourly weather parameters relevant to the renewable energy sector in Ireland: 2 m temperature, 10 m wind speed, and incident shortwave radiation (SW). Two global datasets, MERRA-2 and ERA5, and one high-resolution regional dataset, MÉRA, are considered for a common validation period . Skill scores including ME, RMSE, ACC, and Spearmans correlation are calculated. The results presented here highlight the importance of carefully selecting the optimum reanalysis for specific applications, and comparing the reanalysis to site observations where possible. In terms of renewable energy, reanalysis data can be useful in the planning of viable locations for renewable energy generation, especially where observations are sparse. It is important to study as long a time period as possible as, for example, one or even a few years of data may not be representative of climatology. The results presented here highlight the limitations of reanalyses due to significant errors in all of the reanalyses studied in this paper. Therefore, alternative sources of data may be more suitable for use in standalone studies of wind and solar resources, such as the New European Wind Atlas [25] and the Global Solar Atlas [26]. These custom-made datasets with a reduced bias are tailored to renewable energy developers and specifically to the modelling of energy resources. However, as other research studies on renewable energy may require the use of reanalysis datasets, this paper aims to raise awareness of the importance of selecting the most suitable reanalysis dataset from those available.
All three reanalyses do well in replicating the observed 2 m temperature at the seven selected locations in Ireland, although ERA5 generally outperforms the others. SW is generally overestimated by the reanalyses and has a poorer skill than either wind speed or temperature. These results show that the highest resolution dataset may not necessarily be the most accurate. Results also highlight the variability in skill for different locations and variables in Ireland. Temperature and SW have a strong grouping of coastal versus inland stations; however, inland stations generally perform better for temperature, whereas coastal stations perform better for SW. There is generally a systematic error for all variables at all stations on a diurnal time scale. There is a connection between the diurnal error patterns at each station, such as the underestimation of SW in the morning, leading to lower temperature estimates during the day. There is also a systematic bias observed at most stations, often attributed to the mismatch in surface elevation and the land-sea component of grid-points or local land-use and surface friction. This suggests that a postprocessed dataset should be considered in the decision-making process, as this can reduce systematic errors for reanalysis datasets. Although ERA5 is never the worst, and is often the best, no reanalysis consistently outperforms the others for all weather parameters and locations. Therefore, the best dataset to use for renewable energy in Ireland will vary on a case-by-case basis, depending on factors such as location, timescale, and the meteorological variable of interest.
The good performance of ERA5 may be due to the newer method of data assimilation (4D-var) in the model compared with the other reanalyses, which still employ the older 3D-var version [12]. ERA5 also has a greater number of vertical levels, with the first vertical level closest to the observations at 10 m. MÉRA does not assimilate satellite data, which could be important for SW or cloud properties. The skill of regional reanalyses depends on the quality of the driving reanalysis; previous studies found that MÉRA performs better than its driving reanalysis ERA-interim [19], and studies have also found that ERA5 performs better than ERA-interim [5,12,27]. Therefore, the proposed development of a new MÉRA model with ERA5 as the driving global reanalysis should lead to an improved regional scale reanalysis for Ireland.  Data Availability Statement: MÉRA reanalysis data is available on request from the Irish Meteorological Service, Met Éireann. ERA5 reanalysis data can be found here: https://cds.climate.copernicus.eu/ cdsapp#!/dataset/reanalysis-era5-single-levels?tab=overview (accessed on 5 May 2019). MERRA-2 reanalysis data can be found here: https://disc.gsfc.nasa.gov/datasets?project=MERRA-2 (accessed on 5 May 2019). The publicly available Met Éireann weather observation station wind speed and temperature data can be found here: https://www.met.ie/climate/available-data/historical-data (accessed on 5 May 2019), while the SW data is available on request from Met Éireann.
Acknowledgments: This publication has emanated from research supported (in part) by Science Foundation Ireland (SFI) under the SFI Strategic Partnership Programme Grant Number SFI/15/SPP/E3125. The opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the Science Foundation Ireland. In addition, the authors would also like to acknowledge Met Éireann for the provision of SW observational data and MÉRA data.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: