Synoptic Analysis and Subseasonal Predictability of an Early Heatwave in the Eastern Mediterranean

: Greece and the surrounding areas experienced an early warm spell with characteristics of a typical summer Mediterranean heatwave in mid-May 2020. The maximum 2 m temperature at Kalamata (southern Greece) reached 40 ◦ C on 16 May and at Aydin (Turkey), it was 42.6 ◦ C on 17 May. There was a 10-standard deviation positive temperature anomaly (relative to the 1975–2005 climatology) at 850 hPa, with a southwesterly flow and warm advection over Greece and western Turkey from 11 to 20 May. At 500 hPa, a ridge was located over the Eastern Mediterranean, resulting in subsidence. The aims of this study were (a) to investigate the prevailing synoptic conditions during this event in order to document its occurrence and (b) to assess whether this out-of-season heatwave was predictable on subseasonal timescales. The subseasonal predictability is not a well-researched scientific topic in the Eastern Mediterranean Sea. The ensemble global forecasts from six international meteorological centres (European Centre for Medium-Range Weather Forecasts—ECMWF, United Kingdom Met Office—UKMO, China Meteorological Administration—CMA, Korea Meteorological Administration—KMA, National Centers for Environmental Prediction—NCEP and Hydromete-orological Centre of Russia—HMCR) and limited area forecasts using the Weather Research and Forecasting model with the Advanced Research dynamic solver (WRF) forced by Climate Fore-cast System version 2 (CFSv.2; NCEP) forecasts were evaluated for lead times ranging from two to six weeks using statistical scores. WRF was integrated using two telescoping nests covering Europe, the Mediterranean basin and large part of the Atlantic Ocean, with a grid spacing of 25 km, and Greece–western Turkey at 5 km. The results showed that there were some accurate forecasts initiated two weeks before the event’s onset. There was no systematic benefit from the increase of the WRF model’s resolution from 25 km to 5 km for forecasting the 850 hPa temperature, but regarding the prediction of maximum air temperature near the surface, the high resolution (5 km) nest of WRF produced a marginally better performance than the coarser resolution domain (25 km).


Introduction
Intense and prolonged heat is the leading cause of weather-related deaths around the world [1][2][3].Heatwaves form when very warm air masses are advected over a region and high pressures on the surface strengthen and remain over it for several days up to weeks.Due to global warming, a change in heatwaves in terms of severity, frequency and duration is expected [4,5], causing economic losses [6] and increased human mortality [7,8].Recent research has shown that there is rapidly increasing likelihood of temperatures over 50 • C in the Mediterranean and the Middle East due to climate change [9].The Mediterranean is recognized as one of regions that is the most sensitive and vulnerable to anthropogenic climate change on Earth [10] with climate models projecting a consistent Atmosphere 2024, 15, 442 2 of 41 warming over the 21st century for various emission scenarios [11][12][13].Therefore, it is necessary to extend the subseasonal heatwave predictability of existing model frameworks beyond two weeks to provide the state with sufficient time for preparations and mitigation strategies to be implemented.
In recent decades, there have been advances in weather prediction with the help of the continuous advances in computing power, weather modelling and a better understanding of the atmospheric processes [14].However, the predictability horizon seemed to reach an apparent plateau for extreme weather events such as heatwaves and cold spells [15].On the other hand, in the last decade, great progress has been made in the subseasonal forecasting of heatwaves [16].The discovery of certain sources of subseasonal predictability associated with atmospheric, oceanic and land processes [17] has led closer to a seamless prediction across timescales.One of these sources is the memory of the soil moisture which can last several weeks and influence the atmosphere through changes in evaporation and the surface energy budget, affecting subseasonal forecasts of air temperature [18].Another source of subseasonal predictability is the sea surface temperature (SST) anomalies which can lead to changes in air-sea heat fluxes [19,20].Also, further improvements in numerical weather prediction (NWP) such as a finer model resolution, improved data assimilation techniques and advancements in ensemble forecasting have increased the predictability horizon beyond the two weeks (lead time) [21,22].Finally, the demand from users across the world for subseasonal forecasts and the recent international effort in forecasting on this timescale (e.g., ECMWF) database-(https://apps.ecmwf.int/datasets/data/s2s-realtimeinstantaneous-accum-ecmf;accessed on 20 January 2024) have drawn more attention from scientists for further research on more accurate forecasts and eventually achieving a connection between weather forecasting and climate prediction, in other words, a seamless prediction.
Over the past few years, there has been increased research interest in the subseasonal prediction of heatwaves.These studies evaluated Subseasonal-to-Seasonal (S2S) models [16,17,23], dynamical downscaling [24] and the predictability and the role of different factors such as atmospheric blocking, soil moisture deficits and SST anomalies [25][26][27][28][29][30].Also, S2S predictability research has been conducted using deep learning (artificial intelligence) techniques to improve the predictability of heatwaves and droughts [31][32][33].Additionally, health-related research has been performed, investigating how human mortality and morbidity is affected by heatwaves and how well these events are predicted in S2S timescales [34].
However, most of the research on heatwave predictability in S2S timescales refers to global-scale studies and case studies in North America, East Asia, western Europe and Russia [27][28][29].The S2S research on the predictability of heatwaves for the Eastern Mediterranean region is very limited and needs improvement, as this highly populated region is affected by high temperatures during the summer months.During the period that extends from late spring to early autumn, the countries surrounding the eastern Mediterranean Sea attract many tourists who plan their vacations even months before.It is noted that the Mediterranean countries received one third of the global tourism which was 330 million tourists in 2016 [35].The Eastern Mediterranean includes a few low-income countries which highly rely their economy on tourism, and they are more vulnerable to climate change.Also, the Eastern Mediterranean countries are prone to wildfires during heatwaves with the notable cases of the recent wildfires in Attica (Greece) and Rhodes (Greece) in 2021 and 2023, respectively.Another example is the case of the 2007 extensive and lethal wildfires in Greece, which were thoroughly investigated for a connection between extreme temperatures and wildfire creation [36,37].Water shortages are also prevalent during heatwaves.According to the latest Intergovernmental Panel on Climate Change (IPCC) report, the southeastern Mediterranean countries will double their irrigation requirements by 2080-2090 [38].
For these reasons, it is necessary to evaluate the predictability of weather models in the subseasonal timescale (minimum two weeks ahead) to allow the state to take action to mitigate the outcomes of a heatwave.This becomes imperative due to the fact that in recent decades, the heatwave period is expanding into spring and autumn [39][40][41].Specifically for the Eastern Mediterranean, there is evidence [42] that many regions are experiencing earlier onsets of heatwaves.Additionally, the city of Athens (Greece) has already experienced some out-of-season heatwaves during June (e.g., in 2007) [42][43][44].The abnormally early appearance (11-20 May 2020) of the heatwave of this study that mainly affected the whole Greek region with maximum temperatures reaching 40 • C in Kalamata (southern Greece) and 42.6 • C in Aydin (southwest Turkey) shows that a subseasonal prediction of such an event is crucial and truly beneficial as the people and the state are not expecting a heatwave to occur earlier than summer and it is more difficult for the human body to adapt to the heat in mid-summer [45,46].
The aims of this study were (1) to examine the prevailing synoptic conditions during the heatwave in the central-eastern Mediterranean and more specifically in Greece, in order to document its occurrence and (2) to assess the predictability (with lead times of two to six weeks) of (a) global S2S forecasts from various leading meteorological centres and (b) limited area S2S forecasts with a higher resolution than the global models, through dynamical downscaling with the WRF numerical model.This paper includes a brief review of the definitions of a heatwave, followed by the data and methodology (Section 2).The observational and synoptic analyses of the event are presented in Sections 3.1 and 3.2, respectively.Section 3.3 outlines the assessment of the prediction performance of the S2S forecasts, while Section 4 provides the discussion and conclusions.

Data
To achieve the goals of this study, different datasets were required.Their main characteristics are summarized in Table 1.The WRF forecasts are derived from the analyses and forecasts of the CFSv.2 [47] (https://www.ncei.noaa.gov/products/weather-climatemodels/climate-forecast-system,accessed on 23 June 2023) from the National Centers for Environmental Prediction (NCEP).This dataset was selected because it provides all the required meteorological fields (i.e., air temperature, dew point temperature, geopotential heights, wind speeds, etc.) across multiple vertical levels to the stratopause (1 hPa) and more importantly, the forecast window of this dataset covers an adequate time range of lateral boundary forecast data for the subseasonal forecasts.Every WRF run was initialized with the control analyses of CFSv.2 from the forecast cycles of 00 and 12 UTC and uses the corresponding six-hourly CFSv.2 forecasts (at 00, 06, 12 and 18 UTC) as boundary conditions.The grid spacing of the CFSv.2 gridded fields (used by WRF) is 1 Furthermore, for the predictability estimation of the global forecasts, the S2S Prediction Project Database [30] was used (https://www.ecmwf.int/en/research/projects/s2sand s2s.cma.cn,accessed on 25 January 2024) which provides S2S forecasts and reforecasts (hindcasts) from various meteorological centres around the world (ECMWF, UKMO, CMA, KMA, NCEP, HMCR, etc.).The real-time data are available twice per week, once per week or daily (depending on the providing centre) (Table 1) with a grid spacing of 1.5 • × 1.5 • .This resolution is not the native resolution of each model, but for consistency and comparability reasons, every meteorological centre's model has been archived and disseminated with 1.5 • grid spacing in the S2S database.The main reason that these gridded outputs were not selected to provide the initial and boundary conditions of the WRF simulations is that the S2S database does not provide the full spectrum of data fields required for the initialization of the WRF model.Additionally, the reforecast data [47] are provided with the same resolution as the real-time global forecasts, but with different production frequencies from UKMO and KMA.UKMO and KMA provide reforecast data four times per month in contrast to their daily real-time forecasts.In this study, the UKMO and KMA reforecasts that were examined are from 9, 17 and 25 April.Reforecasts prior to 9 April were not utilized because their lead times do not cover the entire duration of the heatwave.Table A1 presents the weekly frequencies at which the six selected meteorological centres produced their ensemble forecasts.The selection of these six centres for this study was based on the length of their forecasts (lead time > 40 days) and a sufficiently high native resolution (Table 2).The analysis was based on datasets of reanalysis data and observations from weather stations in major Greek cities (i.e., Athens, Thessaloniki and Heraklion).The reanalysis data were obtained from the ERA5 [48] Copernicus Climate Change Service database (ECMWF-https://cds.climate.copernicus.eu;accessed on 25 January 2024).ERA5's grid spacing is 0.25 • × 0.25 • and its gridded data are provided for the examination of the temperature, geopotential heights, winds and other meteorological parameters at specific levels (mainly at 850 hPa and 500 hPa), as well as the near surface air temperature (at 2 m).In addition to ERA5, ERA5-Land [49] reanalysis data (0.1 • × 0.1 • ) were used for analysis of soil moisture.ERA5-Land data were not used for the investigation of the predictability of maximum temperatures of the WRF simulations because the Copernicus database does not provide maximum temperatures, only hourly data.Finally, weather station observations were utilized for the investigation of the predictability and the production of heat indices.The observations were obtained from the National Observatory of Athens (NOA) and the National Climatic Data Center of the National Oceanic and Atmospheric Administration (NOAA) (https://www.ncei.noaa.gov,accessed on 25 January 2024).In total, 25 weather stations, from numerous urban areas and airports in Greece and the surrounding countries, were taken into consideration for the study.

Definition of a Heatwave
Heatwaves are prolonged periods with extremely high temperatures that exert a considerable impact not only on human health and ecosystems, but also on infrastructure and social services.There is no global definition of a heatwave because local acclimatization and adaptation influence the impact of excessive heat.Even at the local level, there can be multiple heatwave definitions based on varying temperature levels or time periods.In the literature, there is a plethora of heatwave definitions which are related to numerous factors such as human morbidity [50,51], mortality [52][53][54] and even cattle milk production [55].Heatwaves are commonly characterized by temperature and duration thresholds [56], in addition to humidity and diurnal temperature cycle characteristics that are related to human morbidity and mortality [57].
There are some extensively used heatwave indices like the THI (Temperature-Humidity Index) [58,59] and EHF (Excess Heat Factor index) [60], which are based on simple, readily available meteorological variables such as the maximum, minimum and mean temperature and the dew point temperature.These variables are easily found in various datasets (e.g., in ERA5 and CFSv.2).THI and EHF have been successfully used in heatwave studies in Greece [61,62].Additionally, there are other more sophisticated heat indices which focus on specific aspects like human discomfort, but with additional requirements on data.One of these heat indices is the Universal Thermal Climate Index (UTCI) [63].
In this study, the EHF index was mainly used (in the analysis of the event) because it only requires temperature values, it is applicable to different climate regimes and it is supported by the recent literature [60,62,[64][65][66][67][68].Additionally, the THI, UTCI and the 95th percentile of maximum 2 m temperatures (T 2 max95) were also utilized for the assessment of the heatwave's onset, duration and intensity.

Excessive Heat Factor, Temperature-Humidity Index and Universal Thermal Climate Index
The EHF [60] is composed of two factors.The first factor (Excessive Heat Index of significance-EHIsig) is a measure of how warm a three-day period is with respect to a climatic threshold, such as the T 2 max95, at each location.In the literature, for the calculation of EHIsig, the minimum, the mean or the maximum 2 m air temperature is used.In this study, in line with [27], only the T 2 max was utilized as the combination of mean, minimum and maximum air temperature is optional.Additionally, the 90th or 95th percentile of the temperature was used in the literature.Thus, the T 2 max95 of May (reference period: 1975-2005) was selected to assess the characteristics of the heatwave more rigorously.If the first factor is above a chosen temperature threshold for three consecutive days, then this period gives a hint of the presence of a heatwave.The second factor (Excessive Heat Index of acclimatization-EHI accl ) is a measure of how warm the same three-day period is relative to the previous 30-day period.This concept suggests that individuals may adapt to the typical temperature variations in their local climate, taking into account changes across latitudes and seasons.However, they may not be prepared for a sudden rise in temperature exceeding that experienced in the preceding 30 days.The formulas for the components of EHF are The product of these two indices results in the EHF, with the specificity that the EHF must have the same sign as EHIsig.The EHF formula is The unit for the EHF is degrees Celsius squared ( • C 2 ) and a three consecutive day period of EHF > 0 • C 2 is indicative of the presence of a heatwave since day i (Equations ( 1) and ( 2)).
The Temperature-Humidity Index (THI) represents the stress to the human body from the combination of humidity and heat [69,70].Its formula is where T is the near-surface (2 m) temperature and Tw is the wet-bulb temperature in degrees Fahrenheit.The threshold used in this study is THI = 70 • F. Above this value, half of the population of a particular area feels uncomfortable.The selection of this threshold is conservative mainly because the heatwave (of this study) occurred earlier than the hot period of summer and secondly due to the relatively low resolution of ERA5.Also, the THI does not need a three consecutive days period above its threshold to indicate a heatwave [71].
The Universal Thermal Climate Index (UTCI) serves as an additional tool for identifying the onset, end and severity of the heatwave using a threshold of 32 • C. The UTCI incorporates a complex set of factors (i.e., radiant temperature and thermophysiological factors like net body heat loss) [72] that are not readily available, easily acquired or implemented in the WRF model.For that reason, UTCI was only calculated using the ERA5 dataset without any modification to implement it in the WRF forecasts.Above the UTCI threshold of 32 • C, people can experience varying degrees of discomfort, ranging from "strong heat stress" to "extreme heat stress" [63,73].

Limited-Area WRF Simulations
In order to assess the effect of dynamical downscaling on the prediction ability of the model, limited area numerical forecasts were performed using version 4.1.2 of Advanced Research WRF model [74].The outer domain (d01) employed a horizontal grid spacing of 25 × 25 km and covered Europe, northern Africa, western Asia and a large part of the northern Atlantic to represent the synoptic scale flow.The inner nested domain (d02) covered Greece and western Turkey with a finer grid spacing of 5 × 5 km in order to investigate the effect of fine resolution forecasts on the predictability of the heatwave.The NCEP/CFS analyses and forecast fields of the control runs were utilized as initial and boundary conditions (for d01, updated every six hours) because they provided all the necessary input fields for the WRF runs.In total, 42 simulations were performed with initialization dates from 9 April 2020 00 UTC to 29 April 12 UTC (two initializations starting at 00 and 12 UTC daily).The characteristics of the WRF forecasts are summarized in Table 3 and the telescoping domains are depicted in Figure 1.All the simulations had the same forecast horizon (until 31 May 2020).This forecast horizon was selected to identify any temporal shifts in the predictions of the heatwave as the heatwave occurred between 11 and 20 May.The Moderate Resolution Imaging Spectroradiometer 21-category dataset from the International Geosphere Biosphere Program (MODIS-IGBP) was used for the specification of the land-use categories and the United States Geological Survey (USGS) dataset was used for the topography.

Global S2S Reforecasts
The global reforecasts from the S2S database were utilized in order to take into account the systematic biases of each model.The reforecasts were compared with the realtime forecasts of temperature at 850 hPa at S2S timescales when the forecast coincides with the reforecast initialization day.This is justified because some meteorological centres (ECMWF, CMA, HMCR and NCEP) produce reforecasts along with forecasts, but other centres (UKMO and KMA) produce reforecasts only four days per month (on the 1st, 9th, 17th and 25th) in addition to their daily production of forecasts.Anomaly Correlation Coefficients (ACCs) were calculated for each global model using its reforecasts [81].Moreover, the reforecasts were compared with the ERA5 climatology data.

Global S2S Reforecasts
The global reforecasts from the S2S database were utilized in order to take into account the systematic biases of each model.The reforecasts were compared with the real-time forecasts of temperature at 850 hPa at S2S timescales when the forecast coincides with the reforecast initialization day.This is justified because some meteorological centres (ECMWF, CMA, HMCR and NCEP) produce reforecasts along with forecasts, but other centres (UKMO and KMA) produce reforecasts only four days per month (on the 1st, 9th, 17th and 25th) in addition to their daily production of forecasts.Anomaly Correlation Coefficients (ACCs) were calculated for each global model using its reforecasts [81].Moreover, the reforecasts were compared with the ERA5 climatology data.

Predictability Statistics and Scores
To produce the prediction statistics and performance scores, it was necessary to define common initialization periods between the limited-area S2S forecasts of WRF and the global ones of the S2S database.The initialization dates were separated into three sevenday periods (weeks): (1) 9-15 April, (2) 16-22 April and (3) 23-29 April.These three weeks were selected because all the models (global and limited-area models) have a certain number of ensemble members each week (Table A1).
The prediction performance scores were calculated for the temperature at 850 hPa (T850) and the maximum daily temperature at 2 m above the ground (T 2 max).The scores for T 2 max were only used in the WRF simulations (25 × 25 km and 5 × 5 km resolution) as the S2S global forecasts showed no prediction ability at the S2S timescale at the surface since they are archived at 1.5 • × 1.5 • and it was not possible to obtain their native resolution output.Additionally, the validation period was from 8 to 23 May 2020 which includes the heatwave (that occurred from 11 to 20 May, as it will be shown in Sections 3.1 and 3.2) and non-heatwave days.In this way, the contingency table includes hits, misses, false alarms and true negatives [82].If the validation period was the exact length of the heatwave, then false alarms and true negatives would be missing.
Before producing the statistics and the scores of the global and WRF S2S forecasts, all data (including reanalysis data) had to be brought to the same resolution.For the predictability of the 850 hPa temperature, a grid spacing of 1.5 • was selected because it was the lowest resolution.Therefore, the ERA5 reanalysis and WRF forecasts were regridded to 1.5 • × 1.5 • using the "conserve" interpolation method.However, for the T 2 max, only the WRF forecasts and ERA5 were compared, so the WRF forecasts (25 km and 5 km) were regridded to the grid spacing of ERA5 (0.25 • ) and its grid geometry.
In the study, firstly, boxplots of the weekly ensembles were examined, accompanied by quantile-quantile (Q-Q) plots which revealed scale differences and identified the extremes of two different temperature distributions (observations and forecasts) [83].Furthermore, receiver operating characteristic (ROC) curves [84] accompanied by reliability diagrams [85] were evaluated in terms of probabilistic forecasts using contingency tables with a yes/no decision threshold for the 95th percentile of the climatological temperature (T850 and T 2 max) at each grid point.ROC curves measure the ability of a forecast to discriminate between two alternative outcomes (resolution).Also, ROC curves are conditioned by the observations, answering the question "what was the corresponding forecast of a heatwave that has occurred" [86].On the other hand, the reliability diagram is conditioned by the forecasts and shows how well the predicted probabilities of a heatwave's occurrence correspond to their observed frequencies [87].In other words, the reliability diagram indicates the reliability of a probabilistic forecast and whether conditional biases are encountered.
Finally, another statistical score that utilizes the hit rate (HR-Equation ( 5)) and false alarm rate (FAR-Equation ( 6)) is the extremal dependence index (EDI-Equation ( 7)) [88].It is used to assess the association between the probabilistic forecasts and the observations.It stands out from the other scores, for example, the HR, because it is a non-degenerative measure of the predictability of rare binary events as the yes/no threshold for the heatwave for this study is the 95th percentile of air temperature.A perfect forecast has an EDI = 1 and an inaccurate forecast in S2S timescales has an EDI = 0.

HR =
Hits Hits + Misses

Observational Analysis
This section aims to document the occurrence and the duration of the heatwave via the analysis of station observations and reanalyses.According to the EHF (weather station data), the early heatwave started on different dates (from 10 to 14 May) in the three main cities of Greece (Athens, Thessaloniki and Heraklion), its end occurred between 19 and 21 May and the highest maximum air temperatures occurred mostly on 16 May in Greece and the surrounding countries (Figure 2 and Table 4).The chronological differences are mostly justified by the latitudinal differences, because Thessaloniki (Northern Greece) and Heraklion (Crete, Southern Greece) are five degrees of latitude apart and on 21 May, the mainland of Greece was covered by a thick cloud layer of stratocumulus and convective clouds.Additionally, the EHF was calculated using ERA5 and ERA5-Land reanalyses to show that all three datasets (weather observations, ERA5 and ERA5-Land) indicated the presence of a heatwave because the EHF is above 0 • C 2 for at least three consecutive days.ERA5-Land and ERA5 agreed on the maximum EHF values in Thessaloniki and Heraklion.

Observational Analysis
This section aims to document the occurrence and the duration of the heatwave via the analysis of station observations and reanalyses.According to the EHF (weather station data), the early heatwave started on different dates (from 10 to 14 May) in the three main cities of Greece (Athens, Thessaloniki and Heraklion), its end occurred between 19 and 21 May and the highest maximum air temperatures occurred mostly on 16 May in Greece and the surrounding countries (Figure 2 and Table 4).The chronological differences are mostly justified by the latitudinal differences, because Thessaloniki (Northern Greece) and Heraklion (Crete, Southern Greece) are five degrees of latitude apart and on 21 May, the mainland of Greece was covered by a thick cloud layer of stratocumulus and convective clouds.Additionally, the EHF was calculated using ERA5 and ERA5-Land reanalyses to show that all three datasets (weather observations, ERA5 and ERA5-Land) indicated the presence of a heatwave because the EHF is above 0 °C2 for at least three consecutive days.ERA5-Land and ERA5 agreed on the maximum EHF values in Thessaloniki and Heraklion.C and T 2 max > 95th percentile (T 2 max95) (highlighted in orange) in the different datasets (stations, ERA5 and ERA5-land) for the three stations of Athens, Thessaloniki and Heraklion.Green blocks indicate the days when the EHF, THI, UTCI and T 2 max95 thresholds were not exceeded.Red blocks indicate the date when the maximum value of each index occurred in each city, along with its corresponding maximum value.
Max. Value 34.5 31.8 The maximum THI and UTCI values occurred on 16 and 17 May, except for the THI in Thessaloniki, which was marginally higher on 19 May compared to the previous three days (16 to 18 May).Both the THI and UTCI values were marginally higher than their thresholds, which were 70 • F (mild discomfort) and 32 • C (strong heat stress), respectively.It should be noted that the THI and UTCI thresholds are not equal in terms of heat stress severity; however, the factors that contribute to each index are quite different.Additionally, UTCI is much more modern, well tested, and more sophisticated and complex than THI.Thus, UTCI's threshold is more appropriate for evaluating whether a heatwave has taken place.
As mentioned previously, the peak intensity of the heatwave was found on different dates, but the most robust results were provided by the EHF from the weather stations as the reanalysis data do not perfectly describe the state of the atmosphere [48,49].The peak intensity of the EHF occurred on the 16th and 17th of May.The majority of the maximum temperatures, which were measured by the weather stations of NOA and the Hellenic National Meteorological Service (HNMS) across Greece and the surrounding countries, were observed on 16 May 2020 (Figure 2).The highest temperature (40 • C) in Greece was recorded in Kalamata city (KAL in Figure 2).The KAL station is owned by HNMS and it does not have a complete temperature time series to derive its reference climatology 95th percentile maximum temperature (nan in KAL).However, it was included in the map because it exhibited the highest maximum air temperature recorded in Greece during the event.This was a record high (absolute maximum) 2 m temperature for Kalamata for the month of May.The same temperature (40 • C) was recorded again the next day (17 May).In agreement with [42], mesoscale features seemed to have affected the maximum temperature at the airport of Kalamata (where the maximum 2 m temperature of 40 • C was observed in Greece on 16 May 2020).Hot and dry light winds from the land, together with the insolation, appeared to have contributed to the synoptic scale warm advection at the lower troposphere, in order to reach a 2 m temperature of 40 • C at 12:20 p.m. local time (09:20 UTC) on 16 May.However, the onset of the sea breeze and the proximity to the coast (at a distance of 4 km) led to an abrupt decrease in the 2 m temperature by 6 • C in the next 30 min.Despite the clear sky and the insolation of the early afternoon hours, the 2 m temperature dropped further to 30 • C until 16:50 local time (13:50 UTC) under the influence of the sea breeze.The highest temperature (42.6 • C) in the whole region of interest appeared in Aydin, Turkey.This temperature was 11.7 • C higher than its T 2 max95 and it was also the highest maximum temperature recorded for the month of May in Aydin.Additionally, all the stations shown in Figure 2  According to the synoptic systems in the vicinity of Greece, at the beginning of the heatwave, there were stable conditions, with a sea level pressure between 1012 and 1017 hPa; there were no apparent fronts and the surface winds were weakly blowing from SSW and SW.At the end of the heatwave (20 May 2020), a cold front approached from the west, bringing in cold air masses and lowering the mean daily temperature across the Eastern Mediterranean.Finally, the end of the heatwave in Greece was indicated by the EHF and THI indices and T 2 max95 (station data and ERA5) for Athens and Thessaloniki.

Synoptic Analysis
Extreme temperatures were also prominent in the lower troposphere.In Figure 3 at 850 hPa, the reanalysis data indicated temperature increases of ten to eleven standard deviations or around 14 • C above the mean value for the Aegean Sea, eastern mainland of Greece and western coast of Turkey.These maximum anomalies were observed on 16 May.The reason for this severe warming is the strong horizontal thermal advection (maximum value: 3 • C/h-Western Greece and Albania) from the southwest during the event.Using the HYSPLIT air parcel trajectory tool [89], we found that the warm air masses originating in the Libyan region and the Gulf of Sidra moved anticyclonically north and northeast, affecting Southern Italy, Greece (Figure 4), and afterwards western Turkey.Figure 5 illustrates the maximum extent of this warm intrusion in the European region at the maximum intensity of the heatwave, with temperatures up to 25 • C at 850 hPa.Additionally, a broad mid-tropospheric ridge prevailed at the eastern Mediterranean Sea and a deep trough was located over the Iberian Peninsula and northwest Africa.The ridge was associated with subsidence leading to adiabatic warming through thermal compression of the air.Furthermore, in Figure 6, a strong positive anomaly in the geopotential heights was observed above Greece and western Turkey.Specifically, above the Aegean Sea and southwestern Turkey, the geopotential height anomalies at 500 hPa (Figure 6a) and 850 hPa (Figure 6b) were, respectively, at least ten and seven standard deviations higher than the climatological value of May.On the other hand, there was a negative anomaly in the geopotential heights at the trough above Morocco, which was intense, but not as extreme as the positive anomaly.This dipole of anomalies amplifies the poleward flow and the advection of warm air masses from northern Africa to the Mediterranean Sea and southern Europe [91,92].
(Figure 6b) were , respectively, at least ten and seven standard deviations higher than the climatological value of May.On the other hand, there was a negative anomaly in the geopotential heights at the trough above Morocco, which was intense, but not as extreme as the positive anomaly.This dipole of anomalies amplifies the poleward flow and the advection of warm air masses from northern Africa to the Mediterranean Sea and southern Europe [91,92].At the level of the jet stream (Figure 7a) at 300 hPa, there is a meridional (southwesterly) flow of the jet stream above Algeria and the central Mediterranean Sea which turns anticyclonically and becomes zonal above the Balkans, where the jet streak is located.According to [93], a pattern of this kind, where the jet stream is north-northwest of Greece and anticyclonically curved, causes strong subsidence (Figure 7b) of the air masses leading to adiabatic heating of the lower troposphere above Greece.This contribution further enhances the heating from the warm air advection in the lower troposphere, which seemed to be the main driver of the heatwave.At the level of the jet stream (Figure 7a) at 300 hPa, there is a meridional (southwesterly) flow of the jet stream above Algeria and the central Mediterranean Sea which turns anticyclonically and becomes zonal above the Balkans, where the jet streak is located.According to [93], a pattern of this kind, where the jet stream is north-northwest of Greece and anticyclonically curved, causes strong subsidence (Figure 7b) of the air masses leading to adiabatic heating of the lower troposphere above Greece.This contribution further enhances the heating from the warm air advection in the lower troposphere, which seemed to be the main driver of the heatwave.
The two factors that potentially affected the intensity of the heatwave were the soil moisture in the upper layer surface (0-7 cm) anomaly and the SST anomaly.The soil moisture anomaly (Figure 8a) showed a water deficit in the upper soil layer, which mostly interacts with the atmosphere.The maximum observed soil moisture deficit (negative anomaly) was −0.1 m 3 in northwestern Greece.A persistent low water content has the potential to contribute to heating through the prevalence of the sensible heat fluxes [94][95][96][97].Also, the SST anomaly (Figure 8b) showed a general warming of the surface waters at around two standard deviations above the climatological value.Specifically, around the island of Crete, the difference was four to five standard deviations higher with a maximum of six standard deviations at 23.6 • C. The warm SSTs were persistent throughout the duration of the heatwave, contributing to further heating of the atmosphere.The reasons for this strong heating might be the prolonged low soil moisture accompanied together with the prevailing ridge at 500 hPa.Finally, a wedge of warm air coming from the Sahara Desert raised the temperature of the lower troposphere and the position of the jet stream northwest of Greece resulted in a downward atmospheric motion that led to the amplification of the warming due to thermal compression of the air.These atmospheric and land factors have previously been studied for heatwaves in relation to soil moisture content [94,[98][99][100][101], blocking highs [102][103][104], atmospheric heat transfer [105], jet stream position [93] and SSTs [106].
anticyclonically and becomes zonal above the Balkans, where the jet streak is located.According to [93], a pattern of this kind, where the jet stream is north-northwest of Greece and anticyclonically curved, causes strong subsidence (Figure 7b) of the air masses leading to adiabatic heating of the lower troposphere above Greece.This contribution further enhances the heating from the warm air advection in the lower troposphere, which seemed to be the main driver of the heatwave.The two factors that potentially affected the intensity of the heatwave were the soil moisture in the upper layer surface (0-7 cm) anomaly and the SST anomaly.The soil moisture anomaly (Figure 8a) showed a water deficit in the upper soil layer, which mostly interacts with the atmosphere.The maximum observed soil moisture deficit (negative anomaly) was −0.1 m 3 /m 3 in northwestern Greece.A persistent low water content has the potential to contribute to heating through the prevalence of the sensible heat fluxes [94][95][96][97].Also, the SST anomaly (Figure 8b) showed a general warming of the surface waters at around two standard deviations above the climatological value.Specifically, around the island of Crete, the difference was four to five standard deviations higher with a maximum of six standard deviations at 23.6 °C.The warm SSTs were persistent throughout the duration of the heatwave, contributing to further heating of the atmosphere.The reasons for this strong heating might be the prolonged low soil moisture accompanied together with the prevailing ridge at 500 hPa.Finally, a wedge of warm air coming from the Sahara Desert raised the temperature of the lower troposphere and the position of the jet stream northwest of Greece resulted in a downward atmospheric motion that led to the amplification of the warming due to thermal compression of the air.These atmospheric and land factors have previously been studied for heatwaves in relation to soil moisture content [94,[98][99][100][101], blocking highs [102][103][104], atmospheric heat transfer [105], jet stream position [93] and SSTs [106].

S2S Prediction Ability
Initially, the boxplots (or whisker plots) of the temperature at 850 hPa using the full distribution of the forecasts (Figure 9) were generated for the region 34.5°Ν-42° N, 19.5° E-29.5°E, hereafter called the area of interest.The boxplots were based on the deterministic and the perturbed forecasts and they include the variation between the ensemble members.The ERA5 boxplot (leftmost boxplot) indicated a median (Q2) value of 17 °C, a 75th percentile (Q3) of 22.3 °C, a 25th percentile (Q1) of 11.1 °C, a maximum (Q4) of 27.1 °C

S2S Prediction Ability
Initially, the boxplots (or whisker plots) of the temperature at 850 hPa using the full distribution of the forecasts (Figure 9) were generated for the region 34.where the CMA model demonstrated the best resemblance to the reanalysis.Also, the CMA ensemble forecast extremes were higher than the extremes of ERA5.The WRF forecasts showed smaller compatibility with ERA5 temperatures in week 2. In the initialization period of 23-29 April (week 3), most of the models presented higher temperatures than in the previous weeks, except for HMCR and CMA.The WRF forecasts showed improvements with a higher resemblance to ERA5 presented by WRF 5 km.Its median was higher and closer to the median of reanalysis than WRF 25 km.and a minimum (Q0) of 0.7 °C in the period of 8-23 May 2020.For the initialization period of 9-15 April (hereafter referred to as week 1), the median of all the S2S forecasts was close to the Q1 of ERA5 and their Q3 was below the ERA5 median.On the other hand, the extreme values (Q0 and Q4) of the S2S forecasts (global and WRF) were closer to the extremes of ERA5 with the exception of the HMCR model.Similar results were obtained in the 16-22 April initialization period (week 2) where the CMA model demonstrated the best resemblance to the reanalysis.Also, the CMA ensemble forecast extremes were higher than the extremes of ERA5.The WRF forecasts showed smaller compatibility with ERA5 temperatures in week 2. In the initialization period of 23-29 April (week 3), most of the models presented higher temperatures than in the previous weeks, except for HMCR and CMA.The WRF forecasts showed improvements with a higher resemblance to ERA5 presented by WRF 5 km.Its median was higher and closer to the median of reanalysis than WRF 25 km.The boxplot diagrams of the WRF deterministic forecasts of the daily maximum 2 m temperature (T2max; Figure 10) had smaller temperature ranges than ERA5, spanning from 9.9 to 29.8 °C (WRF 5 km from week 3), while the range of the reanalysis was from 11.2 to 33 °C.Figure 10 shows that the WRF forecasts could not predict the extreme high temperatures in week 1 and week 2.Although WRF 5 km predicted higher temperatures in Q4 than WRF 25 km, its median was lower than that of WRF 25 km.Week 2 presented the same behaviour as week 1, but in week 3, both WRF 25 km and 5 km showed an improvement.More specifically, the Q3 and Q4 of WRF 25 km increased.The same occurred with WRF 5 km, which exhibited a larger increase in Q4 than WRF 25 km.Quantitatively, the difference between the high extremes was Q4-reanalysis − Q4-WRF-5km = 3.2 °C, while Q4-reanalysis − Q4-WRF-25km = 5.4 °C in the WRF forecasts from week 3.The boxplot diagrams of the WRF deterministic forecasts of the daily maximum 2 m temperature (T 2 max; Figure 10) had smaller temperature ranges than ERA5, spanning from 9.9 to 29.8 • C (WRF 5 km from week 3), while the range of the reanalysis was from 11.2 to 33 • C. Figure 10 shows that the WRF forecasts could not predict the extreme high temperatures in week 1 and week 2.Although WRF 5 km predicted higher temperatures in Q 4 than WRF 25 km, its median was lower than that of WRF 25 km.Week 2 presented the same behaviour as week 1, but in week 3, both WRF 25 km and 5 km showed an improvement.More specifically, the Q 3 and Q 4 of WRF 25 km increased.The same occurred with WRF 5 km, which exhibited a larger increase in Q 4 than WRF 25 km.Quantitatively, the difference between the high extremes was Q Looking into the quantile-quantile (Q-Q) distributions of the predicted temperatures using the full distribution of the global forecasts, and the deterministic WRF forecasts and the reanalysis at 850 hPa from week 3 (Figure 11a), it is obvious that the highest temperature quantiles were underestimated by all the models.Regarding the upper tail (>90%), the highest resemblance with the reanalysis was achieved by KMA and UKMO.Additionally, UKMO had the closest distribution fit with ERA5 as its mean squared error (MSE = 16.57) was the smallest of all the models.Generally, the S2S models and reanalysis agreed (i.e., they are closer to the diagonal) mostly at the quantiles between 15% and 50%.WRF 5 km had higher temperatures than 25 km across all the distribution percentages at 850 hPa.In Figure 11b, WRF 25 km and 5 km had similar maximum daily temperature distributions at 2 m and both nests presented similar MSE values.In the quantiles above 70%, WRF 5 km presented slightly higher temperatures and was closer to ERA5 than WRF 25 km.However, the maximum temperatures were underestimated in the 850 hPa Q-Q profile.For example, the 90% quantile was underestimated by WRF by 5.2 • C. In higher quantiles, the gap between WRF and ERA5 increased.Looking into the quantile-quantile (Q-Q) distributions of the predicted tempe using the full distribution of the global forecasts, and the deterministic WRF foreca the reanalysis at 850 hPa from week 3 (Figure 11a), it is obvious that the highest te ture quantiles were underestimated by all the models.Regarding the upper tail the highest resemblance with the reanalysis was achieved by KMA and UKMO.Ad ally, UKMO had the closest distribution fit with ERA5 as its mean squared error 16.57) was the smallest of all the models.Generally, the S2S models and reanalysis (i.e., they are closer to the diagonal) mostly at the quantiles between 15% and 50% 5 km had higher temperatures than 25 km across all the distribution percentage hPa.In Figure 11b, WRF 25 km and 5 km had similar maximum daily temperatur butions at 2 m and both nests presented similar MSE values.In the quantiles abo WRF 5 km presented slightly higher temperatures and was closer to ERA5 than km.However, the maximum temperatures were underestimated in the 850 hPa Q file.For example, the 90% quantile was underestimated by WRF by 5.2 °C.In highe tiles, the gap between WRF and ERA5 increased.Looking into the quantile-quantile (Q-Q) distributions of the predicted temperatures using the full distribution of the global forecasts, and the deterministic WRF forecasts and the reanalysis at 850 hPa from week 3 (Figure 11a), it is obvious that the highest temperature quantiles were underestimated by all the models.Regarding the upper tail (>90%), the highest resemblance with the reanalysis was achieved by KMA and UKMO.Additionally, UKMO had the closest distribution fit with ERA5 as its mean squared error (MSE = 16.57) was the smallest of all the models.Generally, the S2S models and reanalysis agreed (i.e., they are closer to the diagonal) mostly at the quantiles between 15% and 50%.WRF 5 km had higher temperatures than 25 km across all the distribution percentages at 850 hPa.In Figure 11b, WRF 25 km and 5 km had similar maximum daily temperature distributions at 2 m and both nests presented similar MSE values.In the quantiles above 70%, WRF 5 km presented slightly higher temperatures and was closer to ERA5 than WRF 25 km.However, the maximum temperatures were underestimated in the 850 hPa Q-Q profile.For example, the 90% quantile was underestimated by WRF by 5.2 °C.In higher quantiles, the gap between WRF and ERA5 increased.The uncentered Anomaly Correlation Coefficients (ACCs) of the global models were calculated according to each model's climatology using reforecast data.The ACC compares the predicted anomalies against the observed anomalies (reanalysis) and offers a way to quantify the prediction ability of each forecasting model and assess its prediction accuracy.The ACC is presented separately for the unperturbed (deterministic) forecasts and the forecasts with perturbed initial conditions for various initialization dates (9-29 April 2020 00 UTC-first row) (Table 5).The results of the perturbed forecasts are the ACCs averaged over all the corresponding ensemble members.Note that the ACC of each individual perturbed member was calculated before averaging.The blank cells indicate days when both real-time forecasts and reforecasts had not been produced as most of the centres (except NCEP) do not produce daily real-time forecasts and reforecasts.Also, although UKMO and KMA produce daily forecasts, their reforecasts are produced only four times per month (on the 1st, 9th, 17th and 25th).Red cells indicate the initialization dates with an ACC lower than 0 for the evaluation period of 8-23 May 2020.An ACC near zero indicates that the forecast is unreliable, while negative values show that the observed and the predicted anomalies are negatively correlated.According to Table 5, the highest and sustained (without large values fluctuations above the ACC = 0.6 threshold [107]-Figure A1) ACCs above 0.6 were found in the perturbed forecasts of ECMWF and NCEP after 20 and 25 April, respectively.The perturbed forecasts of ECMWF reached ACC values between 0.69 and 0.78 after 20 April 2020.Furthermore, the UKMO forecasts for 25 April exhibited high ACC values of 0.84 and 0.59 for the unperturbed and the perturbed members, respectively.The unperturbed forecasts of KMA from 25 April exhibited a high ACC of 0.72.The perturbed forecasts of NCEP from 25 April appeared to have the highest credibility because the daily ACC values were generally stable without a substantial decrease as seen in the NCEP unperturbed forecasts.If there were more frequent forecasts and reforecasts from ECMWF, UKMO and KMA, it would be possible to check the stability of their ACC values.ECMWF, UKMO and NCEP also demonstrated the lowest temperature bias (<0.3 • C) between ERA5 climatology and each model's climatology (reforecasts) (Figures A2-A7), while the rest of the models overestimated the temperature at 850 hPa by 1-3 • C.Only HMCR underestimated the temperature by 1 • C.
In Figure 12, the timeseries of the hit rate (HR) and false alarm rate (FAR) is presented for each initialization week using the WRF forecasts and the perturbed members of the global S2S forecasts.The 850 hPa temperature forecasts (of all the ensemble members at each grid point of the area of interest during the validation period) were compared to their 95th percentile in the ERA5 climatology of 1975-2005.It was obvious that the HR of all the forecasts, except for the ones from HMCR, increased from week 1 to week 3, indicating that the models detected the heating during the heatwave when they were initialized in week 3.The highest HR (0.71) was obtained by the KMA model when it was initialized in week 3.However, this increase was not always monotonic as the initialization period approached the actual heatwave.On the other hand, the FAR generally increased at almost the same rate as the HR.This means that although the models detected the high temperature signal, their forecasts exhibited large spatiotemporal errors.
Table 5. Uncentred Anomaly Correlation Coefficients (ACCs) for initialization dates from 9 April to 29 April 2020 (first row) from different global S2S forecasts (perturbed and unperturbed forecasts).The results of the perturbed forecasts are ACCs averaged over all the corresponding ensemble members.The ACC of each individual perturbed member was calculated before averaging.Evaluation period: 8-23 May 2020 at 00 UTC.ACC ≤ 0 (red), ACC > 0 (colour shading).All the global S2S forecasts, reforecasts and the ERA5 reanalysis were interpolated to the same grid (34.5 • N-42 • N, 19.5 • E-29.5 • E) with a grid spacing of 1.5   In Figure 12, the timeseries of the hit rate (HR) and false alarm rate (FAR) is presented for each initialization week using the WRF forecasts and the perturbed members of the global S2S forecasts.The 850 hPa temperature forecasts (of all the ensemble members at each grid point of the area of interest during the validation period) were compared to their 95th percentile in the ERA5 climatology of 1975-2005.It was obvious that the HR of all the forecasts, except for the ones from HMCR, increased from week 1 to week 3, indicating that the models detected the heating during the heatwave when they were initialized in week 3.The highest HR (0.71) was obtained by the KMA model when it was initialized in week 3.However, this increase was not always monotonic as the initialization period approached the actual heatwave.On the other hand, the FAR generally increased at almost the same rate as the HR.This means that although the models detected the high temperature signal, their forecasts exhibited large spatiotemporal errors.week 3.The highest HR (0.71) was obtained by the KMA model when it was initialize week 3.However, this increase was not always monotonic as the initialization period proached the actual heatwave.On the other hand, the FAR generally increased at alm the same rate as the HR.This means that although the models detected the high temp ture signal, their forecasts exhibited large spatiotemporal errors.In Figure 13a, a ROC diagram is presented for week 3 using the perturbed members of the global forecasts.The vertical axis represents the hit rate (HR) and the horizontal axis the false alarm rate (FAR) for the different probability thresholds (PTs) of the S2S forecasts.The PT is the percentage of forecast ensembles (for each model) which exceeded the climatological 95th percentile of the ERA5 temperature at 850 hPa on every day of the validation period at 00 UTC at each grid point of the area of interest (34.5 • N-42 • N, 19.5 • E-29.5 • E at a grid spacing of 1.5 • × 1.5 • ).The ERA5 climatology was used instead of each model's climatology because the temperature biases from the ERA5 climatology, as mentioned previously, are relatively small and there are no WRF reforecasts to verify its model climatology.Each point in the diagram of Figure 13a represents an approximately 17% probability increase starting from the first point to the top-right corner of the plot.For instance, if a model has 100 ensembles members (like ECMWF), a 17% probability means that 17 out of the 100 ensemble members surpassed the 95th percentile temperature threshold derived from the climatology (ERA5).The first (second) point represents a PT of approximately 17% (34%), etc.The 17% increment was chosen to ensure that even the model with the smallest ensemble size (CMA with six members per week) had at least one member for each PT.Also, in the ROC diagram, the Area Under the Curve (AUC) score of each model is presented.An AUC higher than 0.5 indicates a prediction ability higher than random guessing.Therefore, a poor prediction line was plotted (dashed black diagonal).Theoretically, a perfect forecast has an AUC equal to 1 [108].The ROC diagram in Figure 14a shows the performance of the deterministic forecasts of WRF 25 km (blue curve) and 5 km (red curve) for T2max from week 3. WRF 5 km showed a slightly better prediction performance (AUC = 0.65) compared to WRF 25 km (AUC = 0.64).In weeks 1 and 2, the models exhibited poor prediction abilities, as their AUC scores were 0.49 to 0.51 for both grid spacings (Figure 14b).The reliability diagrams (Figure 15) for each global (perturbed members) and limitedarea model (WRF; deterministic) indicate the reliability of the S2S forecasts, that is, how well the predicted probabilities (x-axis) of the validation period of the heatwave correspond to their observed frequencies (y-axis) [87,112].The correspondence is perfect when the plot is identical to the diagonal line (dashed orange line).The dashed red trendline shows the forecast probability range where the S2S forecast has the greatest sharpness (see histogram bars with frequencies above median).This trendline allows for a comparison In week 1 and week 2 many models demonstrated an S2S prediction ability because their AUC was above 0.5.The three highest AUC scores, in descending order, were achieved by WRF 5 km, KMA and WRF 25 km for week 1 and CMA, WRF 25 km and WRF 5 km for week 2 (Figure 13b).In week 3 (Figure 13a), the highest AUC scores were exhibited by ECMWF, KMA and WRF 25 km.The remarkably high AUC score of CMA in week 2 (Figure 13b) was related to three very important ensemble members, which constituted a large part (50%) of the influence of all members and raised the PT of the 50% HR to FAR ratio well above the diagonal.However, it is possible that these members' effects may have occurred due to random chance because the AUC of CMA dropped from 0.84 (week 2) to 0.5 (i.e., poor prediction in week 3), while the rest of the models showed an improvement in their forecasts or demonstrated similar AUC scores as the initialization time approaches the heatwave.The investigation of random successful S2S forecasts was not included in the aims of this study, but it is an interesting topic for further research.Also, a model with a sufficient amount of ensemble members (above ten) minimizes the effects of random hits on the prediction ability [109][110][111].The WRF S2S forecasts presented AUC scores above 0.5 for all three weeks.The two nests of WRF did not exhibit noticeable AUC differences.A substantial difference was found when comparing WRF forecasts with the global forecasts, especially for weeks 1 and 2. The only model that exhibited similar AUC values to WRF was the KMA model.In week 3 the AUCs of NCEP and ECMWF increased significantly by 0.15 and 0.22, respectively.Lastly, the WRF AUC scores for all the initialization periods were better than those of NCEP, which provided the initial and boundary conditions for the WRF forecasts.
The ROC diagram in Figure 14a shows the performance of the deterministic forecasts of WRF 25 km (blue curve) and 5 km (red curve) for T 2 max from week 3. WRF 5 km showed a slightly better prediction performance (AUC = 0.65) compared to WRF 25 km (AUC = 0.64).In weeks 1 and 2, the models exhibited poor prediction abilities, as their AUC scores were 0.49 to 0.51 for both grid spacings (Figure 14b).
The ROC diagram in Figure 14a shows the performance of the deterministic forecasts of WRF 25 km (blue curve) and 5 km (red curve) for T2max from week 3. WRF 5 km showed a slightly better prediction performance (AUC = 0.65) compared to WRF 25 km (AUC = 0.64).In weeks 1 and 2, the models exhibited poor prediction abilities, as their AUC scores were 0.49 to 0.51 for both grid spacings (Figure 14b).The reliability diagrams (Figure 15) for each global (perturbed members) and limitedarea model (WRF; deterministic) indicate the reliability of the S2S forecasts, that is, how well the predicted probabilities (x-axis) of the validation period of the heatwave correspond to their observed frequencies (y-axis) [87,112].The correspondence is perfect when the plot is identical to the diagonal line (dashed orange line).The dashed red trendline shows the forecast probability range where the S2S forecast has the greatest sharpness (see histogram bars with frequencies above median).This trendline allows for a comparison The reliability diagrams (Figure 15) for each global (perturbed members) and limitedarea model (WRF; deterministic) indicate the reliability of the S2S forecasts, that is, how well the predicted probabilities (x-axis) of the validation period of the heatwave correspond to their observed frequencies (y-axis) [87,112].The correspondence is perfect when the plot is identical to the diagonal line (dashed orange line).The dashed red trendline shows the forecast probability range where the S2S forecast has the greatest sharpness (see histogram bars with frequencies above median).This trendline allows for a comparison with the perfect reliability diagonal.If the reliability curve exhibits a positive slope, it signifies that as the predicted probability of the event rises, the confirmed likelihood of observing the event also increases.Hence, the forecasts demonstrate a degree of reliability.If the trendline has a flatter appearance, then the model has a low resolution, for example, in the UKMO forecast for week 3 (Figure 15, first row, second column).Resolution pertains to the ability of a forecast to distinguish between different categories of outcomes that can occur.When the slope is below the diagonal, it suggests an imperfect reliability.Lastly, the histogram bins were derived from the ensemble size of each model.ECMWF (Figure 15) showed a more robust reliability than most S2S models for all three weeks (only week 3 is shown).Its trendline lies above the diagonal, which is a sign that the forecast frequency underestimated the observed frequency of the 850 hPa temperature and exceeding its 95th percentile (underforecasting).ECMWF in week 3 displayed a good reliability with a 40 to 55% probability of a correct forecast because the sharpness (observed frequencies in histogram) was high and the diagram was closer and nearly parallel to the diagonal.Similarly, the forecasts of KMA and NCEP from week 3 (Figure 15) exhibited good reliability and resolution, with the trendline of the former model closer to the diagonal.On the other hand, the other three global S2S models demonstrated a low resolution and low reliability.
The limited-area forecasts (WRF 25 km and 5 km; Figure 15) showed good reliability and both had a better reliability than NCEP in week 3, which is the initialization period in which NCEP demonstrated some S2S prediction ability.More specifically, in week 3, the trendlines of WRF 25 km and 5 km were closer and more parallel to the diagonal even though they had fewer ensemble members (WRF: 14 members; NCEP: 105 members).Furthermore, WRF 5 km was slightly closer to the diagonal than WRF 25 km.Finally, both WRF forecasts showed a bias in underforecasting (trendline above the diagonal) the temperatures at 850 hPa.Regarding the maximum daily 2 m temperatures (Figure 16) predicted from week 3, both WRF forecasts presented a better reliability than in the previous initialization weeks, but the WRF 5 km (blue dashed line) trendline was more parallel to the diagonal than WRF 25 km for all the forecast probabilities.
Atmosphere 2024, 15, 442.https://doi.org/10.3390/atmos15040442www.mdpi.com/journal/atmosphere, on the x-axis).The maximum (minimum) value of the is equal to 1 (−1) when the HR (FAR) is equal to 1.In comparison with the previou mentioned scores (such as HR, FAR and CSI), EDI complements them by focusing on tail of the temperature distribution, which is crucial for rare or extreme events like heatwave of this study [88].In week 3 (Figure 17), KMA had the best profile (highest ED followed by ECMWF and NCEP at the lower PTs and WRF forecasts at the higher P KMA reached an EDI = 0 for the PT of 100%.UKMO reached the same point, but on lo probabilities, its EDI was around 0. ECMWF and KMA maintained an EDI = 1 (per score) for PTs up to 34%.The forecasts of week 3 tended to lose their accuracy in hig PTs (i.e., for larger numbers of ensemble members) than the ones of week 1 and week Figure 17 illustrates the EDI of the various global perturbed and WRF forecasts relative to the T95 of the ERA5 climatology of May 1975-2005 for the different probability thresholds (PTs: 17%, 34%, etc., on the x-axis).The maximum (minimum) value of the EDI is equal to 1 (−1) when the HR (FAR) is equal to 1.In comparison with the previously mentioned scores (such as HR, FAR and CSI), EDI complements them by focusing on the tail of the temperature distribution, which is crucial for rare or extreme events like the heatwave of this study [88].In week 3 (Figure 17), KMA had the best profile (highest EDIs), followed by ECMWF and NCEP at the lower PTs and WRF forecasts at the higher PTs.KMA reached an EDI = 0 for the PT of 100%.UKMO reached the same point, but on lower probabilities, its EDI was around 0. ECMWF and KMA maintained an EDI = 1 (perfect score) for PTs up to 34%.The forecasts of week 3 tended to lose their accuracy in higher PTs (i.e., for larger numbers of ensemble members) than the ones of week 1 and week 2.
Most of the models showed an increase in prediction ability in week 3 and the main contributors to the strong heating were the temperature advection at 850 hPa and the downward motion due to the presence of the high geopotential heights at 500 hPa.In Figure 18, the reanalysis showed positive geopotential height anomalies over the Eastern Mediterranean, which were associated with anomalous anticyclonic flow in the lower troposphere and subsidence over Greece and western Turkey for the period from 8 to 23 May 2020.Many models (Figure 19) identified the positive-negative dipoles of the anomalies at 850 hPa above the Eastern Mediterranean and northeastern Europe for week 3 (initialization period).These models were ECMWF, UKMO, KMA, NCEP and WRF 25 km.The NCEP and WRF anomalies had many similarities because the initial/boundary conditions of WRF were derived from the NCEP/CFSv.2model.However, a better-defined anomalous pattern was predicted by WRF compared to NCEP.At 500 hPa, the same models identified the positive anomalies above Greece and Turkey.On the other hand, HMCR and CMA did not detect the southwest anomalous flow over the central Mediterranean at 850 hPa.Also, the CMA forecasts from week 2, which exhibited very high AUC values (0.84) relative to the other models, did not detect the dipole anomaly, which aligns with the argument that 50% of the CMA ensemble members detected high temperatures above Greece due to randomness.Μost of the models showed an increase in prediction ability in week 3 and the mai contributors to the strong heating were the temperature advection at 850 hPa and th downward motion due to the presence of the high geopotential heights at 500 hPa.I Figure 18, the reanalysis showed positive geopotential height anomalies over the Easter Mediterranean, which were associated with anomalous anticyclonic flow in the lowe troposphere and subsidence over Greece and western Turkey for the period from 8 to 2 May 2020.Many models (Figure 19) identified the positive-negative dipoles of the anom alies at 850 hPa above the Eastern Mediterranean and northeastern Europe for week (initialization period).These models were ECMWF, UKMO, KMA, NCEP and WRF 2 km.The NCEP and WRF anomalies had many similarities because the initial/boundar conditions of WRF were derived from the NCEP/CFSv.2model.However, a better-define anomalous pattern was predicted by WRF compared to NCEP.At 500 hPa, the same mod els identified the positive anomalies above Greece and Turkey.On the other hand, HMC and CMA did not detect the southwest anomalous flow over the central Mediterranean a 850 hPa.Also, the CMA forecasts from week 2, which exhibited very high AUC value (0.84) relative to the other models, did not detect the dipole anomaly, which aligns wit the argument that 50% of the CMA ensemble members detected high temperatures abov Greece due to randomness.

Discussion and Conclusions
The heatwave of this case study occurred abnormally early before summer.According to the various heat indices, it started on 11 May and ended on 20 May 2020, affecting the whole Greek region and the west coast of Turkey.The intensity was also high given the fact that the heatwave took place in mid-May.The highest 2 m temperatures recorded on 16 May were 40 °C in southern Greece and 42.6 °C in southwestern Turkey on 17 May.Additionally, the tropospheric temperature (850 hPa) and geopotential height (850 and 500 hPa) anomalies were substantially high for this period.More specifically, the temperatures at 850 hPa were ten to eleven standard deviations warmer than the climatology above the Aegean Sea and the geopotential heights at 850 and 500 hPa were, respectively, six to eight and ten to eleven standard deviations higher than the climatology above the southeast Aegean Sea and southwest Turkey.The warm air masses that affected the area of interest originated from the central Sahara region and moved anticyclonically, reaching Greece and Turkey.The main causes of this strong heating were the thermal advection from the southwest and the subsidence (leading to thermal compression) caused by the upper air ridge and the position of the jet stream.
The S2S prediction abilities of six global models (ECMWF, UKMO, CMA, KMA, NCEP and HMCR) and the limited-area model WRF with nested forecasts at grid spacings of 25 km and 5 km were investigated using different statistical scores.The main conclusions of this analysis are as follows: 1. Most global forecasts demonstrated the ability to predict the temperature at 850 hPa, 14 to 21 days (S2S timescale) before the event, which is in agreement with [28].As also presented in a previous study [17], the best models were ECMWF, KMA and NCEP which showed a good prediction ability even three to four weeks before the heatwave.The lowest prediction performance for this event was shown by the HMCR model, which is likely due to the small number of vertical levels (28) employed by this model and the fact that it was not coupled to an ocean model.

Discussion and Conclusions
The heatwave of this case study occurred abnormally early before summer.According to the various heat indices, it started on 11 May and ended on 20 May 2020, affecting the whole Greek region and the west coast of Turkey.The intensity was also high given the fact that the heatwave took place in mid-May.The highest 2 m temperatures recorded on 16 May were 40 • C in southern Greece and 42.6 • C in southwestern Turkey on 17 May.Additionally, the tropospheric temperature (850 hPa) and geopotential height (850 and 500 hPa) anomalies were substantially high for this period.More specifically, the temperatures at 850 hPa were ten to eleven standard deviations warmer than the climatology above the Aegean Sea and the geopotential heights at 850 and 500 hPa were, respectively, six to eight and ten to eleven standard deviations higher than the climatology above the southeast Aegean Sea and southwest Turkey.The warm air masses that affected the area of interest originated from the central Sahara region and moved anticyclonically, reaching Greece and Turkey.The main causes of this strong heating were the thermal advection from the southwest and the subsidence (leading to thermal compression) caused by the upper air ridge and the position of the jet stream.
The S2S prediction abilities of six global models (ECMWF, UKMO, CMA, KMA, NCEP and HMCR) and the limited-area model WRF with nested forecasts at grid spacings of 25 km and 5 km were investigated using different statistical scores.The main conclusions of this analysis are as follows: 1.
Most global forecasts demonstrated the ability to predict the temperature at 850 hPa, 14 to 21 days (S2S timescale) before the event, which is in agreement with [28].As also presented in a previous study [17], the best models were ECMWF, KMA and NCEP which showed a good prediction ability even three to four weeks before the heatwave.The lowest prediction performance for this event was shown by the HMCR model, which is likely due to the small number of vertical levels (28) employed by this model and the fact that it was not coupled to an ocean model.

2.
The WRF model (25 km and 5 km) has approximately the same performance as ECMWF, KMA and NCEP, but WRF, especially WRF 5 km, was more reliable (its trendline was closer and parallel to the perfect reliability line).

3.
Similar to the study of [24], the WRF model, in terms of AUC and reliability, performs better than NCEP, which provides the boundary conditions to the downscaling forecasts of WRF. 4.
There was no substantial difference in the S2S predictability of temperature at 850 hPa when WRF 25 km and WRF 5 km were compared.Therefore, there is no systematic benefit from the dynamical downscaling of the 850 hPa temperature forecasts to a higher spatial resolution (5 km × 5 km).

5.
The S2S prediction ability for the maximum daily 2 m temperatures was slightly better for the WRF 5 km compared to WRF 25 km.This difference was only observed in week 3 (lead time of 15-20 days) in terms of reliability, AUC and higher (and closer to ERA5) maximum temperatures at 2 m that were forecasted by WRF 5 km.6.
All models underestimated the extreme high temperatures at 850 hPa and near the surface (as far as WRF is concerned), but the ACC scores for the global models showed indications of better prediction abilities for strong temperature anomalies two to three weeks before the heatwave.
Future work related to the predictability of heatwaves includes numerical weather prediction experiments to investigate the effects of soil moisture, SSTs and North Atlantic oscillations on extreme temperatures at 2 m and in the lower troposphere.
Research (NCAR).We acknowledge the use of the Climate Forecast System Version 2 (CFSv2) data provided by the National Centers for Environmental Prediction (NCEP) in the WRF forecasts.We thank the three anonymous reviewers for their constructive reviews.

Conflicts of Interest:
The authors declare no conflicts of interest.The funders had no role in the design of the study; in the collection, analyses, interpretation of the data; in the writing of the manuscript; and in the decision to publish the results.provided by the National Centers for Environmental Prediction (NCEP) in the WRF forecasts.We thank the three anonymous reviewers for their constructive reviews.

Conflicts of Interest:
The authors declare no conflicts of interest.The funders had no role in the design of the study; in the collection, analyses, interpretation of the data; in the writing of the manuscript; and in the decision to publish the results.
Appendix A

Figure 7 . 43 Figure 7 .
Figure 7. (a) Wind speed (m/s; colour shading) at 300 hPa.(b) Cross-section at 23 • E of omega (Pa/s-colour shading) and zonal wind speed (stronger than 10 m/s-contours) (ERA5).Green lines mark the minimum and maximum latitudes of the study region.
5 • N-42 • N, 19.5 • E-29.5 • E, hereafter called the area of interest.The boxplots were based on the deterministic and the perturbed forecasts and they include the variation between the ensemble members.The ERA5 boxplot (leftmost boxplot) indicated a median (Q 2 ) value of 17 • C, a 75th percentile (Q 3 ) of 22.3 • C, a 25th percentile (Q 1 ) of 11.1 • C, a maximum (Q 4 ) of 27.1 • C and a minimum (Q 0 ) of 0.7 • C in the period of 8-23 May 2020.For the initialization period of 9-15 April (hereafter referred to as week 1), the median of all the S2S forecasts was close to the Q 1 of ERA5 and their Q 3 was below the ERA5 median.On the other hand, the extreme values (Q 0 and Q 4 ) of the S2S forecasts (global and WRF) were closer to the extremes of ERA5 with the exception of the HMCR model.Similar results were obtained in the 16-22 April initialization period (week 2)

Figure 11 .Figure 11 .
Figure 11.Quantile-Quantile plots of the distribution of (a) predicted temperatures at 00 UTC from all the models at 850 hPa and (b) maximum daily temperatures at 2 m from WRF 25 km and 5 km against the distribution of reanalysis (ERA5) 850 hPa temperature and maximum daily 2 m temper-Figure 11.Quantile-Quantile plots of the distribution of (a) predicted temperatures at 00 UTC from all the models at 850 hPa and (b) maximum daily temperatures at 2 m from WRF 25 km and 5 km against the distribution of reanalysis (ERA5) 850 hPa temperature and maximum daily 2 m temperature, respectively, from week 3 (23-29 April).The mean squared error (MSE in • C 2 ) values are presented in the legend.All the forecasts and the ERA5 reanalyses were interpolated to the same grid (34.5 • N-42 • N, 19.5 • E-29.5 • E) with a grid spacing of (a) 1.5 • × 1.5 • and (b) 0.25 • × 0.25 • .

Figure 13 .
Figure 13.(a) ROC diagram of 850 hPa temperatures predicted by WRF (25 km and 5 km grid spacing) and the six meteorological centres for initialization dates of 23-29 April 2020.(b) Timeseries of the AUC scores across the three initialization periods (9-15, 16-22 and 23-29 April 2020).Threshold used: 95th percentile of the 850 hPa temperature of ERA5.The validation period was 8-23 May 2020 at 00 UTC of each day.All the forecasts and the ERA5 reanalysis were interpolated to the same grid (34.5°Ν-42° N, 19.5° E-29.5°E) with a grid spacing of 1.5° × 1.5°.

Figure 13 .
Figure 13.(a) ROC diagram of 850 hPa temperatures predicted by WRF (25 km and 5 km grid spacing) and the six meteorological centres for initialization dates of 23-29 April 2020.(b) Timeseries of the AUC scores across the three initialization periods (9-15, 16-22 and 23-29 April 2020).Threshold used: 95th percentile of the 850 hPa temperature of ERA5.The validation period was 8-23 May 2020 at 00 UTC of each day.All the forecasts and the ERA5 reanalysis were interpolated to the same grid (34.5 • N-42 • N, 19.5 • E-29.5 • E) with a grid spacing of 1.5 • × 1.5 • .

Figure 15 .
Figure 15.Reliability diagrams of predicted temperatures at 850 hPa of WRF (25 km and 5 km grid spacings) and the six meteorological centres for initialization dates of 23-29 April 2020.Threshold used: 95th percentile of the 850 hPa temperature of ERA5.The evaluation period was 8-23 May 2020 at 00 UTC of each day.All the forecasts and the ERA5 reanalysis were interpolated to the same grid (34.5°Ν-42° N, 19.5° E-29.5°E) with a grid spacing of 1.5° × 1.5°.

F F 5 F 5 Figure 15 .
Figure 15.Reliability diagrams of predicted temperatures at 850 hPa of WRF (25 km and 5 km grid spacings) and the six meteorological centres for initialization dates of 23-29 April 2020.Threshold used: 95th percentile of the 850 hPa temperature of ERA5.The evaluation period was 8-23 May 2020 at 00 UTC of each day.All the forecasts and the ERA5 reanalysis were interpolated to the same grid (34.5 • N-42 • N, 19.5 • E-29.5 • E) with a grid spacing of 1.5 • × 1.5 • .

Figure 16 .
Figure 16.Reliability diagram of the predicted maximum daily 2 m temperatures from WRF 25 and 5 km for initialization dates of 23-29 April 2020.Threshold used: 95th percentile of the max mum daily 2 m temperature of ERA5.The evaluation period was 8-23 May 2020.All the WRF forecasts and the ERA5 reanalysis were interpolated to the same grid (34.5°Ν-42° N, 19.5° E-2 E) with a grid spacing of 0.25° × 0.25°.

Figure 17
Figure 17  illustrates the EDI of the various global perturbed and WRF forecasts r tive to the T95 of the ERA5 climatology of May 1975-2005 for the different probabi thresholds (PTs: 17%, 34%, etc., on the x-axis).The maximum (minimum) value of the is equal to 1 (−1) when the HR (FAR) is equal to 1.In comparison with the previou mentioned scores (such as HR, FAR and CSI), EDI complements them by focusing on tail of the temperature distribution, which is crucial for rare or extreme events like heatwave of this study[88].In week 3 (Figure17), KMA had the best profile (highest ED followed by ECMWF and NCEP at the lower PTs and WRF forecasts at the higher P KMA reached an EDI = 0 for the PT of 100%.UKMO reached the same point, but on lo probabilities, its EDI was around 0. ECMWF and KMA maintained an EDI = 1 (per score) for PTs up to 34%.The forecasts of week 3 tended to lose their accuracy in hig PTs (i.e., for larger numbers of ensemble members) than the ones of week 1 and week

Figure 16 .
Figure 16.Reliability diagram of the predicted maximum daily 2 m temperatures from WRF 25 km and 5 km for initialization dates of 23-29 April 2020.Threshold used: 95th percentile of the maximum daily 2 m temperature of ERA5.The evaluation period was 8-23 May 2020.All the WRF forecasts and the ERA5 reanalysis were interpolated to the same grid (34.5 • N-42 • N, 19.5 • E-29.5 • E) with a grid spacing of 0.25 • × 0.25 • .

Atmosphere 2024, 15 , 442 24 of 4 Figure 17 .
Figure 17.Extreme dependence index (EDI) vs. forecast probability of temperature at 850 hPa o WRF (25 km and 5 km grid-spacing) and the six meteorological centres for initialization dates of 23 29 April 2020.Threshold used: 95th percentile of the 850 hPa temperature of ERA5.The evaluatio period was 8-23 May 2020 at 00 UTC of each day.A perfect score is 1 and no prediction ability is All the forecasts and the ERA5 reanalysis were interpolated to the same grid (34.5°Ν-42° N, 19.5 E-29.5°E) with a grid spacing of 1.5° × 1.5°.

Figure 17 .Figure 18 .
Figure 17.Extreme dependence index (EDI) vs. forecast probability of temperature at 850 hPa of WRF (25 km and 5 km grid-spacing) and the six meteorological centres for initialization dates of 23-29 April 2020.Threshold used: 95th percentile of the 850 hPa temperature of ERA5.The evaluation period was 8-23 May 2020 at 00 UTC of each day.A perfect score is 1 and no prediction ability is 0. All the forecasts and the ERA5 reanalysis were interpolated to the same grid (34.5 • N-42 • N, 19.5 • E-29.5 • E) with a grid spacing of 1.5 • × 1.5 • .442 25 of 43

Figure A1 .
Figure A1.Anomaly Correlation Coefficients for six global S2S ECMWF, UKMO, CMA, KMA, NCEP and HMCR forecasts across different initialization dates (9 to 29 April).Unperturbed forecast (red), mean of perturbed forecasts (blue) and ACC = 0.6 (black).The ACC value for a centre is only displayed when there is a forecast and reforecast for the same initialization date.

Figure A1 .
Figure A1.Anomaly Correlation Coefficients for six global S2S ECMWF, UKMO, CMA, KMA, NCEP and HMCR forecasts across different initialization dates (9 to 29 April).Unperturbed forecast (red), mean of perturbed forecasts (blue) and ACC = 0.6 (black).The ACC value for a centre is only displayed when there is a forecast and reforecast for the same initialization date.

Table 1 .
The datasets used in this study and their characteristics.The temporal resolution of the WRF forecasts and the existing S2S forecasts refers to the initialization of each forecast.The CMA reforecasts for 20 April were not available in the S2S database.

Table 2 .
Global S2S model characteristics of real-time forecasts and reforecasts used in this study.

Table 3 .
WRF configuration characteristics and its parameters.

Table 3 .
WRF configuration characteristics and its parameters.

Table 5 .
Uncentred Anomaly Correlation Coefficients (ACCs) for initialization dates from 9 April to 29 April 2020 (first row) from different global S2S forecasts (perturbed and unperturbed forecasts).The results of the perturbed forecasts are ACCs averaged over all the corresponding ensemble members.The ACC of each individual perturbed member was calculated before averaging.Evaluation period: 8-23 May 2020 at 00 UTC.ACC ≤ 0 (red), ACC > 0 (colour shading).All the global S2S forecasts, reforecasts and the ERA5 reanalysis were interpolated to the same grid (34.5°Ν-42° N, 19.5° E-29.5°E) with a grid spacing of 1.5° × 1.5°.

Table A1 .
S2S model forecast initialization frequency during each week.Each check mark indicates the initialization day.The last column (Total) indicates the total number of ensemble members of each model per week.The number in brackets next to the name of each centre shows the ensemble members produced on each day that has a tick mark.

Table A1 .
S2S model forecast initialization frequency during each week.Each check mark indicates the initialization day.The last column (Total) indicates the total number of ensemble members of each model per week.The number in brackets next to the name of each centre shows the ensemble members produced on each day that has a tick mark.