Spatio-Temporal Variation of Extreme Heat Events in Southeastern Europe

: Many studies in the last few years have been dedicated to the increasing temperatures and extreme heat in Europe since the second half of the 20th century because of their adverse effects on ecosystems resilience, human health, and quality of life. The present research aims to analyze the spatio-temporal variations of extreme heat events in Southeastern Europe using daily temperature data from 70 selected meteorological stations and applying methodology developed initially for the quantitative assessment of hot weather in Bulgaria. We demonstrate the suitability of indicators based on maximum temperature thresholds to assess the intensity (i.e., magnitude and duration) and the tendency of extreme heat events in the period 1961–2020 both by individual stations and the Köppen’s climate zones. The capability of the used intensity-duration hot spell model to evaluate the severity of extreme heat events has also been studied and compared with the Excess Heat Factor severity index on a yearly basis. The study provides strong evidence of the suitability of the applied combined approach in the investigation of the spatio-temporal evolution of the hot weather phenomena over the considered domain.


Introduction
Recent reports by the Intergovernmental Panel on Climate Change (IPCC) and the World Meteorological Organization (WMO) confirm that the global temperature has continued to increase in the past decade and has already halved the distance to the upper limit of 2°C, compared to the pre-industrial period, above which the risk of irreversible climate change increases rapidly [1][2][3]. Numerous studies suggest that rising temperatures and most extreme heat events in the recent decades would not have occurred without human-induced climate change (e.g., [4][5][6][7][8]). Although heatwaves represented around 2% of the weather-and climate-related disasters recorded worldwide from 1970 to 2019, they have been accountable for 8% of disasters' caused deaths [9]. In addition to higher rates of thermal stress and mortality [10][11][12], high temperatures and prolonged hot weather are related to reductions in productivity [13,14], substantial agricultural losses [15,16], wildfires, damage to transport infrastructure, power failures, and sharp jumps in water and energy consumption [17][18][19][20].
Europe has experienced several intense heatwaves since the beginning of the 21st century [8,[21][22][23][24], which have been associated with about 90% of deaths and 4% of economic losses from disasters after 1970 [9]. The severity of the heatwaves is expected to increase in the second half of the century due to the increasing magnitude and variability in summer temperatures [1,25]. The rise in extreme heat events under the intermediate and high GHG emissions scenarios could affect the countries of Mediterranean and Eastern Europe significantly, even by the middle of the century, where the heat-related mortality could increase by a factor of 1.8 and 2.6, respectively, compared to the 1971-2000 period [4]. Generally, the region around the Mediterranean basin, including Southeastern Europe, appears to be one of the most vulnerable to climate change, with an increasing intensity of extreme weather and heat events [1,21,26,27] and a further increase in the previously observed summer heatwaves' frequencies and durations [28][29][30]. In southern Europe, prolonged hot weather periods are a typical summertime phenomenon due to the warmer climate and larger-scale atmospheric dynamics that favor the thermodynamic processes and land-atmosphere feedback [26,31]. According to the updated Köppen-Geiger classification [32], the European part of the Mediterranean region mostly belongs to the mid-latitude temperate climate zone, comprising not only Mediterranean climate types with a dry hot/warm summer (Csa/Csb) and large areas with no dry hot/warm summer (Cfa/Cfb), but also areas with humid or dry continental climates with hot/warm summer (Dfa/Dfb or Dsa/Dsb, respectively) in the north [33].
Despite the rise in extreme heat events worldwide, there is still no universal definition of them, but the common agreement is that they are prolonged periods unusually hotter than normal. All definitions consider at least one air-temperature characteristic (daily minimum, maximum or average) and require a certain number of consecutive days where a particular threshold is exceeded, and most of these thresholds revolve around the upper tail of temperature distributions [34]. Three heat event characteristics-magnitude, duration, and frequency-are generally accepted, but there is no consensus about the thresholds or the event's spatial and time extent. Definitions based on fixed-temperature thresholds are of essential importance in assessing many socio-economic and environmental impacts. Percentile-based thresholds, corresponding to the local climatology, facilitate a comparison of different climates and regions, but percentile-based analyses are deprived of the physical intuition of actual temperatures [31]. The development of monitoring and early-warning systems has further expanded the set of heat event definitions [22,[35][36][37][38].
The anthropogenic influence on the climate, as well as the natural fluctuations of the Earth's climate system, are distinguished by substantial regional variations in the different time scales [1]. Therefore, it would be inappropriate to restrict the definitional diversity, as it reflects the various physical mechanisms proposed as the proximate and underlying drivers of heat events [31]. Even a small change in the definition could have a considerable impact on the results of climate analyses or risk assessments [39,40]. The heat event indices suitable for both public and research applications should be easily understood, useful as impact indicators, seamlessly interpretable by climate records, and predictable with reasonable accuracy [37].
The WMO Expert Team on Climate Change Detection and Indices (ETCCDI) has defined 27 internationally agreed indices of climate extremes, including heat-related extremes, to facilitate the regional climate change analyses (http://etccdi.pacificclimate.org/list_27 _indices.shtml) (accessed on 16 July 2022). Later, the WMO Expert Team on Sector-specific Climate Indices (ET-SCI) enlarged the list of indices designed for heat event analysis based on a new definition, accounting for excessive heat accumulation and short-term thermal acclimatization (https://climpact-sci.org/indices/) (accessed on 16 July 2022). The indices developed on health-based definitions receive increasing attention, especially for earlywarning purposes in large cities [41,42]. Numerous gridded datasets of climate extremes indices at global or regional scales, with different spatial resolutions, are already freely accessible (e.g., [43][44][45][46][47][48]). The ETCCDI/ET-SCI indices are defined and computed on annual or monthly time scales, which can lead to ambiguous results or the impossibility of being used in some climate applications requiring different time steps. Moreover, the recently developed indices of extreme heat events are not yet included in the gridded datasets. The thermal comfort indices usually refer to an "average" healthy person, thus excluding large groups of people with different or impaired thermoregulation, who are most vulnerable to extreme heat [49].
The deathly heatwaves during the last two decades changed the focus of research on heat extremes from pure climatology to impact assessment and event attribution. After the exceptionally hot summer of 2007 in Southeastern Europe, heatwaves have been an object of many regional studies that used different approaches and indicators. Founda and Giannakopoulos [50] placed summer 2007 in the climatology of the previous century and examined similarities with the Mediterranean summers of the future. Pecelj et al. [51] assessed the bioclimatic conditions in Serbia during the summer heatwave events defined by the daily thresholds of the Universal Thermal Climate Index (UTCI) and the maximum air temperature. Urban et al. [52] concluded that the cumulative Excess Heat Factor explains the magnitude of excess mortality during heatwaves in the Czech Republic better than other characteristics such as heatwave duration or daily mean temperature. Tolika [30] used the Excess Heat Factor (EHF) index to detect, identify, and describe heatwaves in Greece. Piticar et al. [53] found a substantial change in EHF-based indices and a statistically significant increase in the number, frequency, and duration of heatwaves in Romania. Using a city-specific heatwave hazard index, Morabito et al. [21] revealed a significant increase in heatwave hazards in the southeastern European capitals. Katavoutas and Founda [54] proposed a new indicator of urban heat stress intensity, with a focus on a vulnerable area of the eastern Mediterranean, defined as the difference between urban and non-urban thermal stress levels based on the UTCI and Humidex indices.
The present study further develops previous research [55,56] in the context of the suitability of developed criteria and indicators for the quantitative assessment of hot weather in Bulgaria for an overall climate analysis of heat events in Southeastern Europe. The ability of the proposed hot spell indicator to detect extreme heat events has also been studied and compared with the Excess Heat Factor (EHF) index.

Data Sources and Data Pre-Processing
We selected a domain over Southeastern Europe (SEE) with latitudinal and longitudinal boundaries of 35°N-50°N and 15°E-30°E, respectively ( Figure 1, red-line frame). This area almost entirely falls into the Mediterranean region, defined by the Mediterranean Experts on Climate and Environmental Change (MedECC) [4], which includes parts of central and eastern European countries with a continental climate.
Daily maximum and minimum air temperature data from 70 selected meteorological stations, which covered the study area relatively evenly, were used to investigate the spatio-temporal variations of extreme heat events (EHEs) in the period 1961-2020. Data from 12 Bulgarian meteorological stations, included in our previous research [56], were provided by the National Institute of Meteorology and Hydrology (NIMH)- Figure 1, represented by grey triangles.Data from stations outside Bulgaria were downloaded from three freely accessible sources: the European Climate Assessment and Dataset (ECA&D) project [57], the U.S. National Climatic Data Center (NCDC) Global Historical Climatology Network daily dataset (GHCNd) [58], and the U.S. National Centers for Environmental Information (NCEI) Global Surface Summary of the Day (GSOD) dataset [59]. These sources are used in many applications requiring daily data (e.g., [60][61][62]). The providers of the first two datasets apply strict procedures to ensure high-quality data that consist of the detection of outliers, data consistency, and other checks on the main meteorological elements [63,64]. The quality control procedures in GSOD are more limited, but daily summaries are supplemented by useful attributive information about the processed input data (https://www.ncei.noaa.gov/data/global-summary-of-the-day/doc/readme.txt) (accessed on 16 July 2022). ECA&D implements a two-step testing procedure for evaluating time series homogeneity [63]. GHCNd and GSOD do not support this type of information about time series homogeneity.
ECA&D is the primary source of observational data for the European region, but the number of stations in Southeastern Europe with freely accessible data and complete time series by 2020 is limited. Therefore, the GHCNd and GSOD datasets were used to complete a sufficiently dense set of stations for presenting the EHEs climatology.
The structure of the GHCNd dataset includes a source attribute, which indicates that hourly and daily data from different sources such as the Global Climate Observing System (GCOS), ECA&D and Global Summary of the Day (NCDC DSI-9618) are merged in the time series available for the SEE domain. The latter source's daily maximum and minimum temperatures are included in GHCNd, only when provided as a nominal 24 h climatological summary as indicated in the SYNOP messages [64].  [32], data source: http://koeppen-geiger.vu-wien.ac.at/ present.htm (accessed on 16 July 2022); the selected stations are marked as follow: grey triangles-12 stations from the NIMH network; blue and red dots-16 stations from GHCNd and one from GSOD datasets, respectively; asterisks-41 stations from ECA&D database. GSOD data have been available generally since 1929, but the time series are sufficiently filled from 1973 to the present. In addition, when readings of the maximum or minimum thermometer are missing, the daily maximum/minimum temperature in GSOD is determined by the dry-bulb temperature from synoptic reports. If we separate the data by forming new samples of measured maximum/minimum temperatures (GSOD-m) and substituted maximum/minimum temperatures (GSOD-s), the comparison by stations indicates that the GSOD-s maximum temperatures are cooler while the minimum temperatures are warmer than GSOD-m (Appendix A, Figure A1). Due to these biases, we excluded all minimum/maximum temperatures obtained from incomplete 24 h observations. Moreover, we restricted the use of GSOD data mainly to auxiliary time series due to the large data gaps before 1973.
The selection of stations was organized as follows. Firstly, we explored the data sources looking for stations with available maximum and minimum temperature data since 1961 that fall into the defined above SEE domain. Next, we selected stations with no more than 20% of their data missing in the whole observational period and altitudes below 800 m, thus focusing the study's scope on the non-mountainous areas. Then, we reduced the number of stations with overlapping observational periods to one within a radius of 50 km to preserve a sufficiently high density of stations. The exclusion criteria accepted were the availability of data by 2020 and the homogeneity of the time series.
We selected 17 stations from the ECA&D blended dataset with homogeneous time series and 24 more stations for additional testing, assuming that inhomogeneities could be triggered by local or regional climate variations [63].
Generally, the merging of time series was avoided in our study except for Istanbul and Tirana stations when long enough, partially overlapping observational periods, available in the used data sources, were adjusted by simple linear scaling. The only case of extrapolation of time series over a large historical period (from 1961 to 1973), in order to preserve the relatively evenly spread of stations in the domain, concerned the Skopje International Airport station accessible by the GSOD dataset. Data reconstruction was made under the missing values imputation process before calculating the EHEs indices, using the R-package 'missMDA' [65]. For most stations in the southern part of the domain, the "no more than 20% missing data" selection criterion was not met, and several time series with 21-37% of their values missing, especially for minimum temperature, were used because of the above-mentioned reason.
The missing values were filled in by an iterative principal component analysis (PCA) imputation technique, which takes into account both the internal structure of data and the links between time series (Appendix B). The PCA-based spatio-temporal reconstruction of time series in data-sparse periods using periods with good data coverage and regionalization of time series with similar climatological characteristics is successfully applied by Tveito et al. [66]. In our study, the Köppen-Geiger climate classification and correlation between station data were used as regionalization criteria. The enhanced global dataset for the land component of the fifth generation of the European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric reanalysis ERA5-Land [67] was used as a reference dataset to improve the quality of the infilling of time series with medium to large data gaps.
ERA5-Land shows higher skill than other reanalysis products across geographicallydiverse regions, which suggests that not only improvement in data assimilation but higher spatial resolution is critical to accurately reproducing extreme events observed from surface weather stations [68]. Although reanalyses contain some uncertainties, they are effectively used in gaps-filling procedures, even on sub-daily timescales [69]. The 2 m air temperature raw data from ERA5-Land were pre-processed in order to obtain daily maximum and minimum temperatures, as described in [70]. Since the ERA5-Land data are stored in netCDF files, all netCDF operations were performed by means of the software 'climate data operators (CDO)' [71].
Four homogeneity tests included in the R-package 'trend', Bartels test (a rank version of von Neumann's ratio test), a Standard Normal Homogeinity Test (SNHT), a Buishand U-test, and Pettitt's test [72], were applied to the time series of annual maximum and minimum temperatures from the reconstructed time series and abovementioned stations with problematic or missing homogeneity testing. The accepted rejection criterion was three or four failed tests at a 1% significance level.
Finally, in addition to the 12 stations from the NIMH network, 41 stations from the ECA&D database (Figure 1, asterisks), 16 stations from GHCNd, and one from GSOD ( Figure 1, blue and red dots) were chosen to obtain a good enough coverage of the study area. The average distance between neighbour stations is 118 km. Most stations are located below 500 m a.s.l. Moreover, 17% of stations are located at airports, and 24%-are in urban areas (Appendix A, Table A1). The stations included in the study fall into three main groups of the Köppen-Geiger climate classification [32]-59% are characterized by temperate climates without a dry season (Cfa and Cfb), 21% have Mediterranean climates (Csa), and 20% of the stations have continental climates (Dfb).

Methodology
The shortcomings of threshold-based indices are well-known, but they are often more suitable when assessing the immediate impacts of temperature extremes on ecosystem resilience, human health, and quality of life. Depending on the research scope and aims, temperature thresholds can reflect either biophysical constraints or local conditions derived from local climatology. The latter ones are generally inappropriate for larger-scale analyses. Using a definition of EHEs that only identifies extremely hot days may underestimate the risks associated with extreme heat, and inversely a less stringent threshold may overestimate the risks [73]. Although the risk assessment of extreme heat was not among the aims of the study presented in [55], the proposed approach may be helpful in sector-specific climate assessments. The main idea was to improve the methodology of hot spell analysis in order to reveal EHEs' climatic features in the non-mountainous part of Bulgaria, given its climatic diversity. Later, we propose three hot weather indicators and demonstrate their applicability based on data from 115 NIMH stations and the E-OBS dataset [56]. The present study aims to test the suitability of the developed indicators for an overall analysis of hot weather and EHEs in Southeastern Europe, bearing in mind that the share of Köppen's subtypes (Cfa and Cfb) prevailing in Bulgaria is almost 50% in the whole domain. The ability of the proposed hot spell indicator to detect extreme heat events has also been studied and compared with already proven climate indices, such as EHF.

Climatologically Justified Threshold Indicators for Bulgaria
The threshold indicators proposed in [56] were developed on daily maximum air temperature data (tx) from 36 meteorological stations representative of the non-mountainous regions in Bulgaria using statistical modeling (Appendix C). In accordance with the obtained estimates, three climate indicators of hot weather were defined, as shown below.

1.
The annual number of hot days (nhd32)-i.e., the annual count of days when tx > 32°C.
The first indicator represents a general measure of unfavorable daytime thermal conditions; its long-term increase is related to the increasing frequency of prolonged hot spells. The second indicator delineates the upper bound of hot weather persistence. Finally, the third indicator allows a combined (intensity-duration) evaluation of EHEs in five categories. Although the hsd indicator is defined on a yearly basis, the technology of calculations allows the separation of events and their analysis. Thus, each category can be described by the EHE's intensity (low/moderate/elevated/high/extreme) and corresponding metric (hsd32/34/36/38/40).
The significance and magnitude of trends in the threshold indicators were assessed through the non-parametric Mann-Kendall's test and Sen's Slope Estimator (see refs. [74,75] for details) using the R-package 'trend'.

Excess Heat Factor (EHF)
As an operative heatwave index, the EHF was first used in Australia. The EHF definition was proposed in 2009 as an improvement of the previous absolute or relative heatwave indices on a national level. The EHF measures the heatwave intensity at each location with an additional component to account for adaptation (Appendix D). Later, the EHF was utilized to examine the frequency, intensity, and duration of heatwaves on a global scale [34,36]. Moreover, the EHF provides public health stakeholders with a simple but efficient method to estimate excess heat exposure levels compared to other extreme-heat metrics or bio-climatic indices (e.g., [52,76]). The EHF's severity (EHF sev ) metric permits the comparison of heatwave events and their impacts across the world and can be readily implemented within heatwave early warning systems [37,76]. For a given day i, the EHF severity (EHF sevi ) is defined as follow: where EHF p85 denotes the 85th percentile of all positive EHF values calculated for a 30-year historical period. Nairn et al. [37] defined three levels of EHEs severity as: − L1 (low intensity): when 0 < EHF sev < 1; − L2 (severe): when 1 ≤ EHF sev < 3; − L3 (extreme): when EHF sev ≥ 3.
The annual summaries of EHF sev characteristics, such as mean and maximum magnitude, were calculated for the SEE domain based on daily maximum and minimum temperatures from complete time series. The results are presented via the zones of the Köppen-Geiger climate classification.

Software Products Used in the Research
All calculations completed on observational data were made in the free software environment R, version 3.6.2 [77], and RStudio 1.2 [78]. The maps shown in figures were prepared using QGIS 3.4.9-Madeira [79].

Results and Discussion
Many studies in the last few years have been dedicated to the increasing temperatures in Europe since the second half of the 20th century. As stated in [80], the increase rate is almost twice as rapid from 1985 to 2020 as in the whole period after 1951 (0.027 and 0.051°C/year, respectively). The warming intensities depend on the season, with the largest summer warming of around 0.060°C/year recorded in Central and Southern Europe. These findings are consistent with our results about trends in annual mean temperatures in the considered SEE domain obtained from the complete data series for the periods 1961-2020 and 1985-2020 (0.032 and 0.055°C/year, respectively). As far as the focus of this research is on applying threshold indicators for EHEs analysis in a climatically diverse region, such as SEE, we additionally explored the maximum temperature features through zones of the Köppen-Geiger climate classification. The box plots in Figure 2 present the distribution of the calculated upper percentiles of the daily maximum temperatures at stations in the four climate zones. One can see that only the higher percentiles, usually used in EHEs definitions, fall into the range of the above-stated tx-thresholds.  Table 1 shows the magnitudes of the trends in annual mean (T a ), annual mean maximum (T mx ), and annual maximum (T x ) temperatures in the period 1961-2020 by climate zones. The largest average magnitudes are obtained for Cfb, followed by the Dfb, Cfa, and Csa climate zones.
Most stations with maximum trend magnitudes are located in the central part of the SEE domain, including one station (S10) in the hottest region of Bulgaria along the Struma River valley. Unlike T a and T mx , which have statistically significant trends for all stations, the percentage of stations with a significant T x trend varies between 40% (Csa) and 92% (Cfb). We demonstrated in [56] the suitability of estimated tx-thresholds (32, 34, 36, 38, and 40°C) and corresponding duration thresholds (6, 5, 4, 3, and 2 consecutive days) to classify EHEs in the period 1961-2019. Consequently, our expectation is to obtain an accurate enough picture of EHEs that changes both by individual stations and climate zones in the SEE domain by applying the threshold indicators. Figure 3 illustrates the long-term variation of the averaged-by-climate-zone indicators nhd32 and chd32 (calculated separately for each station), which reveals a clear upward, although not monotonic, tendency.  The average magnitudes of the chd32 trends varied between +0.3 and +1.3 days/10 years (in Dfb and Csa, respectively), and the maximum magnitude of +3.2 days/10 years was obtained in the Csa climate zone. The chd32 maxima by stations varied between 2 and 62 days, with lower values occurring again in coastal and northern stations. The distribution of maxima by years was more heterogeneous compared to the nhd32 (23% in 2012, 21% in 2015, 14% in 1994, 10% in 2007, 9% in 2010, 3% in 2002 and 2003, and few single cases), but EHEs in these years were confirmed by many regional studies [30,33,51,52].
The comparison of multiyear means of nhd32 and chd32 for periods 1961-1990 and 1991-2020 is shown in Figure 4.  The number of hot days increased by 12-13 during the second period in the warmer climate zones, reaching about 27-30 days. This result can be viewed in the context of the rapid increase in summer temperatures in Europe, which can lead to month-long spells or even entire seasons with abnormally high temperatures [80]. The largest change for chd32 in absolute terms (+4.3 days) was registered in the Csa climate zone. Regarding percentage changes, the number of hot days in the cooler climate zones shows the highest increase of about 1.5-2 times during the second period. Figure 5 presents the evolution of the hot spell duration (hsd) indicator by categories, averaged by climate zones. As a whole, the larger hsd32 is, the higher intensity hot spells can be expected to be, but short, very intense EHEs are also frequent. The highest values for hsd32 in all climate zones were reached in 2012, followed by 2015; for hsd34-in 2012, followed by 2015 (in Cfb and Dfb) and 2007 (in Cfa and Csa); for hsd36-2012 (in Cfa and Dfb) and 2007 (in Cfb and Csa); for hsd38/40-2007, followed by 2012, 2017, and 2000 (only for hsd40). Except for the Mediterranean climate zone Csa, all hot spells at higher categories hsd38/40, and almost 90% of the others, emerge after 1985. The average trend magnitudes calculated by the hsd categories ranged between +0.1 and +4.0 days/10 years (in Cfb and Csa, respectively), while the maximum magnitude of +8.7 days/10 years was obtained for hsd32 in the Cfa climate zone ( Table 2).
As seen in Table 2, the maximum-trend magnitudes for all indicators were recorded in one or two stations in each climate zone, associated with climate transitions and EHEs' "hot spots" in SEE [33,81]. These areas are distinguished by a significant increase in the duration and frequency of hot spells in recent decades. Figure 6 illustrates the detection of "hot spots" by comparing hsd32 values for the periods 1961-1990 and 1991-2020 by stations.   Table 1) of meteorological stations used in the study on the background of the Köppen-Geiger classification for SEE (see Figure 1). During the second period, a significant shift in hsd32 statistics is revealed, mainly along the coastal zone in Croatia, Albania, Northern and Central Greece, the Aegean Region in Turkey, and the central part of the Balkan Peninsula. The hottest place in Bulgaria is that between the Struma River Valley and the Kresna Gorge (presented here by S10, Sandanski), where the hsd indicator reached maxima for all categories-96 days for hsd32 and 76/46/19/11 days for hsd34/36/38/40. The duration of individual heat events with tx ≥ 40°C reached 6-8 consecutive days [56].

Comparison between EHF Severity and Categories of hsd Indicator
The tx-thresholds that underlie the proposed method for detecting and classifying EHEs by intensity--duration combinations coincide or are very close to some proven physiological thresholds, the crossing of which for a certain time could have significant adverse health and economic consequences. Many regional and worldwide studies have linked the range of daily maximum temperatures of 32-40°C with an increased risk of worsening and mortality in cardiovascular, respiratory and other illnesses (e.g., [11,12]); agricultural losses [15]; and disruptions in energy production and supply [18]. On the other hand, EHF severity has been shown to be useful as an exposure index that scales well against the human health impact [37], as well as an EHEs indicator in climate studies [82]. Therefore, we choose the EHF sev for the suitability assessment of the proposed threshold indicators. Figure 7 presents the summary estimate of the hot weather in SEE by the mean EHF sev , calculated for 1961-1990 and 1991-2020 on a yearly basis over all periods of at least 6 days with tx ≥ 32°C (thus being synchronized with the lowest intensity hot spell category hsd32). The obtained results show that, despite increasing in EHEs in recent decades, the overall warm-season severity remains relatively low in SEE. Percentage changes indicate that the Cfa and Cfb climate zones are more prone to regional warming. A detailed comparison between the categories of the hsd indicator and the levels of the yearly maximum EHF severity demonstrated good spatio-temporal conformity, as seen in Figure 8. For clarity, hot spell categories are denoted by C1 through to C5. Although the correspondence between the categories and the EHF sev levels is not straightforward, the percentage distribution of different categories by severity levels shows that the hsd indicator significantly underestimates the severity of low-intensity hot spells (C1 category) but tends to overestimate the severity of high-intensity events (C4 category). The me-dian and minimum value of the averaged hsd32 metric increase relative to the prevailing severity level.

Conclusions
In this study, daily maximum and minimum air temperature data from 70 selected meteorological stations, which cover the SEE domain relatively evenly, were used to investigate the spatio-temporal variations of extreme heat events in the period 1961-2020. Three threshold-based indicators, quantifying the unfavorable daytime thermal conditions (nhd32), the hot weather persistence chd32, and the intensity of EHEs in five categories (hsd32/34/36/38/40), were analyzed and summarized using the zones of the Köppen-Geiger climate classification. All three indicators show statistically significant linear trends (p < 0.05, Mann-Kendall method) in the period 1961-2020. The number of hot days increased by 12-13 during the period 1991-2020 in the Cfa and Csa climate zones, reaching about 27-30 days. Both indicators, nhd32 and chd32, show a stronger worsening of daytime thermal conditions in the areas with hot summers.
Many worldwide and regional studies have linked the range of the daily maximum temperature of 32-40°C with the increased risk of worsening and mortality in cardiovascular, respiratory and other illnesses; agricultural losses; and disruptions in energy production and supply. We demonstrated the suitability of the tx-thresholds (32, 34, 36, 38, and 40°C) and the corresponding duration thresholds (6,5,4,3, and 2 consecutive days) to classify EHEs. Except for the Mediterranean climate zone Csa, all hot spells at higher categories hsd38/40, and almost 90% of the others, emerge after 1985. During the period 1991-2020, a significant shift in hsd32 statistics is recorded in "hot spots", mainly along the coastal zone in Croatia, Albania, Northern and Central Greece, the Aegean Region in Turkey, and the central part of the Balkan Peninsula. A detailed comparison between the categories of the hsd indicator and the levels of the yearly maximum EHF severity demonstrated good spatio-temporal conformity.
As far as we know, such a survey on the various aspects of using threshold indicators for a quantitative assessment of EHEs in the considered region of SEE has not been conducted until now. This research also informs the direction of our next work on the possibilities of a combined application of independently defined metrics, such as the threshold indicators and the EHF index presented here, in climate analyses and the implementation of the obtained knowledge in early warning systems. Acknowledgments: Due to the fact that this study is entirely based on free available data and software, the authors would like to express their deep gratitude to all organizations and institutions (ECA&D, NCDC, NCEI, ECMWF, the R Core and QGIS teams, MPI-M, and others) which provides free-of-charge software and data. Not least, we thank the three anonymous reviewers for their valuable comments and suggestions which led to an overall improvement of the original manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.   Figure A1. Comparison between samples of measured maximum/minimum temperatures (GSOD-m) and substituted maximum/minimum temperatures (GSOD-s) for the warm half year (May-October) by stations with different climates, environments, and observational routines; tx-daily maximum temperature, tn-daily minimum temperature. Statistical modeling of hot-weather phenomena involved several steps: (1) the construction of empirical distributions of maximum temperatures in July and August from the selected 36 stations for 1931-1980 (the period is considered to be less affected by global warming [2,4,83]); the calculation of return levels corresponds to return periods 2, 5, 10, 20, and 50 years using the Generalized Extreme Value (GEV) distribution, (3) determination of appropriate thresholds for tx; (4) determination of hot spell duration thresholds for the period 1961-2019 at the different tx-thresholds; (5) validation of obtained thresholds for the period 1961-2019. The estimated tx-threshold values are 32, 34, 36, 38, and 40°C, and the relevant hot spells duration thresholds are 6, 5, 4, 3, and 2 consecutive days.

Appendix A. Data
In the period 1931-1980, the daily maximum temperatures (tx) of July and August in the selected stations fall most often into the range 26-31°C, and the absolute maxima are typically between 38°C and 41°C. During the second period (1981-2019), a significant shift of tx-distributions toward higher values is observed ( Figure A3, left panel). In order to provide a reliable analysis for the period 1961-2019, the temperature thresholds were determined using statistical modeling, not from the upper-tail percentiles of the empirical distributions for 1931-1980 ( Figure A3, middle panel). From the estimated 2-, 5-, 10-, 20-, and 50-year return levels of maximum July-August temperatures by the GEV model, thresholds have been chosen in such a way as to be achievable in 85-90% of stations for the shorter return periods and in 50-60% of stations-for the longer ones. The right panel of Figure A3 presents the percentiles of hot spell durations at different tx-thresholds in the period 1961-2019. Hot weather lasts 1-3 days most frequently, and the 90th percentile varies between 3 and 6 days.

Appendix D. Description of EHF Index Calculation
The EHF is a measure of heatwave intensity, incorporating two ingredients [76,84]. The first ingredient (E sig ) is a measure of how hot a three-day period is with respect to the 95th percentile of the daily mean temperature at each particular location. The percentile is computed for a reference period of 30 years (1961-1990 in our study) using the daily mean temperatures for all days in the year. The second ingredient (E accl ) is a measure of how hot the three-day period is with respect to the recent past (specifically the previous 30 days). This takes into account the idea that people acclimatize to their local climate, with respect to its temperature variation across latitudes and throughout the year, but may not be prepared for a sudden rise in temperature above that of the recent past. We used the retrospective version of equations for the three-day mean temperature (over days i to i − 2): EHI accli = (T i + T i−1 + T i−2 ) − (T i−3 + . . . + T i−32 )/30, and (A2) where T i denotes the daily mean temperature on day i, T 95 represents the long-term 95th percentile of the daily mean temperature, EHI sigi is defined as the exceeding of the previous three-day mean temperature (starting on day i) above T 95 , and EHI accli on day i is calculated as the difference between the three-day mean temperature and a mean of the prior 30 days. Consequently, the daily EHF was calculated using the following equation: