Projection of Future Heat Waves in the United States. Part I: Selecting a Climate Model Subset

: The widespread increase in global temperature is driving more frequent and more severe local heatwaves within the contiguous United States (CONUS). General circulation models (GCMs) show increasing, but spatially uneven trends in heatwave properties. However, the wide range of model outputs raises the question of the suitability of this method for indicating the future impacts of heatwaves on human health and well-being. This work examines the ﬁtness of 32 models from CMIP5 and their ensemble median to predict a set of heatwave descriptors across the CONUS, by analyzing their capabilities in the simulation of historical heatwaves during 1950–2005. Then, we use a multi-criteria decision-making tool and rank the overall performance of each model for 10 locations with di ﬀ erent climates. We found GCMs have di ﬀ erent capabilities in the simulation of historical heatwave characteristics. In addition, we observed similar performances for GCMs over the areas with a partially similar climate. The ensemble model showed better performance in simulation of historical heatwave intensity in some locations, while other individual GCMs represented heatwave time-related components more similar to observations. These results are a step towards the use of contemporary weather models to guide heatwave impact predictions.


Introduction
Observations of global warming from increased greenhouse gases (GHGs) are widespread and the recent four decades are among the warmest in the recorded history [1]. From the 1950s to 2000s, the global annual average minimum and maximum land surface air temperatures show decadal increases of 0.20 • C and 0.14 • C, respectively [2]. For the northern hemisphere, the intergovernmental panel on climate change (IPCC) found that the warmest consecutive 30 years of the last 1400 years were from 1983 to 2012 [3]. Kunkel et al. [4] reported that the average temperature increase across the contiguous United States (CONUS) from 1895 to 2011 was 0.11, 0.07, 0.05 and 0.05 • C per decade for winter, spring, summer and fall, respectively. Additionally, that study found a significant difference in the regional average temperature increase over the same period equivalent to 0.08, 0.07, 0.08, 0.1, 0.05, 0.05, 0.09 • C per decade for northeast, southeast, midwest, Great Plains north and south, northwest and southwest of the CONUS, respectively. General circulation models (GCMs) project various increases in average regional temperature, based on differences in the Representative Concentration Pathways (RCPs) for the CONUS. These projections indicate more frequent hot days, with significant spatial variability [5,6]. In the northeast, GCMs project warming of 3 to 10 • F (1.7 to 5.6 • C) by the 2080s with an increase in the number of days per year above 90 • F (32 • C) [7]. In the southeast region, the average temperature increase is projected to range between 4 and 8 • F (2.2 to 4.4 • C) by the year 2100, while the interior states of the southeast USA can expect temperature increases of 1 to 2 • F (0.55 to 1.1 • C), with more hot days and fewer freezing events [8]. These models project the midwest region may warm by 5 • F (2.8 • C) and experience at least 25 more days above 95 • F (35 • C) by the end of this century, compared to the 1971-2000 period. Similar to the southeast, the midwest will experience longer frost-free seasons [9]. According to Third National Climate Assessment report in the United States, the Great Plains region will experience extreme temperatures by end of this century with increased frequency of days above 95 • F toward the north and 100 • F toward the south (35 and 37.7 • C) and more nights above 80 • F (26.6 • C) across this region [10]. The temperature increase in the northwest and northeast USA is similar, with the projected average annual temperature expected to rise between 3.3 and 9.7 • F (1.8 to 5.4 • C) by 2070 to 2099, compared to the 1970 to 1999 period. Even larger changes are expected during the summer months [11]. In the southwest, some GCMs indicate increases of 2.5 to 5.5 • F (1.4 to 3.1 • C) by 2041-2070 and a cumulative increase of 5.5 to 9.5 • F (3.1 to 5.3 • C) by 2070-2099. These projections are based on scenarios with continued growth in global emissions followed by inevitable more frequent summer heat waves and less wintertime cold days [12].
A warmer future climate is expected to be attended by greater frequency of occurrence and intensity of Extreme Heat Events (EHE) across the CONUS. These events will likely be followed by more frequent and severe heat waves [13,14]. The rise in the future heat waves is expected to be attended by other meteorological extremes such as droughts, causing more frequent and intense concurrent extreme events for the CONUS [5,[15][16][17]. Even a significant reduction in GHG emissions will not stop this warming pattern in the near future, and it would be prudent to prepare for warmer future weather [18]. An important question is how to make an appropriate quantitative representation of the spatial distribution of temperature indices across the CONUS. Russo & Sterl [19] used the ECHAM5/MPI-OM climate model to calculate extreme weather indices proposed by Expert Team on Climate Change Detection, Monitoring and Indices (ETCCDMI) and described by Alexander et al. [20] in a global scale. They found that the number of warm days (defined as days with daytime temperature above the 90th percentile of the reference period (TX90p) in this study ) by 2100 will be up to 100 days more than 1951-1975. Sillmann et al. [21] analyzed 19 CMIP5 models to calculate ETCCDMI based on three RCP scenarios (RCP2.6, 4.5 and 8.5) globally in which the CONUS is divided into three regions including East north America (ENA), Central north America (CNA) and West north America (WNA). They found that over the period 2081-2100, the annual maximum of daily maximum temperature (TXx, see Alexander et al. [20] for details) in WNA, CNA and ENA will be between 0.5 and 10.5 • C further compared to reference period 1981-2000. In addition, this study revealed that days with frost will decrease significantly in WNA by the end of this century. Diffenbaugh & Ashfaq [22] studied 22 GCMs across the CONUS and found that by 2039 intense heat events will be more frequent than ever.
Regional studies also found that by the end of this century, heat wave intensity will be 3 to 8 • C greater, and the number of heat wave days may increase by up 30 to 60 days compared to historical observations over much of the western and southern CONUS [23]. Similarly, Zubler et al. [24] found that in 2070-2100 the summertime temperature and average heat wave intensity in the great plains will be 20% greater and 0.6 • C more than during the period of 1980-2010. Gershunov & Guirguis [25] used a 12 × 12 km 2 resolution observed dataset and four downscaled GCMs to study future heat waves in the state of California and found that urban heat island effect will be more frequent and intense by the end of this century. Targeted studies in New York City suggest that the number of days with a maximum temperature greater than 32.2 • C will more than double from the 1990s to the 2050s [26].
Proper climate change information is a prerequisite for analyses that project the range of future climatological extremes at an acceptable scale. Pierce et al. [27] proposed that a spatial scale on the order of 10 km is required to assess the numerous impacts of climate change on societies. However, most GCMs have a spatial resolution of 100 km. Downscaling methods address this issue by providing smaller spatial scale data for different purposes. However, the question of the reliability of future weather projections remains due to uncertainties in climate models and limits the availability of robust, actionable and reliable projections of the future climate [28,29]. These uncertainties in climate projections originate from three main sources, including internal variability of the climate system, model uncertainties and future scenarios uncertainties [30] and in most cases, it is not possible to remove these uncertainties [31].
Different methods are available to decrease the uncertainties identified above, particularly on a regional scale. Regional climate models reduce prediction biases by developing higher spatial resolution analyses and better represent the local physical processes [23]. However, national and global scale studies lack sufficient resolution for spatial comparison. One common approach to overcome this problem is to calculate an arithmetic multimodel mean of downscaled data, assuming the same weight for each model [21,23,[32][33][34]. This popular method is controversial because there are many arguments that due to different structures of the GCMs, ensemble modeling physically may be implausible [35]. In addition, there are known issues and uncertainties in a few of these models and accordingly, the use of an even large number of subsets could result in very poor representation of the range of projections [36,37]. At the same time, we know some models work better than others, particularly for some regions and purposes [38]. For example, there are many values in focusing on the "seamless models" which have both weather forecasting and climate change projection components [39]. On the other hand, if the impact of human interaction with the atmosphere is of interest, those models that incorporate human processes with earth's physical properties and climate projections have higher priorities to be used [40]. Accordingly, we can focus on the most reliable and accurate GCMs when we know the aim and scope of a specific study [41,42], while acknowledging the pitfalls of model averaging [35]. Furthermore, many studies that have used and compared different approaches, such as Multi-Model Mean (MMM), weighted approaches, probabilistic methods, random and ranked ensembling methods, mentioned the different performance of GCMs [43][44][45][46].
A practical reality for many impact studies is the selection of appropriate ancillary deterministic outputs from other methods to inform decision makers and prepare response plans. For example, in preparing Intensity-Duration-Frequency (IDF) curves for flood modeling, it is important to select particular downscaled GCMs based on regional characteristics. For example, four different GCMs (i.e., HadGEM2-ES, CNRM-CM5, CanESM2, and MICROC5) were selected for the state of California in the United States to provide IDF curves for preparing the state for increasing floods risk [47].
In this regard, several methods have been introduced which consider the performance of each individual GCM or downscaled GCM. Abramowitz & Bishop [38] used a "perfect model" approach to testing whether an ensemble dependence transformation can improve CMIP projections. Similarly, Herger et al. [48] analyzed a different subset of ensembles from complete 81 CMIP5 simulations and compared them with a simple multi-model mean (MMM) of all 81 simulations. There are many other sophisticated and complicated approaches that aim to provide a robust prediction of future temperature or precipitation [49][50][51]. However, these models are often complex and there is no general agreement on methods of assessment [31]. In this regard, many scholars have tried to prepare a simpler method to select a GCM for a particular purpose. Geil et al. [52] analyzed 21 downscaled GCMs and compared the result with observations over North America for 1979-2005, then ranked these models based on the correlation between models and observations. Knutti et al. [53] suggested using a weighted average over the GCMs. Following this method, Lorenz et al. [31] applied a weighting approach to investigate projections of summer maximum temperature across north America. Most of these methods rely on sophisticated approaches and require enormous computational effort, yet there is no universal agreement on the robustness of their outcomes. Furthermore, it is challenging to integrate a local setting in selecting the process of these models for impact studies.
In this study, we introduce a novel method to rank downscaled GCMs based on detection sensitivity for a series of heat wave properties. We analyze eight "time" and "intensity" related heat wave properties from 1950 to 2005 in 10 different locations across the CONUS (as described in Section 2.2.). Then, we use 32 downscaled GCMs and the ensemble median GCM (hereafter Median_GCM) for each observation station and analyze them for the same eight heat wave properties. In every location, we compare the observation and model results for each heat wave property using Nash-Sutcliffe efficiency (NSE) coefficient [54]. Then we use a multi-criteria decision-making (MCDM) tool to rank the models by location. The findings of this research introduce a simple ranking method for GCMs based on the agreement between analyses of local historical heat wave components. This approach is directed to help policymakers and managers to best prepare for future extreme events in their jurisdictions by focusing on more reliable projections of future heatwaves.

Study Area and Data
This study focuses on ten cities in the CONUS with different climates; Baltimore, MD, Bismarck, ND, Colorado Springs, CO, Dallas, TX, Des Moines, IA, Miami, FL, New York City, NY, Phoenix, AZ, Portland, OR and Syracuse, NY ( Figure 1; city names are followed by the abbreviation of State names in the USA). We obtain historical daily weather data (minimum and maximum temperatures) for each location from the U.S. National Centers for Environmental Information (NCEI) for 1950-2005. Then, we obtain downscaled historical simulations from 32 GCMs, for grid cells (hereafter pixel) corresponding to these ten weather stations which were conducted using the Localized Constructed Analogs (LOCA) method [27]. The LOCA downscaled products are available for historical (i.e., retrospective analysis for 1950-2005) and future (2006-2100) periods for intermediate (RCP4.5) and high (RCP8.5) radiative scenarios at a 1/16 degree resolution (~6 km × 6 km). LOCA method downscales GCMs using the Livneh data set and the latter data set is model-derived from observed data developed for the Variable Infiltration Capacity (VIC) simulations over north America [27]. Livneh dataset is constructed based on more than 20,000 National Oceanic and Atmospheric Administration (NOAA) Cooperative Observer Network (COOP) stations across the CONUS [55]. LOCA method downscales the daily GCMs maximum and minimum temperature outputs (i.e., retrospective analysis for 1950-2005) point-by-point from a single best match analog day, based on the observed data for the same date during 1950-2005 [27]. Accordingly, this downscaled dataset with a temporal correspondence between daily retrospective analysis and observations  is a suitable source for the study of extreme events [56]. For example, LOCA dataset was used widely in different climate extreme events' studies, such as heatwaves [57], droughts [58], extreme precipitation events [59] and hydrologic modeling [60]. In addition, we calculate the median daily maximum and minimum temperature values of 32 models in each location, Median_GCM, to compare the performance of that with the GCMs. Furthermore, we examine the capability of LOCA dataset and Median_GCM to reproduce historical  mean climate variables in each location:

1.
We compare the average long-term maximum and minimum daily temperatures obtained from the observation and models.

2.
We investigate differences between the observation and models for maximum and minimum daily temperatures with NSE coefficients.
The results show the LOCA dataset is capable of reproducing daily maximum and minimum temperature accurately (details are shown in Appendix A, Tables A1-A4). In addition, Table 1 shows these 32 GCMs' general information.

Heat Wave Definition and Components
There is no universal heat wave definition, and many debates exist on the parameters that should be included in the heat wave definition [62,63]. In this study, we define a heat wave as an event that has at least two consecutive days with minimum and maximum daily temperatures greater than 90th percentiles of the minimum and maximum daily temperatures in that location during 1950-2005 (thresholds), respectively. This definition has been widely used before in heat wave studies [64][65][66]. Following traditional heat wave components, including frequency, intensity, duration and timing [62] and in accordance with our previous research, we have defined the succeeding eight heat wave components that are computable in each pixel based on the available climatological data [66]. In addition, we use a developed R code to calculate these heat wave properties for each location for 1950-2005 [67]. Two examples: 1. A heat wave of two consecutive days with the minimum and maximum daily temperatures of 30 • C, 35 • C, 35 • C and 42 • C at one station with defined thresholds of 28 • C and 33 • C, respectively, has the daytime heat wave intensity and nighttime heat wave intensity of 11 • C and 9 • C, respectively. 2. If in a given year, there are three consecutive hot days in a month, 2 consecutive hot days two months thereafter, followed by 1 hot day four months later, then Days is 6 (i.e., 3 + 2 + 1 = 6), the waves is 2 (i.e., only first two events are heat waves), the total is 5 (i.e., only length of heatwaves, 3 + 2 = 5), and the longest is 3 (i.e., length of the first event).
We analyze the historical observed and 32 downscaled simulated GCM and Median_GCM data in each location during 1950-2005 to compare the capability of these models in the simulation of the heat wave components based on the NSE coefficient. For this purpose, we first calculate eight heatwave components time series based on observations in each location. Then, we repeat the same analysis with each downscaled GCM and the Median_GCM daily data for the same heatwave components in each location. Then, we use NSE coefficient to compare the results from observations and each of the models' outputs for eight different heatwave components and in ten different locations. The models show different coefficients for each of the eight heatwave components in each of 10 locations (Appendix A, Tables A5-A12). Accordingly, it is not possible to rank the performance of the downscaled GCMs based on simulations of eight different heatwave components in these locations. Therefore, we apply a multi-criteria decision-making (MCDM) approach to rank the downscaled GCMs based on their performance in each location, simultaneously and considering all eight heat wave properties.

Multi Criteria Decision Making
Multi-criteria decision-making (MCDM) methods solve complex problems with multiple competing criteria and no optimal solution, in order to meet the user preferences or a procedural goal [68]. MCDM problems have five main components, including goal, decision-maker priorities, available alternatives, decision criteria and outcomes or ranks [69,70]. The most popular MCDM methods include weighted sum model, weighted product model, ELECTRE, TOPSIS, MAUT, PROMETHEE, VIKOR and AHP [69]. Each model has strengths and limitations and one should carefully select a method based on the characteristics of the model and available data [71]. For this study, we select the TOPSIS method for the simple method of computation and no restrictions on the number of alternatives or criteria [72]. Similarly, the criteria can be weighted independently with no limitation of the range of values [73]. Therefore, the heat wave property information (i.e., heat wave properties) could be correlated without impacting the process of making a decision [74]. This method has been widely used in different scientific fields including, natural hazards, atmospheric sciences and environmental engineering [68,75,76]. We describe the TOPSIS method below and use it to find the five best and five least fit downscaled GCMs based on the NSE coefficient between heat wave properties obtained from each model and observed data [68,77,78]: Calculate of the decision matrix, D, including alternatives A i (for i = 1 to m, which is the number of models) and Criteria, C j , which is NSE between each downscaled GCM and observation for that particular heat wave property (for j = 1 to n, which is the number of heat wave properties): Normalize of the elements in the decision matrix for each criterion: 3.
Calculate of the weighted normalized decision matrix values: where W j is the weight of each criterion that highlights the importance of that criteria (i.e., each of heat wave components) and i=n i=1 W j = 1. In this study, we assume the same values of W j for the heat wave components importance in the selection of a GCM. 4.
Find the best and least fit (or ideal and negative-ideal) solutions for each criterion: where J is associated with benefit criteria and J is associated with negative criteria. In this study, a higher NSE coefficient is associated with benefit (or better choice).

5.
Calculate the distance from best and least fit ideal solutions for each alternative using Euclidean distance method: 6.
Compute the relative closeness to the ideal solution, which can be the best or least fit outcome, based on the goal: Rank each alternative (A i ) based on the calculated relative closeness to the ideal solution (C * i ).
This method sorts downscaled GCMs (i.e., alternatives) based on the NSE coefficient by assuming an equal weight of importance for heat wave components. However, the previously described method to distribute weights among eight heat wave components [66] may vary in future studies addressing the sensitivity of this method to weights of selected criteria.

Results
First, we analyze the NSE coefficient for eight heatwave components based on observational data and output from 32 downscaled GCMs and Median_GCM to identify the best-fit models (Appendix A, Tables A5-A12). Table 2 shows the best and least fit models for each heatwave property across the 10 study sites. EC-EARTH is the best model for Days in Baltimore, Portland and Syracuse. Interestingly, eight different downscaled GCMs are identified as the best models for waves and EC-EARTH is among them for the city of Syracuse. Then, EC-EARTH is the best-fit for total in Baltimore, Bismarck and Syracuse. Similar to waves, eight different downscaled GCMs are selected as the best models for the longest heat wave events across the 10 study sites. EC-EARTH is also the best model for Intensity for Baltimore, NYC and Syracuse, while it remains the best for Night only for Baltimore and NYC. This model also is the best for the timing component of heatwaves (First and Last) in Miami, Phoenix, Syracuse and NYC. An interesting result of this study is that in Syracuse, EC-EARTH is the best model for six heatwave components and in Colorado Springs, we observe CMCC-CM remains the best for three heatwave components. In addition, we found Median_GCM as the best model for the temperature based components of heatwaves, Intensity and/or Night, in Bismarck, Colorado Springs, Dallas and Des Moines.
A similar analysis for least fit downscaled GCMs identifies ACCESS1-0, BCC-CSM1-1, HADGEM2-CC, MIROC5, FGOALS-G2 and MICRO-ESM-CHEM among the poorest GCMs across the study sites in the simulation of historical heatwave components. Contrary to the results above, EC-EARTH is least fit for simulation of historical Days and First for Phoenix and Portland, respectively. Surprisingly, EC-EARTH acts as the best fit model for First and least fit for Days in Phoenix and as the best for Days and least fit for First in Portland. Although this analysis shows which downscaled GCM is preferred for any heatwave component across the study sites, we expand the analysis with the MCDM to rank the downscaled GCMs based on overall accuracy in simulations of historical heatwave events.
We apply the TOPSIS method to sort the models based on their overall accuracy in the simulation of historical heatwave properties, as discussed in Section 2.3 (C * i see Appendix A, Table A13). Accordingly, the EC-EARTH model was selected as the best model (ranked 1 among 33 models) for Baltimore, Bismarck, NYC and Syracuse weather stations. Considering the rank of this model for individual heatwave components across these cities, this result was predictable for Baltimore, NYC and Syracuse. In Colorado Springs, Dallas, Des Moines, Miami, Phoenix and Portland, we found the CMCC-CM, ACCESS1-0, GFDL-ESM2M M, GISS-E2-R, CCSM4 and MPI-ESM-LR as the best downscaled GCMs, respectively. Interestingly, Median_GCM is selected as the 5th best model for Colorado Springs based on its performances in simulation of daytime and nighttime heatwave intensity in this location. In addition, this model is among the top 10 preferred models for Bismarck, Dallas and Des Moines. Considering the overall downscaled GCMs rank, being among the top 5 models-and avoiding being among five least fit models-we suggest the EC-EARTH, MPI-ESM-LR, CANESM2, CMCC-CM and GFDL-ESM2M M as the five best downscaled GCMs for heat wave studies in the CONUS.
In contrast, we find the FGOALS-G2, HADGEM2-CC, ACCESS1-0, MIROC-ESM-CHEM, ACCESS1-3, CESM1-CAM5, MIROC5, GFDL-ESM2M M, HADGEM2-CC and CMCC-CM as the least fit GCMs (ranked 33 among 33 models) for heat wave studies in Baltimore, Bismarck, Colorado Springs, Dallas, Des Moines, Miami, NYC, Phoenix, Portland and Syracuse, respectively. Median_GCM, which was selected among the top 10 models for Bismarck, Colorado Springs, Dallas and Des Moines, is also among the models with rank 12 to 25 for the other locations. We observe a few models have disparate performance across sites. For example, ACCESS1-0 is selected as the best for Dallas and least fit for Colorado Springs. Similarly, we find the FGOALS-G2, GFDL-ESM2G, HADGEM2-CC, ACCESS1-0 and INMCM4 GCMs as the five models with the poorest performance in the simulation of historical heat wave properties of ten study sites. Table 3 summarizes the rank of each downscaled GCM for the study sites considering the model performances in simulation of historical heat wave properties and a multi-criteria decision-making approach. Hereafter we use the abbreviated names for the cities within the tables including BAL for Baltimore, BIS for Bismarck, COL for Colorado Springs, DAL for Dallas, DES for Des Moines, MIA for Miami, NYC for New York City, PHO for Phoenix, POR for Portland, and SYR for Syracuse.

Discussion
We demonstrate a wide range of coherence between GCM and key heat wave features across space and through time. Geil et al. [52] prioritized six models based on one criterion, GCM capability to represent the daily precipitation of the north American Monsoon, including CNRM-CM5, CSIRO Mk3.6.0, HADGEM2-CC, HADGEM2-ES, IPSL-CM5A-LR and IPSL-CM5A-MR. We expand that approach, with several criteria to prioritize downscaled GCMs in this study. Whereas, this new approach did not select the abovementioned GCMs as the best models for heat wave studies across the CONUS, they remain among the selected models for individual study sites. Navarro-Estupiñan et al. [79] used the same six models suggested by Geil et al. [52] for the projection of extreme heat events in Sonora, Mexico. Although we argue that their selection for heatwave study should be based on GCMs suitable for simulation of historical heat waves rather than precipitation, we support their approach, which is more robust than GCM averaging. A strength of these six models is the ability to simulate precipitation, which is an important climate variable in the region. Phoenix is the closest site to Sonora and interestingly, we proposed two common models for heat wave studies, including HADGEM2-CC and CSIRO-MK3-6-0. Accordingly, it seems local climate settings are important components for performance of GCMs.
We find that the performance of the EC-EARTH model is the best fit for historical heat wave properties across the four locations, including Syracuse, NYC, Baltimore and Bismarck. These sites have similar Köppen climate classifications. NYC and Baltimore share Cfa (humid subtropical climate) and Syracuse and Baltimore share a similar Dfb (humid continental mild summer, wet all year) climate type [80]. Although the investigation of the structure and data analysis of these 32 downscaled GCMs is not in the scope of this study, we briefly describe the EC-EARTH model. This model is an Earth System Model (ESM) built based on European Center for medium-range weather forecasts [81]. This model benefits both weather forecasting and climate change projection components simultaneously and is among "seamless prediction" models [39]. In addition, these models incorporate biogeochemical and human processes with the earth's physical properties and climate projections [40]. The performance of this model has been tested in different settings and for various purposes with satisfying results for temperature prediction [82], tropical upper tropospheric water concentration [40], Arctic energy budget [83] and Arctic climate change [84]. Although the reliability of a climate model depends on the selection of model components, EC-EARTH benefits a robust ocean modeling component and it is a strong choice for extreme climate studies [82,[84][85][86][87].
Our analysis indicates that model assembly changes output characteristics relative to a single model, as suggested by Knutti et al. [35], challenging the plausibility of ensemble modeling for climate projections. Median_GCM is ranked under 10 of 33 models for the sites with a continental climate, including Bismarck, Colorado Springs, Dallas and Des Moines. This ensemble shows less reliability for coastal cities, where other individual downscaled GCMs provide more accurate simulations. This observation highlights the importance of selecting those models with a better performance in their ocean modeling components for the study of extreme events in coastal areas [84,85,88].
There are several limitations to the current research that arise from heat wave definition method and the validity of downscaled climate data. The MCDM tool ranks 32 downscaled GCMs and one ensemble median GCM based on the heat wave properties, all connected to heat wave definition. As mentioned before, there are many definitions for heat waves and accordingly, repeating this ranking process using another definition (e.g., a longer period of consecutive days or higher thresholds) could potentially change the rank of downscaled GCMs for any of the locations. Examining the sensitivity of this method to different heat wave definitions is not in the scope of this research. A caveat to the use of downscaled climate data are awareness of uncertainty propagation during downscaling. The use of dynamical downscaling methods (i.e., Regional climate models) could address this problem; however, limited availability of RCMs or dynamically downscaled data across many regions challenges the substitution of LOCA data set with them. Despite this limitation, the comparison of downscaled retrospective LOCA data sets with various land-based observation data are a common method for evaluating the performance of GCMs for many purposes [57][58][59][60]89].
Although we base this research on the downscaled GCMs instead of GCMs, we highlight the following benefits of this work for future improvements in GCMs:

1.
We demonstrated how downscaled GCMs represent a better or least fit performance across the cities with similar climates. We observe that this variation in the performance of models has a direct link to the structure of the model(s).

2.
We demonstrated a few models with previously known robust ocean modeling components simulated historical heatwaves in coastal cities better than other downscaled GCMs. This observation encourages their approach and also sheds light on the possible improvement opportunities for the other models.
Eventually, this ranking is a function of allocated weights for heatwave components, for which we assume equal values.

Conclusions
We examine 32 CMIP5 coupled general circulation models to determine how well these models represent the heat wave properties at 10 measurement stations across CONUS for 1950-2005. In this analysis, we compare the observed data and downscaled historical LOCA meteorological products that cover the same weather stations. The results show that the performance of each model varies significantly across the CONUS. We then use a Multi-criteria decision-making approach to conclude the performance of each downscaled GCM, considering eight predefined heat wave components to show the five best and least fit models for each location. We also select the five best and five least fit models based on their overall performance across the CONUS. Similar to Parker [90], we conclude that the findings of this study simply show "adequacy-for-purpose" of "predicting those farther-further values of" temperature data and heat wave occurrences. Nevertheless, the active discussion still remains on the topics of parameter selection, metrics and assessment methods [53].
If we assume that the best model predictions presented here are robust, reliable and locally available, they may inform the assessment of actions to ameliorate the impact of future heat waves [28,29]. The method developed here provides a pathway to define other approaches for prioritizing global climate models based on the local needs, with the goal of providing more accurate and on-time local adaption and mitigation plans.

Acknowledgments:
The authors would like to thank three anonymous reviewers and the editor for their comprehensive review and productive comments.

Conflicts of Interest:
The authors declare no conflict of interest.  Table A3. NSE coefficient of maximum temperature between observed and simulated data .  Table A4. NSE coefficient of minimum temperature between observed and simulated data .  Table A5. NSE coefficient of Days between observed and simulated data .    Table A6. NSE coefficient of waves between observed and simulated data .