Understanding Urban Heat Vulnerability Assessment Methods: A PRISMA Review

: Increasingly people, especially those residing in urban areas with the urban heat island effect, are getting exposed to extreme heat due to ongoing global warming. A number of methods have been developed, so far, to assess urban heat vulnerability in different locations across the world concentrating on diverse aspects of these methods. While there is growing literature, thorough review studies that compare, contrast, and help understand the prospects and constraints of urban heat vulnerability assessment methods are scarce. This paper aims to bridge this gap in the literature. A systematic literature review with the preferred reporting items for systematic reviews and meta-analyses (PRISMA) approach is utilized as the methodological approach. PRISMA is an evidence-based minimum set of items for reporting in systematic reviews and meta-analyses. The results are analyzed in three aspects—i.e., indicators and data, modelling approaches, and validation approaches. The main ﬁndings disclose that: (a) Three types of indicators are commonly used—i.e., demographic properties and socioeconomic status, health conditions and medical resources, and natu-ral and built environmental factors; (b) Heat vulnerability indexing models, equal weighting method, and principal component analysis are commonly used in modelling and weighting approaches; (c) Statistical regressions and correlation coefﬁcients between heat vulnerability results and adverse health outcomes are commonly used in validation approaches, but the performance varies across studies. This study informs urban policy and generates directions for prospective research and more accurate vulnerability assessment method development.


Introduction
In recent years, with the rise in climate change impact, urban heat has become a major issue for many cities to tackle as a consequence [1][2][3].Extreme heat events are becoming more frequent and intense in recent years due to climate change, which has directly caused a substantial increase in heat-related morbidity and mortality [4][5][6][7].This indispensably puts extra burden on medical systems and national finance [8][9][10].Meanwhile, the urban heat island (UHI) effect has been exaggerating the consequences caused by the increased extreme heat in metropolitan areas [11,12].Hence, it is urgent for local governments to locate the harmful heat and identify the characteristics of vulnerable populations.
An increasing number of methods have been developed to assess urban heat vulnerability in different locations across the world concentrating on diverse aspects [13][14][15], with the aim of helping local urban planning departments to make scientific and effective planning and policies on mitigating the impacts of extreme heat [11,16,17].Before 2010, there were only a few studies [18,19] modelling urban heat vulnerability in limited locations.In 2009, Reid et al. [20] disruptively proposed a heat vulnerability index (HVI), with which they located heat vulnerability at the census-tract-level and recognized potential areas where mitigation interventions were urgently needed.Since then, a substantial number of related research has been conducted, especially during the last five years or so [21].
Varying heat vulnerability models has been built by combining different types of heatrelated indicators to identify vulnerable populations, areas exposed to increased extreme heat [22,23] and assess the spatiotemporal distributions to provide policy directions for policymakers [24,25].Boumans et al. [13] developed a modelling and support platform for climate change and applied it to examine how heat stress influenced heat-related illness and death.El-Zein & Tonmoy [26] employed 22 heat-related indicators in a multicriteria outranking model to assess the vulnerability to increased extreme heat in 15 government areas of Sydney.Estoque et al. [27] used seven environmental and social-ecological indicators to assess heat health risks in 139 Philippian cities during hot, dry seasons.
Nonetheless, there were no criteria in the indicator selection in terms of types or amounts because the indicator selection relied critically on local contexts [28,29].Local characteristics of population, infrastructure, and ecosystem play an essential role in heat vulnerability assessment.Additionally, indicators for the model development and validation were constrained by data availability and the researchers' subjective judgment [30,31].Modelling methods and validation methods were different in heat vulnerability assessments due to the lack of generic references.These facts are not conducive to comparative studies and the applications of heat vulnerability models [32,33].To date, there is no widely acknowledged standardized system of heat vulnerability assessment.Therefore, a systematic review is urgently needed to identify currently available heat vulnerability assessment methods and their capabilities.
As underlined above, although there is growing literature on heat vulnerability assessment, thorough review studies which compare, contrast, and help understand the prospects and constraints of urban heat vulnerability assessment methods are scarce.Against this brief backdrop, the study at hand aims to fill this gap through a systematic review.The rest of the paper is structured as follows: Section 2 introduces the methodological approach, Section 3 presents the results, Section 4 discusses the findings, and Section 5 concludes the paper.

Materials and Methods
This study aimed to tackle the research question of 'what are the methods to assess urban heat vulnerability' by undertaking a systematic review of the literature with the preferred reporting items for systematic reviews and meta-analyses (PRISMA) approach.PRISMA "is an evidence-based minimum set of items for reporting in systematic reviews and meta-analyses" [34].PRISMA is primarily used in the reporting of reviews, focusing on the evaluation of the effects of the interventions (i.e., evaluating etiology, prevalence, diagnosis, or prognosis).It is also broadly applied as a basic framework in reporting systematic reviews with objectives.Following the lead of Regona et al.'s PRISMA review study [35]a three-phase approach including planning, review, and reporting-was selected as the methodological approach for the review.
At the planning phase, the research objective, research question, keywords, and the inclusion and exclusion criteria were developed (Table 1).The research aim was organized to produce useful findings to form a deeper understanding on the prospects and constraints of urban heat vulnerability assessment methods.The inclusion criteria included available online and peer-reviewed English journal articles which were relevant to the research aim.An academic search engine provided by the library of Queensland University of Technology was used to conduct the online search.This library search engine offered access to all major databases.The search was carried out in April 2022 and used the following query string: ((title OR abstract OR keyword) contains "urban heat" AND vulnerab* AND (measur* OR assess* OR model* OR survey* OR map* OR evaluat* OR indicat* OR estimat* OR analy* OR method* OR frame*)) to search titles, keywords, and abstracts of relevant publications-i.e., journal articles.The publication date of the search was intentionally left open to include as much literature as possible in the review.If the abstract of the article

Categorization Criteria
Determine the literature associated with the research aim by the eye-balling technique Identify the potential literature focusing on methods to assess urban heat vulnerability after full-text reading Group the identified types, application areas and effectiveness with similarities into broad categories Narrow down the primary categories and crosscheck their reliability against other published literature Final review of the selected literature and update the shortlisted categories if necessary Confirm and finalize the categories selected for the classification of literature Group the literature selected for the review under the selected categories The final reporting phase concerned the reporting of the identified results and findings from the systematic review of the final 76 journal articles.A list of these papers is provided in Appendix A. A discussion of the prospects and constraints of urban heat vulnerability assessment methods was outlined.

General Observations
The selected articles were categorized by publication year (Figure 2) to analyze the trend of related publications before 2022, which disclosed how interest in heat vulnerability assessment had changed during the last two decades.

General Observations
The selected articles were categorized by publication year (Figure 2) to analyze the trend of related publications before 2022, which disclosed how interest in heat vulnerability assessment had changed during the last two decades.This review also classified the selected articles by research region and climate (Figures 3 and 4) because geographic locations and climate status directly influence the research results.Most studies were conducted in North America (27 articles, 36%) and Asia (23 articles, 30%), followed by Europe (14 articles, 18%).Oceania and South America were the focus of four and five research articles, respectively, while there were only two studies conducted in Africa.The top-3 countries selected as research regions were America (22 articles, 29%), China (11 articles, 14%), Canada (5 articles, 7%), and Australia (5 articles, 7%).Publications were grouped by using the Köppen Climate Classification [36,37], which is one of the most widely used climate classifications.As shown in Figure 4, almost 60% of the reviewed articles were conducted in humid subtropical climates (Cfa, 30 articles, 39%) in temperate oceanic climates (Cfb, 17 articles, 22%).There were another 16 types of climates listed which were involved in one to eight articles each.

Indicators and Data
Heat-related indicators reflecting human and environmental characteristics are essential for the construction of heat vulnerability models.The first step of model construction is selecting influencing indicators associated with heat vulnerability and collecting corresponding available data which can quantitatively measure heat vulnerability levels to increased extreme heat.This review collected the indicators reported in the selected articles and grouped them into three categories: demographic and socioeconomic characteristics, health conditions, and environmental factors.Environmental factors were grouped into natural and built subcategories.Table 3 presents a summary of information on the common indicators and data sources.Demographic and socioeconomic characteristics reflect the sensitivity and adaptive capacity of the population who suffered or potentially suffer heat hazards, such as healthy physiological conditions, adequate resources, and good living environments.There was a total of 17 demographic and socioeconomic indicators that were used in the heat vulnerability detecting studies (Table 3).
The top-5 most frequently used indicators in the studied reports were age, economic status, social isolation, education, and population density.Age was the most widely used demographic indicator, considered by 61 related articles, with an 80% usage rate.It was often represented by percentages of elderly or young population over 65 years old or below five years old.These people are more vulnerable to extreme heat during heat events because of their degenerated or undeveloped ability of thermoregulation [38,39].There are higher heat-related morbidity and mortality and more hospitalization admissions for the young and elderly population during days with high temperatures [40,41].Moreover, the chronic diseases of the elderly and the immature immune systems of children are likely to further aggravate their sensitivity to the extreme thermal environment [42].
The second most considered indicator was the economic status, with 56 related articles and a 74% usage rate.This indicator reflects the response capacity of individuals and governments to mitigate the influence of extreme heat, which contains personal economic status and local financial status.Personal economic status was usually depicted by income level, while a few studies employed the percentage of the population living with adult disability benefits and pension funds.If the income levels of people are below the poverty line, they are more likely to be associated with high levels of heat stress and heat-related disease [43] because they probably cannot afford cooling services and medical services.A total 39 studies used education (51%) as an indicator for heat vulnerability detection.It has been purported that people with a low education degree experience high heatrelated mortality as these people are more likely to live with a low income and work in an environment without thermal isolation conditions [44].
Social isolation (42 articles, 55%) has been identified as one of the most significant indicators to measure heat vulnerability, often delineated by percentages of the population living alone (or combined with age and gender).People living alone, especially the elderly, are more vulnerable during extreme heat events as they may have a limited ability to deal with emergencies and may not receive timely support [42,45].Population density (32 articles, 42%) is considered one of the typical indicators for heat exposure, which is often fused with the temperature data to measure the number of vulnerable people to heat and their spatial distribution.Race (29 articles, 38%) is selected in the development of heat vulnerability models due to racial disparities contributing to heat vulnerability.Minorities with different cultures lack resource access and are easy to be socioeconomically and politically marginalized [46,47].
Other indicators with over 10 usage times included employment, housing condition, air conditioning, and language ability.The other reported indicators included vehicle availability, internet availability, household facilities, frequency/time of outdoor activities, and water/electricity supply.In recent years, some studies have begun using composite indexes built by age, economic status, and other demographic and socioeconomic indicators to model heat risk, such as Human Development Index [48] and Human Settlement Index [49,50].
In most studies, demographic and socioeconomic data were derived from national census statistics, such as the China's Sixth (2010) National Census Dataset [51] and the American Community Survey [29,52], and local official socioeconomic datasets or public themed datasets, such as the Beijing Statistical Yearbook [53] and WorldPop Global Population Data [54].

Health Conditions
Health condition indicators can be classified into two categories-i.e., personal health conditions and the availability of medical and healthcare resources.Personal physical and mental conditions can significantly influence an individuals' sensitivity to heat during extreme heat events, which include personal health status, disability, natality, and mortality.Personal illness status was the most employed health condition indicator with 22 related articles and a 29% usage rate.The percentage of the population with a pre-existing illness is employed as the representative of this indicator as illness, especially chronic diseases (i.e., diabetes, asthma, hypertension, obesity, and cardiopathy), can increase the heat vulnerability of individuals [55,56].
People with chronic diseases often have a limited response capacity to the frequently changing thermal environment.Disable people suffer from the same situation, so disability (11 articles, 14%) is also one of the frequently used indicators, which is captured by the percentage of the population with disability.There were also five studies (7%) using birth/death rate to reflect the heat effect on human health.Specifically, one of them employed infant mortality rate [57] as a heat vulnerability indicator related to health conditions, and another one selected birth rate [58].
The availability of medical and healthcare resources can greatly affect the adaptive capacity to heat of an individual, which consists of medical infrastructure, healthcare services, and health insurance.Medical infrastructure (18 articles, 24) is the second most frequently considered health condition indicator, of which the number of medical workers/facilities/institutions are one of the typical representatives.Some studies calculated distances and time costs [23,24,58] to the nearest medical institution and considered them as the proxies of the availability of public health resources.The rest of the less frequently used indicators are healthcare services (5 articles, 7%) and health insurance (3 articles, 4%), both of which reflect external support and assurance [59,60] for people to deal with extreme heat.
Health condition data were mainly collected from datasets released by national and local health departments and medical institutions, such as datasets published by the Ministry of Health Malaysia [61] and Toronto Public Health 2009 [62].The data on the distances to medical resources were derived from point of interest data and online map platforms [54,63].

Environmental Factors
Environment factors are directly associated with harmful heat and related heat outcomes [21].Not only can they directly reveal the heat intensity such as the land surface temperature (LST); air temperature and heat duration such as days/frequency of heat events, but also reflect the ability to aggravate or reduce heat impacts, such as vegetation cover, and accessibility to cooling spaces.Environmental indicators can be summarized into two categories as natural environment and urban environment.
The top-3 indicators concerning natural environment were LST (38 articles, 50%), vegetation cover (23 articles, 30%), and air temperature and heat duration (20 articles, 26%).Landsat and MODIS satellite images with resolutions of 30 m, 60 m, and 1 km were employed to derive daytime and night-time LST during the study period.Daytime and night-time mean, maximum and minimum air temperatures were collected from meteorological observation stations [62,64].The combined use of surface and air temperatures can fully characterize the thermal space distribution near the ground.
Vegetation cover plays an important role in assessing the population's heat vulnerability due to the ability to adjust the temperature of the surrounding environment through photosynthesis.The abundance of vegetation determines the local ability to adjust to extreme heat to a certain extent [65,66].MODIS satellite products were used to capture vegetation covers, such as Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), and Fractional Vegetation Cover (FVC) [67][68][69].The less used natural environment indicators are humidity, heat events, thermal radiation and heat flux, air condition, and weather.
A total two studies attempted to employ the Digital Elevation Model (DEM) [49,68] as an elevation indicator in heat vulnerability measurement because topographic relief can affect the duration and intensity of heat exposure on a population.Heat-related composite indexes built by temperature, humidity, and other weather variables have been proven to have a high correlation with heat-related hospitalizations and death [70][71][72] and utilized in the development of heat vulnerability frameworks, such as Heat Index (HI), Humidex, and Wet Bulb Globe Temperature (WBGT).
The top-3 environmental factors reported concerning urban environment were accessibility to cooling spaces (20 articles, 26%), land cover/use (19 articles, 25%), and building information (14 articles, 18%).Green space, water bodies, and other cooling places can provide people with free cooling services and relieve them from the thermal environment [73,74].Impervious area extracted from land cover data has become one of the most typical indicators of land cover/use as it is considered a contributor to exacerbating the urban heat island (UHI) effects [75].
Other contributors to UHI, building information including building density, height, and type was frequently used to demonstrate urban heat vulnerability due to the positive association with the increased temperature [76][77][78].Other studies also took transportation, infrastructure, and urbanization into consideration [24,54,72].
Natural environmental indicators, such as LST, vegetation cover, and meteorological data, were commonly derived from Landsat and MODIS satellite images and meteorological observation datasets [56,79].Urban environmental indicators are commonly calculated from Landsat and MODIS satellite products or collected from public databases from national and local planning departments [80,81].Moreover, other satellite products from various sensors (e.g., Sentinel-2, DMSP/OLS, and NPP/VIIRS) were also utilized in some studies [15,31,49,82].

Modelling Approaches
In the modelling process, after deciding on the heat-related indicators and collecting corresponding data, the next step is to build an efficient model to measure extreme heat vulnerability, which plays a fundamental role in the whole study process.Table 4 displays a summary of information on the modelling and weighting methods and outputs.

Modelling Methods
The most frequently used modelling method was the indexing method (61 articles, 80%), which included three popular heat vulnerability models based on different conceptual frameworks.Many heat-vulnerability indexes (19 articles, 25%) and heat risk indexes (15 articles, 20%) were developed through two popular conceptual frameworks, i.e., population vulnerability framework [83] and risk triangle framework [84].Each of the frameworks consisted of three separate components, which were represented by different types of heat-related indicators.
In the population vulnerability framework, heat vulnerability is the summation of exposure, sensitivity, and adaptive capacity.Exposure was referred to by direct or indirect impacts on populations from the thermal environment.Direct risk of heat exposure mainly comes from the natural environment, which can be represented by meteorological and climatic indicators, such as daytime and night-time air temperatures and land surface temperatures [61,85].Indirect risk of heat exposure is mainly caused by the built environment, which includes accessibility to cooling spaces, building information, and so on [59,77].
Sensitivity represented the extent to which populations are sensitive or susceptible to increased extreme heat, which can be reflected by demographic and socioeconomic indicators, such as age, social isolation, and economic status [86,87].Adaptive capacity to extreme heat exposure was usually depicted by the availability of facilities that reduce the risk of heat exposure, such as access to air conditions/personal vehicles/internet services/water supply/medical services [54,68].
In the risk triangle framework, heat risk is calculated by summing up components of hazard, exposure, and vulnerability.Hazard was considered as the spatiotemporal distribution of heatwaves or extreme heat events, which was often depicted by mean daily max temperatures, duration of heatwaves or heat events, and days with extremely high temperatures [81,82].
Exposure was typically represented by land cover/use and population density, especially young and elder population density [88,89].Vulnerability incorporates sensitivity and adaptive capacity to extreme heat exposure, which is like those in population vulnerability frameworks.Indexes built by the two frameworks were often adjusted through switching mathematical operations, like multiplication and division operations [90].
Many studies (29 articles, 38%) developed heat vulnerability indexes that were not based on the above frameworks.These studies usually collected heat-related indicators, ranging from two to 24, and fused them to build a new index.Heat-related indicators were usually categorized according to the property characteristics, such as social, economic, and environmental categories [91].Several studies (6 articles, 8%) selected GIS techniques to map and visualize the spatial distributions of heat vulnerability.
Macintyre et al. [76] utilized GIS techniques to calculate and estimate the ambient temperatures corresponding to potential heat risk indicators, such as age, housing condition, and economic status.The other methods included composite methodologies (e.g., the multicriteria outranking framework [26] and the decision support system [92]), statistical analysis models (e.g., Poisson regression models [29]), and questionnaire/survey analytics [53].

Weighting Methods
Once the indicators and modelling methods are established, the next step is to determine the weight assigned to each indicator or category, which is essential to determine their importance or contribution to the heat vulnerability.Weight allocation follows the principle that weights correspond to the degrees of impact on heat vulnerability.Of the collected studies, the top-2 weighting methods were the equal weighting (EW) method (37 articles, 49%) and the principal component analysis (PCA) (27 articles, 36%).
EW is widely used in many studies and developed on the assumption that each indicator or category has the same influence on the subject matter [93].It ensures that both each category and each indicator in the same category are allocated equal weights.PCA is often employed in studies with numerous indicators to group them into fewer categories and reduce the analysis dimensions.It is worth noting that principal components statistically calculated by PCA are less readable and interpretable than original indicators.
Liu et al. [94] built HVIs via EW and PCA and compared the separate performance in Hangzhou, China.The results revealed that equal-weighted HVI performed better than HVI calculated by PCA in terms of correlation with heat-related death.Nevertheless, Tate.[30] demonstrated PCA performed better than other methods, with higher precision and sensitivity to indicators and analysis scales.This means weighting methods depend on contexts and the determination of weighting methods should be with more caution.
Another statistical method applied in weighting assignment is AHP (5 articles, 7%), which obtains weights based on the experience and judgment of a panel of experts.The other weighting methods (7 articles, 9%) consist of expert weighting methods, slopeweighted methods, and factor-weighted methods.Sometimes, multiple methods are combined to find out reasonable weights.For example, Song et al. [78] first used PCA to obtain all principal components and then allocated equal weights to them to build composite HVI in Hongkong, China.

Modelling Outputs
Heat vulnerability maps reflecting spatial distributions of heat exposure and hazard were the most common outputs in the selected articles.Once heat vulnerability maps were created, the indexes were normally rescaled into 3 to 10 vulnerability levels by equal intervals, typically five or seven evaluation scales (e.g., very low, low, moderate, high, very high), to reflect the degree of localized heat exposure [95,96].Sometimes, the indexes were calculated and presented as continuous numerical values, in which case high values represented high heat vulnerability and vice versa.
The model outputs varied across studies from spatial units to temporal scales due to the spatiotemporal resolutions of proxy data.This review classified selected articles into the different categories according to spatial and temporal scales to evaluate the dimensions of the outputs.The top-3 spatial scales employed in the selected articles were census units (26 articles, 34%), administrative areas (25 articles, 33%), and grid (17 articles, 22%).Census units incorporated census blocks [82], census block groups [88], and census tracts [97].Administrative areas ranged from states/provinces to communities/counties [21,98].The reason for this popularity is that heat-related indicators at the scales, such as demographic, socioeconomic, and health-related, can be easily obtained from national and local demographic datasets.The application of satellite products contributed to the increased use of grid-scale, such as LST, NDVI, and land cover/use products from Landsat and MODIS datasets.The spatial resolutions of these studies were consistent with that of the satellite product used, from 100 m to 1 km.The less frequently used spatial units were postal areas and local climate zones [79,99].
Basically, there were no explicit study years in the heat vulnerability studies because they commonly used data from different years due to limited data availability.Hence, model outputs reflected the situation of heat vulnerability in a period, not in an explicit year.Model outputs concentrated in two periods 2000-2009 (32 articles, 42%) and 2010-2019 (34 articles, 45%) because of more frequent extreme heat events occurred after 2000 and data availability.Only 3 studies [50,67,87] focused on mapping long time-series heat vulnerability.Weber et al. [67] identified localized trends of increased urban extreme heat and spatiotemporal distributions of heat exposure over 1980 and 2013 with ground-and satellite-based data.Wilson et al. [87] proposed a novel methodology to locate the most vulnerable populations and analyzed the changing trends of these locations from 1990 to 2010 with an interval of 10 years.Rao et al. [50] conducted a gridded analysis based on MODIS LST images to compute heatwave indicators and explore the temporal and geographical variation of heat risk and vulnerability in Indo-Gangetic Plains in India from 2003 to 2019.
It is worth noting that only three articles focused on predicting future heat vulnerability.Oh et al. [98] built a climate change vulnerability assessment tool and applied it to assess the province-level heat vulnerability in Korea for the 2040s.Prosdocimi et al. [24] developed a spatially explicit index for measuring heat stress risk and used it to assess heat risk in Dublin, Ireland from 2020s to 2050s.Loughnan et al. [63] mapped the population vulnerability in Australia's capital cities with an HVI to extreme heat events and projected the future population changes during 2020-2030.Both results were combined to map future changes in population vulnerability.Mapping future spatial distributions of heat vulnerability are also important because it benefits grasping the changes and preparing countermeasures in advance.More attention needs to be paid to the predictive assessment of heat vulnerability.

Validation Approaches
It is indispensable for newly developed heat vulnerability assessment models to conduct validation experiments to demonstrate their capabilities in evaluating potential heat vulnerability and risk during extreme heat events.Nevertheless, only 20 selected articles (out of 76) involved validation parts, accounting for 26%.Some research [54,[98][99][100] adjusted validated heat vulnerability models and applied them to new study areas, so the validation parts were disregarded.Many studies [67,77] mentioned the limitation of lacking validation and described future validation plans.They explained the main reason was validation data dearth.If enough validation indicators were available, further validation analyses would be undertaken.Table 5 shows the summary of information on the validation indicators, methods, and study performance.The most frequently used validation indicator in the studied literature was mortality with 15 related articles (about 20% of all reviewed articles).These researchers argued mortality can directly depict the impacts of heat exposure on humans and the association between population health and extreme heat with less bias.Mortality indicators can be classified into two categories-i.e., heat-related mortality [101] and all-cause mortality [102] during summer or heat events.
On the one hand, some studies used deaths caused by diseases sensitive to the extreme thermal environment as the mortality indicator.Kwon et al. [101] selected mortality data related to respiratory, ischemic heart disease, and cerebrovascular disease as thermal disease-related mortality.Hu et al. [49] and Liu et al. [94] also considered deaths caused by heat stroke, dehydration, and hyperpyrexia as heat-related mortality.Moreover, Kim et al. [103] and Song et al. [78] took the deaths caused by exposure to excessive natural or artificial heat into consideration, such as sunlight and artificial light.To evaluate the effectiveness of the extreme heat vulnerability index (EHVI) proposed, Johnson et al. [104] predicted heat-related mortality based on the total population number of each block group and the overall death rate in Chicago during the extreme heat event that happened in 1995.
On the other hand, some studies employed all-cause deaths during hot seasons as a mortality indicator.Conlon et al. [52] estimated the proportion of all-cause deaths during extremely hot days and explore the relationship with HVI developed.Considering the data availability, Estoque et al. [27] selected all-cause mortality data at the province level during hot dry seasons to validate the heat vulnerability outcomes.Krstic et al. [71] used deaths on six extremely hot days from 1984 to 2014 to conduct an index assessment and evaluate the index performance.Maier et al. [102] and Wang et al. [51] also collected daily mortality monitoring to verify whether heat vulnerability levels reflected health outcomes of extreme heat exposure.Mallen et al. [105] estimated mortality from related observed temperatures and all-cause death rates with a distributed lag non-linear model.
The second frequently used validation indicator in the collected studied was heat-related morbidity, including hospital and emergency department visits and admissions [106][107][108].
Chuang & Gober [65] calculated diabetes hospitalization by using census-tract population data and previous hospitalizations for diabetes and used this as a validation indicator.Zhang et al. [68] obtained summer hospital visit data at the county level from the local statistical department for validation.Other studies [107,108] used emergency service demand data to evaluate the model performance, such as ambulance calls, hospital and/or emergency department visits and admissions.Apart from mortality and hospital visits and admissions, Loughnan et al. [63] combined emergency ambulance callouts, emergency department presentations and corresponding triage categories, and emergency hospital admissions to measure morbidity data and used the morbidity outcomes for validation.
Multiple validation indicators were commonly used in the same study.Morbidity and mortality were often combined as validation indicators to evaluate model efficiency [106,107].Moreover, accessibility to medical institutions was also taken into consideration.Prosdocimi et al. [24] selected the distance to the nearest hospital and clinic and health record data as instrumental variables to evaluate the HVI model performance.

Validation Methods
As mentioned above, many studies tested whether the models were efficient enough to predict heat vulnerability by using the correlation between heat-related health outcomes and heat vulnerability results.A high positive correlation result meant high model efficiency and the possibility of being a useful urban planning tool.A variety of mathematical and statistical models, especially regression models (14 articles, 18%), were applied to performance evaluations of the proposed heat vulnerability assessing methods.Poisson regression (5 articles, 7%) and linear regression (11 articles, 14%) were commonly used in model evaluations, and the less-used regression models included logistic regression [65].
Maier et al. [102] selected multivariate Poisson regression to explore the interaction of HVI and oppressive heat on mortality in Georgia.Reid et al. [106] tried to demonstrate how the increases in HVI and deviant days influence the hospitalizations and mortality in five American states from 2000 to 2007 by using Poisson regression analysis.Wolf et al. [107] utilized Poisson regression to predict the associated heat risk with mortality and ambulance calls as the response variables and the HVI as the predictive variable.Hu et al. [27,49,68,78,94,103,109] employed linear regression models to assess the correlation between HVI values and heat-related mortality and used scatter plots and the coefficient of determination to evaluate the extent to which a higher value of HVI can produce more related deaths.In addition, spatial distribution comparison between HVI and heat-related outcomes was often used as a qualitative analysis to assess the efficiency of proposed models [65,103].
Some studies used multiple methodologies to validate the effectiveness and efficiency of the developed models.Wang et al. [51] firstly employed quasi-Poisson regression and distributed lag nonlinear models to estimate the relationships between temperature and mortality at the county level, then calculated the heat-attributed mortality fraction, and finally used a meta regression model to explore the correlation between mortality fraction and the HVI values.Prosdocimi et al. [24] combined linear regression, zeroinflated Poisson regression, and zero-inflated negative binomial regression.Then they conducted a difference-in-difference analysis to validate the HVI against heatwave-related excess mortality.
Apart from statistical regression models, Pearson correlation and Spearman's rankorder correlation were often calculated to reflect whether there was a significant correlation between heat vulnerability outcomes and heat-related health outcomes [49,63,68].Critical success index (CSI), statistical significance test, case-crossover and threshold or curve analyses, and root mean square error [71,101,104,109] were also employed in the validation of the HVI models.

Study Performance
As mentioned in Section 3.3.3,there was a consensus among most heat vulnerability assessing studies that a higher value of HVI meant an increased heat risk.Although the selected articles claimed that the heat vulnerability models can be utilized for urban planning and decision-making, the validation performances significantly varied across the heat-related studies.Some studies (6 articles, 8%) achieved satisfactory validation outcomes, while many studies (10 articles, 13%) did not obtain ideal results that indicated a significant relationship between HVI values and heat-related outcomes.
Based on the heat risk triangle framework, Zhang et al. [68] developed an HVI for elderly people at a raster scale in Chongqing, China.The calculated result showed the Pearson correlation coefficient between the HVI values and summer hospital visits was 0.924, which indicated the heat vulnerability model performed satisfactorily.Mallen et al. [105] developed an HVI based on PCA at the census tract level in Dallas, US, and then used bivariate and multivariate regression models to compare estimated mortality and HVI results.However, the results of both regression models were unsatisfactory.Especially, the R squared value between total deaths and HVI scores was 0.03 by using the bivariate regression, potentially indicating the HVI model was not capable of locating and predicting the heat risk.
The heat vulnerability models also had varying performances at different scales, in different spatial units, and based on different methodologies.For example, Jänicke et al. [109] quantitatively assessed heat-stress impacts through heat hazard, vulnerability, and risk models at the district scale in Seoul, Korea, and conducted an evaluation of the assessment outcomes by linear correlations.The heat-related mortality showed a significant correlation with HVI values at the city level, while the result was opposite at the district scale.The supervised HVI performed better than the unsupervised HVI, with a high positive association with increased mortality during extremely hot days in Detroit, Michigan, USA.[52].The equal-weighted HVI performed better than the HVI calculated by PCA in terms of correlation with heat-related death in Hangzhou, China [94].
To sum up, the performance of a generic HVI depends on scale, measurement, and context, which to a certain degree implies heat vulnerability models should be established with thorough caution.Furthermore, under the circumstance of only employing statistical analyses and using heat-related morbidity and mortality, it is difficult to verify the effectiveness of the validation results for the proposed heat vulnerability methodologies.More qualitative and quantitative methodologies need to be introduced as validation methods, such as questionnaire surveys on vulnerable populations and field measurements on vulnerable areas.

Findings and Discussion
This review attempted to address the research question of 'what are the methods to assess urban heat vulnerability' from three aspects, i.e., indicators and data, modelling approaches, and validation approaches, by undertaking a systematic review of the literature.The summarized highlights of the reviewed literature are presented in Appendix A. Main findings and some other critical issues are discussed further in this section.
Indicator selection is an essential part of heat vulnerability assessment research because all the results and analyses are based on the selected indicators.Introducing as many relevant indicators as possible into assessment models contributes to accurately and comprehensively grasping the urban heat vulnerability [33,110].According to the reviewed literature, it was found that three types of indicators are commonly used-i.e., demographic properties and socioeconomic status, health conditions and medical resources, and natural and built environmental factors.These indicators directly or indirectly influence the sensitivity and adaptive capacity of urban residents to increased extreme heat and reflect the frequency, duration, intensity, and distribution of heat exposure.The top-5 most frequently used indicators in the studied reports were age, economic status, social isolation, education, and land surface temperature.
From the perspective of indicator categories, there are four demographic and socioeconomic indicators and only one environmental indicator.In terms of the impact on urban heat vulnerability, only land surface temperature belongs as a direct indicator, which can capture the intensity, magnitude, and spatiotemporal distribution of extreme heat.Two main factors contributing to the popularity of the indicators were the verified significant relationship between urban heat vulnerability [19,29,86] and the indicators and the easy availability of corresponding data [21,33].The popularity indicates that current vulnerability studies put over-reliance on indirect socio-demographical indicators.More attention should be paid to personal health conditions, public medical services, natural environmental hazards, and urban planning and governance for a comprehensive conceptual framework.
Although using heat-related indicators to measure urban heat vulnerability has been a consensus, there were no universal criteria or specifications in the selection of input metrics in terms of quantities, varieties, theoretical fundamentals of selections, and data proxies.According to the reviewed literature, the number of indicators ranged from 2 to 30, and the categories ranged from 2 to 7. For example, Prosdocimi & Klima [24] collected 24 indicators from the Brazilian national census database based on a hypothesis that the indicators had an impact on heat vulnerability, while Yin et al. [111] selected only 2 indicators, i.e., ambient air temperature and noontime foot traffic time, to build an index to capture urban heat exposure patterns.However, both studies incorporated 2 categories.Yin et al.'s study [111] only focused on exposure and hazard, while Prosdocimi's study [24] only concentrated on socioeconomic factors and urban form.This means the number of indicators does not always reflect the assessment dimensions, even though studies with many indicators involved cannot ensure the diversity and comprehensiveness of analysis dimensionality.
On the one hand, the research considering inadequate indicators will inevitably miss some characteristics related to heat vulnerability.Kershaw et al. [62] took apparent temperature intensity, exposure duration, and humidity into consideration, only focusing on the natural environment without any demographic and socioeconomic properties.Barron et al. [16] employed 5 commonly used indicators, i.e., age, race, economic status, education level, and social isolation, incorporating demographic characteristics and socioeconomic properties but without environment-related considerations.On the other hand, the same circumstance also occurred in the assessment studies with many indicators.For example, several studies [29,87,104] involved 16-30 indicators covering characteristics of population and environment, but without indicators capturing the impacts of public medical facilities and institutions.
In the studies, individual health conditions and available public medical resources are the easily neglected elements.Moreover, by involving numerous metrics in an assessment it is inevitable to incorporate less relevant indicators.For instance, in Harlan's study [19], 8 out of 14 indicators were not significantly correlative with human thermal comfort index (HTCI), particularly the distance from the city center and the mean roof reflectivity, which have no statistical correlation with HTCI.Wolf et al. [107] discovered that univariate and multivariate HVI had similar performances in predicting health outcomes.It inspired researchers that parsimony might be a significant factor to consider in their study design.
The lack of generic standards also led to mismatches between indicators and categories.Taking the two most popular theoretical frameworks for example, both frames interpreted selected indicators in different ways, which led to a consequence that even the same indicators had different interpretations and meanings and reflected varied components of heat vulnerability.Specifically, LST was referred to as an exposure element in the population vulnerability framework [54,80,87], while the risk triangle framework considered it as a representative of hazard element [11,81,88].Kwon et al. [101] classified population density as a proxy for the sensitivity element while using the population vulnerability framework, but Dong et al. [82] interpreted it as the representative of the exposure element in the risk triangle framework.
Even in the same conceptual frame, there were different interpretations and classifications for the same indicators.For instance, an individuals' economic status and education levels were categorized into the sensitivity category by Zhang et al. [14], Wilson et al. [87], and Wu et al. [54], yet considered as adaptive capacity indicators by El-Zein & Tonmoy [26], Hulley et al. [77], and Mallen et al. [105] while all using the population vulnerability framework.Under the risk triangle framework, Lapola et al. [48] and Zhang et al. [68] categorized the elderly population as the exposure indicator, while other studies [31,49,78] commonly used it as a proxy of vulnerability component.
The great differential in numbers and categories of indicators is mainly caused by the lack of universal selection theories and criteria.Current indicator selection relies critically on local contexts [28,29].On the one hand, local characteristics of the population, infrastructure, and ecosystem have a significant impact on urban heat vulnerability.For example, Azhar et al. [95] included scheduled castes and scheduled tribes in the study mapping heat vulnerability in India, due to the existence of social class systems.In counties with a high proportion of immigrants like America, ethnicity was commonly taken into consideration while measuring heat vulnerability, such as Latino immigrants, Hispanics, and races other than white [19,102,108].
On the other hand, local data availability also plays an essential role in heat vulnerability assessment.Current studies put an over-reliance on census datasets because of the easy availability [110].Census datasets freely provide abundant and comprehensive sociodemographic metrics for heat-related assessments to capture the sensitivity and adaptive capacity of vulnerable people.The lack of data availability could alter indicator selection to a large extent.For instance, studies substituted the missing data in the study period with data from years when the targeted indicators could not be found [17,76,77].If the substituted data were not available either, the amount of indicators would be reduced correspondingly.
The determination of indicators was constrained by either data availability or the researchers' subjective judgment [30,31].Without the guidance of universal selection criteria, most studies selected indicators grounded on conclusions of previous studies, data availability, and personal view of the theoretical relationship between potential indicators and the local context [14,21].Researchers' subjective discretion was bound to impair the credibility and validity of the selected indicators.In most heat vulnerability research, the representativeness and relevance have not been demonstrated through scientific quantitative and qualitative methodologies [29,69,91].Although some studies conducted sensitivity analyses to explore the relationship between selected metrics and heat vulnerability outcomes, from the results, not all indicators had a significant statistical relationship with measured heat vulnerability [47,86].Whether the selected indicators are explicitly relevant to heat vulnerability needs further verifications.Thus, it raised doubt about the credibility and authenticity of selected metrics and achieved results.
While ensuring enough relevant indicators are involved in the assessment, it is necessary for researchers to select data proxies that can accurately represent corresponding indicators.However, according to reviewed literature, it was common that the same in-dicators had different data proxies with varied resolutions [21], and accuracy verification of selected proxies was not included before developing assessment models.Taking the most frequently used environmental indicator LST for example, it has been measured by satellite products and data from meteorologic observation sites [46,61,63].Satellite products included LST images from Landsat and MODIS sensors with spatial resolutions of 120 m and 1 km, respectively.
Higher resolutions mean more spatial characteristics of the temperature distribution, so studies [70,75,97] using Landsat images were conducted on a coarser scale and generated more detailed information.This was not absolute because the cloud cover of satellite images directly reduces the accuracy of capturing surface temperature distributions [104,110].Therefore, satellite images with high resolution are imperative when using remote sensing to depict extreme heat events.Moreover, it is worth noting that studies often used several satellite images, each covering 8-16 days, to reflect the LST spatial distribution in a whole year.It is unclear whether the operation would reduce the accuracy of the data proxy and even have a negative impact on the credibility of final heat vulnerability results.
According to the reviewed literature, we found heat vulnerability indexing models were the most popular modelling methods.The modelling methods incorporated indexing models based on the population vulnerability framework, the risk triangle framework and other heat-related indexing models.Despite the consistent increment of HVIs, the conceptualization of the index remained incomplete and required further development.From Table 4, the number of studies using other heat-related indexing models is close to the summation of studies based on the two popular frameworks.According to the difference in focus on vulnerability characterization, the model developed two types of HVI, i.e., biophysical vulnerability index and social vulnerability index.The biophysical vulnerability index was proposed to depict precise characteristics of vulnerability to biophysical exposure, which encompassed environmental factors such as temperature distribution and urban form.
The social vulnerability index was meant to capture human health and well-being in aspects of society, economy, policy, and culture for estimating to what extent people are susceptible and whether they have adequate capacity to heat vulnerability.Incomplete indicator systems indicated that these indexes were incapable of reflecting integral characterizations of heat vulnerability due to the missing depiction of imperative features.Further, Johnson et al. [104] and Macnee & Tokai [59] argued that both socioeconomic vulnerability and biophysical exposure were complementary components of heat risk, which cannot be completely grasped in the absence of any component.
To fill the lacuna of integrative approaches, composite HVIs combined with biophysical and sociodemographic indicators have been developed and broadly applied.As mentioned above, composite HVIs developed on the two popular frames all comprised 3 components and were commonly combined with additive approaches.However, it is unclear whether there is an additive effect among different assessment dimensions and indicators under each dimension.Some studies opted to use other arithmetic means to calculate HVIs [80,82,98], i.e., subtraction, multiplication, and division, but they failed to adequately demonstrate the underlying logical relationships between constituent elements either.The logicality and adequacy of the commonly used equations have not been examined before in heat vulnerability assessments.This is urgently needed to discover theoretical constructs for supporting developments of HVIs, not just based on subjective assumptions.
The same controversy persisted in several studies [11,76,92] which applied GIS techniques, i.e., the overlay analysis approach, to obtain heat vulnerability maps because they were also grounded on the assumption of additive effects.No matter indexing models, GIS techniques, or other statistical methods applied are all quantitative approaches.Those approaches failed to accommodate qualitative analysis approaches which are complementary to quantitative methods because qualitative components are usually non-measurable.However, the integration of quantitative and qualitative perspectives is necessary for policy formulation in the mitigation of heat vulnerability.It is challenging to qualitatively Energies 2022, 15, 6998 20 of 34 capture heat vulnerability and rationally integrate qualitative and quantitative elements.Cheng et al. [21] attempted to apply questionnaire surveys to qualitatively capture people's heat risk perception and integrate it with quantitative spatial analysis to identify and assess heat vulnerability in Beijing, China.More qualitative components and approaches to heat vulnerability should be explored and checked in the assessment to provide guidelines for scientific decision-making.
Once the sets of indicators and the theoretical framework are determined, the weight assignment between each indicator or component is the next quite important step, which needs selecting an objective and effective weighting tool.This review found EW and PCA were the most frequently used weighting methods in heat vulnerability studies.Substantial studies [16,85,99] asserted that they decided to allocate all indicators or components with equal weights by referring to literature reviews on previous heat-related research.They assumed that each element is independent and represented a separate dimension of vulnerable targets to heat exposure.Nevertheless, there are no theoretical fundamentals to support the authenticity and credibility of this assumption.In contrast, it was discovered that the effects of vulnerability indicators might be altered as geographical scales and distances to metropolitan centers change [20,108].
Therefore, it is less convincing to distribute equal weights to selected indicators without any further examinations or scientific justifications.Another doubt is whether the hierarchical scheme is reasonable and valid, which allocates equal weights to indicators under the same subcategories.Ho et al. [55] and Mushore et al. [75] assigned equal weights to the indicators under dimensions of biophysical exposure and social vulnerability.However, the indicators under different evaluation dimensions had no equal weights because of the difference in original weights assigned to dimensions and the number of indicators of each dimension.It is unclear whether the consistency without any theoretical construct can possibly affect the accuracy and precision of vulnerability assessment results.Consequently, the decision on the selection of the EW method is totally at the author's discretion from the subjective understanding of the relationships between each element.
The controversy of equal weights also occurred while applying PCA.Although PCA allocates weights to the variables under components according to the explained variances [61].It was quite common for the studies which applied PCA to distribute equal weights to separate components.The implication of equal weighting mechanics also lacks adequate evaluations and supporting theories.PCA was commonly applied to reduce the number of indicators and identify principal components according to the statistical relationships between screened indicators.For instance, Cutter et al. [112] [108] entitled each component according to the commonality of indicators included.However, the socioeconomic component contained an indicator of personal health condition, i.e., the population with a disability, and race indicators of Hispanic and Black are classified into separate components.The mismatch between components and indicators has led to clutters in the analysis logicality.Consequently, whether the components obtained based on statistical relationships have rational practical meanings remains doubtful.Moreover, the application of PCA has a strict requirement for proxy data, which must ensure data integrity without any missing data.The restriction further limits the application of available datasets.Despite the limitation mentioned, PCA is still the most acknowledged and applicable statistical weighting method in the field of heat vulnerability assessment.
The results of the review disclosed that nearly 70% of the included vulnerability studies selected census tracts and administrative areas as spatial units to map urban heat vulnerability.Those studies over-relied on easily available census datasets and statistical data from official departments.However, varying heat vulnerability indicators interact across the boundaries of geographical units, which experience consolidations, revisions, and splits in urbanization.The selection of spatial scales may not draw rational conclusions due to underestimating the influence of interactions between administrative units on heat vulnerability and the accompanying modifiable areal unit problem (MAUP).MAUP is a statistical bias caused by aggregating types of indicators with different spatial features in a specific spatial scale [113].After all, it is difficult to adequately capture geographical features with simple mean values on a coarse spatial scale.Additionally, the coarse granularity of distractive areas cannot meet the demand of contemporary urban planning and policymaking.
For avoiding MAUP and improve the accuracy of vulnerability assessment, many studies [27,50,71] attempted to harmonize spatial characteristics of human sensitivity and environmental exposure on a grid-scale with resampling approaches, which benefited from the development of related satellite techniques and available products.The application of remote sensing in heat-related research greatly improves the spatial resolution, which makes it possible to capture the difference between communities.Nevertheless, the quality of satellite images is influenced by atmospheric conditions during the imaging period, particularly the cloud cover.If the quality was not guaranteed, the improvement of spatial scale would not bring more precise and accurate vulnerability results.As there is a broad application of heat-related satellite products, assessing urban heat vulnerability on a finer grid scale with remote sensing data is becoming a trend for urban planning.Moreover, census datasets solely incorporate static data based on households, which cannot reflect the dynamic properties of individuals.Current studies rarely provide insight into how to assess urban heat vulnerability on the premise of considering a subjects' location dynamic changes due to daily activities [111].
From a temporal perspective, most studies focused on the measurement and analysis of historical heat vulnerability in an implicit short term.These studies attempted to investigate how and why vulnerability patterns exist and distribute in a specific time after the 1990s.However, knowledge of only the spatial distributions and drivers of increased heat vulnerability is insufficient.The cross-sectional studies failed to capture the evolution and development of urban heat vulnerability in a long-term series and lacked the projection of potential heat risk.Thus, it is necessary to involve the time dimension in the assessing system.Furthermore, the lacunae of longitudinal and predictive research may impede the development and application of a broader applicable and comprehensive theoretical framework.On the one hand, longitudinal studies can adequately reflect the heterogeneity and dynamic evolution of heat vulnerability in the spatiotemporal dimension [50,67,87].
Evolution information is essential for urban planners to design mitigation strategies and examine the validity of previous measures.On the other hand, predictive studies are beneficial to grasp the potential trend of heat vulnerability in the future and provide guidelines for policymakers to formulate effective and efficient precautionary measures [24,63,98].Interventions and policies anchored on historical knowledge have hysteretic nature, the reason for which is that they are formulated and applied after the occurrence of extreme heat vulnerability.For the prevention of potential heat risk, the prediction of heat vulnerability characteristics and corresponding preventive strategies from divergent scientific perspectives is particularly significant.To scientifically mitigate the extreme heat vulnerability, we call for more attempts and achievements in longitudinal and predictive research.
Validation is a dispensable process for heat vulnerability measurement and assessment [106,107].However, according to the results of validation approaches, only 20 studies validated the research achievements, accounting for under a quarter of the selected articles.
The lack of validation procedures may undermine the authenticity and credibility of the results and conclusions.Some research [54,98,100] argued that the vulnerability model applied has been validated by previous studies.However, the applicability and validity were only evaluated in specific study regions, which are insufficient as the theoretical underpinnings for the adjustment and application in the context of new studies.
The validation part is still imperative to broaden the applicability and enhance the conceptual comprehensiveness of current popular theoretical frameworks [102].Moreover, many studies [67,77] mentioned the limitation of lacking validation and described future validation plans.They explained the main reason was the lack of corresponding validation data.If enough validation indicators are available, further validation analyses will be taken.However, only a few studies [106,107] conducted performance assessments to provide a solid support for the previous conclusions.Therefore, future heat vulnerability assessments should adequately consider and check the feasibility of validation experiments while designing study constructs.
All the studies with validation selected adverse heat-related health outcomes as validation variables, which is based on the empirical understanding that there is a causal relationship between HVIs and physical health consequences.Since Reid et al. [106] used hospitalizations and mortality counts to evaluate the proposed HVI before, it has been acknowledged by following related research to assess the performance of heat vulnerability results by using heat-related morbidity and mortality.The geographical area with a high HVI value may experience the potential risk of high morbidity and mortality [106].However, although more frequent and intense extreme heat is associated with individual health conditions and contributes to increased morbidity and mortality [114,115], the stability and efficiency of the relationship has not been stated or examined in any reviewed articles.
It is unclear whether the heat-related morbidity and mortality of subjects are eligible to be the validation indicator and to what degree it can reflect the population's heat vulnerability.For instance, Conlon et al. [52] obtained extremely low R 2 values both in census tracts and blocks when regressing calculated HVIs with mortality during heat extremes, which potentially indicates that heat-related outcomes inadequately reflect heat vulnerability.Furthermore, multiple factors may influence heat-related morbidity and mortality.Zafeiratou et al. [115] discovered that cold weather contributed more to cardiovascular mortality rather than hot temperature.Although Sun et al. [116] found that increased extreme heat exacerbated the susceptibility and incidence of respiratory patients, Zafeiratou et al. [115] argued that the effect of air conditions on respiratory outcomes could not be ignored.It is extremely significant to excise the interferences from other attributes while using morbidity and mortality as validation variables.
While using morbidity and mortality as validation variables, the validation process was often constrained by local data availability, which led to spatiotemporal mismatches between assessment results and validation data, and substitution of all-cause health outcomes for heat-related morbidity and mortality.In terms of spatiotemporal mismatches, due to the lack of appropriate data, Estoque et al. [27] substituted city-level validation datasets for 2015 with province-level mortality during hot seasons from 2009 to 2011.Evaluating and validating heat vulnerability results with data for similar years on a coarser scale inevitably fails to capture adequate detailed information and harms the accuracy of validation results.Several studies [51,52,71,102] opted to use all-cause morbidity and mortality instead of heat-related ones because of data availability.Numerous factors contribute to adverse health outcomes besides extreme heat, such as individual habits, other meteorological factors, mental health status, etc.Therefore, it raises doubt whether all-cause morbidity and mortality are adequately eligible for validation data and likely to affect the accuracy of validation results.
Another point worth noting is that extreme heat not only poses a salient risk to human physical health, but also threatens the well-being of their mental status [117,118].However, the selected studies solely concentrated on people's physical health conditions, so we call for more indicators of mental and perceptual reactions to increased extreme heat.
Ultimately, current validation indicators are from a statistical perspective without in situ measurements.Although statistical datasets are convenient and cost-effective, they have an inherent error and hysteresis nature due to the way of measurement and statistics.
Due to this circumstance, it is recommended to collect and survey personal feelings and perceptions in increased extreme heat and vulnerability parameters of the thermal environment from targeted vulnerable people and regions located by assessment outputs.In situ measurements of personal perceptions and ambient thermal environment during extremely hot days are appropriate complements to medical recordings [119].Heat vulnerability is not bound to cause diseases and deaths, but the knowledge of individual feelings and perceptions in extreme heat is a precise reflection of biophysical exposure and social sensitivity.Moreover, field observations of the thermal environment, especially temperatures, cooling infrastructures, and green spaces, are complementary to a human perspective.It helps evaluate and examine heat vulnerability assessment outcomes from an angle of environmental exposure and resilience.
Multinomial regression models and Poisson regression models are the most frequently used validation models.Most reviewed studies selected simple linear regression models grounded in the assumption of the existing linear relationship between HVI and adverse health outcomes [49,103,109].They utilized the accuracy of prediction of heat-related health outcomes with HVI to assess the performances of the studies.The facticity and stability of the assumed linear relationship have a direct impact on validation results.The introduction of Poisson regression models was anchored in the fact that heat-related morbidity and mortality during extreme heat periods fit to Poisson distribution.Poisson regression models are often used to examine contributing factors of diseases in the medical field [120].The results of Poisson regression models illustrate what extent HVI explains the variation in heat outcomes.
Moreover, Pearson's correlation and Spearman's correlation were introduced to validate sections by measuring the linear or monotone relationships between HVI values and health outcomes.This review found that those validation methods were limited to statistical and quantitative aspects without any field investigations or qualitative methods, which is likely to undermine the accuracy and credibility of the validation.The field observation of temperatures in the thermal environment and the adequacy of cooling resources is beneficial to evaluate and validate the risk of biophysical heat exposure.The socioeconomic vulnerability can be qualitatively analyzed and verified by questionnaires and interviews with populations in vulnerable areas characterized by heat vulnerability outputs [21].This method helps to grasp people's perceptions and opinions of extreme heat and corresponding behavioral and habitual reactions to mitigate heat risk.It is an effective way to qualitatively analyze and validate whether the targeted populations are vulnerable to or feel worried about increased extreme heat.
As discussed above, selections of indicators, scales, and methods in processes of modelling and validation all have an impact on the performances of heat vulnerability measurement studies.It is essential for future research to select the elements suitable to local contexts.According to validation results, only 50% of studies with validations had relative great evaluation performances.The rate indicates that it is urgent for studies without validations to evaluate and validate their results to support the credibility of their conclusions and the applicability of methods.None of the other 50% of studies obtained satisfactory results which indicating a significant and strong correlation between heat vulnerability values and adverse health outcomes.Despite the lack of solid support for validation results, some studies [27,105] asserted that the results were quite important and useful in heat risk profiling and policymaking.
Thus, it is worth doubting whether the results and conclusions are credible and meaningful without the support from a good validation performance.Moreover, the difference in correlations at the different scales indicates the limited applicability of the developed methods.The overall validation performances reveal that HVI values alone may not suffice when measuring urban heat vulnerability [27,101].Urban planners and policymakers should integrate multiple measuring and assessing approaches in the planning, conception, and design process of mitigation and intervention strategies.It is worth noting that validation approaches also affect performance outcomes.The differences between study performances may not precisely reflect the method accuracy and heat vulnerability gap due to the incomprehensiveness of validation methods.In situ measurements and qualitative analyses are necessary to examine and verify the effectiveness and authenticity of validation results.In this way, it is possible to evaluate the heat vulnerability results objectively and accurately, and substantially promote the development of methods measuring urban heat vulnerability.

Conclusions
As global warming, because of anthropogenic climate change, has been intensifying due to human activities, more and more people are inevitably becoming exposed to extreme heat, and the lack of or highly limited effective ecological planning is not helping [121][122][123][124][125]. The UHI effect exacerbates the circumstance, which leads to urban residents being more vulnerable to extreme heat than people in the surrounding areas.An increasing number of methods are thus being developed to measure urban heat vulnerability in various regions worldwide, to guide urban planning and policymaking.Nevertheless, thorough review studies that compare, contrast, and help in understanding the prospects and constraints of urban heat vulnerability assessment methods are scarce.The objective of the review was to bridge this gap by using the PRISMA approach.The results were analyzed and discussed from three aspects of indicators and data, modelling approaches, and validation approaches to assess the validity and limitations of these methods.
The results disclose that demographic properties and socioeconomic status, health conditions and medical resources, and natural and built environmental factors are commonly used indicator types.However, there seems to be no universal criteria and specifications in the selection of input metrics in terms of quantities, varieties, theoretical fundamentals, and data proxies.It is essential to incorporate as many key factors as possible according to supporting theories and local contexts to avoid subjective discretions.Meanwhile, controlling number of indicators by eliminating less relevant ones and keeping the consistency between indicators and categories is also important.Ultimately, researchers should select data proxies with high quality and adequate representativeness.
Another finding is that heat vulnerability indexing models, the equal weighting method, and the principal component analysis are commonly used modelling and weighting methods.Biophysical exposure and socioeconomic vulnerability are two main aspects captured by heat-related indexes.The heat vulnerability framework and risk triangle framework integrate the aspects of heat vulnerability for the comprehensiveness of assessments.However, further explanation of the underlying logicality is lacking, which is likely to undermine the reliability of the frameworks.Moreover, more attention should be paid to qualitative methods, though assessment from a qualitative perspective is quite difficult.EW and PCA methods are frequently used to allocate weights to indicators, while they have inherent limitations.The assumption of equal weighting seems to be untenable in some situations, and the PCA methods make indicators less readable and interpretable.Model outputs are mainly concentrated in administrative scales and historical short-term cross-sectional perspectives.Although it helps to capture the spatial variance of heat vulnerability, the evolution and trend of heat risk cannot be measured.To scientifically mitigate and intervene in extreme heat vulnerability, we call for more attempts and achievements in longitudinal and predictive research at finer scales.
In aspects of validation processes, slightly more than a quarter of all studies involved model validation.The results revealed that statistical regressions and correlation coefficients between heat vulnerability results and adverse health outcomes are commonly used validation approaches.Nevertheless, the relationship between HVI values and health consequences was not stable or strong enough in some studies.Both the substitution of all-cause health outcomes for heat-related morbidity and mortality spatiotemporal mis-matches between assessment results and validation data had an impact on the accuracy and reliability of validation results.Moreover, validation methods are limited to statistical and quantitative perspectives without any field investigations and qualitative methods.Thus, more indicators, such as people's perceptions and mental health and in situ data of thermal environment, and multiple qualitative methods, such as questionnaires and interviews, should be involved in model validation.Overall validation performances imply that HVI values alone may not suffice, and urban planners and policymakers should integrate multiple measuring and assessing approaches while formulating mitigation and intervention strategies.
In conclusion, there is a lack of a comprehensive and systematic framework in the development and validation of heat vulnerability assessment methods, which is adjustable to the local context.The findings and discussions are helpful for the establishment of an acknowledged framework in the future.This study informs urban policy and generates directions for prospective research and more accurate vulnerability assessment method development.
The earliest selected article was published in 2006, with only three research articles being published from 2006 to 2011 There was an increase to five articles in 2012, and the number of publications fluctuated between two and seven until 2017.The number almost doubled in 2018 and stayed at a stable level of 10-12 during 2018-2021.This review also classified the selected articles by research region and climate (Figures 3 and 4) because geographic locations and climate status directly influence the re-
The earliest selected article was published in 2006, with only three research articles being published from 2006 to 2011.There was an increase to five articles in 2012, and the number of publications fluctuated between two and seven until 2017.The number almost doubled in 2018 and stayed at a stable level of 10-12 during 2018-2021.

Figure 2 .
Figure 2. Publication distribution by year.

Figure 2 .
Figure 2. Publication distribution by year.

Figure 3 .
Figure 3. Publication distribution by research region.

Table 3 .
Common indicators and data sources.
Composite indicatorsIndicators built by temperature, humidity, wind etc. such as Heat Index, Humidex and Wet Bulb Globe Temperature

Table 4 .
Modelling and weighting methods and outputs.

Table 5 .
Validation indicators, methods, and study performance.
reduced 42 original indicators to 11 eligible indicators by PCA.Johnson et al. [104] obtained 19 variables from 25 well-known indicators by iteratively removing 6 variables exhibiting complex structures with PCA.The final obtained components without titles are less interpretable and comprehensible.It is unknown whether each component made up of a set of indicators that have statistical relationships can explicitly reflect varied vulnerability dimensions.For example, Johnson et al. [104] obtained 4 components by PCA, one of which is made up of indicators of the black population and land surface temperature.What vulnerability characteristics does the combination of demographical properties and natural environment metrics indicate, although it can explain approximately 7% of the variance?Nayak et al.