An Innovative Index for Evaluating Urban Vulnerability on Pandemic Using LambdaMART Algorithm

: The COVID-19 pandemic has signiﬁcantly changed urban life and increased attention has been paid to the pandemic in discussions of urban vulnerability. There is a lack of methods to incorporate dynamic indicators such as urban vitality into evaluations of urban pandemic vulnerability. In this research, we use machine learning to establish an urban Pandemic Vulnerability Index (PVI) that measures the city’s vulnerability to the pandemic and takes dynamic indicators as an important aspect of this. The proposed PVI is constructed using 140 statistic variables and 10 dynamic variables, using data from 47 prefectures of Japan. Factor Analysis is used to extract factors from variables that may affect city vulnerability, and the LambdaMART algorithm is used to aggregate factors and predict vulnerability. The results show that the proposed PVI can predict the relative seriousness of the COVID-19 pandemic in two weeks with a precision of more than 0.71, which is meaningful for taking controlling measures in advance and shaping the society’s response. Further analysis revealed the key factors affecting urban pandemic vulnerability, including city size, transit station vitality, and medical facilities, emphasizing precautions for public transport systems and new planning concepts such as the compact city. This research explores the application of machine learning techniques in the indicator establishment and incorporates dynamic factors into vulnerability assessments, which contribute to improvements in urban vulnerability assessments and the planning of sustainable cities while facing the challenges of the COVID-19 pandemic.


Introduction
Against the background of the COVID-19 pandemic, the concept of resistance and vulnerability has drawn more and more attention [1]. Urban vulnerability represents a state of being likely to be influenced by natural disasters, including earthquakes, typhoons, and floods. When natural disasters strike, high-vulnerability cities and high-vulnerability areas in the city will be more likely to be harmed, inflicting suffering on vulnerable groups and exacerbating existing inequalities [2]. Recent studies show that pandemics such as COVID-19 should also be regarded as natural disasters and be included in the discussion of urban vulnerability [3,4].
In urban planning, many urban factors might affect urban vulnerability, such as excessive population density, low-quality housing, inadequate infrastructure, and environmental degradation [2,5]. The differences between urban factors bring different levels of urban vulnerability, leading to different performances in response to natural disasters. For example, income inequality will affect a region's vulnerability to flooding [6]. The COVID-19 pandemic also shows regional differences in its spread, bringing more damage to vulnerable countries and regions [7,8]. UN-Habitat pointed out that the Asia-Pacific region could be the most susceptible due to its fast urbanization rate, with one-third of urban dwellers in slums or slum-like conditions [9]. Given the still-raging pandemic, assessing urban vulnerability to the pandemic is an urgent task.

Literature Review
Unlike natural disasters such as earthquakes and typhoons, which can suddenly wreak havoc on urban facilities, pandemics such as COVID-19 threaten cities by harming residents' health, stopping facilities from functioning, and disrupting daily urban life [14]. On the one hand, the pressure brought on by the pandemic will be transmitted to all aspects of urban life through various paths [15], making it more challenging to identify the possible vulnerable link. On the other hand, the pandemic gradually causes damage as it spreads, which is dynamic and occurs over a relatively long period [16]. This means that the risk of exposure and damage also dynamically changes along with the pandemic spread and the city's reaction.
Since the COVID-19 outbreak in 2020, there has been some research on pandemic vulnerabilities. Mishra, Gayen and Haque [3] examined four major cities of India, devised a COVID Vulnerability Index with carefully selected indicators, and analyzed why social distancing and lockdowns failed in vulnerable slums. Prieto, Malagón, Gomez and León [12] proposed an Urban Vulnerability Assessment methodology to investigate the various vulnerability factors related to pandemics and aggregate them into a vulnerability index using the data from Bogotá, Colombia. Shi, Liao, Li and Su [13] employed the crisp-set qualitative comparative analysis method to explore possible causal condition combination paths that affect community resilience to the pandemic in Wuhan, China, showing three condition configurations that were vulnerable to the pandemic, including communities populated by disadvantaged populations. These pieces of research provide an essential analytical framework for identifying possible vulnerability factors in cities, and validate the feasibility of extracting vulnerability factors from qualitative or quantitative data using methods such as Factor Analysis.
However, the dynamic pressure of the pandemic has created new difficulties for researchers. Taking Japan as an example, in the third wave of the pandemic in January 2021, the most densely populated Tokyo metropolitan area had the highest number of new infections [17]. However, in the fourth wave in June 2021, Okinawa, which performed relatively better in the last wave, had the highest number of infections and was short of medical resources [18]. This situation indicates that it is not enough to only focus on the inherent factors of cities, which led to the examination of dynamic urban indicators [16]. Practical methods that can investigate dynamic variables such as urban vitality and prevention measures such as social distancing are needed when conducting an urban vulnerability assessment. Machine learning techniques, which have gained popularity in recent years, provide some proven approaches: Zawbaa, et al. [19] used t he Multi-Layer Perceptron to model the spread of COVID-19 and verified the impact of social distancing; and Pan, et al. [20] used Random Forest to capture pandemic dynamics and make time-series predictions, and further offered optimal solutions to minimize the growth of confirmed cases and deaths through NSGA-II. These pieces of research illustrate that machine learning technology can capture the dynamic spread of the pandemic and use empirical data to verify the results.
Therefore, this research aims to establish a composite index to evaluate urban vulnerability to the pandemic. The proposed composite index is innovative in considering the impact of dynamic factors on urban vulnerability, which can make up for the insufficient advances in the urban dynamics research line of urban vulnerability assessments [10]. The Factor Analysis (FA) will extract essential factors from the available indicators, and a machine learning LambdaMART algorithm is used to combine these factors into a Pandemic Vulnerability Index (PVI). PVI's prediction ability will be verified on empirical data, and the critical characteristics that influence urban pandemic vulnerability will be examined through feature importance and dependence analysis. The proposed PVI is expected to dynamically identify vulnerable regions and remind decision-makers in the corresponding region to take preventive measures such as social distancing or expanding healthcare capacity in advance. The corresponding analysis is expected to reveal the key influencing factors, such as urban sprawl, that should be carefully considered in future urban planning. We believe that a pandemic vulnerability index that includes dynamic factors can refine the framework for urban vulnerability assessments and contribute to flexible and accurate city planning and policies in the post-COVID-19 era. Machine learning techniques, including FA or LambdaMART, are a promising method.

Situation of Japan
As a democratic government that emphasizes local autonomy, Japan's prefectures show significant differences among regions. From the urban foundation perspective, the cities of the three major metropolitan areas around Tokyo, Osaka, and Nagoya have formed a distinctive urban form with high population density and a relatively complete infrastructure, differing from other prefectures. Different prefectures have made different policies in response to the pandemic due to their different pandemic situations and economic considerations. For example, Hokkaido's local government declared measures for the pandemic as early as 28 February 2020, while the national state of emergency was declared on 7 April 2020. The differences in urban infrastructure, residents' actions, and government policy will affect the cities' ability to counter the pandemic and become the basis for establishing and verifying the cities' vulnerability index.

Data Source and Software
The Japanese government has released various databases to fight the COVID-19 pandemic and support related research. In addition, many Internet service providers, such as Google or NTT, have also released a series of data related to human activities during the pandemic. The data used in this research are from the public database, including: These databases cover necessary information for each prefecture, such as population, GDP, medical facilities, and the dynamic changes in the pandemic situation such as the number of infections, providing the possibility of describing the differences between cities' vulnerability levels. The data were from 1 March 2020, to 1 March 2022, and accessed on 8 March 2022.

Pandemic Vulnerability Index
This research used FA and LambdaMART to establish PVI through a typical machine learning workflow. Sections 3.3.1-3.3.4 describe variable pre-processing, COVID-19 damage representation, LambdaMART details, and training and validation settings, respectively. Figure 1 shows the overall framework of this research. This includes four steps: 1.
Extracting influential factors related to urban pandemic vulnerability through Factor Analysis and calculating a Damage of COVID-19 Pandemic (DOP) score for the pandemic; 2.
Using the urban factors and DOP score as data and labels, respectively, to supervise the training of a LambdaMART model; 3.
Using the trained LambdaMART model to establish the PVI, and evaluating the PVI's performance on the validation dataset; 4.
Analyzing the PVI to reveal critical factors regarding urban pandemic vulnerability.

Pandemic Vulnerability Index
This research used FA and LambdaMART to establish PVI through a typical machine learning workflow. Sections 3.3.1-3.3.4 describe variable pre-processing, COVID-19 damage representation, LambdaMART details, and training and validation settings, respectively. Figure 1 shows the overall framework of this research. This includes four steps: 1. Extracting influential factors related to urban pandemic vulnerability through Factor Analysis and calculating a Damage of COVID-19 Pandemic (DOP) score for the pandemic; 2. Using the urban factors and DOP score as data and labels, respectively, to supervise the training of a LambdaMART model; 3. Using the trained LambdaMART model to establish the PVI, and evaluating the PVI's performance on the validation dataset; 4. Analyzing the PVI to reveal critical factors regarding urban pandemic vulnerability.

Influential Variables on Vulnerability
Many variables may impact urban vulnerability, including population density, GDP, medical facilities, etc. [26][27][28][29][30]. Specific to pandemic vulnerability, these variables can roughly be divided into two groups: statistic variables that describe the static conditions of a city over a relatively long period, such as population, industrial structure, medical facilities; and dynamic variables that describe the dynamic status of a city during the pandemic, such as the interim policies, urban vitality, and disease prevalence [12,13,16,19].
Dynamic variables represent the efforts made to fight the pandemic, which will change over time. Although statistical variables will also change with the development of the pandemic, it is difficult to obtain the latest information due to the statistical process. Considering that changes in statistical variables are usually relatively slow, and PVI is more concerned with relative differences, statistics variables before the outbreak were used.

Influential Variables on Vulnerability
Many variables may impact urban vulnerability, including population density, GDP, medical facilities, etc. [26][27][28][29][30]. Specific to pandemic vulnerability, these variables can roughly be divided into two groups: statistic variables that describe the static conditions of a city over a relatively long period, such as population, industrial structure, medical facilities; and dynamic variables that describe the dynamic status of a city during the pandemic, such as the interim policies, urban vitality, and disease prevalence [12,13,16,19].
Dynamic variables represent the efforts made to fight the pandemic, which will change over time. Although statistical variables will also change with the development of the pandemic, it is difficult to obtain the latest information due to the statistical process. Considering that changes in statistical variables are usually relatively slow, and PVI is more concerned with relative differences, statistics variables before the outbreak were used.
Since there may be a strong correlation in variables, extracting interpretable factors can effectively reduce the number of variables and facilitate subsequent calculations. A widely used method is Factor Analysis (FA), which assumes that all observed correlated variables are determined by orthogonal unobserved factors [31]. Researchers can locate a set of factors that reveal a simple hidden structure without losing the information contained in the original variables. The FA used in this research explains a set of m variables in each of n cities with a set of k factors. There should be fewer factors than variables, so k < m, and these factors are related to the variable via a factor-loading matrix L ∈ R m×k . The model can be concluded as follows: where observation matrix X ∈ R m×n , factors matrix F ∈ R k×n , error term matrix ∈ R m×n and mean matrix M ∈ R m×n . By choosing appropriate constraints, the observation matrix X can be transformed into the factor matrix F without losing too much information. The resulting factors were used together with the dynamic variables as data for subsequent model training and PVI establishment.

Damage of COVID-19 Pandemic
Cities' vulnerability can be represented by the damage caused by COVID-19. The greater the actual damage, the more vulnerable the city is to a pandemic. Here, we use the total score for infection status and pressures on the medical care system to present the Damage of COVID-19 Pandemic (DOP). According to the Ministry of Health, Labor, and Welfare of Japan, six indicators were officially used to characterize the COVID-19 status (see Table 1) [23]. These indicators are all critical descriptions of the COVID-19 pandemic and represent its speed of spread, severity, and the stress on the healthcare system. A comprehensive single metric is needed to capture the damage to cities caused by the pandemic. Due to the method's simplicity and limited compensability, the geometric mean after the min-max normalization is used to aggregate these indicators [32]. A city's DOP score will be set based on the geometric mean of normalized indicators N I, and rescaled to between 0 and 10.
Subscript s denotes the six different indicators, and subscript d denotes the date. The DOP score varies over time, representing the change in the damage to cities as the pandemic spreads. Note that indicators are normalized according to maximum and minimum values among cities, and a city will only receive a maximum score of 10 when all its indicators are at a maximum among cities. This normalization means that the DOP score reflects more relative disparities between cities, rather than the absolute severity of the pandemic. As the pandemic spreads exponentially, unnormalized data will show exponential shifts, obscuring the data characteristics at the beginning of the outbreak. Using normalized data will allow for a focus on relative comparisons between cities, which is in line with PVI's attempt to characterize the relative ability of cities to counter the damage caused by the pandemic.
Here, the DOP score only measures the pressure COVID-19 places on public health and does not cover subsequent damages such as economic losses or mental harm. Since research has shown that the more serious the damage to the public health system, the more serious the economic and social damage that follows [33], the DOP score can serve as a simple, direct, and comprehensive measurement of the relative damage caused by the pandemic.

LambdaMART Model
A city's pandemic vulnerability depends on the impact factors F mentioned in Section 3.3.1. The factors may have different weights and influence paths, represented by a set of parameters β. The pandemic vulnerability index PVI can be written as a function of factor vector F and a set of parameters β. PV Traditionally, the vulnerability index is a linear function with experts assigned as β. Such a method is limited in its expressive ability and relies too much on prior knowledge regarding data differences and regional differences [34]. Here, the supervised machine learning algorithm is used to infer the f and β automatically. The machine learning algorithm will first assume the function form f and provide an initial guess of the parameters β. Then, the algorithm will compare the difference between the resulting PVI and the actual damage caused by the COVID-19 pandemic (DOP score) and update the f and β based on the differential gradient. Through iterations, the machine learning algorithm can infer suitable a f and β that best fit the empirical DOP data, which means the resulting PVI can reflect the damage and describe the urban vulnerability to the pandemic.
In this research, we chose LambdaMART due to the specificity of the PVI. Urban pandemic vulnerability is a relative concept based on inter-city comparisons. Hence, the constructed PVI should be a relative indicator that relative ranking among cities is more important than the absolute score. Fortunately, the Learning to Ranking (LTR) technique was designed to develop an optimal ordering of items and provide a ranking, which is suitable for PVI. The LambdaMART algorithm was chosen from the LTR methods due to its powerful expression ability and robustness [35].
The LambdaMART algorithm belongs to the family of decision tree algorithms, assuming the basic functional form is a decision tree [36]. For a typical decision tree, all observations x are classified into p different regions R p , and the average of label y p is used as the predicted value in the region: Usually, a single decision tree will not produce a good prediction result. The Multiple Additive Regression Tree (MART) will iteratively calculate the loss between observed DOP score and predicted PVI, fit new decision trees along the differential gradient of previous prediction loss, and the final result will be the sum of all decision trees. However, here the DOP score and PVI represent a relative ranking, which makes it challenging to compute a differentiable loss. Therefore, LambdaMART uses a pairwise method to transform the DOP score into a partial order of pairwise comparisons. For city i and city j, the actual probability of city i being more vulnerable than city j is denoted as P ij : While the possibility given by the LambdaMART model isP ij : Therefore, the loss between observed and predicted can take the differentiable crossentropy form: It should be noted that the loss function here treats all cities equally. However, we are more concerned with those cities that are ranked higher and are more vulnerable. LambdaMART introduced the Normalized Discounted Cumulative Gain (NDCG), which emphasizes samples with high rankings. Therefore, gradient λ can be defined on the partial derivative of loss C and NDCG measurements.
where the |∆NDCG| represents the difference in NDCG after exchanging the positions of i and j. A new decision tree T l+1 now can be fit on gradient λ l from the latest decision tree T l . After L iterations, the PVI given by the LambdaMART algorithm will be as follows: In short, the LambdaMART model will repeat the cycle of "fitting a decision tree-obtaining PVI-measuring the difference between PVI and DOP scores-calculating gradient-fitting a new decision tree" until the difference between the observed DOP score and predicted PVI is small enough in terms of the partial order of pairwise comparisons.

Training and Validation
For machine learning, overfitting is a critical problem, which means that the Lamb-daMART model pays too much attention to the existing data and loses the ability to work on unobserved data. In the context of this research, overfitting means that the established PVI is consistent with the observed DOP score but cannot make a valid prediction for the future.
A general solution is the train-test splitting technique. The dataset is divided into two parts, the training dataset and the test dataset, and the model is trained using only the training dataset. When the model achieves an excellent performance and the established PVI is consistent with the observed DOP score, the model is then validated on the test dataset to see if the resulting PVI reflects the "unobserved" DOP score.
This train-test splitting technique can help us evaluate how accurately the established PVI measures urban pandemic vulnerability. In this research, the two-year dataset was evenly split into training and test datasets according to time. The data from 1 March 2020 to 1 March 2021 formed the training set used to train the LambdaMART model. The the data from 1 March 2021 to 1 March 2022 formed the test dataset, used to verify the model's performance. The data from the Diamond Princess cruise ship and imported cases were omitted to focus on vulnerability in the urban area.
Another time-related problem is the lag in pandemic damage. At a given moment, the pandemic vulnerability will not be immediately reflected in the DOP score at that exact moment, but instead will be delayed for a while. Considering that the incubation period of COVID-19 can extend up to 24 days, the PVI of cities at a specific moment should be able to predict the DOP score of the next period. According to Lauer, et al. [37], 99% of patients will develop symptoms in 14 days. Therefore, the time lag for the DOP score is set to 14 days, which means that PVI at day d is used to characterize the pandemic damage at day d + 14.

Variables Selection
Aiming to explore the possible relevant variables that affect urban pandemic vulnerability, this research referred to the variables included in previous research [12,13,16,19]. This research used 140 variables from the Digital National Land Database, 6 variables from the Google Community Mobility Report, and 10 variables from the Ministry of Health, Labor, and Welfare. The complete variable list is given in Table S1 in the Supplementary Materials.
These variables include statistic variables that describe the static conditions over a period and dynamic variables that change over time. The statistic variables involved in this research can be divided into the following five aspects: • Demographic Variables. Intuitively, the scale of a city is closely related to the spread of infectious diseases, and overpopulated cities are more vulnerable to a pandemic. Variables such as urban built-up area population density are included. • Economic Variables. Active economic activity means that more urban resources can be mobilized to counter pandemics, and diseases are more easily spread. Fiscal expenditures closely related to economic activities contribute to improved medical and public facilities.  The included dynamic variables can be divided into the following three aspects: • Vitality Variables. The number of people active in different urban areas is compa with the baseline value of February 2020, which can help characterize the urban tality changes that reflect residents' reactions to the pandemic. Urban function areas is classified into six types: retail and recreation, grocery and pharmacy, par transit stations, workplaces, and residential areas. The included dynamic variables can be divided into the following three aspects:  It can be seen from Figure 3 that urban life gradually returned to a new balance after the initial shock of the pandemic in early 2020, with the apparent fluctuations all being holiday-related. The urban vitality in the residential area increased by about 15% compared to before the pandemic, which may be related to the work-from-home trend. The urban vitality of workplaces, transit, and retail significantly decreased, with Tokyo down by about 30%, Osaka by 20%, and Ishikawa by 10%, showing that the impact of the pan- It can be seen from Figure 3 that urban life gradually returned to a new balance after the initial shock of the pandemic in early 2020, with the apparent fluctuations all being holiday-related. The urban vitality in the residential area increased by about 15% compared to before the pandemic, which may be related to the work-from-home trend. The urban vitality of workplaces, transit, and retail significantly decreased, with Tokyo down by about 30%, Osaka by 20%, and Ishikawa by 10%, showing that the impact of the pandemic varied depending on city size. The urban vitality of the transit area was comparatively the most affected, followed by the retail area. On the other hand, the pandemic waves appear to be linked to holidays and the associated urban vitality changes. In all three areas, the declaration of an emergency status and measures for spread prevention seem to be helpful to control the pandemic.

Factor Analysis
The Factor Analysis method mentioned in Section 3.3.1 was used to extract factors due to the strong correlation between the statistical variables.
The oblimin rotation method was adopted, and 85% of the variance was retained. Finally, nine factors were selected to characterize these 140 indicators, retaining 86.8% variation. Figure 4 shows the correlation matrix after the oblimin rotation with a correlation between factors lower than 0.32, which means that there is less than a 10% overlap in variance among factors [38]. The complete loading matrix is shown in Figure S1 in the Supplementary Materials.

Factor Analysis
The Factor Analysis method mentioned in Section 3.3.1 was used to extract factors due to the strong correlation between the statistical variables.
The oblimin rotation method was adopted, and 85% of the variance was retained. Finally, nine factors were selected to characterize these 140 indicators, retaining 86.8% variation. Figure 4 shows the correlation matrix after the oblimin rotation with a correlation between factors lower than 0.32, which means that there is less than a 10% overlap in variance among factors [38]. The complete loading matrix is shown in Figure S1 in the Supplementary Materials. For these nine factors, the variable with the most considerable load was extracted. This can be named according to the direction of its main load concentration (see Table 2). According to their eigenvalues, these factors were named city size, medical facilities, age structure, unemployment, cultural facilities, precipitation, industry, decentralization, and For these nine factors, the variable with the most considerable load was extracted. This can be named according to the direction of its main load concentration (see Table 2). According to their eigenvalues, these factors were named city size, medical facilities, age structure, unemployment, cultural facilities, precipitation, industry, decentralization, and commerce. These factors and dynamic variables constitute the data in the subsequent LambdaMART model. Negative load of population ratio in densely populated areas Decentralization 8 Negative load of the commercial land ratio Commerce Figure 5 reveals the difference in several factors in Japan. Large cities are concentrated around Tokyo, Osaka, and Nagoya, metropolitan areas, while the medical facility factor is highest in South Tohoku and South Kyushu. There is a relatively large aging population in Hokkaido and the Tohoku region, and cultural facilities are concentrated in the Kinki region. These differences represent the differences in regions and may affect the urban pandemic vulnerability.  Negative load of population ratio in densely populated areas Decentralization 8 Negative load of the commercial land ratio Commerce Figure 5 reveals the difference in several factors in Japan. Large cities are concentrated around Tokyo, Osaka, and Nagoya, metropolitan areas, while the medical facility factor is highest in South Tohoku and South Kyushu. There is a relatively large aging population in Hokkaido and the Tohoku region, and cultural facilities are concentrated in the Kinki region. These differences represent the differences in regions and may affect the urban pandemic vulnerability.

Model Performance
The LambdaMART model described in Section 3.3.4 was implemented in Python with the LightGBM package, and its hyperparameters were set as shown in Table 3.

Model Performance
The LambdaMART model described in Section 3.3.4 was implemented in Python with the LightGBM package, and its hyperparameters were set as shown in Table 3. After training, the PVI established by the LambdaMART model can be matched with the DOP score in the training dataset. Figure 6 shows two example results in the training set. The red bars show the actual DOP score after 14 days, while the blue bars show the PVI that was learned in the training set. In the first pandemic wave on 16 May 2020, the three most vulnerable regions were Fukuoka, Ishikawa, and Hokkaido, with the highest DOP score obtained after 14 days. Similarly, on 6 February 2021, in the third pandemic wave, the three areas with the highest PVI became Tokyo, Chiba, and Kanagawa as the situation changed, and the DOP score after 14 days also changed. Noticed that the accuracy is lower for regions with a lower PVI, which is related to the NDCG metric used in the LambdaMART model. The NDCG metric assigns a higher weight to the top-ranked predictions limits model performance in less vulnerable area predictions.
Overall, the PVI and the DOP score fit perfectly, with an average NDCG@10 = 0.9411 and Pecision@10 = 0.7591 in the entire training dataset. The PVI ranking is validated in the test dataset based on the learned model. Figure 7 shows the two example results in the test set on 15 May 2021, and 8 January 2022. The DOP score in the test dataset is "unseen" for the LambdaMART model, so the result represents the model's predictive ability. On 15 May 2021, the three regions with the highest PVI reported by the model were Okinawa, Osaka, and Hokkaido. The actual DOP score obtained 14 days later shows Okinawa, Hokkaido, and Osaka, the same regions, with slightly different rankings. The results for 8 January 2022 also show correct predictions but slightly different rankings, with only one wrong prediction in the top-10 PVI areas. Overall, the model is accurate, reporting an average NDCG = 0.9149 and Pecision@10 = 0.7189 in the whole test dataset. Despite the slight drop, the model can still effectively reflect the severity of the pandemic. Figure 8 takes three typical regions, namely Tokyo, Osaka, and Ishikawa, to represent the results in both datasets. The red line shows the actual DOP scores with a 14 day lag. The solid blue line shows the PVI in the training phase, and the blue dashed line shows the prediction of PVI. In Tokyo, the PVI effectively reflects the damage caused by COVID-19 predictions, except from October 2021 to December 2021. The calculated PVI appears to overestimate the vulnerability of Tokyo during this period. Osaka's PVI performance is generally excellent, with occasional deviations in the test dataset. In Ishikawa, the predicted PVI appears to underestimate vulnerability between October and December 2021. score after 14 days, while the blue bars show the PVI that was learned in the training set. In the first pandemic wave on 16 May 2020, the three most vulnerable regions were Fukuoka, Ishikawa, and Hokkaido, with the highest DOP score obtained after 14 days. Similarly, on 6 February 2021, in the third pandemic wave, the three areas with the highest PVI became Tokyo, Chiba, and Kanagawa as the situation changed, and the DOP score after 14 days also changed. Noticed that the accuracy is lower for regions with a lower PVI, which is related to the NDCG metric used in the LambdaMART model. The NDCG metric assigns a higher weight to the top-ranked predictions limits model performance in less vulnerable area predictions.  Overall, the PVI and the DOP score fit perfectly, with an average NDCG@10 = 0.9411 and Pecision@10 = 0.7591 in the entire training dataset. The PVI ranking is validated in the test dataset based on the learned model. Figure 7 shows the two example results in the test set on 15 May 2021, and 8 January 2022. The DOP score in the test dataset is "unseen" for the LambdaMART model, so the result represents the model's predictive ability. On 15 May 2021, the three regions with the highest PVI reported by the model were Okinawa, Osaka, and Hokkaido. The actual DOP score obtained 14 days later shows Okinawa, Hokkaido, and Osaka, the same regions, with slightly different rankings. The results for 8 January 2022 also show correct predictions but slightly different rankings, with only one wrong prediction in the top-10 PVI areas. Overall, the model is accurate, reporting an average NDCG = 0.9149 and Pecision@10 = 0.7189 in the whole test dataset. Despite the slight drop, the model can still effectively reflect the severity of the pandemic.   solid blue line shows the PVI in the training phase, and the blue dashed line shows the prediction of PVI. In Tokyo, the PVI effectively reflects the damage caused by COVID-19 predictions, except from October 2021 to December 2021. The calculated PVI appears to overestimate the vulnerability of Tokyo during this period. Osaka's PVI performance is generally excellent, with occasional deviations in the test dataset. In Ishikawa, the predicted PVI appears to underestimate vulnerability between October and December 2021. Since PVI is a relative indicator, the cessation of the pandemic in metropolitan areas between October and December 2021 makes the model simultaneously "overestimate" the risk in the metropolitan area and "underestimate" the risk in the non-metropolitan area. However, the rapid spread of the Omicron variant in Tokyo in January 2022 (see Figure  3) shows that such a deviation is only temporary, and the metropolitan area is still vulnerable to a pandemic.
Generally, these results show that the LambdaMART model has a good generalization ability, proving that the PVI can effectively predict the damage caused by the COVID-19 pandemic in the city with an overall 0.7198 top-10 accuracy. Since PVI can forecast Since PVI is a relative indicator, the cessation of the pandemic in metropolitan areas between October and December 2021 makes the model simultaneously "overestimate" the risk in the metropolitan area and "underestimate" the risk in the non-metropolitan area. However, the rapid spread of the Omicron variant in Tokyo in January 2022 (see Figure 3) shows that such a deviation is only temporary, and the metropolitan area is still vulnerable to a pandemic.
Generally, these results show that the LambdaMART model has a good generalization ability, proving that the PVI can effectively predict the damage caused by the COVID-19 pandemic in the city with an overall 0.7198 top-10 accuracy. Since PVI can forecast vulnerable regions in two weeks given real-time data, the possibility of preventive measures in advance is opened up. Such measures include, but are not limited to, social distancing, supporting healthcare needs, expanding healthcare facilities, and framing strategies to mitigate the infection. Society can also reorganize smoothly without sudden changes by managing inventory, facilitating working from home, and preparing supplies [39]. Therefore, the proposed PVI is meaningful in controlling the pandemic and shaping the response in advance.

Feature Importance and Dependence
It would be more helpful to look into the features that influenced PVI, which can guide the subsequent policy formulation and urban planning process. In this research, the Permutation Importance Analysis and Partial Dependence Analysis were carried out to examine further the obtained model.
The permutation importance analysis can evaluate each feature's importance by randomly shuffling a single feature value [40]. Figure 9 shows the permutation feature importance of PVI, where the city size and the vitality of transit stations have the highest permutation importance, about 0.31 and 0.22, respectively. A metropolitan area's city-scale and population density create sufficient conditions for a pandemic. At the same time, the dynamism of a transit station is somewhat representative of whether there is rapid population mobility and is also essential for assessing a region's urban pandemic vulnerability.
distancing, supporting healthcare needs, expanding healthcare facilities, and frami strategies to mitigate the infection. Society can also reorganize smoothly without sudd changes by managing inventory, facilitating working from home, and preparing suppl [39]. Therefore, the proposed PVI is meaningful in controlling the pandemic and shapi the response in advance.

Feature Importance and Dependence
It would be more helpful to look into the features that influenced PVI, which c guide the subsequent policy formulation and urban planning process. In this research, t Permutation Importance Analysis and Partial Dependence Analysis were carried out examine further the obtained model.
The permutation importance analysis can evaluate each feature's importance by ra domly shuffling a single feature value [40]. Figure 9 shows the permutation feature i portance of PVI, where the city size and the vitality of transit stations have the highest p mutation importance, about 0.31 and 0.22, respectively. A metropolitan area's city-scale a population density create sufficient conditions for a pandemic. At the same time, the dyn mism of a transit station is somewhat representative of whether there is rapid populati mobility and is also essential for assessing a region's urban pandemic vulnerability. It can be seen that most dynamic variables have relatively high levels of importan which confirms the view that it is difficult to assess urban pandemic vulnerability by lying only on static statistics. In addition to city size, the critical static factors include c tural facilities, weather, and medical facilities. Note that neither emergency status nor k measures of spread prevention seem to be important, possibly because these tend to remedial measures, while the PVI emphasizes vulnerability before the pandemic hits. Figure 10 further reveals the partial dependence of the critical features of PVI. T partial dependence is the expected response as a function of the input features, assumi other conditions remain unchanged, shown as the thick blue line. The light blue line in cates the individual conditional expectation separately, with one line per sample. The features show different patterns. The increase in city-scale and transit station vitality w lead to an increase in PVI, showing that cities with large populations and high mobil will have high pandemic vulnerability. The increase in parks' vitality and medical fac ties leads to a decrease in pandemic vulnerability. It can be seen that most dynamic variables have relatively high levels of importance, which confirms the view that it is difficult to assess urban pandemic vulnerability by relying only on static statistics. In addition to city size, the critical static factors include cultural facilities, weather, and medical facilities. Note that neither emergency status nor key measures of spread prevention seem to be important, possibly because these tend to be remedial measures, while the PVI emphasizes vulnerability before the pandemic hits. Figure 10 further reveals the partial dependence of the critical features of PVI. The partial dependence is the expected response as a function of the input features, assuming other conditions remain unchanged, shown as the thick blue line. The light blue line indicates the individual conditional expectation separately, with one line per sample. These features show different patterns. The increase in city-scale and transit station vitality will lead to an increase in PVI, showing that cities with large populations and high mobility will have high pandemic vulnerability. The increase in parks' vitality and medical facilities leads to a decrease in pandemic vulnerability. As Figures 9 and 10 indicate, city-scale and transit station vitality are the two most important factors. The city's expansion has a relationship with the city's ability to resist the risk of infectious diseases and increase the city's vulnerability. For Japan, the three major metropolitan areas are at the core of social and economic development, meaning that metropolitan pandemic risk will be an essential issue for future planning. The vitality of the transit station is both a factor affecting urban vulnerability and a target of pandemic impact, making causality more challenging to analyze. However, in any case, the public transport system will be a weak link for cities when facing the epidemic. In addition, the PVI drops when medical facilities feature increases, showing that medical infrastructure investment might provide advantages in fighting COVID-19.
Changes in urban vitality also have significant impacts. The rise of urban vitality in transit leads to an increase in PVI, which verifies the necessity of the social distancing policy. On the other hand, the park's vitality helps urban vulnerability, suggesting that public open spaces such as parks should attract more attention from urban planners in the post-COVID era.

Conclusions and Discussion
COVID-19 has completely changed urban life and brought new problems and challenges to urban vulnerability research. In response to these challenges, this research proposed the concept of urban pandemic vulnerability as the first step to supplementing the urban vulnerability research and providing a Pandemic Vulnerability Index, using Japan as an example.
In this research, we took a series of statistic variables and dynamic variables of the city as a base, used the Factor Analysis to reduce the dimension, calculated the Damage of COVID-19 Pandemic Score to evaluate the damage caused by the pandemic, and used the LambdaMART algorithm to establish a Pandemic Vulnerability Index that targets the critical characteristics regarding pandemic vulnerability. The results indicate that the PVI Figure 10. The partial dependence of the top six features in the PVI.
As Figures 9 and 10 indicate, city-scale and transit station vitality are the two most important factors. The city's expansion has a relationship with the city's ability to resist the risk of infectious diseases and increase the city's vulnerability. For Japan, the three major metropolitan areas are at the core of social and economic development, meaning that metropolitan pandemic risk will be an essential issue for future planning. The vitality of the transit station is both a factor affecting urban vulnerability and a target of pandemic impact, making causality more challenging to analyze. However, in any case, the public transport system will be a weak link for cities when facing the epidemic. In addition, the PVI drops when medical facilities feature increases, showing that medical infrastructure investment might provide advantages in fighting COVID-19.
Changes in urban vitality also have significant impacts. The rise of urban vitality in transit leads to an increase in PVI, which verifies the necessity of the social distancing policy. On the other hand, the park's vitality helps urban vulnerability, suggesting that public open spaces such as parks should attract more attention from urban planners in the post-COVID era.

Conclusions and Discussion
COVID-19 has completely changed urban life and brought new problems and challenges to urban vulnerability research. In response to these challenges, this research proposed the concept of urban pandemic vulnerability as the first step to supplementing the urban vulnerability research and providing a Pandemic Vulnerability Index, using Japan as an example.
In this research, we took a series of statistic variables and dynamic variables of the city as a base, used the Factor Analysis to reduce the dimension, calculated the Damage of COVID-19 Pandemic Score to evaluate the damage caused by the pandemic, and used the LambdaMART algorithm to establish a Pandemic Vulnerability Index that targets the critical characteristics regarding pandemic vulnerability. The results indicate that the PVI proposed can effectively predict the damage caused by the COVID-19 pandemic, and further analysis revealed the key features that should be focused on to reduce pandemic vulnerability. This method could be applied to flexible data and different regions.
The main contributions of this research are: • This research established a Pandemic Vulnerability Index that can indicate relative urban vulnerability and incorporate dynamic factors into indicator construction.

•
LambdaMART is efficient in constructing a relative ranking index for urban vulnerability and can predict infection development with high precision. Accurate short-term forecasts help to take advance measures and help with preparation. • Feature importance and dependence analysis emphasize city-scale and transit station vitality when evaluating urban pandemic vulnerability.
Compared with related studies, this research has made significant improvements. The Urban Vulnerability Assessment proposed by Prieto, Malagón, Gomez and León [12] combines information on demographic factors, work styles, and transportation through Borda Counting. However, the variables used in [12] come from surveys taken before the pandemic, and the method is designed to respond to static geographic data; therefore, it cannot reflect urban dynamics during the development of the pandemic and is struggles to guide timely action through analysis. Our research has included dynamic factors such as urban vitality in addition static data, demonstrated dynamic factors' importance in evaluating urban pandemic vulnerability, and provided a reference for preventive measures in advance by forecasting two weeks in advance. Jardim, Castro Neto, Alpalhão and Calçada [16] presented an Urban Dynamic Indicator through time series decomposition and factor analysis. However, the proposed indicator aims to provide an alternative reference for urban vitality and cannot be directly applied to vulnerability assessments of the COVID-19 pandemic. The PVI proposed in our research is also a dynamic indicator and can reflect the damage to the city in two weeks, which is beyond simple descriptions of urban vitality. Overall, this research responds to the limited advances in urban dynamics in urban vulnerability assessments [10] and improves the capacity to capture the dynamic nature of urban vulnerability.
There are still some limitations to this research. Due to a lack of data, some important indicators, such as vaccination status, are not covered. Although the method does not depend on specific indicators, the lack of essential indicators may impact the model's performance and interpretation. Although PCA can retain critical information from original variables, the PVI's robustness to variable selection still needs further study. Additionally, this research is based on COVID-19 target data, which puts forward higher requirements for the data collection process in developing countries. When discussing inequality issues in small regions, such as a city block, such data requirements may pose certain obstacles, while looking into the vulnerable regions in the city is critical to deepening the understanding of urban vulnerability. Although a DOP Score is proposed as a reference, the damage to cities caused by the COVID-19 pandemic is complex, comprehensive, and has not been fully assessed to date. In the absence of a rational justification for assigning weights, which needs to be developed in future research, the proposed DOP scores assumed equal importance for all involved indicators. The relative importance of variables in building such a composite indicator calls for further in-depth analysis [12,41].
The analysis shows that the public transportation system is the weak link in the city's response to the pandemic, so we recommend paying attention to the density of the public transportation system when facing a pandemic and encouraging response measures such as wearing masks. Excessive city size can also make cities more vulnerable to pandemics, so we suggest that new urban planning ideas such as compact cities should be examined carefully in future urban planning. COVID-19 will permanently change our world, but a healthy, livable city will be the constant pursuit. The development of new technology will be our continuous progress in dealing with urban vulnerability and proposing proper urban planning based on facts.