Correlation between COVID-19 Morbidity and Mortality Rates in Japan and Local Population Density, Temperature, and Absolute Humidity

This study analyzed the morbidity and mortality rates of the coronavirus disease (COVID-19) pandemic in different prefectures of Japan. Under the constraint that daily maximum confirmed deaths and daily maximum cases should exceed 4 and 10, respectively, 14 prefectures were included, and cofactors affecting the morbidity and mortality rates were evaluated. In particular, the number of confirmed deaths was assessed, excluding cases of nosocomial infections and nursing home patients. The correlations between the morbidity and mortality rates and population density were statistically significant (p-value < 0.05). In addition, the percentage of elderly population was also found to be non-negligible. Among weather parameters, the maximum temperature and absolute humidity averaged over the duration were found to be in modest correlation with the morbidity and mortality rates. Lower morbidity and mortality rates were observed for higher temperature and absolute humidity. Multivariate linear regression considering these factors showed that the adjusted determination coefficient for the confirmed cases was 0.693 in terms of population density, elderly percentage, and maximum absolute humidity (p-value < 0.01). These findings could be useful for intervention planning during future pandemics, including a potential second COVID-19 outbreak.


Introduction
The COVID-19 outbreak was first reported in China in 2019 [1,2] and spread worldwide in early 2020. Japan declared a state of emergency in seven (of 47) prefectures on 7 April 2020 and extended it to all prefectures on 13 April 2020. The state of emergency was withdrawn on 25 May 2020. During this state of emergency, unlike many other countries where city lockdowns were enforced, in Japan, citizens self-isolated. The mortality rate (per population) in Japan is relatively low compared to the global rate; the total number of confirmed deaths in Japan is 846 (25 May 2020), corresponding to 6.72 per million people [3]. Although a straightforward comparison is infeasible, this number is smaller than that of many other countries with the same order of magnitude of population: 541, 504, 435, and 295 in Italy, the United Kingdom, France, and the United States, respectively, but larger than 5.18 and 4.0 in Indonesia and Australia, respectively (25 July 2020).
An additional difficulty in understanding the morbidity rate is the unreliability of the diagnosis of COVID-19. The number of polymerase chain reaction (PCR) tests, a simple and cost-effective method, is limited in Japan, partly because of its reliability. Therefore, chest CT is used for a fast-track, highly rates in different cities? Unlike the aforementioned studies, a major feature of Japan is the relative homogeneity of the health insurance and care system without medical collapse during this pandemic. In addition, the difference in household wealth is relatively small in Japan [28]. The average annual salary per population is USD 34,400 to 39,900 (USD 1 = JPY 107). The standard deviation of household consumption in each prefecture is 10% or less [28]. With all these demographic factors, the data sample discussed here provides a convenient case study with less bias. In a recent study [29], we examined the time course of the morbidity rate of different prefectures in Japan and found that the durations of the spread and decay stages can be characterized by population density, temperature, and absolute humidity. An additional factor would be the ratio of the elderly to the entire population; in Japan, this ratio reached 28.4% [30], which is ranked the highest globally.
This novel study aimed to evaluate the effect of ambient temperature and humidity on mortality and morbidity rates in different prefectures in Japan. Additionally, it considered the influence of population density and composition. To the best of the authors' knowledge, this is the first study to highlight the environmental factors' effect during COVID-19 in Japan. The model of Japan provides an interesting case study for different factors, as the medical service and social reaction is almost uniform nationwide and high-quality data were recorded properly. If the correlation of the pandemic with population density and ambient conditions is significant, the findings will be useful to set the level and duration for a strict lockdown period for each city considering the environmental factors and in planning future pandemic measures.
The organization of this study is as follows. In Section 2, the data sources of COVID-19 in Japan and weather data are mentioned. Then, the statistical method for data processing is explained briefly. In Section 3, effect of population density, elderly population, and ambient conditions on the morbidity/mortality rates are evaluated statistically. Based on the evaluation, multivariate linear regression has been conducted to estimate the morbidity/mortality rate from these parameters. In Section 4, provides discussion of the results including the limitation. The conclusion is given in Section 5.

Data Source
In this study, we used three datasets. The first involved the confirmed daily positive cases and deaths in each prefecture [31]. This dataset is based on the report by the Ministry of Health Labor and Welfare [32]. We used time-integrated data until 25 May 2020-when the emergency state was terminated. According to the dataset, 16 prefectures had confirmed total deaths and daily positive counts higher than 4 and 10, respectively. These prefectures were defined as infected. The remaining prefectures were excluded due to a lower number of infected cases, which is simply because of the self-isolation, including the discouragement from moving to other prefectures after the declaration. In Japan, to avoid nosocomial infections and medical resource shortages, it was suggested that people with symptoms (e.g., fever >37.5 • C for no more than four consecutive days) stay home and not seek immediate medical attention unless they had been in close contact with infected people or had recently visited a foreign country. Some patients have been reported to be asymptomatic [5], making the statistical study of COVID-19 more complex. Then, the positive rate of the test varied from 2.2% to 34.8% for different prefectures. Unlike other diseases, the number of confirmed cases/deaths are counted even when patients are found to be infected after their death. For this data collection, we use the mortality rate in this study as a metric rather than the case fatality rate.
For comparison, two sets of data were prepared: (1) the number of confirmed deaths excluding and including nosocomial/nursing facility infection, and (2) the total confirmed positive cases. This is to avoid the data of cluster infection for high-risk groups, resulting in a higher possibility of death.
Note that among some of the 16 selected prefectures, as shown in Figure 1, the number of victims due to nosocomial/nursing facility infection was not always reported. Thus, such prefectures were excluded from the comparisons. Among others, the area of Hokkaido is one to two magnitudes larger than the other prefectures. Thus, several peaks in the number of cases are observed in different cities with larger distances between them than those in other adjacent prefectures. In the Gunma prefecture, three confirmed deaths occurred, except for in Isezaki City, where substantial nosocomial infections were reported (15 victims). Thus, we used data from the Gunma prefecture, excluding Isezaki City, for accurate comparison of mortality rates. In addition, Saitama was also excluded since it did not report humidity data (see below for the third dataset). Two prefectures with unclear nosocomial/nursing facility infections were also excluded.
one month after that.
The third dataset comprises weather data for each prefecture. They are extracted from the weather reports generated by the Japan Meteorological Agency [33]. In our previous study, we had studied the correlation between environmental conditions and the duration of the pandemic from its spread to decay periods [29]. We extended this investigation to the prefectures defined in Table 1. We then estimated the start and end dates of the spread and decay stages, as defined in Table 2. To validate the effect of ambient factors in different phases of the pandemic, we computed ambient features for three time frames: during the spreading stage DS (from TSS to TSE), during the decaying stage DD (from TDS to TDE), and during both stages (from TSS to TDE).
To consider the mortality rate, which is affected by many factors, it should be noted that the metrics are averaged over the duration of the spread stage, decay stage, and the entire period. The duration-averaged values of temperature, absolute humidity, wind velocity, and daylight hours were calculated from the data available from the internet site mentioned above, as listed in Table 3. The latitude of Japan considered here is N 33°36′ (Fukuoka) to N 36°35′ (Ishikawa), except for Okinawa of N 26°12′), and thus, total solar radiation may be marginally influenced with this measure.  The second dataset comprises the population and the area of the prefectures. Based on the evidence that more than 90% of the victims are older than 60 years and because the retirement age in Japan is 65, which may potentially influence morbidity rates, we set the threshold as 65. For a total of 14 prefectures and one city, the first and second datasets are listed in Table 1. Note that the rationale for choosing 25 May as a reference date is the end of state of emergency, and then, the daily confirmed death over Japan was 20 (128 million population); the daily confirmed death was smaller than 100 for one month after that.
The third dataset comprises weather data for each prefecture. They are extracted from the weather reports generated by the Japan Meteorological Agency [33]. In our previous study, we had studied the correlation between environmental conditions and the duration of the pandemic from its spread to decay periods [29]. We extended this investigation to the prefectures defined in Table 1. We then estimated the start and end dates of the spread and decay stages, as defined in Table 2. To validate the effect of ambient factors in different phases of the pandemic, we computed ambient features for three time frames: during the spreading stage D S (from T SS to T SE ), during the decaying stage D D (from T DS to T DE ), and during both stages (from T SS to T DE ).  To consider the mortality rate, which is affected by many factors, it should be noted that the metrics are averaged over the duration of the spread stage, decay stage, and the entire period. The duration-averaged values of temperature, absolute humidity, wind velocity, and daylight hours were calculated from the data available from the internet site mentioned above, as listed in Table 3. The latitude of Japan considered here is N 33 • 36 (Fukuoka) to N 36 • 35 (Ishikawa), except for Okinawa of N 26 • 12 ), and thus, total solar radiation may be marginally influenced with this measure. Table 3. Duration-averaged temperature (T), absolute humidity (H), wind velocity (V air ), and daylight hours (DL) in each prefecture. D S and D D represent time frames during the spread and decay stages of the pandemic, respectively, as listed in Table 2. T ave , T max , and T min represent the daily average, maximum, and minimum temperatures, respectively. H ave , H max , and H min represent the daily average, maximum, and minimum absolute humidity values, respectively. V air represents the daily averaged wind velocity.

Statistical Analysis
A statistical study was conducted to analyze the correlation of different factors on both mortality and morbidity rates. The software JMP (SAS Institute, Cary, NC, USA) was used in this study. In order to specify dominant factors influencing the rates, p-value was used. We determined the pairwise correlations by calculating the Spearman's rank correlation between the number of confirmed positive cases, confirmed death cases, and different environmental and demographic parameters. Correlation matrix with partial correlation probability and CI of correlation were calculated. After that, with the same software, multivariate analysis [34] was conducted in terms of the factors. We considered linear regression for data least-squares fitting after considering multicollinearity. Statistical significance was accepted at p < 0.05. Figure 2 shows the relationship between confirmed positive cases and confirmed deaths, including and excluding nosocomial infections and nursing home patients. A modest correlation was observed between positive cases per million and population density (R 2 = 0.394), whereas a slight and mild correlation was observed for confirmed deaths (R 2 = 0.097) and excluding nosocomial infection (R 2 = 0.259). This result suggests that population density should be considered as a factor that implicitly represents social distancing, as is similar to our previous study that discussed the pandemic's duration [29].

Effect of Population Density and Elderly Population
When the cases and deaths for the elderly population were considered, the same tendency was observed; R 2 = 0.363 for cases, and R 2 = 0.078 and R 2 = 0.210 for deaths with and without nosocomial infections (not shown to avoid repetition), respectively. Instead, as shown in Figure 3, the morbidity and mortality rates normalized by population density are modestly correlated with the percentage of the elderly, especially for confirmed deaths excluding nosocomial infections (R 2 = 0.482). This factor is thus considered in the multivariate analysis study presented later.  Figure 2 shows the relationship between confirmed positive cases and confirmed deaths, including and excluding nosocomial infections and nursing home patients. A modest correlation was observed between positive cases per million and population density (R 2 = 0.394), whereas a slight and mild correlation was observed for confirmed deaths (R 2 = 0.097) and excluding nosocomial infection (R 2 = 0.259). This result suggests that population density should be considered as a factor that implicitly represents social distancing, as is similar to our previous study that discussed the pandemic's duration [29].

Effect of Population Density and Elderly Population
When the cases and deaths for the elderly population were considered, the same tendency was observed; R 2 = 0.363 for cases, and R 2 = 0.078 and R 2 = 0.210 for deaths with and without nosocomial infections (not shown to avoid repetition), respectively. Instead, as shown in Figure 3, the morbidity and mortality rates normalized by population density are modestly correlated with the percentage of the elderly, especially for confirmed deaths excluding nosocomial infections (R 2 = 0.482). This factor is thus considered in the multivariate analysis study presented later.

Effect of Ambient Conditions
Several ambient factors potentially influence morbidity and mortality rates. Our study considered temperature and absolute humidity. Most previous studies reported the maximum, average, or difference (diurnal variation range) of ambient temperature (e.g., see [8] and [35]). Our study also considered the minimum temperature. Recent reports on influenza suggest the importance  Figure 2 shows the relationship between confirmed positive cases and confirmed deaths, including and excluding nosocomial infections and nursing home patients. A modest correlation was observed between positive cases per million and population density (R 2 = 0.394), whereas a slight and mild correlation was observed for confirmed deaths (R 2 = 0.097) and excluding nosocomial infection (R 2 = 0.259). This result suggests that population density should be considered as a factor that implicitly represents social distancing, as is similar to our previous study that discussed the pandemic's duration [29].

Effect of Population Density and Elderly Population
When the cases and deaths for the elderly population were considered, the same tendency was observed; R 2 = 0.363 for cases, and R 2 = 0.078 and R 2 = 0.210 for deaths with and without nosocomial infections (not shown to avoid repetition), respectively. Instead, as shown in Figure 3, the morbidity and mortality rates normalized by population density are modestly correlated with the percentage of the elderly, especially for confirmed deaths excluding nosocomial infections (R 2 = 0.482). This factor is thus considered in the multivariate analysis study presented later.

Effect of Ambient Conditions
Several ambient factors potentially influence morbidity and mortality rates. Our study considered temperature and absolute humidity. Most previous studies reported the maximum, average, or difference (diurnal variation range) of ambient temperature (e.g., see [8] and [35]). Our

Effect of Ambient Conditions
Several ambient factors potentially influence morbidity and mortality rates. Our study considered temperature and absolute humidity. Most previous studies reported the maximum, average, or difference (diurnal variation range) of ambient temperature (e.g., see [8] and [35]). Our study also considered the minimum temperature. Recent reports on influenza suggest the importance of absolute humidity rather than its relative value [22,23]; however, we considered the maximum, average, minimum, and difference values of absolute humidity as parameters. The daily average wind velocity and daylight hours were also considered. Regression analysis was conducted for all metrics averaged over the duration of the spread and decay stages and the total duration. Table 4 lists the coefficients of determination for different metrics. For most parameters, the averaged values over the total stage provided the highest correlation rather than those over the other two durations. As an example, Figure 4 shows the correlation between the number of confirmed positive cases and fatality normalized by the population density and the daily maximum temperature and diurnal absolute humidity. A moderate correlation was observed among the daily maximum temperature, diurnal absolute humidity, and cases per population density. Table 5 lists the Spearman's rank correlation for different parameters. The ambient factor was normalized by population density as aforementioned. A moderate correlation was also observed with the daily maximum temperature, daily maximum, and diurnal absolute humidity and percentage of elderly population. Correlation was weak with wind velocity and daylight hours.  Table 4 lists the coefficients of determination for different metrics. For most parameters, the averaged values over the total stage provided the highest correlation rather than those over the other two durations. As an example, Figure 4 shows the correlation between the number of confirmed positive cases and fatality normalized by the population density and the daily maximum temperature and diurnal absolute humidity. A moderate correlation was observed among the daily maximum temperature, diurnal absolute humidity, and cases per population density. Table 5 lists the Spearman's rank correlation for different parameters. The ambient factor was normalized by population density as aforementioned. A moderate correlation was also observed with the daily maximum temperature, daily maximum, and diurnal absolute humidity and percentage of elderly population. Correlation was weak with wind velocity and daylight hours.

Multivariate Linear Regression
In this subsection, the morbidity/mortality rates are estimated in terms of different factors. In Section 3.1, population density and percentage of the elderly were found to be modest, at least non-negligible factors for multivariate analysis [34]. In Section 3.2, maximum temperature and absolute humidity difference were found to be relatively important. No consistency was observed between mortality and morbidity rates. The data in Ishikawa and Toyama prefectures were considered as outliers from hierarchical clustering (see also [29]).
The difference in absolute humidity is derived from the maximum and minimum absolute humidity; at least two parameters are needed. In addition, the maximum temperature is also related to the maximum absolute humidity. In terms of variance inflation factors (VIFs), the multicollinearity was evaluated. The threshold value to differentiate small from large is generally taken as 10 [36]. From this analysis, a set of population density, elderly percentage, and absolute humidity provided estimation without multicollinearity: VIF < 3.78 for spread duration, VIF < 3.23 for decay duration, and VIF < 3.68 for total duration. Note that the maximum ambient temperature was excluded due to strong correlation with absolute humidity. Figure 5 shows the multivariate linear regression of cases and deaths per million. Table 6 shows the determination coefficients for the three durations. As shown in Figure 5, the predicted and actual data are of good correlation with the averaged value over three stages. The highest contribution rates were the population density in the multivariate analysis (74.4%, 80.0%, and 84.5% in the cases per million, deaths per million including, and excluding nosocomial infection, respectively).  Figure 5 shows the multivariate linear regression of cases and deaths per million. Table 6 shows the determination coefficients for the three durations. As shown in Figure 5, the predicted and actual data are of good correlation with the averaged value over three stages. The highest contribution rates were the population density in the multivariate analysis (74.4%, 80.0%, and 84.5% in the cases per million, deaths per million including, and excluding nosocomial infection, respectively).

Discussion
In this study, we analyzed the morbidity and mortality rates in different prefectures in Japan, where the number of confirmed deaths and daily confirmed positive counts were higher than 4 and 10, respectively. A major feature of Japan was the relative homogeneity of the health insurance and

Discussion
In this study, we analyzed the morbidity and mortality rates in different prefectures in Japan, where the number of confirmed deaths and daily confirmed positive counts were higher than 4 and 10, respectively. A major feature of Japan was the relative homogeneity of the health insurance and care system without medical collapse during this pandemic, in addition to household wealth. The Japanese strategy included identifying infection clusters at an early stage, to the best possible extent. However, the criteria for conducting tests (diagnosis) on potential patients may not be uniform in different prefectures; some patients may exhibit weak symptoms. Thus, after retracting the state of emergency on 25 May 2020, we processed the data for morbidity and mortality rates in 14 prefectures.
The morbidity/mortality rates were then shown to be proportional to the population density. In previous studies, this factor was not considered [12] nor was correlation between different cities considered [17]. After excluding the number of confirmed deaths in cluster infections related to hospital and care services, we observed modest correlation among different cities in terms of population density. It is worth noting that no strict closure was applied in Japan. Next, we found a good correlation between population density and the spread of COVID-19. This finding implicitly represents social distancing. In Tokyo and Osaka, which are considered among cities with the highest population densities worldwide, infection is potentially more likely to occur compared to other less dense regions. However, this may not be the case reported in other countries where strict lockdown was implemented. In Wuhan (China), the duration of the decaying stage was only 10 days, with almost no contact during the period. However, such strict lockdown may not be allowed in most countries to avoid severe social and economic damage. Therefore, this study demonstrates that population density should be considered for avoiding potential spread in future pandemics. Moreover, this finding may be useful to improve the simulation model of epidemic transmission [37,38]. The maximum temperature and absolute humidity differences were the dominant ambient factors characterizing morbidity and mortality rates. As shown in Figure 4, cases and deaths in Ishikawa and Toyama prefectures have a different tendency than that in other prefectures-as COVID-19 occurred in a very limited area in these prefectures. In general, for higher temperature and absolute humidity, the morbidity and mortality rates were decreased. For example, the population density of Hyogo (650.4 capita/km 2 ) is nearly equal to that of Okinawa (637.5 capita/km 2 ). However, the total cases in Hyogo were 8.6 times that of Okinawa. The daily maximum temperature in Hyogo was 7 • C lower than in Okinawa. This relationship can be observed in other prefectures but not all due to mild correlation with weather condition. The reason for higher correlation with absolute humidity difference is unclear. However, one potential reason would be the relatively small variation in a limited period (from mid-March to mid-May). Further study of key factors would be needed. The ambient conditions in Okinawa prefecture differ the most from those of other prefectures in Japan. If the data of Okinawa are excluded, the correlation of confirmed cases and deaths improved. In particular, the total cases and deaths normalized by population have a mild correlation with the maximum temperature and absolute humidity averaged over spread duration (from 0.13 < R 2 < 0.18 to 0.37 < R 2 < 0.55).
The effect of ambient conditions on the morbidity and mortality rates was shown to be modest over multiple prefecture studies. As mentioned in the introduction, this was a controversial COVID-19 issue. Our study hypothesized that this may be caused by population density, which was not considered in previous studies, as well as the uniformity of the policy, health insurance system, household wealth, etc.
The morbidity and mortality rates were roughly derived via multivariate analysis. Note that the ambient parameters are cross-correlated with each other, and thus further research and investigation are needed. Their adjusted-R 2 was almost the same; 0.69 (p < 0.01) for positive cases, and those for confirmed deaths including and excluding nosocomial infection were 0.53 (p < 0.05) and 0.15 (p = 0.25), respectively. This statistical finding may be improved for modeling studies. The correlation with the mortality rate excluding nosocomial infection was relatively low, suggesting that nosocomial infection would be a part of COVID-19 transmission at least in Japan.
Unlike previous studies that discussed the correlation with ambient condition in each city (e.g., [17]), our study explores common factors over 14 prefectures, resulting in lower p-value as compared to such studies. In such cases, the uncertainty of measured ambient condition would also be another factor to influence the correlation. For example, no correlation with ambient condition was observed in the analysis of 122 cities in China [12].
Note that according to the record of the Ministry of Health in Japan, no pandemic has been reported in the last 50 years [39]. Thus, a comparison with other epidemics is infeasible. Common influenza has been recorded, but only at fixed points (hospitals), making proper comparison difficult [39]. However, the finding of this study that presents the effect of population density and ambient conditions may be useful when considering measures for potential future pandemics.

Conclusions
A mild correlation was found of mortality and morbidity rates with the population density and the percentage of the elderly population, in addition to maximum absolute humidity averaged over the spread stage under Japanese policy. The multivariate linear regression provided adjusted coefficients of determination, which were 0.69 and 0.53 (p < 0.05) for positive cases and confirmed deaths, respectively. Our results suggested that with population and weather data, we can estimate the number of cases and deaths, at least in Japanese cities. Although the date and duration of the pandemic were different even in Japan, our estimation presented mild correlation, providing useful information for the planning of policy and medical resources. With our findings, more customized guidelines can be developed, specific to where and when different measures can be applied to restrict the adverse effects caused by a potential pandemic in the future, including a second wave of COVID-19. The limitation of this study is that the weather data in different prefectures are similar to each other due to the limited period (March to May 2020), and thus further data are needed for a general conclusion. The controversy in previous studies may be potentially caused by the population density and elderly population percentage, as those were not considered in most studies. Thus, these factors should be included for proper comparisons with the tendencies of international cities.