Correlation between Population Density and COVID-19 Cases during the Third Wave in Malaysia: Effect of the Delta Variant

In this study, we describe the incidence and distribution of COVID-19 cases in Malaysia at district level and determine their correlation with absolute population and population density, before and during the period that the Delta variant was dominant in Malaysia. Methods: Data on the number of locally transmitted COVID-19 cases in each of the 145 districts in Malaysia, between 20 September 2020 and 19 September 2021, were manually extracted from official reports. The cumulative number of COVID-19 cases, population and population density of each district were described using choropleth maps. The correlation between population and population density with the cumulative number of COVID-19 cases in each district in the pre-Delta dominant period (20 September 2020–29 June 2021) and during the Delta dominant period (30 June 2021–19 September 2021) were determined using Pearson’s correlation. Results: COVID-19 cases were strongly correlated with both absolute population and population density (Pearson’s correlation coefficient (r) = 0.87 and r = 0.78, respectively). A majority of the districts had higher numbers of COVID-19 cases during the Delta dominant period compared to the pre-Delta period. The correlation coefficient in the pre-Delta dominant period was r = 0.79 vs. r = 0.86 during the Delta dominant period, whereas the pre-Delta dominant population density was r = 0.72, and in the Delta dominant period, r = 0.76. Conclusion: More populous and densely populated districts have a higher risk of transmission of COVID-19, especially with the Delta variant as the dominant circulating strain. Therefore, extra and more stringent control measures should be instituted in highly populated areas to control the spread of COVID-19.


Introduction
The World Health Organization (WHO) identified a new type of coronavirus, SARS-CoV-2, early in 2020, which causes the disease COVID-19. This novel coronavirus was first discovered in Wuhan, China, after its outbreak in December 2019. Following which, COVID-19 spread rapidly across the world in a short period of time, resulting in a Public Health Emergency of International Concern (PHEIC) [1]. As a result of this, COVID-19 was declared a pandemic by the WHO on 11 March 2020 [2]. The spread of the COVID-19 virus resulted in unprecedented outbreaks worldwide, characterized by exponential rises in new infections. As the pandemic progressed, many countries initially instituted several Public Health Social Measures (PHSM), followed by the more recent COVID-19 vaccination strategies to curb the COVID-19 outbreak. However, despite these measures, many countries are currently experiencing a resurgence in COVID-19 infections [3][4][5].
As of 28 February 2022, about 435 million COVID-19 infections and 5.95 million deaths due to COVID-19 have been reported globally, and more alarmingly, these estimates keep 2 of 17 rising. The USA, India and Brazil are among those countries that have reported the highest numbers of COVID-19 infections globally. Moreover, in the South East Asia region, as of 28 February 2022, the incidence rates of COVID-19 infections were highest in Brunei Darussalam at 13,588.538 per 100,000 population, followed by Singapore (12,151.085 cases per 100,000 population) and Malaysia (10,565.52 cases per 100,000 population) [6]. In Malaysia, the first case of COVID-19 infection was reported on 22 January 2020, and this marked the beginning of the first wave with a total of 22 infections, which lasted until 26 February 2020. Following this, a much larger second wave began from 27 February 2020 to 19 September 2020, which resulted in 3375 infections. Currently, Malaysia is facing its third wave, which began on 20 September 2020 [7].
Since the beginning of the third wave in Malaysia, the distribution of COVID-19 infections across the country has varied, wherein several states, namely Selangor (30.2%), Johor (8.7%) and the Federal Territory of Kuala Lumpur (8.3%), have reported much larger numbers of COVID-19 infections compared to other states [8]. This observation could be attributed to the high population numbers and densities observed in these states, which could have increased the disease transmission. These findings were supported by evidence in the literature, which reports a higher distribution of COVID-19 cases in areas with larger densities [9], and significant correlations between population numbers and densities with COVID-19 cases in countries such as England, the USA and Turkey [10][11][12][13]. In addition to the above observations, a large number of COVID-19 infections were also observed in states with lesser densities, such as in Sarawak (9.3%) and Sabah (8.8%), during the third wave in Malaysia [8]. This is because despite these states having larger areas, which reduces the overall population density at the state level, they consist of multiple districts which are highly populous with large population densities. As a result of this, determining the effect of population and population density on COVID-19 cases at higher levels (i.e., state) may be inaccurate and misleading. Therefore, analysing this effect at lower levels (i.e., district level) would provide more meaningful and accurate findings.
To date, in Malaysia, there have been limited studies which examine the distribution of COVID-19 cases by districts and the relationship between absolute population and population density with COVID-19 cases during the third wave. This paper focuses specifically on the third wave of COVID-19, as there are several unique factors that affected the transmission dynamics of COVID-19 during the third wave, which differentiate it from the previous waves in Malaysia. These factors include, first, the presence of new and more virulent variants of the COVID-19 virus from 18 December 2020 onwards [14]. The emergence of these new variants posed an increased risk to the spread of the COVID-19 pandemic. Therefore, more measures were taken in the characterization of specific Variants of Interest (VOIs) and Variants of Concern (VOCs) to improve outbreak surveillance and control measures. For example, the Delta variant (B.1.617.2) designated as a VOC on 11 May 2021 was highly infectious and had affected many countries. This variant was found to be more transmissible and resulted in more severe forms of COVID-19 illness [14,15]. Malaysia recorded its first case of the Delta variant on 2 May 2021, which was detected in an Indian national screened at the Kuala Lumpur International Airport [16]. Subsequently, the first locally transmitted case of the Delta variant was detected on 17 May 2021 [17]. The presence of these VOCs, especially the Delta variant, intensified the outbreak during the third wave; therefore, when examining the relationship between absolute population and population density with COVID-19 cases during the third wave, it is important to account for the effects of the Delta variant.
The second factor unique to the third wave is the implementation of COVID-19 vaccination. Numerous studies have reported that vaccination had a major effect in decreasing COVID-19 infections [14,18]. The Malaysian vaccination program began on 24 February 2021 and was rolled out in phases. The first phase from February to April 2021, focused on frontliners, followed by the second phase involving senior citizens, high-risk groups and people with disabilities from April to August 2021, and from May 2021 onwards, for others aged 18 years and above. As of 19 September 2021, the last day of our study period, a total of 69.1% and 58% of the population had received one and two doses of the COVID-19 vaccine, respectively. The percentages of the total vaccine doses administered with the various vaccines are as follows, CoronaVac (inactivated SARS-CoV-2 vaccine by Sinovac) (46.5%) and Comirnaty (mRNA vaccine by Pfizer-BioNTech) (45.4%), Oxford As-traZeneca's SARS-CoV-2 mRNA vaccine (7.8%) and less than 1% other vaccines. Malaysia's vaccination program for those aged below 18 years was started on 20 September 2021, which was after our study period.
Due to the presence of these unique factors (i.e., VOCs and vaccination) during the third wave of COVID-19, it is important to determine the relationship between population and density with COVID-19 cases by accounting for demographic characteristics, vaccination status and the COVID-19 Delta variant, as it would provide a better understanding of the true unbiased relationship between population and density with COVID-19 cases. Therefore, the initial aims of this study are to describe the incidence and distribution of COVID-19 cases during the third wave in Malaysia. Subsequent to this, we determined the correlation between absolute population and population density with COVID-19 cases during the pre-Delta period, during Delta and in the overall period of the third wave. We believe the findings from this study would assist in the prioritization of instituting outbreak control measures based on disease distribution, to better control and manage the COVID-19 pandemic in Malaysia. Imported cases were excluded in this study because the source of infection was outside of Malaysia and therefore did not contribute to local disease transmission. The population numbers and population density in each district were obtained from the Department of Statistics Malaysia (DOSM). The estimated population data were obtained from the DOSM, which is the authority that provides the official population statistics data for Malaysia. These yearly population estimates are generated by the DOSM using the cohort-component method, which is based on census data as well as rates of birth, death, and internal and international migration. In this study, the total populations used were projected 2020 mid-year populations, based on 2010 population census data. In addition, population density was defined as the district's mid-year population for the year 2020 divided by its total land area (km 2 ) [19]. Geospatial shape files were provided by the Department of Survey and Mapping Malaysia (JUPEM) in the year 2019.

Data Analysis
Data were analyzed using the Statistical Package for the Social Sciences (SPSS) version 26.0 release 2019 by International Business Machines, IBM Corp., Armonk, NY, USA [20]. Data were checked for missing data and abnormal values before performing any statistical analysis. There were no missing values. For the correlation analysis, the COVID-19 case data at district levels were categorized into three time periods, which were based on the detection of Delta variants in Malaysia. First was the pre-Delta variant period, which was from 20 September 2020 to 29 June 2021 (283 days). Second was the during-Delta variant period, which was from 30 June 2021 (the date the Delta variant became the predominantly circulating variant (more than 50%) among the samples tested by the Institute for Medical Research, Malaysia) to 19 September 2021 (82 days). The third time period was the one-year duration of the third wave, from 20 September 2020 to 19 September 2021 (365 days; the end date of 19 September 2021 was selected as it represented the downward trajectory of the third wave). The incidence of COVID-19 cases per 1000 populations by districts was estimated by dividing the total number of cases with the absolute population for each district. Quantum Geographic Information System (QGIS) version 3.10 was used to plot the incidence and distribution of COVID-19 cases by districts across total population and population densities.
Prior to the correlation analysis, the normality of absolute population, population density, and COVID-19 cases pre-Delta, during Delta and overall, were examined using the Shapiro-Wilk test and normal probability plots. The results of the Shapiro-Wilk test for all five variables were significant suggesting the data were not normally distributed and normal probability plots showed their deviations from the normal distribution (Appendix A). As the data were not normally distributed, log transformation was performed, and Pearson's correlation coefficient (r) was used to determine the strength and direction of the correlation between absolute population and population density with COVID-19 cases. The classification of the strength of the relationship was determined based on the value of r which ranges from 0 to 1 where r = 0 indicates no association and r = −1 or +1 indicates perfect association with a p-value less than 0.05 indicating significant correlations. The magnitude of change for two variables is either in the same or in the opposite direction, indicated by a positive or negative value of the correlation coefficient [21]. Correlation analysis was conducted for all the three time periods to determine the effects of the Delta variant on the correlation between absolute population and population density with COVID-19 cases.
In addition, prior to the correlation analysis, multivariable linear regression analysis was performed with SPSS software to control for the confounding effects of sociodemographic factors (i.e., median household income and the percentage of the population aged 15 years old and above) and the percentage of the population fully vaccinated on the correlation between population density and COVID-19 cases [14,15]. All data were at district level, except for vaccination data which were available at state level only, and therefore, were used to represent each district's vaccination percentage. Data were analyzed using a stepwise linear regression method. The cutoff probability for adding and removing variable in the stepwise method was 0.05 and 0.10 respectively. The final model was checked to ensure the assumptions of the analysis were sufficiently met [22].

Characteristics of COVID-19 Cases in the Third Wave
The most populous and densely populated districts were Petaling, Selangor, (2,223,300 persons) and Kuala Lumpur (7863 people per square kilometer), respectively, in Malaysia (Appendix B).
For the overall time period, the highest number of cases were distributed in Petaling (n = 197,082 cases), followed by Kuala Lumpur (n = 178,406 cases) and Klang (n = 126,579 cases), as shown in Figure 1. The highest COVID-19 incidence rate was reported in Sepang (133.8 per 1000 population), followed by Klang (119.8 per 1000 population) and Kuala Langat (115.2 per 1000 population), as shown in Figure 2.
During the pre-Delta period, the highest number of cases was distributed in Kuala Lumpur (n = 73,041 cases), followed by Petaling (n = 72,839 cases) and Klang (n = 50,417 cases). In addition, the highest COVID-19 incidence rate was reported in Labuan (73.7 per 1000 population), followed by Sepang (62.8 per 1000 population) and Kapit (52.3 per 1000 population). During the Delta period, the highest number of cases was distributed in Petaling (n = 124,243 cases), followed by Kuala Lumpur (n = 105,365 cases) and Klang (n = 76,162 cases). The highest COVID-19 incidence rate was reported in Serian (87.5 per 1000 population), followed by Bau (79.0 per 1000 population) and Klang (72.1 per 1000 population) (Appendix C).   From the total of 145 districts, 70% (n = 127) of the districts reported an increase in both COVID-19 cases and the incidence rate during the Delta variant period compared to the pre-Delta period. The percentage increase in COVID-19 cases and the incidence rate ranged from 0.3% to 2500% and the mean increase was 242.5%. The COVID-19 cases and incidence rate per 1000 population for all the districts (n = 145) in the pre-Delta and during Delta periods in Malaysia are shown in Appendix C.

Association between Sociodemographic Factors and Vaccination with COVID-19 Cases
In multivariable regression analysis, population density and median household income were found to be independently associated with COVID-19 cases, after controlling for sociodemographic and vaccination factors. Following this analysis, population density alone accounted for 60% of the variation in COVID-19 cases. Moreover, 64% of the variation of the COVID-19 cases was explained by both population density and household income. This was a minimal increase of 4% contributed by the median household income variable, therefore suggesting household income does not largely affect COVID-19 cases (Table 1).

Correlation between Population and Population Density with COVID-19 Cases
Correlation analysis showed both absolute population and population densities were significantly correlated (p-value < 0.001) with COVID-19 cases for the overall time period (r = 0.871), the pre-Delta variant period (r = 0.785) and the Delta variant period (r = 0.864), respectively, as shown in Figure 3. This corresponds to an increase in correlation of 15.9% from the pre-Delta period to the Delta period. The correlation between population density and COVID-19 cases was for the overall time period (r = 0.778), pre-Delta variant period (r = 0.723) and Delta variant period (r = 0.764) respectively as shown in Figure 4. This corresponds to an increase in correlation by 21.5% from the pre-Delta period to the Delta period.         Overall, an increase in the correlations between absolute population and population density with COVID-19 cases was observed during the Delta variant period compared to the pre-Delta period ( Table 2). In addition, the correlation coefficient was higher for the correlations between absolute population and COVID-19 cases compared to the correlations between population density and COVID-19 cases, across all the time periods.

Discussion
In this study, we described the incidence and distribution of COVID-19 cases by districts as well as determining the correlations between absolute population and population density with COVID-19 cases during the third wave in Malaysia. In addition, the correlation findings of this study were analyzed and presented based on the pre-Delta variant period (20 September 2020 to 29 June 2021) and the Delta variant period (30 June 2021 to 19 September 2021) during the third wave.
The highest number of COVID-19 cases and incidence rate were observed in districts in Selangor state (i.e., Petaling and Klang) and the Federal Territory of Kuala Lumpur, during the third wave. This finding is observed primarily because these areas are highly urbanized, densely populated and populous. Similar findings have been reported in studies conducted in England and Malaysia [9,10]. In addition, this study also found high COVID-19 incidence in districts with lesser densities (i.e., Sepang). This finding could be attributed to the relatively small population numbers in these districts (i.e., Sepang = 265,600) and the higher number of COVID-19 infections (i.e., Sepang n = 35,539 cases), therefore ultimately resulting in higher incidence rates.
The findings of this study showed an increase in COVID-19 cases and the incidence rate between the pre-Delta and Delta variant periods, ranging from 0.3% to 2500%. Previous studies conducted in the countries which were affected by the Delta variant, such as England and United States, also reported a similar increase in COVID-19 cases [15,23,24]. The increment in COVID-19 cases and the incidence rate observed in this study is due to the effects of the Delta variant, and highly and densely populated areas which would increase disease transmission [15,23,24]. Furthermore, this study reports a positive significant correlation between absolute population and population densities with COVID-19 cases throughout the third wave. Our findings support the existing evidence that suggests COVID-19 cases tend to increase in areas that are highly and densely populated. Our findings were consistent with previous studies conducted in England, the United States, Turkey and Malaysia, which report higher COVID-19 cases and incidence in more densely populated areas [10][11][12]. Several reasons can be attributed to these findings. First, communities with high population numbers and population density have a higher probability of coming into contact with one another, therefore, increasing the risk of disease transmission [25][26][27][28]. In addition, individuals residing in densely populated areas tend to live in close proximities, which would result in prolonged, sustained and continuous exposure to possibly infected individuals, therefore, increasing disease transmission.
This study also reports an increase in the correlation between absolute population (15.9% correlation increase) and population densities (21.5% correlation increase) with COVID-19 cases, during the Delta period (16 May 2021 to 19 September 2021) compared to the pre-Delta period (20 September 2020 to 15 May 2021). The mild increase in the magnitude of the correlation across these two periods could be attributed to the fact that population density itself fuels COVID-19 disease transmission (resulting in high pre-Delta correlation estimates). In addition, the modest increase in the correlation supports the low transmissibility of the Delta variant. The relevance of this finding suggests that in highly and densely populated areas, the existence of a variant of concern with low disease transmissibility would contribute to an increase in the number of infections [15,18,29].
While many countries are still working on finding curative treatments and increasing COVID-19 immunization rates, non-pharmaceutical interventions (NPI) are still important measures to control and manage this pandemic. With limited resources and the need for timely institutions of NPI measures, many countries have adopted targeted outbreak control measures [10,27,30,31]. The findings from this study highlight the importance of implementing NPI in areas that are highly and densely populated as a priority, in order to control and manage the COVID-19 outbreak effectively. Moreover, the evidence generated from this study could be used to guide decision makers in making sound decisions regarding instituting targeted outbreak control measures. In addition, this study also provides evidence on the effects of a variant of concern (i.e., the Delta variant) on the correlation between absolute population and population density with COVID-19 cases, wherein such VOCs could be the driving factor in increasing disease transmissibility, especially in areas that are highly and densely populous.
To the best of our knowledge, this is the first study that has analyzed the correlation between absolute population and population density with COVID-19 cases, at different time frames in relation to the Delta variant during the third wave, to describe the changes in incidence rates and correlations caused by the Delta variant. This study has several strengths, which include first using districts as the smallest point for correlation analysis. By doing so, we were able to examine this correlation in more focused smaller areas, which would improve the precision and accuracy of the correlation instead of using larger areas such as states. Second, a longer study period (365 days) was used to examine the correlation, therefore improving the analysis and suggesting that the current data used are sufficient to show the impact of absolute population and population density on COVID-19 cases. Third, this study focused on local cases during the third wave of COVID-19, which contributed more than 90% of the total COVID-19 cases in Malaysia. Finally, this study ruled out the presence of potential confounders (i.e., household income, population aged 15 years and above and vaccination coverage) prior to examining the correlation between population density and COVID-19 cases.
The limitations of this study include the distribution of COVID-19 cases which may depend on a variety of other factors, which include geographical characteristics, economic growth, health infrastructure, regulatory policy and the number of tests. In addition, it would also be important to examine the correlation and relationship of other sociodemographic and socioeconomic factors with COVID-19 mortality in Malaysia. Therefore, further research may be needed to account for the aforementioned issues.

Conclusions
In conclusion, the present study reports that a higher incidence of COVID-19 infections was found among highly and densely populated districts, especially during the Delta variant period in the third wave in Malaysia. In addition, absolute population and population density significantly contribute to the increase in COVID-19 infections, as evident from the positive correlations reported in this study. Therefore, prioritizing the implementation of outbreak control measures in highly and densely populated areas and with the presence of VOCs could be key to containing this highly infectious disease and eventually controlling the COVID-19 pandemic.

Institutional Review Board Statement:
The study was registered with the National Medical Research Register (NMRR-21-1085-60190 and approved on 13 August 2021). Ethical approval was not required as the data that were used in this study are freely available and open-source.

Informed Consent Statement: Not applicable.
Data Availability Statement: The datasets used and analyzed during the current study are available from the Ministry of Health Malaysia website and provided by the Department of Statistics Malaysia.

Acknowledgments:
We would like to thank the Director General of Health Malaysia for his permission to publish this paper and the Director of the Institute for Medical Research for his support.

Conflicts of Interest:
The authors declare no conflict of interest.   Appendix C