Statistical Forecast of Pollution Episodes in Macao during National Holiday and COVID-19

Statistical methods such as multiple linear regression (MLR) and classification and regression tree (CART) analysis were used to build prediction models for the levels of pollutant concentrations in Macao using meteorological and air quality historical data to three periods: (i) from 2013 to 2016, (ii) from 2015 to 2018, and (iii) from 2013 to 2018. The variables retained by the models were identical for nitrogen dioxide (NO2), particulate matter (PM10), PM2.5, but not for ozone (O3) Air pollution data from 2019 was used for validation purposes. The model for the 2013 to 2018 period was the one that performed best in prediction of the next-day concentrations levels in 2019, with high coefficient of determination (R2), between predicted and observed daily average concentrations (between 0.78 and 0.89 for all pollutants), and low root mean square error (RMSE), mean absolute error (MAE), and biases (BIAS). To understand if the prediction model was robust to extreme variations in pollutants concentration, a test was performed under the circumstances of a high pollution episode for PM2.5 and O3 during 2019, and the low pollution episode during the period of implementation of the preventive measures for COVID-19 pandemic. Regarding the high pollution episode, the period of the Chinese National Holiday of 2019 was selected, in which high concentration levels were identified for PM2.5 and O3, with peaks of daily concentration exceeding 55 μg/m3 and 400 μg/m3, respectively. The 2013 to 2018 model successfully predicted this high pollution episode with high coefficients of determination (of 0.92 for PM2.5 and 0.82 for O3). The low pollution episode for PM2.5 and O3 was identified during the 2020 COVID-19 pandemic period, with a low record of daily concentration for PM2.5 levels at 2 μg/m3 and O3 levels at 50 μg/m3, respectively. The 2013 to 2018 model successfully predicted the low pollution episode for PM2.5 and O3 with a high coefficient of determination (0.86 and 0.84, respectively). Overall, the results demonstrate that the statistical forecast model is robust and able to correctly reproduce extreme air pollution events of both high and low concentration levels.


Introduction
The development of air quality forecast models is essential for cities with high population density, including Macao, one of the most densely populated cities in the world. It is extremely important to predict pollution episodes so the authority can provide a warning to the local community in advance to avoid the adverse air quality, which may lead to severe health consequences. In order to predict next-day concentrations of nitrogen dioxide (NO2), particulate matter (PM10 and PM2.5), and maximum hourly concentration of ozone (O3 MAX) for roadside, ambient, and residential stations in Macao, a forecast model was developed based on statistical methods using multiple linear regression (MLR) and classification and regression tree (CART) analysis.
There are three forms of total suspended particles (TSPs), which include coarse, fine, and ultrafine particles. Coarse particles, also known as PM10, are derived from suspension of dust, soil, sea salts, pollen, mold, and other crustal materials. Fine particles, also known as PM2.5, are derived from emissions from combustion process, including vehicles powered by petrol and diesel, wood burning, coal burning, and other industrial processes. Ultrafine particles are derived from combustion related sources such as vehicle exhausts and atmospheric photochemical reactions [1].
O3 is the most important index substance for photochemical smog, one of the major air pollutants [2]. The formation of ground-level O3 heavily depends on the concentration levels of volatile organic compounds (VOCs) and nitrogen oxides (NOx) and meteorological factors such as wind speed, insolation, and temperature. PM2.5 and O3 pollutants are known to cause the most damages to the human respiratory and cardiovascular system. A study for Terengganu State, Malaysia, showed that high levels of O3 occurring under dry and warm conditions during the southwest monsoon, were higher in industrial areas, and were positively correlated with the maximum daily temperature [3].
The emission of NOx is primarily emitted from transportation and combustion process, while the emission of VOCs is primarily emitted from road traffic and the use of products containing organic solvents [4,5].
The emission of NOx and VOCs is responsible for the O3 formation, in particular rural areas being NOx-sensitive while urban areas being VOC-sensitive. Nevertheless, the greater NOx emission reductions have contributed to a widespread shift in the O3 production regime from NOx-saturated (high-NOx) to NOx-sensitive (low-NOx) in some urban areas, while O3 production in rural areas is even more sensitive to NOx.
TSPs are primary contributors to premature death worldwide, with over four million premature deaths being recorded due to exposure to high levels of ambient PM2.5 [6][7][8]. PM2.5 can penetrate deep into the lungs when being inhaled, which leads to both acute and chronic health issues [1,6]. NO2 and TSPs are responsible for 412,000 and 71,000 premature death per year, respectively, in the European Union [9,10]. Moreover, previous studies show a strong correlation between short-term exposure to NO2 and both the number of hospital outpatients with eye and adnexa diseases (EADs) [11] and the number of hospital admission due to cardiovascular diseases (CVD) [12]. The Chinese National Ambient Air Quality Standard (NAAQS) has set the threshold of PM10, PM2.5, and O3 MAX concentration at 150 μg/m 3 , 75 μg/m 3 , and 160 μg/m 3 , respectively, while the WHO Air Quality Guideline has set the same thresholds at 50 μg/m 3 , 25 μg/m 3 , and 100 μg/m 3 , respectively. Compliance with the thresholds set by the WHO for PM2.5 could improve life expectancy in China by 0.14 years [13] and ambient air pollution has caused at least 3.7 million deaths, with more than 25% of deaths in Southeast Asia [14,15].
Air pollution forecasting models can provide important information for populations to adopt mitigation measures during high pollution days. To be useful, these models should be robust to deal with extreme variations in pollution levels, in particular during high-pollution peak days. Factors leading to extreme variation in pollution levels are diverse and include both human activities and meteorological factors.
In a study for Beijing, China, the reduction of traffic flow and vehicle emissions in downtown areas during the Chinese National Holiday, reduced air pollution, while, in contrast, fireworks during the Chinese New Year Holiday had the opposite effect [16]. When highway tolls were being waived for passenger vehicles during the Chinese National Holiday across the entire nation of China, air pollution increased by 20% and visibility decreased by 1 km, causing economic losses due to negative health impacts estimated at RMB 0.95 billion [17]. Nevertheless, the Chinese National Holiday is known to be a golden week of tourism, in which the Chinese tourist flock to different tourist destinations around the world to celebrate the national holiday. Due to the vibrant casinos and entertainment industry and close proximity to mainland China, Macao is also one of the favorite destinations for Chinese tourists, so the influx of tourist during the period of Chinese National Holiday may lead to an increase of emissions in Macao.
Likewise, the recent COVID-19 crisis has had an extreme impact in air pollution levels. The Wuhan Health Commission has first reported cases of pneumonia linked to the Wuhan wet market in Hubei Province, China, back in December 2019 [18]. Preventive measures were implemented soon after that abruptly reduced industrial activities and transportation. Nevertheless, the levels of air pollutants, in particular of PM2.5, remained severe in northern China throughout the end of January 2020 due to adverse meteorological conditions that have overwhelmed the benefits of emission reduction in transportation and industrial sectors [19].
Previous work showed that there is an increase in the level of O3 concentrations and a decrease in the level of NO2, PM10, and PM2.5 concentration during the period of COVID-19 pandemic lockdown in several cities of China, due to the significant reduction of transportation and industrial activities [4,5,20,21].
In this context, it is relevant to develop a reliable methodology to forecast the concentration of air pollutants, which is presented and tested for a high pollution episode (associated with the Chinese National Holiday) and a low pollution episode (during COVID-19 preventive measures).

Materials and Methods
The air quality and meteorological variables that were considered to build all of the air quality statistical models were obtained from Macao Meteorological and Geophysical Bureau (SMG). The air quality data was gathered from the air quality monitoring network, namely for: Macao Roadside, Macao Residential, Taipa Ambient, Taipa Residential, and Coloane Ambient stations, which have a suitable historic dataset of surface air quality measurements for the levels of NO2, PM10, PM2.5, and O3 concentrations. These background stations (residential and ambient) can capture the regional contribution of PM10 and PM2.5. There is a higher population and traffic density in Macao Roadside and Macao Residential, which are located in the main peninsula, in comparison to Taipa Ambient, Taipa Residential, and Coloane Ambient stations, which are located on the outlying islands.
Meteorological data was obtained from surface observations at SMG's Taipa Grande Meteorological Station, hourly observations from automatic weather stations, such as temperature, relative humidity, precipitation, average wind speed, and dew point temperature, as well as upperair observations (from Hong Kong King's Park location) such as geopotential heights, thickness, stability, temperature, relative humidity, and dew point temperature at various altitudes. In the present work, statistical models such as multiple linear regression (MLR), and classification and regression tree (CART), are developed, based on historical measurements of meteorological and air quality variables. Table 1 presents all the variables considered as predictors in the MLR and CART forecast models, as shown in previous work [22]. The air quality variables considered included the levels of NO2, PM10, PM2.5, and O3 MAX concentration from 00:00 to 23:00 of the previous day, two days and three days ago, and from 16:00 of the previous day and 15:00 of today. The meteorological variables being considered included the upper-air observations from King's Park location, Hong Kong Observatory, surface observations and other variables from the monitoring network of Macao Meteorological and Geophysical Bureau (SMG). In this study, meteorological and air quality variables for 2013 to 2016, 2015 to 2018, and 2013 to 2018 were used to build three separate forecasting models. The 2013 to 2016 model was constructed for the initial evaluation for the application of the statistical model to forecast air quality in Macao, while the 2015 to 2018 models and the 2013 to 2018 models are a follow-up, to determine if any improvement could be made with two additional years of data. The comparison of extended data ranging from 5 to 6 years are considered to be adequate lengths to test if there is any significant difference between the time series. Simultaneously, it would not be ideal to trace back too far with the time series, because regional emissions are constantly changing, and therefore the level of pollutants concentration may also be changing. The dataset from 2019 was the most recent dataset, which would be used for the model validation for all the models. This study is an empirical approach and also region-specific, which may also be chemical-regime dependent.
The final selected variables to predict the levels of PM2.5 and O3 concentration are common to different locations of Macao air quality monitoring stations. Some variables initially selected were rejected from the forecast models due to collinearity. The final objective is to obtain prediction models with the lowest number of variables, but with the maximum explained variance as translated by the coefficient of determination (R 2 ).
After selecting the best model, it was applied to forecast pollution levels during an extremely high pollution episode, and a low pollution period. The high and low pollution selected episodes were, respectively: (i) the period of Chinese National Holiday, a week before the Chinese National Holiday from September 23rd to 30th, 2019, and the week during the Chinese National Holiday from October 1st to 7th, and (ii) the preventive measures period of COVID-19, from February 5th to 20th, 2020.
The statistical model was built using IBM SPSS Statistics version 26 with MLR (stepwise) and CART methods [26,36]. SPSS is a statistical software that is applied to solve research problems through hypothesis testing and predictive analysis.
Model performance indicators were calculated, such as, coefficient of determination (R 2 ), root mean square error (RMSE), mean absolute error (MAE), and systematic error (BIAS). The results showed that the model for the 2013 to 2018 period was the one that performed best in predicting next-day concentrations levels in 2019, with high R 2 between predicted and observed daily average concentrations (between 0.78 and 0.89 for all pollutants) and low RMSE, MAE, and BIAS. The additional two years of data helped to improve the air quality forecasting model. Nevertheless, with the two other models (2013-2016 and 2015-2018) a significant R 2 (between 0.78 and 0.89 for all pollutants) was also obtained, but it translated into a less reliable air quality forecast.

Air Quality Forecast Models
Regarding model performance indicators obtained per pollutant and station, the majority of models show a good agreement and a similar R 2 range values (from 0.81 to 0.89), except for O3 MAX, which is more difficult to predict. MLR was used for all pollutants, while CART analysis was used in almost all the O3 MAX models (Tables 2 and 3). This CART analysis complement was an approach to obtain improved results, mainly regarding a better prediction of high pollutant levels.    Table 4.

Air Quality During the High Pollution Episode
Taipa Ambient is the representative background location for Macao, and was chosen to assess the background levels of PM2.5 and O3 during the extreme pollution episode.
The influx of tourists coming to Macao, in light of the Chinese National Holiday, contributed to an high pollution episode that occurred during late September and early October 2019, with peak daily levels of PM2.5 concentration exceeding 55 μg/m 3 and O3 MAX levels exceeding 400 μg/m 3 , largely exceeding the threshold level recommended by the WHO.
The levels of PM2.5 and O3 MAX concentrations for Taipa Ambient during the Chinese National Holiday in 2019 (from September to November) are presented in Figures 1 and 2. Figures 1 and 2 showed the comparison of daily average PM2.5 and O3 MAX concentration during 2018 and 2019, from a month before in September and a month after in November of the Chinese National Holiday. The pollution episode of 2019 occurred just before and going well into the period of Chinese National Holiday (1 to 7 October).  As shown in Figures 1 and 2, the levels of PM2.5 and O3 MAX concentration peaked immediately before, and during, the Chinese National Holiday in late September and early October 2019. The monthly mean concentration of PM2.5 (from September to November) during the Chinese National Holiday in 2019 was 19 μg/m 3 , 24 μg/m 3 , and 28 μg/m 3 , respectively. In addition, the monthly mean concentration of O3 MAX (from September to November) during the Chinese National Holiday in 2019 was 181 μg/m 3 , 163 μg/m 3 , and 172 μg/m 3 , respectively.
The levels of O3 MAX concentrations reached its peak during the late September and early October due to meteorological factors including predominant winds from the north and east, from the Guangdong Province and Hong Kong, respectively. Temperatures were high in conjunction with low wind speed. The average daily temperature during the ozone peak episode that took place the twoweeks before the Chinese National Holiday (October 1st) was 28 °C, while the maximum daily average was 31 °C. Average wind speed was 2.5 m/s. Due to the shutdown of nearby industrial sectors during the period of Chinese National Holiday, there were lower emissions of nitrogen oxides associated with the decreased load from the coal power plants in the northern region, usually supporting the operation of the factories. Therefore, this caused a decrease NOx, the precursor of O3. However, the increase in emissions of VOCs and NOx by vehicles, with chemical reactions in the presence of sunlight, may have caused the peak levels of ozone concentrations under these high temperature favorable conditions.

Air Quality During the Low Pollution Episode
In contrast, the COVID-19 pandemic has led to the Macao government's decision to temporarily suspend the operation of the casinos and entertainment industry and highly restrict cross border movements, as a preventive measure to reduce population mobility within the region of Macao. As a result, it has caused a low pollution episode during late January and early February 2020, with daily levels of PM2.5 concentration reaching a record low at 2 μg/m 3 and O3 MAX levels at 50 μg/m 3 . The reduction of population mobility, and consequently, of traffic emissions in Macao and its nearby Guangdong Province, lead to this lowest PM2.5 concentration levels.
As shown in Figure 3, the levels of PM2.5 concentrations remained low during the initial outbreak of COVID-19 pandemic in Macao (from January to February 2020), slowly recovering to pre-COVID-19 values in March 2020. As shown in Figure 4, the levels of O3 MAX concentration remained high during the initial outbreak of COVID-19 pandemic in Macao (from January to February 2020) and the high levels continued into March 2020. The higher levels of O3 MAX concentration were associated with lower NOX emissions, which led to a weakened O3 titration by NO during the COVID-19 pandemic lockdown in the nearby Guangdong Province [4].
Despite industrial emission being a major contributor to the PM2.5 pollution in China prior to COVID-19 pandemic lockdown period, the residential emission contributed to 39% of total PM2.5 emissions in China, so the emissions of PM2.5 during the lockdown period may have originated from residential areas [5].
The comparison of PM2.5 and O3 MAX concentrations for Taipa Ambient during the previous year of 2019 and COVID-19 pandemic in 2020 (January to March) is presented in Figures 3 and 4.  As shown in Figure 5, the difference between monthly mean concentration (from January to March) of PM2.5 concentration in 2019 and 2020 was 16 μg/m 3 , 2 μg/m 3 , and 1 μg/m 3 , respectively. As shown in Figure 6, the difference between monthly mean concentration (from January to March) of O3 MAX concentration in 2019 and 2020 was 12 μg/m 3 , 21 μg/m 3 , and 9 μg/m 3 , respectively.
The monthly mean concentration of PM2.5 and O3 MAX concentration for Taipa Ambient during the previous year of 2019 and COVID-19 pandemic in 2020 (January to March) is presented in Figures 5  and 6. Overall, the preventive measures of COVID-19 pandemic may not have caused a significant difference in the levels of PM2.5 and O3 concentration in Macao, as the levels from February to March 2020 were similar to that of the previous year, 2019.

Air Quality Pollution Episodes Discussion
The air quality of Macao, a territory with only 32.8 km 2 , is heavily influenced by external factors, in particular by human activities that occur in the much larger and neighboring Guangdong province. Our study shows the extent to which an increase in mobility associated with Chinese National Holiday, or a decrease in the same factors, associated with the COVID-19 preventive measures period, impacts air quality in Macao.
The levels of PM2.5 concentrations significantly reduced after the first confirmed case of COVID-19 pandemic in Macao on January 22nd, 2020, which caused panic and anxiety in the local population, and continued by the announcement of casino closures by the Macao government as part of the preventive measures for COVID-19 from February 5th to 20th, 2020. As some of the preventive measures, in particular, the 15 days mandatory casino closure have been lifted, the fear and tension of the local residents has eased, which has promoted population mobility. Although the levels of PM2.5 concentrations in Macao improved significantly during late January and early February 2020, the levels of PM2.5 concentrations gradually returned to normal in March 2020 after some of the preventive measures began to be lifted in Macao and its nearby Guangdong Province.

Air Quality Pollution Episodes Forecast
Regarding the model behavior in predicting PM2.5 and O3 MAX during the high pollution episode (Chinese National Holiday), observed and predicted PM2.5 and O3 MAX concentrations are presented in Figures 7 and 8.
As shown in Figures 7 and 8, the levels of PM2.5 and O3 MAX concentration peaked during late September and early October 2019. The PM2.5 predicted levels followed the primary trend of the measured concentrations and followed the concentration peak represented in Figure 7. The model for O3 MAX also followed the primary trend, but it was more difficult to represent the concentration peak. The forecast model for PM2.5 has a higher R 2 in comparison to the model of O3 MAX, because the maximum hourly concentration of O3 MAX is more challenging to predict in comparison to the 24 h average of PM2.5, as there is influence from the regional precursors sources and also its complex chemistry with solar radiation for O3 formation, which led to a higher degree of variability.    The 2013 to 2018 model successfully predicted both the high and low pollution episodes, for PM2.5 and O3 MAX, obtaining a significant R 2 of 0.88 and 0.83, respectively, for the high pollution period (from September to November 2019), and an R 2 of 0.82 and 0.75, respectively, for the low pollution period (from January to March 2020). The R 2 obtained for the entire year of 2019 was 0.86 for both PM2.5 and O3 MAX. The statistical forecast model has been shown to be capable to predict, with a high coefficient of determination, the next 24 h.

Conclusions
As expected, the 2013 to 2018 model performed best with the highest R 2 and lowest RMSE, MAE, and BIAS as compared with the 2013 to 2016 model and the 2015 to 2018 model. The additional two years of data helped to improve the accuracy and stability of the forecast of the 2013-2018 model.
The 2013-2018 model was able to successfully predict the high pollution episode during the Chinese National Holiday in late September and early October 2019 and the low pollution episode during the preventive measures period of COVID-19 pandemic in late January and early February 2020. This shows that this model can be reliably applied to forecast next-day pollutants concentrations across different magnitude levels of air pollution, being a useful tool for mitigation of air pollution impacts.
In addition, this shows that an improvement of global air quality in the territory is possible but it is tightly linked to the implementation of air pollution control measures in the industry and mobility sectors in Macao, in particular, in Guangdong Province. As previously studied, the air pollution problem associated with PM2.5 and O3 MAX is a regional problem that is not only limited to Macao, but also in the nearby regions of Hong Kong and Guangdong Province.