Modeling Wildﬁre Smoke Pollution by Integrating Land Use Regression and Remote Sensing Data: Regional Multi-Temporal Estimates for Public Health and Exposure Models

: To understand the health effects of wildﬁre smoke, it is important to accurately assess smoke exposure over space and time. Particulate matter (PM) is a predominant pollutant in wildﬁre smoke. In this study, we develop land-use regression (LUR) models to investigate the impact that a cluster of wildﬁres in the northwest USA had on the level of PM in southern Alberta (Canada), in the summer of 2015. Univariate aerosol optical depth (AOD) and multivariate AOD-LUR models were used to estimate the level of PM 2.5 in urban and rural areas. For epidemiological studies, it is also important to distinguish between wildﬁre-related PM 2.5 and PM 2.5 originating from other sources. We therefore subdivided the study period into three sub-periods: (1) Pre-ﬁre, (2) during-ﬁre, and (3) post-ﬁre. We then developed separate models for each sub-period. With this approach, we were able to identify different predictors signiﬁcantly associated with smoke-related PM 2.5 verses PM 2.5 of different origin. Leave-one-out cross-validation (LOOCV) was used to evaluate the models’ performance. Our results indicate that model predictors and model performance are highly related to the level of PM 2.5 , and the pollution source. The predictive ability of both uni- and multi-variate models were higher in the during-ﬁre period than in the pre- and post-ﬁre periods. S.B.;


Introduction
Fire smoke is a complex mixture of gases and particles, including carbon dioxide (CO 2 ), methane (CH 4 ) and nitrogen dioxide (NO 2 ), that are known as greenhouse gases, as well as high levels of particulate matter (PM), toxins, carbon monoxide (CO), ozone (O 3 ), and benzene [1]. Thus, fire smoke is a significant contributor to air pollution. It impacts climate as well, through primary emission of greenhouse gases and aerosol (direct impact), and through secondary effects on atmospheric chemistry (indirect impact) [2]. Wildfire smoke is associated with adverse health effects (both long-term and short-term) on exposed human population, and it contributes to annual global premature mortality [3][4][5]. Individuals are affected by fire smoke differently: At-risk groups include people suffering from respiratory disease such as asthma or lung cancer, people with an existing heart condition, children, people over 65, and pregnant women [6].
To understand health-related effects, it is important to accurately assess wildfire smoke exposure. Particulate matter (PM) is a predominant air pollutant in wildfire smoke [7], and it poses the most Remote sensing observations, including AOD images, provide extensive spatial coverage and reliable repeated measurements. However, there are some limitations with AOD images. For example, cloudy days are a major problem with remotely sensed methods of PM 2.5 estimation [24], as they limit the number of days available for analysis.
A variety of satellite sensors provide AOD image including Moderate Resolution Imaging Spectrometer (MODIS), the Ozone Monitoring Instrument (OMI), the Multi-Angle Imaging Spectrometer (MISR), the Geostationary Operational Environment Satellite (GEOS), Polarization of Earth's Reflectance and Directionality (POLDER), the Sea-viewing Wide Field-of-view Sensor (SeaWiFS), and the Cloud-Aerosol Lidar with Orthogonal Polarization (CALIOP). AOD is affected by ambient conditions, such as humidity and vertical profile, as well as chemical properties of the atmosphere [45].
Land use regression (LUR) methods estimate pollution concentration at a given location using surrounding attributes, such as land use type and traffic characteristics within a surrounding area, as predictors [33]. LUR is generally used to estimate air pollution at fine spatial scales, and can be used to assign household-level exposure in community health studies [20]. Based on the type of the pollutant and the source of the air pollution different land use variables can be used in LUR. For example, road types, parks, residential, commercial, and industrial land uses have been used by Bertazzon et al. (2015), for modeling urban air pollution [20]. Yang et al. (2017) developed LUR to estimate PM 2.5 and NO 2 using different land use types including cultivated land, forest, grassland, shrub and water land, and traffic and urbanization [46]. According to a report by Environment and Climate Change Canada (2017), the main sources of PM 2.5 emission in Canada include dust and fire, agriculture, home firewood burning, mineral and oil and gas industry, transportation, and building utilities [47]. Different specifications of LUR (ordinary least squares (OLS) regression, geographically weighted regression (GWR), and spatial autoregression (SAR)) have been employed to predict air pollution in different studies [20,38,48].
Different factors affect the identification of the most reliable models in different wildfire events and areas. As each model has strengths and weaknesses, no single model can be considered the most accurate to quantify human exposure to wildfire smoke [44]. For example, ground-based measurements provide accurate information for each station; however, the spatial density of station distribution is not sufficient to estimate the spatial distribution of wildfire smoke. GWR is suitable on a reginal scale and needs small amounts of data; however, its performance depends on ground-based measurements. Therefore, in areas lacking air quality (AQ) stations, GWR is not reliable [44]. CTM can predict pollutant concentration without ground based measurements [49]. However, collecting the necessary chemical and physical information is time consuming and requires financial resources [50]. Although MLR has relatively weak predictability, it can be improved by adding some covariates [44]. For example, for estimating PM 2.5 using AOD aerosol type, methodological and land use variables may be included in the regression to improve the model performance [28,51,52]. MEM performance is usually strong, and the R 2 value could reach up to 0.80 [27,53]. The main disadvantage of this method is that, if ground-based stations are not available in some areas, the model assumptions are not met, potentially affecting its performance [54]. In addition, there are differences between models trained for smoke-based PM 2.5 (i.e., when the wildfire is the source of PM 2.5 ), and models developed to estimate PM 2.5 from other sources, mainly of industrial and transportation origin.
In the present study, the model choice was affected by many factors, including the number of ground-based AQ monitors, the characteristics of the study area, data availability, data uncertainty, and the significance of spatial autocorrelation in the data. Experiments with a range of predictors showed that the predictors associated with PM 2.5 could change even over a short period (two months), depending on the source of PM 2.5 . The predictors considered include spatial and temporal variables, such as wind speed and land use variables, and satellite data, including AOD and (normalized difference vegetation index). Roadway and industrial variables are established predictors in LUR models for estimating air pollution [17,19,55], whereas satellite observations are relatively uncommon. In the present study, we investigated the role of vegetation on the level of PM 2.5 concentration during a wildfire event. Plants have the ability to reduce some of air pollution particles as plant leaves and stems increase surface areas to which airborne particles can be absorbed [56]. NDVI is an index of greenness, vegetation, and agriculture land use that has been rarely used as a predictor in LUR models [57][58][59][60]. Wu et al., [57] investigated the role of surrounding greenness on intra urban PM 2.5 variability, showing that there was a strong negative correlation (ranging between −0.71 and −0.77) between NDVI and PM 2.5 .
To examine the temporal variability in predictors, we divided the study period into the following three sub-periods. "Pre-fire period", that is, when the level of PM 2.5 in the study area was normal (the average level of the 24-h PM 2.5 was lower than the standard level: 30 µg/m 3 [60]); "during-fire" period, that is, when the concentration of PM 2.5 was dramatically increased in the majority of AQ stations located in the study area; "post-fire" period, that is, when the PM 2.5 concentration returned to background levels. The subdivision of the study period into three separate sub-periods allowed us to test different models for the same study area, yet with very different PM 2.5 concentration levels and pollutant sources over two months. The study area covers most of southern Alberta, including urban and rural areas. Rural areas usually have insufficient AQ monitors and most epidemiological studies are limited to urban areas. We propose an extension of the models to rural areas to estimate levels of smoke-based PM 2.5 concentration in areas where AQ monitors are not enough or available.
In the present study, we estimate two types of predictive models: univariate models based on satellite AOD; and integrated AOD-LUR models. Both models yield estimates of PM 2.5 for each of the sub-periods in August and September 2015 over the whole urban and rural southern Alberta.

Study Area and Ground-Level PM 2.5 Measurements (Dependent Variable)
The study area covers the southern part of the province of Alberta (Canada). The Clean Air Strategic Alliance (CASA) was established to manage air quality in Alberta. Ten Airshed zones were formed between 1996 and 2017; the study area includes six of these zones: Alberta Capital  predictor in LUR models [57][58][59][60]. Wu et al., [57] investigated the role of surrounding greenness on intra urban PM2.5 variability, showing that there was a strong negative correlation (ranging between −0.71 and −0.77) between NDVI and PM2.5.
To examine the temporal variability in predictors, we divided the study period into the following three sub-periods. "Pre-fire period", that is, when the level of PM2.5 in the study area was normal (the average level of the 24-h PM2.5 was lower than the standard level: 30 µg/m 3 [60]); "duringfire" period, that is, when the concentration of PM2.5 was dramatically increased in the majority of AQ stations located in the study area; "post-fire" period, that is, when the PM2.5 concentration returned to background levels. The subdivision of the study period into three separate sub-periods allowed us to test different models for the same study area, yet with very different PM2.5 concentration levels and pollutant sources over two months. The study area covers most of southern Alberta, including urban and rural areas. Rural areas usually have insufficient AQ monitors and most epidemiological studies are limited to urban areas. We propose an extension of the models to rural areas to estimate levels of smoke-based PM2.5 concentration in areas where AQ monitors are not enough or available.
In the present study, we estimate two types of predictive models: univariate models based on satellite AOD; and integrated AOD-LUR models. Both models yield estimates of PM2.5 for each of the sub-periods in August and September 2015 over the whole urban and rural southern Alberta.

Study Area and Ground-Level PM2.5 Measurements (Dependent Variable)
The study area covers the southern part of the province of Alberta (Canada). The Clean Air Strategic Alliance (CASA) was established to manage air quality in Alberta. Ten Airshed zones were formed between 1996 and 2017; the study area includes six of these zones: Alberta Capital Airshed   Twenty-four-hour PM 2.5 concentrations were collected at 24 Alberta Airshed Zones continuous AQ stations (Figure 1b) for the months of August and September 2015 (http://airdata.alberta.ca/ RelatedLinks.aspx). The daily PM 2.5 concentrations were averaged for each period in each station.

Temporal (Meteorological) and Spatial Predictors
Wind speed and direction were collected at each monitoring station. A windrose was created using four quadrants, defined by radii traced at 0, 90, 180, and 270 degrees, so that each quadrant is centered on the northeast, southeast, southwest, and northwest directions, respectively. As the fire source is located southwest of the study area, based on the recorded wind direction, hourly wind speed, the southwest quadrant of each station was averaged over each of the three sub-periods. Average humidity and temperature over each period were obtained from continuous weather stations as well.
Industrial and road length predictors were generated by drawing circular buffers around each monitoring station. Following the previous studies [20], the buffers for industrial variables ranged from 5000 to 10,000 m and for road length from 500 to 1000 m. The number of industrial emission points and length of road in each buffer were calculated. The Alberta road network was acquired from the National Road Network (NRN) [61], and industrial points were obtained from the National Pollutant Release Inventory (NPRI) [62].
Elevation data were acquired through an Alberta DEM (digital elevation model) from DMTI Spatial [63]: Contour lines, stored in a shapefile, were converted to a raster file of elevation values, using ESRI ArcMap 10.3.1.
As the 2015 PNW wildfire originated in Washington State, USA, the centroid of the fire was digitized in ArcGIS as a source of the smoke over Alberta. The Euclidean distance between each AQ station in the study area and the source of the fire was calculated.

Satellite AOD Images
One of the most common satellite products used in the field of air quality is aerosol optical depth (AOD), a quantity that indicates the amount of aerosol particles in the atmosphere. In our study, AOD data were derived from both MODIS and ozone monitoring instrument (OMI).
Averaged MOD 08 product of MODIS (at 1 degree resolution) and OMI/Aura AOD data (at 0.25 degree resolution) were collected from the NASA Giovanni website [64] for the three sub-periods (pre-, during, and after-fire) of the study.

NDVI
Normalized Difference Vegetation Index (NDVI) has been used as greenness predictor in previous air pollution studies [57][58][59][60]. NDVI is an image-based greenness index for measuring and monitoring density of plant, vegetation and biomass production on the earth. The NDVI values range is from −1 to +1. Increasing positive NDVI values indicate increasing amount of healthy vegetation and correspond to dense vegetation. Areas covered by rock, sand, snow, and non-vegetative features show very low NDVI values (close to -1).
Global NDVI is one of the vegetation products of MODIS that contains atmospherically corrected bi-directional surface reflectance. Global NDVI data are provided every 16 or 30 days. In the present study, MOD13C2 data (monthly NDVI image) were downloaded from the NASA Giovanni website [64]. Considering that the NDVI is not expected to change significantly over a month we collected one image for August 2015 and one for September 2015. Those monthly NDVI images were projected on a 0.05 degrees geographic Climate Modeling Grid (CMG).

PM 2.5 Predictive Models
In the present study, a univariate model was first tested, using AOD as the single predictor, to assess the amount of variability of PM 2.5 concentration that can be modeled by AOD for different pollution levels in each sub-period. Subsequently, this simple model was integrated into a LUR model by including traditional land use variables (Equation (1)) [20], as well as meteorological variables. The purpose of this integration was to improve the model's spatial and temporal fit, as the relationship between AOD and PM 2.5 is thought to be affected by other variables. Furthermore, the latter model tests the association of PM 2.5 with relevant predictors. An AOD-LUR multivariate model was independently estimated for each sub-period. Model selection led to the identification of different sets of significant predictors, yielding respective coefficients for each sub-period.
Traditional LUR models are defined as: where the response variable, y, (e.g., PM 2.5 ) indicates pollution concentration at location i, (where each i is a sample AQ station), and is expressed as a function of k attributes, or independent variables (x), such as land use variables and traffic characteristics [33]. Coefficients are estimated using the ordinary least squares (OLS) estimation method. Variable selection was conducted independently for each sub-period model; over twenty variables were tested in the prediction models, including spatial predictors (land use type, industrial, and transportation indicators), temporal predictors (meteorology) and remotely sensed data (AOD and NDVI). Model selection was based on theoretical relevance and statistical significance. First, the most relevant variables were chosen for initial list in each period. Then, forward stepwise multiple linear regression and subset regression methods were used to identify the most significant predictors in each model. Global Moran's I [65] was used to identify spatial clustering and autocorrelation at the spatial variables, i.e., PM 2.5 , land use variables, and satellite observations recorded at each AQ station. Residual Moran's I and Lagrange multiplier (LM) test [66] were used to assess the spatial autocorrelation in the residuals of each sub-period model.
After estimating the best prediction model for each sub-period, the regression coefficients were applied to calculate PM 2.5 in the centroid of each 10 by 10 km grid-cell overlaying the study area. Then inverse distance weighted (IDW) interpolation technique was applied to map the calculated PM 2.5 level between grid centers. IDW works on the assumption that near things are more alike than things that are farther apart. Using the estimated value at each grid center, IDW predicts a value for any unknown location, giving greater weights to points closer to the known location. The weights are determined by distance [67]. A summary of variables, their names, and additional details are shown in Table 1. Performance of the models in the PM 2.5 concentration estimation is evaluated by using different statistical indices including coefficient of determination (R 2 ), adjusted coefficient of determination (adj R 2 ), cross-validated R 2 (CV-R 2 ), RMSE, CV-RMSE, and Akaike information criterion (AIC).

Validation
The validity of the AOD-based and AOD_LUR models used for estimation of PM 2.5 concentration was evaluated by a cross-validation procedure. As the total number of AQ stations (PM 2.5 samples) was too small to divide the data set to test and train separately, we applied leave-one-out cross validation (LOOCV) procedure. With this method, each PM 2.5 measurements are estimated based on a model calibrated over the other PM 2.5 measurements. Therefore, in this study, for each period, we developed almost 24 individual models (after removing outliers we had 22 data for pre-fire period, 24 data for during-fire period, and 23 data for post-fire period). Then the cross-validated root mean square error (CV-RMSE) and CV-R 2 were calculated for validating the models. The structure of the model (the variables in the model) was the same as the one used in the full training dataset (the model trained by 24 PM 2.5 measurements) but the coefficients of the model changed. Table 2 shows the descriptive statistics of PM 2.5 concentration over the three sub-periods. The overall mean concentration of PM 2.5 over the during-fire period is significantly higher than the pre-and post-fire periods (i.e., 19.5 µg/m 3 compared to 6.2 µg/m 3 and 4.2 µg/m 3 , respectively). Global Moran's I tests indicate that there was no significant spatial autocorrelation in the measured PM 2.5 at AQ stations in any of the sub-periods over the study area. The temporal daily variability of PM 2.5 level at each of the 24 AQ stations during the whole period (1 August to 30 September) is shown in Figure 2. The recorded concentrations of PM 2.5 increased greatly in late August. Calgary monitors reported the level of PM 2.5 as about 170 µg/m 3 on 25 August, which is, more than three times their typical peak in non-smoke days. It can be seen in this figure that the daily averaged PM 2.5 level for some stations was more than 80 µg/m 3 on 25 August. By 1 September, the PM 2.5 concentration returned to the background level. The spatial variability of PM2.5 concentration was estimated using AOD and AOD-LUR models during the three sub-periods. That is, two models were developed for each period. The univariate AOD models are summarized in Table 3, and the multivariate AOD-LUR models in Table 4. The final equation of the multivariate model for each sub-period is as follows:

Results
Pre-fire period model: During-fire period model: Post-fire period model: log ( ) = 1.55 + (−9.61 × 10 × ) + (1.87 × 10 × Road ) + (2.9 × 10 × LU_ind_ ) As it is shown in the above equations, there are substantial differences between the intercept values across multivariate models. The highest intercept value was obtained for the during-fire period, due to the higher level of PM2.5 concentration in that period. The lowest intercept was obtained for the post-fire period, as we applied a logarithmic transformation to PM2.5 to normalize the PM2.5 distribution and improve the results. The spatial variability of PM 2.5 concentration was estimated using AOD and AOD-LUR models during the three sub-periods. That is, two models were developed for each period. The univariate AOD models are summarized in Table 3, and the multivariate AOD-LUR models in Table 4. The final equation of the multivariate model for each sub-period is as follows: Pre-fire period model: During-fire period model: Post-fire period model: log(y i ) = 1.55 + −9.61 × 10 −4 × Elevation + 1.87 × 10 −5 × Road 1km + 2.9 × 10 −2 × LU_ind _ 5km (4) As it is shown in the above equations, there are substantial differences between the intercept values across multivariate models. The highest intercept value was obtained for the during-fire period, due to the higher level of PM 2.5 concentration in that period. The lowest intercept was obtained for the post-fire period, as we applied a logarithmic transformation to PM 2.5 to normalize the PM 2.5 distribution and improve the results.  The model results shown in Table 3 for all three sub-periods indicate that the goodness-of-fit of the univariate AOD model was limited, especially for the pre-and post-fire periods, that is, when the PM 2.5 level was low. The goodness-of-fit for all three sub-periods improves substantially by adding temporal and spatial variables to each regression ( Table 4). The LUR was developed for each period using different image-based, temporal, and spatial variables. The only significant variable, beside AOD, in the pre-fire model is "road length within a 1000-m buffer". With the addition of this variable, the model R 2 increased from 0.29 to 0.50. For the during-fire model, the significant variables, besides AOD, were "NDVI", and "distance to fire. The integration of AOD with distance to the fire period and NDVI improved model performance substantially, as shown by the higher R 2 value of the AOD_LUR model compared to AOD-based model (AOD-LUR's R 2 = 0.77 verses AOD-based model's R 2 = 0.50), along with lower AIC and RMSE values for the AOD-LUR model compared to AOD-based model (AOD_LUR's AIC and RMSE = 162 and 6, verses AOD-based model's AIC and RMSE = 179 and 9). For the post-fire model, significant variables were "elevation", "road length within a 1000-m buffer", and "industrial land use within a 5000-m buffer". As shown in Tables 3 and 4, AOD did not perform well in the post-fire period. For most periods, the CV-R 2 was less than 10% lower than the model adjusted R 2 . For example, in the during-fire period the multi-variate model R 2 and CV-R 2 were 0.74 and 0.71 respectively, less than 5% decrease in CV-R 2 ( Table 4). This model showed the best performance based on validation methods.
Residual Moran's I and Lagrange multiplier (LMerr) tests for all three periods indicated that there is no significant spatial autocorrelation in any of the model residuals. Therefore, geographically weighted or spatially autoregressive models were not deemed necessary.

PM 2.5 Predictive PM 2.5 Maps
To extend the estimation of PM 2.5 to the entire study area, a 10 by 10 km grid was overlaid to the study area. First, the value of the model's variables at the center of each grid cell was extracted, and then, by using the developed models for each sub-period (shown in Table 4 and Equations (2)-(4)), the PM 2.5 levels were calculated at each grid center. An IDW interpolation technique was applied to interpolate the PM 2.5 level between grid centers. The predictive maps are shown in Figure 3. Table 3 for all three sub-periods indicate that the goodness-of-fit of the univariate AOD model was limited, especially for the pre-and post-fire periods, that is, when the PM2.5 level was low. The goodness-of-fit for all three sub-periods improves substantially by adding temporal and spatial variables to each regression ( Table 4). The LUR was developed for each period using different image-based, temporal, and spatial variables. The only significant variable, beside AOD, in the pre-fire model is "road length within a 1000-m buffer". With the addition of this variable, the model R 2 increased from 0.29 to 0.50. For the during-fire model, the significant variables, besides AOD, were "NDVI", and "distance to fire. The integration of AOD with distance to the fire period and NDVI improved model performance substantially, as shown by the higher R 2 value of the AOD_LUR model compared to AOD-based model (AOD-LUR's R 2 = 0.77 verses AOD-based model's R 2 = 0.50), along with lower AIC and RMSE values for the AOD-LUR model compared to AOD-based model (AOD_LUR's AIC and RMSE = 162 and 6, verses AOD-based model's AIC and RMSE = 179 and 9). For the post-fire model, significant variables were "elevation", "road length within a 1000-m buffer", and "industrial land use within a 5000-m buffer". As shown in Tables 3 and 4, AOD did not perform well in the post-fire period. For most periods, the CV-R 2 was less than 10% lower than the model adjusted R 2 . For example, in the during-fire period the multi-variate model R 2 and CV-R 2 were 0.74 and 0.71 respectively, less than 5% decrease in CV-R 2 ( Table 4). This model showed the best performance based on validation methods.

The model results shown in
Residual Moran's I and Lagrange multiplier (LMerr) tests for all three periods indicated that there is no significant spatial autocorrelation in any of the model residuals. Therefore, geographically weighted or spatially autoregressive models were not deemed necessary.

PM2.5 Predictive PM2.5 Maps
To extend the estimation of PM2.5 to the entire study area, a 10 by 10 km grid was overlaid to the study area. First, the value of the model's variables at the center of each grid cell was extracted, and then, by using the developed models for each sub-period (shown in Table 4 and Equations (2)-(4)), the PM2.5 levels were calculated at each grid center. An IDW interpolation technique was applied to interpolate the PM2.5 level between grid centers. The predictive maps are shown in Figure 3.

Relation between AOD and PM2.5 Based on Different Smoke Levels
The results of both AOD and AOD-LUR, for all periods, indicate that with lower level of PM2.5 concentration, the predictive ability of AOD in the models decreased. Studies have shown that satellite AOD is well suited for detecting large changes in aerosol content [36], which can explain the higher performance of the during-fire models compared to pre-and post-fire models. The increased level of PM2.5 concentration increases the correlation between AOD and PM2.5, leading to a better model performance. The graph in Figure 2 demonstrates the difference in PM2.5 concentration in the during-fire period compared to the other two periods, and it can be seen from the graph that there is also a small peak in the pre-fire period between 11 and 15 August. It is clear from both uni-and multivariate models that the AOD models improved when the smoke from the wildfire drifted over the study area, leading to a high level of PM2.5. However, both for the pre-and post-fire periods, when the levels of PM2.5 were fairly low, the model performance decreased due to the lower correlation between AOD and PM2.5 (mean PM2.5 = 4.2 µg/m 3 compared to 6.2 µg/m 3 and 19.5 µg/m 3 for pre-and during-fire periods).
The main difference between the post-fire model verses pre-and during-fire models was that the AOD showed little correlation with PM2.5 level and it was not significant either in the univariate model or in the integrated AOD-LUR model. The relatively weak results obtained by the AOD-model in the post-fire period, compared to the pre-and during-fire period, may be due to uncertainties in MODIS AOD data, such as cloudy days. In southern Alberta, the month of September experiences gradually increasing cloud cover. For example, the percentage of time that the sky is mostly cloudy increases from 42% to 47% in Calgary, 37% to 44% in Lethbridge, 44% to 48% in Red Deer between 1 September and 30 September [68] . The increasing number of cloudy days in September (the post-fire period) have affected the performance of the model using AOD image due to the uncertainty in the AOD data.

Selected Predictor Variables
Although all three sub-periods used the same study area with almost the same AQ monitoring stations for developing the multivariate models, each final model included a different set of variables. The variable selection of the integrated AOD-LUR models highlights one key finding of this study: The predictors of PM2.5 vary even over a short period of time (two months), as a function of the

Relation between AOD and PM 2.5 Based on Different Smoke Levels
The results of both AOD and AOD-LUR, for all periods, indicate that with lower level of PM 2.5 concentration, the predictive ability of AOD in the models decreased. Studies have shown that satellite AOD is well suited for detecting large changes in aerosol content [36], which can explain the higher performance of the during-fire models compared to pre-and post-fire models. The increased level of PM 2.5 concentration increases the correlation between AOD and PM 2.5, leading to a better model performance. The graph in Figure 2 demonstrates the difference in PM 2.5 concentration in the during-fire period compared to the other two periods, and it can be seen from the graph that there is also a small peak in the pre-fire period between 11 and 15 August. It is clear from both uni-and multivariate models that the AOD models improved when the smoke from the wildfire drifted over the study area, leading to a high level of PM 2.5 . However, both for the pre-and post-fire periods, when the levels of PM 2.5 were fairly low, the model performance decreased due to the lower correlation between AOD and PM 2.5 (mean PM 2.5 = 4.2 µg/m 3 compared to 6.2 µg/m 3 and 19.5 µg/m 3 for preand during-fire periods).
The main difference between the post-fire model verses pre-and during-fire models was that the AOD showed little correlation with PM 2.5 level and it was not significant either in the univariate model or in the integrated AOD-LUR model. The relatively weak results obtained by the AOD-model in the post-fire period, compared to the pre-and during-fire period, may be due to uncertainties in MODIS AOD data, such as cloudy days. In southern Alberta, the month of September experiences gradually increasing cloud cover. For example, the percentage of time that the sky is mostly cloudy increases from 42% to 47% in Calgary, 37% to 44% in Lethbridge, 44% to 48% in Red Deer between 1 September and 30 September [68] . The increasing number of cloudy days in September (the post-fire period) have affected the performance of the model using AOD image due to the uncertainty in the AOD data.

Selected Predictor Variables
Although all three sub-periods used the same study area with almost the same AQ monitoring stations for developing the multivariate models, each final model included a different set of variables. The variable selection of the integrated AOD-LUR models highlights one key finding of this study: The predictors of PM 2.5 vary even over a short period of time (two months), as a function of the presence/absence of wildfire smoke. For example, for the pre-fire period, the variable selection methods identified two variables: AOD, and road length on a 1000-m buffer. By adding the road length variable to the AOD-based regression, the model's R 2 value increased from 0.29 to 0.50, emphasizing the major impact of traffic emissions, especially in urban areas. The predictive ability of the during-fire model using both AOD and AOD-LUR model was greater than pre-and post-fire models.

Relation between PM 2.5 and NDVI
The negative correlation of −0.60% between NDVI and PM 2.5 in the during-fire model (Table 4) indicates that with an increased amount of vegetation, that is, plants and trees around the ground monitoring stations, the level of PM 2.5 decreases. Trees have the ability to reduce significant amounts of ambient air pollutants, by primary absorbing pollution components via leaf stomata and plant surface, and also by intercepting airborne particles [56]. Most of the intercepted particles are retained on the plant surface [56]. The significantly higher coefficient of NDVI in the AOD-LUR model (in absolute value) than the coefficients of the other variables may also indicate the essential role of plants and vegetation in controlling wildfire-related PM 2.5 level. Few studies have applied the greenness index as a predictor in PM 2.5 modeling. However, the role of vegetation and plants in reducing air pollution during the fire events has not been assessed yet.

Relation between PM 2.5 and Prevailing Wind Speed
The maps in Figure 3 show the spatial pattern of PM 2.5 levels in the three sub-period and displays a temporal pattern over the study period. This pattern appears to be associated with the source of fire (south-west of Alberta) and prevailing wind direction that in the summer is from west to east in most of Alberta. In the first period, when most of the study area's monitoring stations showed standard levels of PM 2.5 , the predicted map shows an elevated level of PM 2.5 in a small portion of southern Alberta. In the rest of the study area, the road length variable is associated with the PM 2.5 concentration, and, as it can be seen in the predictive map, the level of PM 2.5 inside the cities is higher than the level of PM 2.5 in areas further out, where the road network is sparse.
During the fire period, the movement of the fire smoke to the interior portion of the study area is clear: The north and northeast portions of the study area, that are farther from the source of fire are still clean compared to its south and south-west portions.
The PM 2.5 pattern in September, just after the fire, shows higher PM 2.5 concentration in east and northeast areas and lower in west and southwest areas. Even though we used coarse resolution AOD images, the correlation of AOD and PM 2.5 in two of the three sub-periods exhibited the highest value among all predictors, and it was a one of the most significant predictors of PM 2.5 .
Wind speed was not significant in any of the models, but it is interesting to note that the correlation between PM 2.5 concentration and wind speed during the fire had opposite signs, in comparison to pre-and post-fire period (correlation between wind speed (at during-fire period) and PM 2.5 = 0.60 and correlation between wind speed-pre/post-fire and PM 2.5 = −0.15). The strong positive correlation between wind speed and PM 2.5 concentrations in the during-fire period is due to the prevailing wind direction in southern Alberta. As the prevailing wind direction during the summer is from west to east in most of Alberta, and the fire was located south-west of Alberta, the higher wind during the fire would result in higher PM 2.5 concentration. Conversely, in the pre-and post-fire periods the wind helps to scatter the pollution and there was a negative correlation between wind speed and PM 2.5 , and low winds allow PM 2.5 levels to rise [69].

Spatial Autocorrelation between AQ Measurements
PM 2.5 over AQ stations did not exhibit significant spatial autocorrelation, nor did any of the model residuals. This result is consistent with the known characteristics of PM 2.5 : A regional pollutant, which tends to exhibit little spatial variation. However, as spatial autocorrelation is an indication of self-similarity over short distances, it would be a reasonable expectation during a wildfire event. Non-significant spatial autocorrelation may be related to a smooth pollution pattern, driven by wind and elevation, but it could also be a function of the relatively large distance between the AQ stations. Owing to non-significant spatial autocorrelation in model residuals, reliable result estimates can be obtained from simple linear (OLS) models.

Validating the Models by Leve-One-Out Cross Validation Procedure
As we mentioned before, the limitation of the present study was the limited number of AQ stations. Therefore, it was not possible to have a separate dataset to validate the models. So, the models' performance was evaluated using a leave-one-out cross validation method. Comparing R 2 and adjusted R 2 of the full training dataset model with the CV-R 2 shows small decrease in most of the models (between 5% and 10%), which may indicate the reliability of the performance of the models and the generalization ability of the models for a new data set.
Further support for the usefulness of most of the developed models, especially multivariate models, is that both RMSE and CV-RMSE were low compared to the range in measured PM 2.5 concentrations (Tables 2 and 4).
We can then say that the validation results indicate that the calibrated proposed models, especially during-fire models could be used to estimate PM 2.5 for other fire events in Southern Alberta.

Application of Integrated AOD-LUR Models in Rural Area
Most studies focus on urban areas due to the lack of AQ monitors in rural area as well as to higher air pollution in urban areas resulting from human activities. The present study estimated wildfire-related PM 2.5 , as it relates to source of fire, as opposed to fossil fuel emissions in both urban and rural areas. The results demonstrate that the use of satellite imagery and other land-use (LU) variables can produce a model that might be useful in areas that suffer from the scarcity of AQ monitors. Our study area, southern Alberta, despite including eight airshed zones, only has 24 AQ monitors that measured PM 2.5 , most of them located in urban areas. By incorporating MODIS AOD and LU variables, we developed three models for three periods that estimated PM 2.5 in both urban and rural area. Even though the sparseness of AQ stations remains problematic, the proposed models yield preliminary estimates of air pollution at relatively fine spatial resolution in rural areas. Hence, epidemiological studies could use our model results to assess PM 2.5 effect on human health in both rural and urban areas. Further, these predictive models are applicable to other wildfire events, and could help health authorities estimate the spatial and temporal pattern of smoke, with relative accuracy, in major wildfire events.

Conclusions
In summary, we proposed a method applicable globally and regionally, which integrates the use of MODIS AOD data into LUR models to identify smoke (particulate matter) concentration. Univariate AOD and multivariate AOD-LUR models were developed to estimate the level of PM 2.5 concentration in three different sub-periods: Pre-, during-, and post-fire. The AOD model performed substantially better when combined with other covariates during the fire period, compared to the univariate AOD model. The predictors in each period varied as a function of the sources of PM 2.5 . The three different sub-periods exhibited very large differences in PM 2.5 concentration levels, as well as in the source of PM 2.5 . For this reason, separate models were developed for each sub-period. The three univariate models differ from each other in the coefficients, goodness of fit, and other regression diagnostics. The integrated multivariate AOD-LUR models further differ from each other in that each model contains its own set of predictors, with their own associated coefficients.
In addition, our model estimated PM 2.5 concentration at finer spatial scale compared to ground-based measurements available from Alberta airshed zones, which exhibit a sparse distribution of AQ monitors, far from sufficient for public health and epidemiological studies. The spatial and temporal estimates yielded by these models are particularly useful in rural areas, where such estimates are rarely available. Furthermore, the model estimates can aid health authorities predict wildfire smoke exposure at fine spatial and temporal scale, allowing them to better plan for wildfire events, which are likely to increase in frequency and intensity as climate changes.
The result of this study could also be used in epidemiological studies to assess health effects of smoke-related particulate matter during wildfire events, in contrast with those of particulate matter pollution originating from other sources. The results improved the spatial accuracy and reliability of the estimation of wildfire-related PM 2.5 , enabling epidemiological studies to assess the association between PM 2.5 and smoke-related health effects during wildfire episodes. To this end, it is essential for epidemiological studies to distinguish between wildfire-related PM 2.5 and PM 2.5 originating from other sources [15]. An important contribution of this paper is the comparison of PM 2.5 level during-fire events, in contrast to pre-fire and post-fire periods. Distance from the wildfire is a significant predictor of PM 2.5 during the fire event, in contrast with the pre-and post-fire period, when the main predictors are transportation and industrial activities. Therefore, by estimated pre and post PM 2.5 maps, the PM 2.5 related health effects before and after fire periods can be evaluated and compared with smoke-related disease.
Further directions of enquiry include the calibration of the proposed models to estimate PM 2.5 for other fire events, in southern Alberta and other regions. We plan to experiment with the use of different satellite AOD data, to obtain even greater spatial accuracy, through the use of finer resolution AOD, which will allow for filling the current MODIS AOD gaps.