Predicting Fine Particulate Matter (PM2.5) in the Greater London Area: An Ensemble Approach using Machine Learning Methods

: Estimating air pollution exposure has long been a challenge for environmental health researchers. Technological advances and novel machine learning methods have allowed us to increase the geographic range and accuracy of exposure models, making them a valuable tool in conducting health studies and identifying hotspots of pollution. Here, we have created a prediction model for daily PM 2.5 levels in the Greater London area from 1st January 2005 to 31st December 2013 using an ensemble machine learning approach incorporating satellite aerosol optical depth (AOD), land use, and meteorological data. The predictions were made on a 1 km × 1 km scale over 3960 grid cells. The ensemble included predictions from three di ﬀ erent machine learners: a random forest (RF), a gradient boosting machine (GBM), and a k-nearest neighbor (KNN) approach. Our ensemble model performed very well, with a ten-fold cross-validated R 2 of 0.828. Of the three machine learners, the random forest outperformed the GBM and KNN. Our model was particularly adept at predicting day-to-day changes in PM 2.5 levels with an out-of-sample temporal R 2 of 0.882. However, its ability to predict spatial variability was weaker, with a R 2 of 0.396. We believe this to be due to the smaller spatial variation in pollutant levels in this area.


Introduction
Environmental research has long dealt with issues in exposure assessment, particularly in studies involving air pollutants. Direct individual measurements using personal monitors are costly, difficult to implement, and inconvenient for participants-effectively limiting the number of individuals that may be recruited and the length of time that exposures are measured. Moreover, some monitors have

Materials and Methods
We created a prediction model for daily PM 2.5 in the Greater London area from 1st January 2005 to 31st December 2013. The predictions were made on a 1 km × 1 km scale. The total map consisted of 3960 grid cells. In order to optimize the predictions, we utilized four different machine learning methods: a gradient boosting machine (GBM), a random forest (RF), a deep neural network (NN), and a k-nearest neighbor (KNN) learner. The predictions from each model were put into a final generalized additive model (GAM) with a smoothed spatial term, to obtain the final average daily PM 2.5 levels in each grid cell. We expected that the use of an ensemble approach would result in better predictions than those from a single model [40].

Machine Learning Algorithms
The detailed specifics of how these machine learning algorithms work have been described elsewhere [41][42][43][44]. The GBM and RF are both decision tree machine learning algorithms. The GBM is a boosting decision tree method. This means that weak learners are created and the residuals from these models are used to create stronger learners, with previous models essentially "boosting" subsequent models and improving the predictions [41]. In a random forest, a large number of decision trees are constructed and predictions from these individual trees are averaged to obtain the output [42]. The neural network, on the other hand, takes the input variables, much like a neuron responds to stimuli, processes them through various combinations and weights, and generates predictions [43]. The k-nearest neighbor algorithm relies on the assumption that there is a proximal relationship between values. It calculates a weighted average from the closest designated "k" neighbors, in order to generate predictions [44]. Each machine learner was run separately on the data.
The individual machine learners were ensemble-averaged using a GAM, which included a smoothed function of the predictions from each individual learner, plus a smoothed function of latitude and longitude. The predictions from this GAM are the ensemble-averaged predictions. The smoothing terms allow the weights given to each learner to vary with the pollution level, in case one learner performs better in a specific range of PM 2.5 .

Input Variables
The covariates used in training the models were population density (persons/km 2 ), cloudiness (okta), barometric pressure (mBar/hPa), wind direction ( • N), wind speed (m/s), dew point temperature ( • C), temperature ( • C), aerosol optical depth (AOD), land use type, distance to water (km), distance to Heathrow airport (m), inverse of the height of the planetary boundary layer (m −1 ), normalized difference vegetation index (NDVI), traffic counts, sine of day of the year, cosine of day of the year, day of week, number of days from time of origin (1st January 2005), average daily PM 2.5 across the greater London area (µg/m 3 ), year, light at night, elevation (m), distance to nearest major road (km), length of major road (km) in grid cell, number of bus stops in grid cell, distance to nearest bus stop (km), average building height (m), and number of buildings in the grid cell. We selected these variables, primarily based on expert knowledge and data availability. Meteorological variables are known to affect the dispersion and transport of fine particulate matter. Land-use variables represent potential sources of PM 2.5 and areas of higher concern. The time variables account for the seasonal variation in PM 2.5 levels and the trend over several years. As previously mentioned, AOD is a key predictor of PM 2.5 , with higher levels of AOD indicating higher PM 2.5 levels [18][19][20][21][22]. We also used cross-validated R 2 values in the GBM algorithm to determine whether additional variables would improve model performance in deciding whether to include them or not.

Data Sources
The meteorological variables were obtained from the UK Meteorological Office. Traffic counts were obtained from the Department of Transport in the United Kingdom. The land use type was derived from the Land Cover Map of Great Britain from 2007. The average building height, number of buildings, distance to nearest major road from grid cell centroid, and length of major road in grid cell were calculated using data provided by the Ordnance Survey. Elevation data was obtained from the CGIAR Consortium for Spatial Information, who used Shuttle Radar Topography Mission (SRTM) data from the United States Geological Survey (USGS) and NASA. Data on bus stops was obtained from Transport for London on the London Datastore website.
We used AOD from the Moderate Resolution Imaging Spectroradiometer (MODIS) instrument on the Aqua and Terra satellites, as provided by the MAIAC algorithm at 1 km 2 resolution [45]. There were missing AOD values due to cloud cover and snow reflectance. We imputed these missing values using a random forest approach and land use and meteorological predictors. The random forest approach Remote Sens. 2020, 12, 914 4 of 18 had an internal cross-validation R 2 score of 0.913 with a root mean square error (RMSE) of 0.037 for AOD-terra and an R 2 of 0.914 and RMSE of 0.036 for AOD-aqua, indicating very good predictive ability for both these variables. NDVI was also obtained from MODIS satellite measurements. It was measured every 16 days. Daily values were imputed spatially and temporally.
Mean annual light at night from 2015 was obtained from the Visible Infrared Imaging Radiometer Suite (VIIRS) satellite, at a horizontal resolution of 750 m.
Population density was obtained on a 1 km × 1 km scale from Columbia University's Center for International Earth Science Information Network (CIESIN) [46].

PM 2.5 Data
Predicted PM 2.5 was compared to measured PM 2.5 to train the models and assess the accuracy of the various methods used. Measured  [47] and the UK Automatic Urban and Rural Network (uk-air.defra.gov.uk) [48]. In order to predict PM 2.5 at fixed sites with no PM 2.5 measurements, but which measured PM 10 and NO x , we used two methods: (1) a regression model and (2) a random forest approach. The predictions from the two methods were used as independent variables in a generalized additive model (GAM), in order to improve the fit of the model and therefore provide a greatly enhanced database of PM 2.5 estimated concentrations. These values were then treated as our measured PM 2.5 , which we used for training and cross-validation. All PM measurements were gravimetric equivalent [49]. Overall, measurements from 124 sites were used. The location of these monitors can be seen in Figure 1. Mean annual light at night from 2015 was obtained from the Visible Infrared Imaging Radiometer Suite (VIIRS) satellite, at a horizontal resolution of 750m.
Population density was obtained on a 1 km x 1 km scale from Columbia University's Center for International Earth Science Information Network (CIESIN) [47].

PM2.5 Data
Predicted PM2.5 was compared to measured PM2.5 to train the models and assess the accuracy of the various methods used.  [48] and the UK Automatic Urban and Rural Network (uk-air.defra.gov.uk) [49]. In order to predict PM2.5 at fixed sites with no PM2.5 measurements, but which measured PM10 and NOx, we used two methods: 1) a regression model and 2) a random forest approach. The predictions from the two methods were used as independent variables in a generalized additive model (GAM), in order to improve the fit of the model and therefore provide a greatly enhanced database of PM2.5 estimated concentrations. These values were then treated as our measured PM2.5, which we used for training and cross-validation. All PM measurements were gravimetric equivalent [50]. Overall, measurements from 124 sites were used. The location of these monitors can be seen in Figure 1.

Hyper-parameter Tuning
Although machine learning methods are non-parametric and do not require distributional assumptions, they do require the specification of hyper-parameters, parameters that control the learning process. In order to optimize the hyper-parameters for the algorithms, we used a grid search and looked at the mean square error (MSE) and cross-validated R 2 values. For the gradient boosting machine, we looked at the following hyper-parameters: number of trees, maximum tree depth, column sample rate, and learning rate. For the random forest, we also tuned the number of trees and maximum tree depth. For the neural network we used two hidden layers and tuned the number of neurons, the number of times the data is run through the network, the adaptive learning rate, and two shrinkage parameters. For the k-nearest neighbor, we found the optimal value of "k" using crossvalidation.

Predictions
After hyper-parameter tuning, we estimated average daily PM2.5 for 3960 grid cells, which covered the greater London area, from 2005 to 2013, using each of the four methods. We then used predictions from the four models in a generalized additive model (GAM), in different combinations,

Hyper-Parameter Tuning
Although machine learning methods are non-parametric and do not require distributional assumptions, they do require the specification of hyper-parameters, parameters that control the learning process. In order to optimize the hyper-parameters for the algorithms, we used a grid search and looked at the mean square error (MSE) and cross-validated R 2 values. For the gradient boosting machine, we looked at the following hyper-parameters: number of trees, maximum tree depth, column sample rate, and learning rate. For the random forest, we also tuned the number of trees and maximum tree depth. For the neural network we used two hidden layers and tuned the number of neurons, the number of times the data is run through the network, the adaptive learning rate, and two shrinkage parameters. For the k-nearest neighbor, we found the optimal value of "k" using cross-validation.

Predictions
After hyper-parameter tuning, we estimated average daily PM 2.5 for 3960 grid cells, which covered the greater London area, from 2005 to 2013, using each of the four methods. We then used predictions from the four models in a generalized additive model (GAM), in different combinations, with a spline term for longitude and latitude to allow the predictions to vary spatially. If any of the final predictions were negative, we set the value of PM 2.5 to zero. This represented less than 0.00014% of predictions.
We used ten-fold cross-validation to check the robustness of our model. We divided the monitoring stations into ten groups. Each model was trained on data from ninety percent of the monitors and predicted in the held-out ten percent. This process was repeated ten times to fully recreate the measured dataset from the portion of the data in which training did not occur. We then looked at the correlation of the predicted PM 2.5 with the measured PM 2.5 . In order to look at the model's ability to capture spatial variation, we compared annual average predicted PM 2.5 to the measured annual average PM 2.5 at monitoring sites, as seen in the equation below: where i is the monitoring site and j is the year. In order to look at the temporal accuracy, we looked at the difference between predicted and measured PM 2.5 levels and their annual averages, as seen in the equation below: We chose our final prediction model based on the overall, spatial, and temporal adjusted R 2 values.
To assess the linearity of the relationship between predicted and measured PM 2.5 , we regressed the final predictions against the measurements for both the spatial and temporal component, using a penalized spline, which chooses the degree of nonlinearity based on the restricted maximum likelihood. The spatial component was modeled using the following equation: In this equation, i is the monitoring site and j is the year, and s is a smoothing function. We modeled the temporal component using the equation below: In this equation, i is the monitoring site and j is the day/year of the day, and s is a smoothing function. All data cleaning and processing operations were done in R Statistical Software Version 3.6.1. The machine learning algorithms and predictions, in particular, were run using the "H2O" and "caret" packages [50,51].

Results
The results of the ten-fold cross-validation can be seen in Table 1.
In this equation, i refers to the grid cell and j refers to the day, and s is a smoothing function. The NN showed fairly weak results and was subsequently dropped. The RF performed better than the other machine learning methods, overall and temporally. However, the GBM had the strongest spatial predictability. As such, incorporating both benefited the overall model in terms of its predictive abilities across attributes.
The slope and intercept were obtained from regressing the predicted against the measured in the ten held-out cross-validation samples, which recreated the full measured dataset. This both tests for bias in the estimates and represents a form of regression calibration. The final ensemble model had virtually no additive bias (intercept of 0.058), and little multiplicative bias (slope of 0.979 vs 1.0). a biased slope could induce measurement error bias in epidemiological studies. This model performed well during different seasons. The cross-validated R 2 values were 0.851 for winter, 0.809 for spring, 0.743 for summer, and 0.837 for fall.
The spatial R 2 was not strong for any of the individual or ensemble averaged models. We modeled the spatial residuals from various years ( Figure 2) to see whether a pattern existed which our learners had failed to capture.
had virtually no additive bias (intercept of 0.058), and little multiplicative bias (slope of 0.979 vs 1.0). A biased slope could induce measurement error bias in epidemiological studies. This model performed well during different seasons. The cross-validated R 2 values were 0.851 for winter, 0.809 for spring, 0.743 for summer, and 0.837 for fall.
The spatial R 2 was not strong for any of the individual or ensemble averaged models. We modeled the spatial residuals from various years ( Figure 2) to see whether a pattern existed which our learners had failed to capture. The residuals showed no discernible pattern for us to address. We further tested this idea by calculating Moran's I and seeing whether there was any spatial autocorrelation between the residuals. The residuals showed no discernible pattern for us to address. We further tested this idea by calculating Moran's I and seeing whether there was any spatial autocorrelation between the residuals. The results in Table 2 suggest that the dispersion of the residuals is random and not guided by a spatial trend. We also attempted to model the spatial variability separately from the temporal variability using the GBM machine and found very poor spatial model performance (R 2 = 0.09-0.12), indicating that our ensemble approach would indeed be preferable.
We further modeled the population-weighted standard deviation of the annual predicted PM 2.5 across the Greater London area to see whether annual PM 2.5 changes in different parts of London were the same across the years or not. Figure 3 shows that while there is not an obvious pattern across the years, annual PM 2.5 concentrations were slightly more homogeneous at the end of the study period in 2013 than at the beginning in 2005. This implies that daily changes in PM 2.5 in a year are less dramatic for later years than earlier years, after adjustment for population density.  The final model was used to generate daily predictions for 3960 1-km by 1-km grid-cells which cover the Greater London Area from 1st January 2005 to 31st December 2013. The mean daily measured PM 2.5 at monitoring stations was 16.1 µg/m 3 with a standard deviation of 9.2 µg/m 3 . Across all 3960 grid-cells, the mean daily predicted PM 2.5 level was 14.9 µg/m 3 , with a standard deviation of 9.0 µg/m 3 , indicating that the monitors were located, on average, in more polluted locations. Annually, the measured PM 2.5 was 16.1 µg/m 3 with a standard deviation of 0.6 µg/m 3 , as compared to a predicted level (across all grid-cells) of 14.9 µg/m 3 , with a standard deviation of 0.6 µg/m 3 . The measured and predicted values at monitoring sites closely resembled one another (Table 3). Annual predictions for the Greater London area can be seen in Figure 4. The center of London is consistently the location with highest levels of fine particulate matter. The southern and western regions of the area seem to have the lowest levels of pollution. Over the years, a general decrease can be seen in the levels of PM 2.5 . By 2013, the average concentration in London had fallen to 16.1 µg/m 3 , from a peak of 17.1 µg/m 3 in 2011 and an initial concentration of 16.9 µg/m 3 in 2005. This is most likely attributable to regulatory policies and technological improvements.  Of the variables used as predictors, six of the ten most important were the same for the RF and the GBM (the KNN did not provide measures of variable importance). These six were average daily city-wide PM2.5, height of the planetary boundary layer, average wind speed, wind direction, distance to Heathrow airport, and light at night (Table 4). Of these, the average daily city-wide PM2.5 level was by far the most informative predictor. We also looked at SHAP (shapely additive explanations) values. The SHAP values measure the contribution of each predictor to the prediction for each observation. They can be aggregated to obtain a global value of variable importance, as seen in Figure  5 [53]. Of the variables we considered, the average daily city-wide PM2.5 level was by far the most Of the variables used as predictors, six of the ten most important were the same for the RF and the GBM (the KNN did not provide measures of variable importance). These six were average daily city-wide PM 2.5 , height of the planetary boundary layer, average wind speed, wind direction, distance to Heathrow airport, and light at night (Table 4). Of these, the average daily city-wide PM 2.5 level was by far the most informative predictor. We also looked at SHAP (shapely additive explanations) values. The SHAP values measure the contribution of each predictor to the prediction for each observation. They can be aggregated to obtain a global value of variable importance, as seen in Figure 5 [52]. Of the variables we considered, the average daily city-wide PM 2.5 level was by far the most informative predictor. This would be expected given the correlation values seen in Figure 6 with average daily city-wide particulate levels having a correlation coefficient of 0.9 with measured PM 2.5 . informative predictor. This would be expected given the correlation values seen in Figure 6 with average daily city-wide particulate levels having a correlation coefficient of 0.9 with measured PM2.5.   When modeling the spatial and temporal predicted vs. measured PM2.5 using a smoothing term, we found a virtually perfect linear relationship for both components (Figure 7).

Discussion
Our final model incorporated the GBM, RF, and KNN in a GAM, with a smoothing term for longitude and latitude. This model had a very strong performance in terms of the cross-validated overall R 2 and the cross-validated temporal R 2 . The spatial R 2 was not as robust. However, the linear regression looking at measured PM2.5 versus predicted PM2.5 showed very little bias in the spatial model (i.e., the intercept was approximately zero and the slope was approximately one). The distribution of the spatial residuals did not demonstrate any particular pattern ( Figure 2)-a conclusion supported by Moran's I values (Table 2)-nor did modeling the spatial variability When modeling the spatial and temporal predicted vs. measured PM 2.5 using a smoothing term, we found a virtually perfect linear relationship for both components (Figure 7). When modeling the spatial and temporal predicted vs. measured PM2.5 using a smoothing term, we found a virtually perfect linear relationship for both components (Figure 7).

Discussion
Our final model incorporated the GBM, RF, and KNN in a GAM, with a smoothing term for longitude and latitude. This model had a very strong performance in terms of the cross-validated overall R 2 and the cross-validated temporal R 2 . The spatial R 2 was not as robust. However, the linear regression looking at measured PM2.5 versus predicted PM2.5 showed very little bias in the spatial model (i.e., the intercept was approximately zero and the slope was approximately one). The distribution of the spatial residuals did not demonstrate any particular pattern ( Figure 2)-a conclusion supported by Moran's I values (Table 2)-nor did modeling the spatial variability

Discussion
Our final model incorporated the GBM, RF, and KNN in a GAM, with a smoothing term for longitude and latitude. This model had a very strong performance in terms of the cross-validated overall R 2 and the cross-validated temporal R 2 . The spatial R 2 was not as robust. However, the linear regression looking at measured PM 2.5 versus predicted PM 2.5 showed very little bias in the spatial model (i.e., the intercept was approximately zero and the slope was approximately one). The distribution of the spatial residuals did not demonstrate any particular pattern ( Figure 2)-a conclusion supported by Moran's I values (Table 2)-nor did modeling the spatial variability separately improve the spatial R 2 . We hypothesize that the reason behind the lackluster spatial R 2 was that the Greater London area did not have very much spatial variation on this scale to begin with; that is to say that most of the change in PM 2.5 levels was due to temporal factors. This can be seen by looking at the standard deviation of the measured PM 2.5 . The standard deviation of the daily measured PM 2.5 is 9.2 µg/m 3 , while the standard deviation of the annual averages across the monitoring sites is 0.6 µg/m 3 , showing that most of the variation in the area can be attributed to day-to-day variation in pollution levels rather than spatial differences. If the predictions were done on a finer scale, the R 2 for the spatial variation would likely be higher due to greater variability on a smaller scale.
In order to check for bias across the range of particulate matter levels, we modeled the spatial component and temporal component of the measured PM against the predicted one using a smoothing parameter. This would have allowed us to see any non-linearity that might have existed between the predicted and measured at any point. As can be seen in Figure 7, there is a virtually perfect linear relationship between the predicted and measured PM 2.5 levels at the monitoring sites for both the spatial and temporal component of our predictions. It demonstrates that our model performs well across the range of pollution levels.
Our model joins a number of other PM 2.5 models that have been created for London, though it covers a more comprehensive time frame of nine years. a 2016 study by researchers at King's College London modeled hourly PM 2.5 on a 20 m × 20 m scale in 2011. This model was called the Community Multiscale Air Quality (CMAQ)-urban model and it was generated using a combination of dispersion and chemistry models, and meteorological models with a very fine emissions inventory. It had an R 2 coefficient of 0.59 [26]. a newer hybrid daily model, which incorporated CMAQ-urban, a LUR model, as well as machine learning methods for 2009-2013 at the lower layer output super area (LSOA) level, had a spatial R 2 0.22 and a temporal R 2 of 0.93 for background monitors and a spatial R 2 of 0.29 and a temporal R 2 of 0.95 at roadside monitors [53]. a 2014 study by Singh et al., which modeled hourly PM 2.5 for the year 2008 on an adjustable 10 m to 100 m scale using a dispersion model, had an overall R 2 of 0.64 when comparing the predicted time-series to the measured time-series. Furthermore, this model had a spatial R 2 of 0.27, indicating limited spatial predictability [54]. a 2012 land use regression model created for the European Study of Cohorts of Air Pollution Effects (ESCAPE) using measured pollution data, collected three time over 14 days in three seasons and averaged for an annual estimate, found a spatial leave-one-out cross-validated R 2 of 0.77 in the London/Oxford area [55].
The ensemble model we created compares well to other prediction models for PM 2.5 using machine learning methods. An ensemble model set to predict daily PM 2.5 in China from 2013 to 2016 using clustering and an ensemble approach of a an extreme gradient boosting machine, RF, and GAM, found an overall R 2 of 0.79 [56]. Also in China, a geographically-weighted GBM daily prediction approach across the country for the year 2014 had a cross-validated R 2 of 0.76 [57]. Another study in China modeling daily PM 2.5 from 2005 to 2016 and utilizing a random forest approach found a cross-validated R 2 of 0.83. This machine learning approach outperformed two other non-linear distributed lag models [58]. a study predicting monthly PM 2.5 in British Columbia based on remote sensing data, and in particular AOD, using eight different approaches found that the random forest was the strongest machine learner as compared to other algorithms, such as extreme gradient boosting and Bayesian regularized neural networks. All machine learners in this study outperformed the multiple linear regression [59].
There are other studies which did not find the same patterns as we did. a study looking specifically at modeling hourly PM 2.5 in Beijing and Shanghai using multiple machine learning methods found a variant of the artificial neural network to have the highest R 2 , with a value of 0.92, which outperformed the random forest with a value of 0.88 [60], while our study found the neural network to be one of the weakest learners and the random forest to be the strongest. a study predicting average daily PM 2.5 in the contiguous United States on a 1 km 2 scale using an ensemble approach and machine learning methods also showed the neural network to be the strongest learner [31]. This might be due to differences in the input variables, as well as the structure of the algorithm and how it processes the input parameters from different regions. The substantial difference in spatial scale (US vs. London) may also favor different algorithms.
Our study had several limitations. Firstly, we had a limited number of observations and fixed PM 2.5 station monitors. We expanded this number using a model to predict PM 2.5 from PM 10 measurements and correcting tapered element oscillating microbalance (TEOM) monitor measurements to gravimetric equivalents for consistency. We then used this dataset to calibrate our predictions. These monitors were located in a range of site types: curbside, roadside and urban background. As our spatial predictions were at 1-km 2 resolution, validation against monitors with strong local sources would produce an under-prediction. Secondly, as with any machine learning method, our predictions may be subject to overfitting. We accounted for this using out-of-bag ten-fold cross-validation at monitoring sites. Thirdly, we were limited in the number of input variables by data availability. It is possible that the inclusion of an overlooked variable may have improved the spatial and overall predictability. Finally, as with all analyses, our results rely on the quality of our predictors. This is particularly true of measurements made using remote sensing technology such as AOD. Many PM 2.5 prediction models rely on AOD as the main predictor of PM 2.5 concentrations. Yet, there is error associated with satellite measurements of AOD as well [61,62]. However, given that we are predicting a continuous outcome, this measurement error should not cause bias and it is accounted for in the residuals of the prediction model.
Despite these limitations, our prediction model also possessed several advantages. It relied on the power of three different machine learning algorithms to generate predictions. As such, it is likely that one method may have captured some of the variation in daily PM 2.5 that the other two had missed. Moreover, the overall out-of-bag ten-fold cross-validated R 2 of 0.828 suggests a strong ability to predict fine particulate matter and the temporal R 2 suggests that we are capturing the day-to-day trends very well. Furthermore, even though the spatial R 2 is not very strong, the model comparing predicted and measured PM 2.5 had an intercept close to zero and an intercept close to one. This suggests minimal bias in capturing the spatial variation. Finally, our model predicted particulate matter in 1-km 2 grid cells which allows for exposure assignment on a fine scale in other studies that may utilize the model.

Conclusions
Our PM 2.5 model, which incorporated several different machine learning algorithms, has shown to be a robust and accurate measure of pollution levels in the Greater London area. It had very strong performance metrics, overall and temporally. The spatial R 2 was fairly weak, however, the model showed very little bias. Future models of Greater London may need to be done on a smaller spatial scale or with additional predictor values in order to capture greater variability. These exposure measurement models can be used study the effect of PM 2.5 on health outcomes in epidemiological studies, to identify locations with peak concentrations, and to conduct risk assessments.