Next Article in Journal
Assessment of Workflow Feature Selection on Forest LAI Prediction with Sentinel-2A MSI, Landsat 7 ETM+ and Landsat 8 OLI
Next Article in Special Issue
Satellite-Derived PM2.5 Composition and Its Differential Effect on Children’s Lung Function
Previous Article in Journal
Mapping and Monitoring Small-Scale Mining Activities in Ghana using Sentinel-1 Time Series (2015–2019)
Previous Article in Special Issue
Spatiotemporal Characteristics of the Association between AOD and PM over the California Central Valley
Open AccessArticle

Predicting Fine Particulate Matter (PM2.5) in the Greater London Area: An Ensemble Approach using Machine Learning Methods

1
Department of Environmental Health, Harvard TH Chan School of Public Health, Boston, MA 02115, USA
2
Foundation Medicine, Cambridge, MA 02141, UK
3
Department of Hygiene, Epidemiology and Medical Statistics, Medical School, National and Kapodistrian University of Athens, 115 27 Athens, Greece
4
MRC Centre for Environment and Health, King’s College London, London SE1 9NH, UK
5
Department of Global Environmental Health, Imperial College, London SW7 2AZ, UK
6
Environmental Epidemiology Group, Section of Environmental Health, Department of Public Health, Faculty of Health and Medical Sciences, University of Copenhagen, 2200 Copenhagen, Denmark
7
Climate and Radiation Laboratory, Goddard Space Flight Center, NASA, Greenbelt, MD 20771, USA
8
Department of Epidemiology, Harvard TH Chan School of Public Health, Boston, MA 02115, USA
*
Author to whom correspondence should be addressed.
Remote Sens. 2020, 12(6), 914; https://doi.org/10.3390/rs12060914
Received: 15 January 2020 / Revised: 9 March 2020 / Accepted: 10 March 2020 / Published: 12 March 2020
Estimating air pollution exposure has long been a challenge for environmental health researchers. Technological advances and novel machine learning methods have allowed us to increase the geographic range and accuracy of exposure models, making them a valuable tool in conducting health studies and identifying hotspots of pollution. Here, we have created a prediction model for daily PM2.5 levels in the Greater London area from 1st January 2005 to 31st December 2013 using an ensemble machine learning approach incorporating satellite aerosol optical depth (AOD), land use, and meteorological data. The predictions were made on a 1 km × 1 km scale over 3960 grid cells. The ensemble included predictions from three different machine learners: a random forest (RF), a gradient boosting machine (GBM), and a k-nearest neighbor (KNN) approach. Our ensemble model performed very well, with a ten-fold cross-validated R2 of 0.828. Of the three machine learners, the random forest outperformed the GBM and KNN. Our model was particularly adept at predicting day-to-day changes in PM2.5 levels with an out-of-sample temporal R2 of 0.882. However, its ability to predict spatial variability was weaker, with a R2 of 0.396. We believe this to be due to the smaller spatial variation in pollutant levels in this area. View Full-Text
Keywords: air pollution; particulate matter; machine learning; exposure modeling air pollution; particulate matter; machine learning; exposure modeling
Show Figures

Graphical abstract

MDPI and ACS Style

Danesh Yazdi, M.; Kuang, Z.; Dimakopoulou, K.; Barratt, B.; Suel, E.; Amini, H.; Lyapustin, A.; Katsouyanni, K.; Schwartz, J. Predicting Fine Particulate Matter (PM2.5) in the Greater London Area: An Ensemble Approach using Machine Learning Methods. Remote Sens. 2020, 12, 914.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Search more from Scilit
 
Search
Back to TopTop