Next Article in Journal
Transport Planning beyond Infrastructural Change: An Empirical Analysis of Transport Planning Practices in the Rhine-Main Region in Germany
Next Article in Special Issue
Study of Ecosystem Degradation Dynamics in the Peruvian Highlands: Landsat Time-Series Trend Analysis (1985–2022) with ARVI for Different Vegetation Cover Types
Previous Article in Journal
The Effects of Board Capital on Green Innovation to Improve Green Total Factor Productivity
Previous Article in Special Issue
A Study on the Impact of Roads on Grassland Degradation in Shangri-La City
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Estimation of Daily PM2.5 Concentration in Thailand Using Satellite Data at 1-Kilometer Resolution

1
School of Information, Computer and Communication Technology, Sirindhorn International Institute of Technology, Thammasat University, Pathum Thani 12120, Thailand
2
School of Knowledge Science, Japan Advanced Institute of Science and Technology, Nomi 923-1211, Japan
3
National Electronics and Computer Technology Center, National Science and Technology Development Agency, Pathum Thani 12120, Thailand
*
Authors to whom correspondence should be addressed.
Sustainability 2023, 15(13), 10024; https://doi.org/10.3390/su151310024
Submission received: 23 May 2023 / Revised: 19 June 2023 / Accepted: 21 June 2023 / Published: 25 June 2023
(This article belongs to the Special Issue Application of Remote Sensing for Sustainable Development)

Abstract

:
This study addresses the limited coverage of regulatory monitoring for particulate matter 2.5 microns or less in diameter (PM2.5) in Thailand due to the lack of ground station data by developing a model to estimate daily PM2.5 concentrations in small regions of Thailand using satellite data at a 1-km resolution. The study employs multiple linear regression and three machine learning models and finds that the random forest model performs the best for PM2.5 estimation over the period of 2011–2020. The model incorporates several factors such as Aerosol Optical Depth (AOD), Land Surface Temperature (LST), Normalized Difference Vegetation Index (NDVI), Elevation (EV), Week of the year (WOY), and year and applies them to the entire region of Thailand without relying on monitoring station data. Model performance is evaluated using the coefficient of determination (R2) and root mean square error (RMSE), and the results indicate high accuracy for training (R2: 0.95, RMSE: 5.58 μg/m3), validation (R2: 0.78, RMSE: 11.18 μg/m3), and testing (R2: 0.71, RMSE: 8.79 μg/m3) data. These PM2.5 data can be used to analyze the short- and long-term effects of PM2.5 on population health and inform government policy decisions and effective mitigation strategies.

1. Introduction

According to the World Health Organization (WHO), ambient air pollution causes approximately 6.7 million premature deaths globally, with particulate matter, ozone, nitrogen dioxide, sulfur dioxide, and other contaminants being some of the leading pollutants [1]. The most dangerous among them is particulate matter with an aerodynamic diameter of less than 2.5 µm (PM2.5). These particles can easily enter the lungs and become trapped in the lung’s parenchyma, leading to inflammation and oxidative stress [2]. This can cause severe cardiovascular and respiratory diseases and even lung cancer. PM2.5 plays a critical role in air pollution, and environmental health and its impact on human health are of great concern.
PM2.5 has been associated with increased mortality and morbidity in several studies [3,4,5]. However, the coverage of ground-level PM2.5 monitoring sites is limited, which makes it challenging to capture the spatial variability of PM2.5 for exposure and epidemiological research. Researchers have increasingly used satellite-derived atmospheric aerosol optical depth (AOD) to address this challenge as a proxy for ground-level PM2.5 [6,7,8,9,10]. AOD measures the aerosol in the atmosphere and can serve as a proxy for surface PM2.5 [11]. Additionally, other factor variables, including meteorological factors, land use and cover, and time variables, are often included to improve the accuracy of the modeling [12]. These variables can explain seasonal variations and long-term trends in PM2.5 levels and indicate potential PM2.5 sources and areas of concern [13]. Conversely, the importance of these factors varies among studies, and some analyses have found that satellite-derived AODs do not improve model performance [14]. However, the study in the Pearl River Delta (PRD) region southern coast of China demonstrates the usefulness of AOD-derived spatiotemporal concentrations in health calculations [15]. Therefore, the association between satellite data and PM2.5 in different locations must be considered.
Previous studies on the estimation of PM2.5 using satellite data have employed a variety of models, but most have chosen only one [16]. The six studies were done to compare model performance comprehensively with the Random Forest (RF) model showing a high coefficient of determination (R2) in four studies, and the eXtreme Gradient Boosting (XGBoost) model showing a high R2 in two studies [13,14,16,17,18,19]. However, it should be noted that the RF model performed similarly to the XGBoost model. Among the other Machine Learning (ML) models, Multiple Linear Regression (MLR) had the lowest accuracy. Despite this, MLR is still widely used for its simplicity and practicality. Estimating PM2.5 concentrations is challenging due to the numerous variables that can affect it. ML has become popular for solving complex problems because it can find and use multiple independent factors that impact the predicted variable [20].
Earlier research on estimating PM2.5 levels in Thailand using satellite data has been limited due to a scarcity of data from both ground stations and satellites. Two previous studies conducted in Thailand’s Chiangmai and central regions estimated PM2.5 using MLR models with AOD (10 kilometers (km)), resulting in R2 values of 0.77 and 0.49 when considering monitoring station meteorological parameters and 0.22 and 0.11 when not considering them [21,22]. However, these meteorological parameters do not cover small areas such as 1 km, 3 km, and 10 km, limiting the accuracy of PM2.5 estimation. A review article on predicting ground PM2.5 concentration using satellite AOD found that MLR had the lowest R2 accuracy compared to other models [16]. The low R2 values suggest further examination into including covariates such as meteorological factors, land use, cover, and season variables in MLR models [23].
In this study, we aim to develop a method for estimating PM2.5 concentrations throughout Thailand using satellite data with a 1 km pixel resolution. Our approach seeks to overcome the limitation of ground-level PM2.5 monitoring by not relying on monitoring station factor variables. Instead, we begin with AOD as a base factor and then add other variables to improve accuracy in estimating PM2.5 levels in Thailand. Specifically, we have selected Land Surface Temperature (LST), Normalized Difference Vegetation Index (NDVI), and Elevation (EV) data to represent land use and cover, as well as year and week of the year (WOY) as time factors. All factor variables are applied at a 1 km pixel resolution throughout Thailand without the need for monitoring station data, which can be costly and not cover all areas of the country. We will use MLR as the standard regression model and other ML models such as RF, XGBoost, and Support Vector Machines (SVM) to compare their performance. The final model with the highest accuracy will be selected to estimate PM2.5 levels in Thailand.
Our study will serve as a reference for future satellite-based PM2.5 estimation studies and will aid in exposure assessment in health studies of the Thai population. Using satellite data to estimate PM2.5 concentrations at a high spatial resolution, our study can provide a more comprehensive understanding of the distribution of PM2.5 in Thailand, which can help inform policy and public health efforts to reduce exposure to harmful air pollutants.

2. Materials and Methods

2.1. PM2.5 Data and Area of Study

Thailand is a Southeast Asian country that borders the Andaman Sea and the Gulf of Thailand, with an approximate population of 70 million people and an area of 513,120 square kilometers. The Pollution Control Department (PCD) is a legally recognized government agency in Thailand that collects data on air pollution parameters from meteorological stations throughout the country. Bangkok’s Air Quality and Noise Management Division (BAQ) also operates ground stations for monitoring PM2.5 in Bangkok. The PCD and BAQ measure PM2.5 data using the same standard, the beta-ray attenuation method, which follows the United States Environmental Protection Agency (USEPA) reference method. Figure 1 presents PM2.5 data and the number of stations from PCD and stations for BAQ from 2011 to 2020.

2.2. Satellite Data

This study employed remote sensing data obtained from the MODIS satellite products, specifically AOD, LST, NDVI, and EV, which were all retrieved from the National Aeronautics and Space Administration (NASA) Earth Observing System Data and In-formation System (EOSDIS) offered by the Distributed Active Archive Center (DAAC). AOD data were processed from the MCD19A2 product of both Terra and Aqua satellites, which included “Aerosol Optical Depth at 045 Microns” [24]. The daily AOD data had a spatial resolution of 1 km per pixel and was collected at 10:30 a.m. and 1:30 p.m. local standard time. LST data was collected from Terra’s MOD11A1 product and Aqua’s MYD11A1 product [25], and their measurements were combined with increasing the sample size. Daily average LST values were calculated by taking the arithmetic mean of the two satellite measurements or using only one satellite’s data. The study utilized NDVI data from MOD13A1, with a temporal resolution of 16 days and a spatial resolution of 500 m, which was beneficial in monitoring vegetation conditions, depicting land cover changes, and providing insights for modeling global biogeochemical and hydrologic processes and regional climates [26]. Additionally, EV data from “Land Digital Elevation Model (MODDEM1KM)—Land/sea mask and digital elevation model” with a spatial resolution of 1 km was used.

2.3. Data Analysis

For this study, we found that satellite data and PM2.5 readings were consistent when the sky was clear. At 1 km resolution, AOD and LST showed more than 50% missing values. However, the average over a 5 km radius only accounts for less than 50% of the missing number. To match the daily PM2.5 concentrations for each station from 2011 to 2020, we selected the average satellite data within a 5 km radius. We established a link between PM2.5 outcomes and factors such as AOD, LST, NDVI, EV, WOY, and year by using daily average PM2.5 data. Four models were developed to predict daily PM2.5: MLR, RF, XGBoost, and SVM. We evaluated the model’s accuracy using R2 and root mean square errors (RMSE). A higher R2 and lower RMSE indicate better-estimating performance.
When extending this model estimation to other geographical areas, including regions, provinces, districts, and sub-districts, we can utilize the average satellite data within the boundaries of each specific area. Furthermore, data imputation techniques, such as nearest date and pixel, can be employed. The data handling and analysis procedures were implemented using the R programming language.

2.3.1. Multiple Linear Regression (MLR)

The MLR statistical model is a commonly used method for identifying the relationship between a continuous response variable and one or more predictor variables, which can be continuous or categorical. MLR is a parametric model that assumes a normal distribution, constant variance, and a linear relationship between the response and predictor variables. This study uses a log-linear regression model because the PM2.5 data has a skewed distribution, and the MLR model can be represented as:
log(PM2.5) = β0 + β1AOD + β2LST + β3NDVI + β4EV + β5WOY + β6Year
where β0 is the intercept, β(1–6) is the coefficient of determinant.

2.3.2. Random Forest (RF)

RF is a method for creating an ensemble of decision trees. The RF algorithm builds each tree using a bootstrap sample of the data, and each tree node is split based on the best of a subset of randomly selected predictors [27]. The predictions of each tree are then combined to produce an ensemble prediction of the target variable. The model also calculates the “importance” of each predictor by measuring how much prediction error increases when the data for that variable is permuted. In contrast, the data for the other variables remain unchanged [28]. This study uses the R package “randomForest” [29].

2.3.3. eXtreme Gradient Boosting (XGBoost)

XGBoost is a gradient-boosting technique that improves performance and speed using a tree-based ensemble ML algorithm [30]. Gradient boosting is a method where the loss function is minimized by sequentially adding weak learners through gradient descent optimization. The gradient boosting approach has three key components: a loss function, a weak learner, and an additive model. The loss function measures how well the model predicts the data. Even though a weak learner may not classify things accurately, it is still better than guessing randomly. The additive model is a method of adding decision trees one at a time and iteratively. This study uses the R package “xgboost” [31].

2.3.4. Support Vector Machines (SVM)

SVM is a supervised learning model for regression concerns in ML [32]. SVM builds a set of hyperplanes in a high-dimensional space using a nonlinear transformation based on the following function [33].
f(x) = wx + b
where x is the input predictors’ vector (6 variables), w is the weight vector of x, and b is the error, which defines the hyperplane’s distance from the original. SVM is based on decreasing the gap between the expected and actual output values. It reduces prediction errors. This study uses the R package “e1071” [34].

2.3.5. Model Assessment

The rows of the PCD dataset were randomly shuffled and divided into a training dataset (80%) and a validation dataset (20%) to ensure that model performance comparisons could be made. A consistent random state was used for this purpose. Table 1 presents the structure of the PCD and BAQ data. The distribution of the training and validation datasets were similar; however, the testing dataset was different as it only included BAQ data collected in Bangkok provinces.
After training the model, the model’s performance was evaluated by indicators such as R2 and RMSE, shown in the following formulas:
R 2 = 1 ( y i y ^ i ) 2 ( y i y ¯ ) 2
R M S E = ( y i y ^ i ) 2 n
where y i is the observations of PM2.5, y ^ i is the predicted value, y ¯ is the mean of the observations of PM2.5, and n is the total sample count.

3. Results

3.1. Data Descriptive Statistics

Figure 2 presents a scatterplot matrix of the variables, with the first row and column displaying positive skew histograms of the PM2.5 distribution. Each scatterplot matrix includes the correlation coefficient (R) values, with the top row showing the relationship between each predictor variable and PM2.5. The first column displays the R values for all determinants with PM2.5. Positive R correlations between PM2.5 and AOD, LST, and EV indicate that these variables increase along with PM2.5 (R = 0.51, 0.20, and 0.13, respectively), while negative R correlations between WOY (R = −0.27), NDVI (R = −0.19), and year (R = −0.05) and PM2.5 suggest that as these variables increase, PM2.5 will decrease. AOD has the highest positive association, and lower PM2.5 levels are observed during WOY 20-40 in Thailand’s rainy season, indicating a negative correlation. Dry seasons with increased LST show higher PM2.5 levels, while higher NDVI levels decrease PM2.5. Finally, EV and Year have lower correlation values with PM2.5.

3.2. Modeling Results

Table 2 presents the estimated performance of each model for the three datasets. The results indicate that the RF model, which includes AOD, LST, NDVI, EV, WOY, and year, is the most effective in predicting PM2.5 across all datasets. The R2 values for the training, validation, and testing datasets were 0.95, 0.78, and 0.71, respectively, with RMSE values of 5.58 μg/m3, 11.18 μg/m3, and 8.79 μg/m3, respectively. In terms of model performance, XGBoost and SVM were similar. However, the MLR model had the worst performance.
Although the final RF model has a higher R2 accuracy in the validation dataset than the testing dataset, the testing dataset has a lower RMSE than the validation dataset. This means the RF model can estimate PM2.5 in the validation dataset more accurately than in the testing dataset. However, the difference between the actual and estimated PM2.5 in the testing dataset is closer than in the validation dataset due to the lower RMSE. This discrepancy could be attributed to the fact that the testing dataset only covers Bangkok provinces and thus has more data from these areas. In contrast, the validation dataset covers all areas of Thailand.
RF approaches were used to estimate daily PM2.5 concentrations in Thailand, and it was found that the model that included AOD, LST, NDVI, EV, WOY, and year had the best performance. The RF results also show two alternative measurements of each predictor variable’s relative contribution in Figure 3. The %IncMSE is a percentage increase in mean square error, equivalent to accuracy-based importance. The IncNodePurity, calculated similarly to Gini-based importance, is based on reducing the sum of squared errors whenever a variable is split. Without WOY, AOD, EV, year, LST, and NDVI as predictors, the %IncMSE was 72.4%, 59.3%, 50.7%, 43.2%, 32.4%, and 31.5%, respectively. The important variables for IncNodePurity were WOY, AOD, EV, NDVI, LST, and year, respectively. These two measurements were calculated using different methods due to their strong association with ground-level PM2.5. Additionally, all the factors were needed to estimate PM2.5 levels in Thailand, where WOY, AOD, and EV were the three most essential variables in the two measurements.

3.3. Estimation of Daily PM2.5

Figure 4 presents the PM2.5 time series plot and estimation for the training, validation, and testing data. The three plots exhibit a consistent pattern in the observed and estimated PM2.5 concentrations, with the highest concentrations observed during weeks 45 to 53 (November to December) and 1 to 10 (January to March). The difference between the measured and estimated PM2.5 concentrations in the testing dataset was slight in 2015 and 2016 but remained consistent in 2017 and 2020.
Figure 5 presents the estimation of PM2.5 concentrations from 2011 to 2020 at a 1 km resolution using the RF model. The values of PM2.5 at stations and the estimated PM2.5 are comparable. The average percentages of correct estimation PM2.5 are between 68.9–75.2 with higher accuracy when PM2.5 is less than 15 μg/m3 and higher than 50 μg/m3. Northern Thailand exhibited the highest PM2.5 concentrations, while Southern Thailand showed the lowest levels. Except for the southern part of Thailand, most of the region’s PM2.5 levels exceeded the WHO 24-h standard of 15 μg/m3 but remained below Thailand’s national standard limit of 50 μg/m3 overall.

4. Discussion

We proposed using satellite data with a 1 km resolution to predict daily PM2.5 concentrations in Thailand and identified the best model to achieve this. The results of this model estimation can be utilized as standards for simulating PM2.5 in other areas with a similar mix of pollution sources and a need for more monitoring to understand the particle’s spatiotemporal distribution. Investigating the spatiotemporal variations of PM2.5 at small scales was made possible by estimating PM2.5 in 1 km grid cells. These PM2.5 values are intended to aid epidemiological research and assist individuals in making informed decisions about air pollution.
In our trials, RF outperformed MLR, XGBoost, and SVM models. Our findings align with previous PM2.5 estimating studies from other countries, with an R2 of 0.95 (RMSE of 5.58 μg/m3) for training data, 0.78 (RMSE of 11.18 μg/m3) for validation data, and 0.71 (RMSE of 8.79 μg/m3) for testing data. For example, the predicted PM2.5 in Greater London using RF, Gradient Boosting Machine (GBM), and K-Nearest Neighbor (KNN), with RF providing the best estimation with an R2 of 0.83 and RMSE of 4.28 μg/m3 [35]. In another study, using remote sensing data and AOD, eight approaches were used to anticipate monthly PM2.5 in British Columbia, and RF was found to be the most reliable ML method, with an R2 of 0.49 (RMSE of 2.67 μg/m3) [18]. The predicted daily PM2.5 at a 1 km grid for 2013–2015 in Italy using RF with an R2 of 0.80 (RMSE = 7.05 μg/m3) [36]. The computed 1 km-resolution PM2.5 concentrations in China using RF, with an R2 of 0.98 (RMSE = 6.40 μg/m3) for model fitting and an R2 of 0.81 (RMSE = 17.91 μg/m3) for model validation [20]. Another Chinese study used RF to predict daily PM2.5 from 2005 to 2016, with an R2 of 0.77 (RMSE of 22 μg/m3) [17]. These studies demonstrate that estimating PM2.5 from satellite data using the RF model with an R2 of 0.49–0.83 (RMSE = 2.67–22 μg/m3) in the validation data is acceptable. On the other hand, the MLR model performed poorly in this study. This may be due to the positively skewed and non-normally distributed nature of PM2.5 data, which may not be well suited for MLR models [37,38,39].
The study found that the RF model, utilizing AOD, LST, NDVI, EV, WOY, and year as predictors, produced the best results for estimating daily PM2.5 concentrations in Thailand. The strength of the RF model lies in its ability to avoid overfitting data by utilizing the strength of individual trees in the forest and their correlation. However, the results of our study differ from those of other studies, where other models, such as XGBoost, have been found to outperform RF [17]. This may be due to how these decision tree-based models take in and process training data. Our findings suggest that decision tree-based models are recommended for estimating PM2.5 using satellite data.
The results indicate that WOY, AOD, and EV are significant factors in determining PM2.5 concentrations, as shown by the two measurements of the RF model. This is consistent with previous studies, which found AOD and EV to contribute to PM2.5 modeling significantly [18]. Daily PM2.5 concentrations often exhibit a favorable skewed distribution similar to AOD. Similar to the research conducted in China, the bivariate correlation analysis revealed that independent variables such as AOD strongly associate with PM2.5 [20]. Our results also show that the estimated PM2.5 concentrations align well with the observed values at monitoring stations, with similar patterns in the time-series plots for observed and estimated PM2.5. However, there was some discrepancy between observed and estimated PM2.5 concentrations in 2015–2016. This may be due to the less varied geographical distribution of pollutants in the PM2.5 sample taken before 2017, as suggested by research from the United Kingdom [35].
The PM2.5 assessment indicates that northern Thailand experiences higher levels of PM2.5 than other regions, particularly during the dry seasons of WOY 1–10 (January–March) and WOY 45–53 (November–December). This is attributed to extensive agricultural fields and open-air biomass burning in northern Thailand and neighboring countries [22]. These activities contribute to the elevated PM2.5 levels and also have a significant impact on climate change. Except for the southern region, most areas in Thailand surpass the WHOs 24-h standard of 15 μg/m3 for PM2.5 levels, although they remain within the national limit of 50 μg/m3. The high PM2.5 levels can negatively impact population health, including respiratory and cardiovascular diseases. Our model’s PM2.5 data can be used to identify links between PM2.5 levels and specific geographic areas, such as provinces, districts, and sub-districts.
Although satellite data can provide higher coverage than ground monitoring stations for PM2.5 data, it often has lower temporal coverage due to lousy observation conditions such as clouds and fog. We used average satellite data within a 5 km radius of the stations to decrease missing values. In our analysis, we used 42,009 (or 33.6%) data points out of 124,846 valid data points. According to evaluate MODIS collection 6 AOD retrievals against ground sunphotometer observations over East Asia cloud cover or high surface reflectance can cause an average of 40% to 70% of satellite retrievals to go unrecovered [40]. Furthermore, Thailand’s overcast or foggy weather can invalidate the satellite retrieval technique by reducing the sampling frequency of accessible satellite data. This issue has also been identified in a study conducted in China [8]. As a result, new monitoring methods with wider spatial coverage and fewer weather limitations should be developed. These strengths can be used as benchmarks when estimating ground-level PM2.5 or other air pollution metrics in Thailand or other countries using remote sensing.

5. Conclusions

This study proposed an efficient method for estimating daily PM2.5 concentrations in Thailand using satellite data with a pixel resolution of 1 km. The RF model was the most effective compared to MLR, XGBoost, and SVM models. The use of AOD, LST, NDVI, EV, WOY, and year as predictor variables improved the model’s performance, resulting in R2 values of 0.95 (RMSE of 5.58 μg/m3) for the training dataset, 0.78 (RMSE of 11.18 μg/m3) for the validation dataset, and 0.71 (RMSE of 8.79 μg/m3) for the testing dataset. The results from 2011 to 2020 were consistent with PM2.5 values obtained from monitoring stations. Using satellite data in this study allowed for examining air quality at various regional and temporal scales. The developed models and projections can aid regulatory operations and future epidemiological research in Thailand.

Author Contributions

S.B., Conceptualization, Formal analysis, Writing—original draft. S.U., Supervision, Writing—review & editing. H.G., Writing—review & editing. J.K., Writing—review & editing. All authors have read and agreed to the published version of the manuscript.

Funding

This study was encouraged by the Sirindhorn International Institute of Technology (SIIT), Thammasat University Research Fund and Japan Advanced Institute of Science and Technology (JAIST), and the research fund of Thailand’s National Electronics and Computer Technology Centre (NECTEC).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

PM2.5 data from PCD (http://air4thai.pcd.go.th/webV2/history/, accessed on 18 May 2023) and BAQ (https://bangkokairquality.com/bma/report?lang=en, accessed on 18 May 2023). The satellite data can be assessed at (https://ladsweb.modaps.eosdis.nasa.gov/search/, accessed on 18 May 2023).

Acknowledgments

The Pollution Control Department and Bangkok’s Air Quality and Noise Management Division provided the PM2.5 data, which the authors are thankful for. We appreciate Professor Don McNeil’s wise counsel. We are also grateful to SIIT, Thammasat University and JAIST for thesis support for our research.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. WHO. Ambient (Outdoor) Air Pollution. Available online: https://www.who.int/news-room/fact-sheets/detail/ambient-(outdoor)-air-quality-and-health (accessed on 26 May 2023).
  2. Dockery, D.W. Health Effects of Particulate Air Pollution. Ann. Epidemiol. 2009, 19, 257–263. [Google Scholar] [CrossRef] [Green Version]
  3. Bae, S.; Kwon, H.J. Current state of research on the risk of morbidity and mortality associated with air pollution in korea. Yonsei Med. J. 2019, 60, 243–256. [Google Scholar] [CrossRef] [PubMed]
  4. Chung, Y.; Dominici, F.; Wang, Y.; Coull, B.A.; Bell, M.L. Associations between long-term exposure to chemical constituents of fine particulate matter (PM2.5) and mortality in Medicare enrollees in the eastern United States. Environ. Health Perspect. 2015, 123, 467–474. [Google Scholar] [CrossRef] [Green Version]
  5. Lu, F.; Xu, D.; Cheng, Y.; Dong, S.; Guo, C.; Jiang, X.; Zheng, X. Systematic review and meta-analysis of the adverse health effects of ambient PM2.5 and PM10 pollution in the Chinese population. Environ. Res. 2015, 136, 196–204. [Google Scholar] [CrossRef] [PubMed]
  6. Carmona, J.M.; Gupta, P.; Lozano-García, D.F.; Vanoye, A.Y.; Hernández-Paniagua, I.Y.; Mendoza, A. Evaluation of modis aerosol optical depth and surface data using an ensemble modeling approach to assess pm2.5 temporal and spatial distributions. Remote Sens. 2021, 13, 3102. [Google Scholar] [CrossRef]
  7. Maheshwarkar, P.; Sunder Raman, R. Population exposure across central India to PM2.5 derived using remotely sensed products in a three-stage statistical model. Sci. Rep. 2021, 11, 544. [Google Scholar] [CrossRef]
  8. Xu, X.; Zhang, C. Estimation of ground-level PM2.5concentration using MODIS AOD and corrected regression model over Beijing, China. PLoS ONE 2020, 15, e0240430. [Google Scholar] [CrossRef]
  9. Yang, Q.; Yuan, Q.; Yue, L.; Li, T.; Shen, H.; Zhang, L. The relationships between PM2.5 and aerosol optical depth (AOD) in mainland China: About and behind the spatio-temporal variations. Environ. Pollut. 2019, 248, 526–535. [Google Scholar] [CrossRef] [PubMed]
  10. Zeydan, Ö.; Wang, Y. Using MODIS derived aerosol optical depth to estimate ground-level PM2.5 concentrations over Turkey. Atmos. Pollut. Res. 2019, 10, 1565–1576. [Google Scholar] [CrossRef]
  11. Pavolonis, M.; Sieglaff, J. GOES-R Advanced Baseline Imager (ABI) Algorithm Theoretical Basis Document for Volcanic Ash (Detection and Height); University of Wisconsin: Madison, WI, USA, 2010. [Google Scholar]
  12. Unik, M.; Sitanggang, I.S.; Syaufina, L.; Jaya, I.N.S. PM2.5 Estimation using Machine Learning Models and Satellite Data: A Literature Review. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 538. [Google Scholar] [CrossRef]
  13. Zhang, X.; Chu, Y.; Wang, Y.; Zhang, K. Predicting daily PM2.5 concentrations in Texas using high-resolution satellite aerosol optical depth. Sci. Total Environ. 2018, 631–632, 904–911. [Google Scholar] [CrossRef]
  14. Joharestani, M.Z.; Cao, C.; Ni, X.; Bashir, B.; Talebiesfandarani, S. PM2.5 prediction based on random forest, XGBoost, and deep learning using multisource remote sensing data. Atmosphere 2019, 10, 373. [Google Scholar] [CrossRef] [Green Version]
  15. Lin, C.; Li, Y.; Lau, A.K.H.; Deng, X.; Tse, T.K.T.; Fung, J.C.H.; Li, C.; Li, Z.; Lu, X.; Zhang, X.; et al. Estimation of long-term population exposure to PM2.5 for dense urban areas using 1-km MODIS data. Remote Sens. Environ. 2016, 179, 13–22. [Google Scholar] [CrossRef] [Green Version]
  16. Chu, Y.; Liu, Y.; Li, X.; Liu, Z.; Lu, H.; Lu, Y.; Mao, Z.; Chen, X.; Li, N.; Ren, M.; et al. A review on predicting ground PM2.5 concentration using satellite aerosol optical depth. Atmosphere 2016, 7, 129. [Google Scholar] [CrossRef] [Green Version]
  17. Xiao, Q.; Chang, H.H.; Geng, G.; Liu, Y. An Ensemble Machine-Learning Model to Predict Historical PM2.5 Concentrations in China from Satellite Data. Environ. Sci. Technol. 2018, 52, 13260–13269. [Google Scholar] [CrossRef]
  18. Xu, Y.; Ho, H.C.; Wong, M.S.; Deng, C.; Shi, Y.; Chan, T.C.; Knudby, A. Evaluation of machine learning techniques with multiple remote sensing datasets in estimating monthly concentrations of ground-level PM2.5. Environ. Pollut. 2018, 242, 29. [Google Scholar] [CrossRef]
  19. Jin, X.; Ding, J.; Ge, X.; Liu, J.; Xie, B.; Zhao, S.; Zhao, Q. Machine learning driven by environmental covariates to estimate high-resolution PM2.5 in data-poor regions. PeerJ 2022, 10, e13203. [Google Scholar] [CrossRef]
  20. Wei, J.; Huang, W.; Li, Z.; Xue, W.; Peng, Y.; Sun, L.; Cribb, M. Estimating 1-km-resolution PM<inf>2.5</inf> concentrations across China using the space-time random forest approach. Remote Sens. Environ. 2019, 231, 111221. [Google Scholar] [CrossRef]
  21. Kanabkaew, T. Prediction of hourly particulate matter concentrations in Chiangmai, Thailand using MODIS aerosol optical depth and ground-based meteorological data. EnvironmentAsia 2013, 6, 65–70. [Google Scholar]
  22. Phuengsamran, P.; Lalitaporn, P. Estimating Particulate Matter Concentrations in Central Thailand Using Satellite Data. Thai Environ. Eng. J. 2021, 35, 1–11. [Google Scholar]
  23. Kloog, I.; Koutrakis, P.; Coull, B.A.; Lee, H.J.; Schwartz, J. Assessing temporally and spatially resolved PM2.5 exposures for epidemiological studies using satellite aerosol optical depth measurements. Atmos. Environ. 2011, 45, 6267–6275. [Google Scholar] [CrossRef]
  24. Lyapustin, A.; Wang, Y. MCD19A2 MODIS/Terra+ Aqua Land Aerosol Optical DEPTH daily L2G Global 1 km SIN gr id V006 [Data Set]; NASA EOSDIS Land Processes DAAC: Sioux Falls, South Dakota, 2018.
  25. Wan, Z.; Hook, S.; Hulley, G. MOD11A1 MODIS/Terra Land Surface Temperature/Emissivity Daily L3 Global 1km SIN Grid V006; NASA EOSDIS Land Processes DAAC: Sioux Falls, South Dakota, 2015.
  26. Didan, K. MOD13Q1 MODIS/Terra Vegetation Indices 16-Day L3 Global 250 m SIN Grid V006; NASA EOSDIS LP DAAC: Sioux Falls, South Dakota, 2015.
  27. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
  28. Liaw, A.; Wiener, M. Classification and Regression by randomForest. R News 2002, 2, 18–22. [Google Scholar]
  29. Breiman, L.; Cutler, A.; Liaw, A.; Wiener, M. Package ‘randomForest’—Breiman and Cutler’s Random Forests for Classification and Regression; CRAN Repository: Lincoln, Nebraska, 2018. [Google Scholar]
  30. Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  31. Chen, T.; He, T.; Benesty, M.; Khotilovich, V. Package ‘xgboost’. R Version 2019, 90, 1–66. [Google Scholar]
  32. Sain, S.R.; Vapnik, V.N. The Nature of Statistical Learning Theory. Technometrics 1996, 38, 1271324. [Google Scholar] [CrossRef]
  33. Zhao, D.; Qi, L. Prediction of Maximum Power of PV System based on SVR Algorithm. J. Jilin Inst. Chem. Technol. 2015, 32, 89–94. [Google Scholar]
  34. Meyer, D. Support Vector Machines: The Interface to Libsvm in Package e1071; Springer: New York, NY, USA, 2014; Volume 1. [Google Scholar] [CrossRef]
  35. Danesh Yazdi, M.; Kuang, Z.; Dimakopoulou, K.; Barratt, B.; Suel, E.; Amini, H.; Lyapustin, A.; Katsouyanni, K.; Schwartz, J. Predicting fine particulate matter (PM2. 5) in the greater london area: An ensemble approach using machine learning methods. Remote Sens. 2020, 12, 914. [Google Scholar] [CrossRef] [Green Version]
  36. Stafoggia, M.; Bellander, T.; Bucci, S.; Davoli, M.; de Hoogh, K.; de’ Donato, F.; Gariazzo, C.; Lyapustin, A.; Michelozzi, P.; Renzi, M.; et al. Estimation of daily PM10 and PM2.5 concentrations in Italy, 2013–2015, using a spatiotemporal land-use random-forest model. Environ. Int. 2019, 124, 170–179. [Google Scholar] [CrossRef]
  37. Boulesteix, A.L.; Schmid, M. Machine learning versus statistical modeling. Biom. J. 2014, 56, 588–593. [Google Scholar] [CrossRef]
  38. Bzdok, D.; Altman, N.; Krzywinski, M. Points of Significance: Statistics versus machine learning. Nat. Methods 2018, 15, 233–234. [Google Scholar] [CrossRef]
  39. Kourou, K.; Exarchos, T.P.; Exarchos, K.P.; Karamouzis, M.V.; Fotiadis, D.I. Machine learning applications in cancer prognosis and prediction. Comput. Struct. Biotechnol. J. 2015, 13, 8–17. [Google Scholar] [CrossRef] [Green Version]
  40. Xiao, Q.; Zhang, H.; Choi, M.; Li, S.; Kondragunta, S.; Kim, J.; Holben, B.; Levy, R.C.; Liu, Y. Evaluation of VIIRS, GOCI, and MODIS Collection 6 AOD retrievals against ground sunphotometer observations over East Asia. Atmos. Chem. Phys. 2016, 16, 1255–1269. [Google Scholar] [CrossRef] [Green Version]
Figure 1. The map of PM2.5 stations and the number of stations.
Figure 1. The map of PM2.5 stations and the number of stations.
Sustainability 15 10024 g001
Figure 2. The scatterplot matrix of variables.
Figure 2. The scatterplot matrix of variables.
Sustainability 15 10024 g002
Figure 3. The importance variables for estimation of PM2.5.
Figure 3. The importance variables for estimation of PM2.5.
Sustainability 15 10024 g003
Figure 4. Time series plot of PM2.5 observed and estimation of PM2.5.
Figure 4. Time series plot of PM2.5 observed and estimation of PM2.5.
Sustainability 15 10024 g004
Figure 5. Estimation of PM2.5 in Thailand 2011–2020 in each pixel has a 1 km resolution.
Figure 5. Estimation of PM2.5 in Thailand 2011–2020 in each pixel has a 1 km resolution.
Sustainability 15 10024 g005aSustainability 15 10024 g005bSustainability 15 10024 g005cSustainability 15 10024 g005d
Table 1. The data structure of datasets.
Table 1. The data structure of datasets.
VariablesTypesPCD (n = 34,748)BAQ (n = 7339)
Training (n = 27,798)Validation (n = 6950)Testing
StationsNominal68 stations68 stations49 stations
DateDate2778 days1865 days734 days
MonthNominal12 months12 months12 months
YearDiscrete10 years10 years6 years
WOYNominal53 weeks53 weeks53 weeks
PM2.5 (μg/m3)Continuousµ: 32.2, s: 23.7, IQR: 26µ: 32.4, s: 23.8, IQR: 26µ: 30.1, s: 16.2, IQR: 21
AODContinuousµ: 0.5, s: 0.3, IQR: 0.4µ: 0.5, s: 0.3, IQR: 0.4µ: 0.5, s: 0.3, IQR: 0.4
LST (°C)Continuousµ: 33.3, s: 4.5, IQR: 6µ: 33.4, s: 4.5, IQR: 6µ: 36.1, s: 3.8, IQR: 4.3
NDVIContinuousµ: 0.1, s: 0.2, IQR: 0.3µ: 0.1, s: 0.2, IQR: 0.3µ: −0.1, s: 0.1, IQR: 0.2
EV (m)Continuousµ: 144.6, s: 198.9, IQR: 265.3µ: 142.4, s: 197.3, IQR: 265.3µ: 6.8, s: 1.6, IQR: 2.9
n: Rows; µ: Mean; s: Standard deviation; IQR: Interquartile range; m: Meter.
Table 2. The performance of models for estimation of PM2.5.
Table 2. The performance of models for estimation of PM2.5.
ModelsR2 (RMSE (μg/m3))
TrainingValidationTesting
MLR
  AOD0.18 (21.48)0.19 (21.26)0.04 (16.79)
  AOD + LST0.21 (21.25)0.22 (21.04)0.01 (17.15)
  AOD + LST + NDVI0.22 (21.26)0.22 (21.19)0.01 (17.27)
  AOD + LST + NDVI + EV0.25 (20.49)0.25 (20.38)0.01 (17.35)
  AOD + LST + NDVI + EV + WOY0.51 (18.42)0.51 (17.94)0.35 (14.07)
  AOD + LST + NDVI + EV + WOY + Year0.51 (18.28)0.52 (17.83)0.35 (13.78)
RF
  AOD0.79 (11.39)0.16 (23.08)0.02 (20.52)
  AOD + LST0.86 (10.12)0.25 (20.88)0.04 (18.59)
  AOD + LST + NDVI0.90 (8.82)0.44 (17.87)0.10 (16.03)
  AOD + LST + NDVI + EV0.89 (8.82)0.60 (15.17)0.15 (15.05)
  AOD + LST + NDVI + EV + WOY0.92 (7.23)0.74 (12.35)0.60 (10.47)
  AOD + LST + NDVI + EV + WOY +Year0.95 (5.58)0.78 (11.18)0.71 (8.79)
XGBoost
  AOD0.31 (19.77)0.27 (20.27)0.04 (17.45)
  AOD + LST0.34 (19.34)0.30 (19.85)0.05 (17.63)
  AOD + LST + NDVI0.40 (18.39)0.38 (18.71)0.08 (15.90)
  AOD + LST + NDVI + EV0.49 (16.94)0.47 (17.34)0.12 (15.23)
  AOD + LST + NDVI + EV + WOY0.61 (14.93)0.60 (15.14)0.43 (12.40)
  AOD + LST + NDVI + EV + WOY + Year0.62 (14.74)0.60 (15.00)0.45 (12.12)
SVM
  AOD0.28 (20.59)0.28 (20.66)0.04 (17.15)
  AOD + LST0.31 (20.08)0.31 (20.16)0.05 (16.91)
  AOD + LST + NDVI0.39 (18.83)0.38 (18.93)0.09 (15.68)
  AOD + LST + NDVI + EV0.47 (17.60)0.46 (17.79)0.14 (15.65)
  AOD + LST + NDVI + EV + WOY0.59 (15.64)0.60 (15.44)0.51 (11.51)
  AOD + LST + NDVI + EV + WOY + Year0.61 (15.32)0.62 (15.17)0.52 (11.63)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Buya, S.; Usanavasin, S.; Gokon, H.; Karnjana, J. An Estimation of Daily PM2.5 Concentration in Thailand Using Satellite Data at 1-Kilometer Resolution. Sustainability 2023, 15, 10024. https://doi.org/10.3390/su151310024

AMA Style

Buya S, Usanavasin S, Gokon H, Karnjana J. An Estimation of Daily PM2.5 Concentration in Thailand Using Satellite Data at 1-Kilometer Resolution. Sustainability. 2023; 15(13):10024. https://doi.org/10.3390/su151310024

Chicago/Turabian Style

Buya, Suhaimee, Sasiporn Usanavasin, Hideomi Gokon, and Jessada Karnjana. 2023. "An Estimation of Daily PM2.5 Concentration in Thailand Using Satellite Data at 1-Kilometer Resolution" Sustainability 15, no. 13: 10024. https://doi.org/10.3390/su151310024

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop