Hourly Ground-Level PM2.5 Estimation Using Geostationary Satellite and Reanalysis Data via Deep Learning

This study proposes an improved approach for monitoring the spatial concentrations of hourly particulate matter less than 2.5 μm in diameter (PM2.5) via a deep neural network (DNN) using geostationary ocean color imager (GOCI) images and unified model (UM) reanalysis data over the Korean Peninsula. The DNN performance was optimized to determine the appropriate training model structures, incorporating hyperparameter tuning, regularization, early stopping, and input and output variable normalization to prevent training dataset overfitting. Near-surface atmospheric information from the UM was also used as an input variable to spatially generalize the DNN model. The retrieved PM2.5 from the DNN was compared with estimates from random forest, multiple linear regression, and the Community Multiscale Air Quality model. The DNN demonstrated the highest accuracy compared to that of the conventional methods for the hold-out validation (root mean square error (RMSE) = 7.042 μg/m3, mean bias error (MBE) = −0.340 μg/m3, and coefficient of determination (R2) = 0.698) and the cross-validation (RMSE = 9.166 μg/m3, MBE = 0.293 μg/m3, and R2 = 0.49). Although the R2 was low due to underestimated high PM2.5 concentration patterns, the RMSE and MBE demonstrated reliable accuracy values (<10 μg/m3 and 1 μg/m3, respectively) for the hold-out validation and cross-validation.


Introduction
Airborne particulate matter (PM) consists of solid particles, liquid droplets, or a mixture of both suspended in the air. PM with aerodynamic diameters of less than 2.5 µm (PM 2.5 ) and 10 µm (PM 10 ) are two of the most widespread health threats, causing respiratory disease due to their penetration into the skin, lungs, and bronchi [1][2][3]. In addition to its effect on health, PM diminishes visibility and affects the climate both directly and indirectly by influencing the global radiation budget [4,5]. Therefore, monitoring PM 10 and PM 2.5 exposure is necessary to accurately diagnose air quality, address public health risks, and understand the climate effects of ground-level aerosols. One challenge for the aforementioned studies is a lack of accurate spatial and temporal distributions of ground-level PM 2.5 [6,7].
To support the assessment of PM exposure, ground stationary observations have been conducted to monitor ground-level air quality. Although the ground-based monitoring has provided reliable and accurate measurements with high temporal resolutions, there is a major limitation in capturing spatially continuous variations in PM, even though there are dense distributions of observation sites. It is difficult to ensure the homogeneity of the observed location, and even instruments of the same model may have various mechanical errors, making them unsuitable to obtain spatially continuous data. Recently, satellites have become a promising tool for studying the dynamics of aerosol optical properties due to their broad coverage and multispectral bands [8]. In particular, satellite-based aerosol optical depth (AOD) has been widely used to estimate the spatial and temporal distributions of PM 2.5 at ground level, and it has demonstrated effective performance in regions where ground measurements are limited [9,10].
From the spatial retrieval methodology perspective, most traditional studies are roughly divided into four categories, namely multiple linear regression (MLR [11]), mixedeffect model (MEM [12]), geographically weighted regression (GWR [13]), and chemicaltransport model (CTM), to estimate or predict ground-level PM 2.5 using satellite-based AOD, according to Chu et al. [14]. The MLR, MEM, and GWR statistical models are not only dependent on the distribution of ground stations but also have difficulty in applying many related increasing factors (e.g., meteorological conditions, land-use type, population, and road networks) to input parameter dimensions [14,15]. This implies that statistical models are likely to oversimplify the complicated relationships between PM 2.5 and the input predictors. The CTM exhibits an inaccuracy issue due to natural sources and anthropogenic emission inventories, has substantial computational costs, and requires additional expertise to understand complex physical and chemical processes [16,17].
As an alternative way to solve these issues, nonlinear and nonparametric machine learning methods such as the artificial neural network, support vector regression, and random forest (RF) have been used to estimate ground-level concentrations of PM with satellite data, demonstrating more reliable accuracy than that of conventional numerical models and statistical approaches [18][19][20] due to their nonlinear computation [15,21]. Additionally, deep learning, which is considered the second generation of machine learning, has been suggested [22], and it has great potential to solve issues in geophysical research for analyzing complicated natural phenomena [15,[21][22][23][24].
However, new approaches using deep learning have seldom been applied to estimate the spatial distribution of ground-level PM 2.5 . Only a few attempts have been made to estimate PM 2.5 ground levels. Ong et al. [25] used a deep recurrent neural network to predict PM 2.5 , resulting in environmental monitoring with improved accuracy compared with that of numerical models; however, their method only performed well over ground monitoring sites. Li et al. [15] estimated ground-level PM 2.5 by fusing satellite and station observations with deep learning, but they also used meteorological data from ground sites as the deep-learning input parameters when modeling daily PM 2.5 using a polar orbit moderate resolution imaging spectroradiometer (MODIS/Terra) sensor. Sun et al. [26] adopted deeper and wider network model structures than those of Li et al. [15] to learn the complex spatiotemporal relationships of PM 2.5 from large-scale observation data. The crossvalidation values based on each ground site were statistically accurate in their research. However, they were limited in interpolated areas because they still used interpolated meteorological variables based on ground stations with inverse distance weighting methods as the input parameters for deep learning. Therefore, further applicability studies of deep learning are required to determine the optimal approach for more reliable and accurate PM 2.5 ground-level spatial concentration data.
Consequently, the objective of our study was to spatially estimate ground-level PM 2.5 , primarily using high-spatiotemporal-resolution geostationary ocean color imager (GOCI) images and reanalysis data via the deep neural network (DNN) approach. Compared with previous studies that have used deep-learning methods [15,25,26], major differences and improvements were made in this study as follows.
Firstly, to estimate high-temporal-resolution (hourly) ground-level PM 2.5 data, this study used GOCI satellite data, which can be used to monitor the diurnal variation in PM 2.5 [27] and long-range transported air pollutants over Northeast Asia.
Secondly, this study directly used multispectral images of GOCI top of atmosphere (TOA) reflectance instead of GOCI-retrieved AOD products as the input parameters of the DNN model. Most previous studies have used satellite-based AOD to estimate groundlevel PM 2.5 [15], demonstrating reasonable results [14,19]. Although most research studies have sufficiently estimated the spatiotemporal distribution of AOD from the multispectral TOA reflectance of optical satellites, they still exhibit AOD product retrieval errors [28]. This means that the accumulative error of AOD retrieval from the satellite is propagated in the PM retrieval process, if AOD was used as an input parameter. In addition, because AOD estimation can be challenging due to the bright and nonlinear scattering characteristics of land surfaces [29,30], AOD-based estimation of PM 2.5 has limitations over bright landsurface areas, similar to those of AOD retrieval [28].
Lastly, to enhance the application of DNN model performance from the spatial information perspective, near-surface atmospheric information from the unified model (UM) was used to improve the spatial accuracy of the PM concentration [31]. This means that the meteorological parameters of the ground stations were not used as input variables when simulating ground-level PM 2.5 , unlike in previous studies.

Area of Interest and Ground Measurements
This study focused on the Korean Peninsula, as illustrated in Figure 1, to estimate hourly ground-level P.M 2.5 , which was primarily due to the computational limitations of the DNN model caused by the high-spatiotemporal-resolution dimensions of the satellite and reanalysis data. The Korean Peninsula is located at mid-latitude and in the westerly wind zone, and it exhibits a monsoon climate that gives way to a cold continental climate in winter and a marine climate in summer [32]. This means that the retrieval accuracy of ground-level PM 2.5 is lowered due to the high cloudiness of the rainy summer season. To evaluate the performance of the newly applied deep-learning network, PM 2.5 data simulated with the Community Multiscale Air Quality (CMAQ) model version 5.1 [33] driven by meteorological inputs using the Weather Research and Forecasting (WRF) model version 3.8.1 [34] were also compared with ground measurements of PM 2.5 . Initial and boundary conditions for the WRF model were set by applying National Centers for Environmental Prediction (NCEP) Final (FNL) Operational Global Analysis data on 1 • × 1 • grids. The horizontal and vertical resolutions of the WRF and CMAQ models were 15 km × 15 km and 27 vertical sigma levels from the surface to 50 hPa, respectively. More details, such as about the chemical and physical configurations of the WRF and CMAQ models, are presented in a previous study [35].

Specifications of GOCI Satellite Data
In this study, we used TOA reflectance observed by GOCI onboard the Communication, Ocean, and Meteorological Satellite (COMS) to estimate hourly ground-level PM 2.5 . The reflectance data contain eight bands consisting of six visible and two near-infrared bands with a spatial resolution of 500 m. Although the sensor specifications of the GOCI satellites are predominately designed for ocean observation [36], GOCI is useful for estimating aerosol optical properties, especially those of land areas, because the nonlinear contribution of the bright surface reflectance decreases in the shortwave visible spectral region [29]. This means that an increase or decrease in the PM concentration can be observed by the GOCI satellite, primarily due to the lower error contribution of land-surface reflectance in shortwave blue channels. Detailed information on GOCI is provided in previous studies [27].
Additionally, a cloud mask of the GOCI image was applied to select clear sky areas for the retrieval of PM 2.5 [27,37,38].

Meteorological Variables from UM Regional Data Assimilation and Prediction System (RDAPS) and Ancillary Data
In this study, we also used meteorological variables from the UM RDAPS [39] not only to improve the spatiotemporal performance of the DNN model but also to support the oversimplified relationship between the GOCI TOA reflectance and ground-level PM 2.5 . The following meteorological variables from the UM RDAPS model of the Korea Meteorological Administration (KMA) were used as additional input dimensions: wind direction and speed, surface pressure, planetary boundary layer height (PBLH), 2 m air temperature, dew point temperature, visibility, and relative humidity (accessed on 27 May 2021 from https://data.kma.go.kr/).
In addition to these variables, we used solar and satellite geometric conditions, including normalized difference vegetation index (NDVI), global 30 arc second digital elevation model (DEM), and land cover (LC) as input variables (accessed on 27 May 2021 from https://lpdaac. usgs.gov/tools/data-pool/). The NDVI and LC of MODIS were applied to consider the vitality of the vegetation and the state of the land surface.

Pre-Processing of Input Parameters for Training DNN
GOCI TOA, UM RDAPS reanalysis data, and other ancillary data (DEM, NDVI, and LC) have different spatial projections. Therefore, we converted all input data into orthographic map projections similar to those of the GOCI projection with a 4 km spatial resolution. Ancillary data were converted using the nearest-neighbor interpolation method. UM RDAPS estimates each meteorological variable based on a 12 km spatial resolution and 6 h intervals (00:00, 06:00, 12:00, and 18:00 coordinated universal time (UTC)). Differences in the spatiotemporal resolution between the reanalysis and GOCI satellite data were corrected by performing spatial interpolation with the Kriging method and temporal interpolation with a spline function. The Kriging method has been used in the assimilation process of weather data in many studies and has demonstrated reliable spatial interpolation performance [30,40,41]. For temporal interpolation with the spline function, most of the weather variables in previous studies showed curved cycles [42,43], and interpolation was performed with time-unit data through the spline function.
We validated the accuracy of the interpolated UM RDAPS variables (excluding PBLH) by calculating the correlation coefficient (R), root mean square error (RMSE), and mean bias error (MBE) with automatic synoptic observation station (ASOS) in situ meteorological variables provided by the KMA (Figure 2). In Figure 2, it can be seen that the interpolated UM RDAPS variables demonstrated statistically high matched patterns with the ASOS in situ measurements; however, the surface pressure showed a systematic deviation at each point. This is because the low spatial resolution of the UM RDAPS does not reflect the actual altitude characteristics of each in situ observation site. The wind speed exhibited a linear variation over time, but the wind direction uncertainty was high because observation data are provided in 20 • direction intervals and are excluded from the input data. PBLH (Figure 2f) shows the monthly (January, March, May, July, September, and November) time average of the data before interpolation (diamond-shaped symbols) and after interpolation (solid lines) because actual measurement data could not be obtained. A similar temporal trend was observed to that of previous studies [42,44]. Finally, the interpolated spatiotemporal meteorological variables from UM RDAPS were used as additional input data for the DNN model.

DNN Approach
In this study, we adopted the Python deep-learning library of Keras to estimate ground-level PM 2.5 . A DNN is a supervised training method with a feed-forward network structure that utilizes error backpropagation to determine the weight and bias of each hidden node. Thus, it requires true values, such as in situ measurements [17]. The DNN is composed of one input layer, multiple hidden layers, and one output layer. The number of hidden layers is typically greater than three, and the DNN consists of n time-hidden input nodes [17,45]. The structure of the DNN influences the performance of the estimation model; thus, we examined several training model structures with various parameter combinations, as presented in Table 1. In this study, Keras Tuner was used to determine the optimal hyperparameters for the deep-learning model, which was determined to be four hidden layers, including 512, 512, 1024, and 1024 hidden nodes for each layer, with L1 regularization of 0.001, batch normalization, and the rectified linear unit (ReLu) activation function. In addition, Adam optimization with a learning rate of 0.05 and a dropout rate of 0.3 were determined as the optimal hyperparameters. For reference, the random forest (RF) approach was also applied to compare the performance of the newly applied DNN ground-level PM 2.5 estimation, because it demonstrates high predictive performance by calculating the results of several decision trees using the ensemble technique [19,20,46,47]. In this study, 72 model structures were tested based on independent variables such as the number of decision trees, maximum tree depth, and the percentage of data per column used for training to predict the concentration of PM 2.5 . The results of the analysis illustrated that the final RF used 70% of the data per column for 40 decision trees, 12 input nodes per decision tree, and 80% of the total data for each decision tree. In this study, two validation approaches were applied to evaluate the temporal and spatial performances of the deep-learning model: hold-out validation and k-fold crossvalidation. For the hold-out validation, we separated the total matchup datasets into three parts in chronological order: training data (60%), validation data (20%), and test data (20%). Fivefold cross-validation was performed by dividing the PM ground observation stations into five groups, as illustrated in Figure 1. Each cross-validation dataset was randomly composed of 114 points (approximately 55% of training data) at 206 observation points; 54 points (approximately 26% of validation) of data were utilized during the training process, and the remaining 38 points (approximately 18% of test data) were used as the final model accuracy test data. For both the hold-out validation and the cross-validation approaches, the training data were used to optimize the deep-learning model, and the validation data were used to reduce overfitting problems based on the early stop approach applied in the training process. The remaining test data were used to evaluate how well the deep-learning model reflected the spatiotemporal characteristics of PM 2.5 . As a reference, all matchup datasets were normalized from their physical values to 0-1 float values using minimum and maximum values over 2 years. , but RF has a greater underestimation, which means that its uncertainty is greater for high concentrations. Regarding MLR, it is well clustered, but the calculation range is concentrated in a narrower range compared to that of the actual values, and the maximum output value is not reflected in high concentration cases within 40 µg/m 3 . The predicted values from the CMAQ simulations were distributed over a wide range regardless of the actual measurement, and the RMSE and MBE were higher than those of the other methods. However, unlike the satellite-based method, prediction is possible for 24 h a day, and the CMAQ model contains more data for the same period since there are no missing areas due to cloud cover. In Figure 3d, the time range (08~23 UTC) that is not observed by GOCI is excluded.

Results
To evaluate the spatial generalization, fivefold cross-validation was performed for the DNN, RF, and MLR by dividing the dataset into five station groups, as illustrated in Figure 1. In the case of CMAQ based on the physical model, we did not perform cross-validation since the performance of generalization for CMAQ is better than the data-driven models of the DNN, RF, and MLR, which were trained using the ground measurements [21,35]. The results are presented in Table 2, and the total cross-validation results of the RMSE, MBE, and R 2 for the DNN are 9.17 µg/m 3 , 0.293 µg/m 3 , and 0.49 µg/m 3 , respectively, and the results of the fivefold cross-validations are less accurate than those of the hold-out validation of the deep-learning model. This means that the spatial variation exhibits a complicated spatial pattern and requires additional parameters to reflect the characteristics of spatial PM 2.5 . Although the R 2 is low because of the underestimated pattern of high PM 2.5 , the RMSE and MBE displayed reliable and accurate values of less than 10 µg/m 3 and 1 µg/m 3 , respectively, when compared with previous results [15,19,20,25,26,48]. When considering the RF model as a comparative analysis, the statistical values may vary according to the data type, pre-processing approach, adopted model, and validation method. The proposed DNN model appears to produce reliable estimates of spatiotemporal PM 2.5 when compared to those of the RF model.

Discussion
Finally, we confirmed the spatial calculation ability of the DNN model for the case of high concentrations in 2017. Figure 4 displays the results of the hourly spatial maps of ground-level PM 2.5 on 19 January 2017 (from 01:00 (10:00) to 06:00 (15:00) UTC (KST)) for cases of high PM 2.5 concentrations. Except for areas that are not observable due to clouds, high concentration areas estimated using the DNN model were well matched with the ground-truth PM 2.5 . Similar to the accuracy analysis results, the spatial variation in the DNN is consistent; however, it tends to underestimate concentrations, especially for high PM 2.5 areas. Nevertheless, it was determined that the utility of the DNN approach using satellite and reanalysis data is high because it can observe diurnal and spatial changes of PM 2.5 with reliable accuracy. As this study is based on the GOCI sensor, there are several limitations. First, the GOCI is a sensor whose main purpose is ocean color observation, and it is impossible to observe at night due to only being equipped with a visible channel. Second, there is an error possibility for high concentration cases via residual cloud effects due to the absence of an IR channel. The above problems can be solved by fusion with meteorological data from satellites that observe a wide range of wavelengths, such as the Geo-KOMPSAT-2A/Advanced Meteorological Imager (GK-2A/AMI).

Conclusions
In this study, we estimated ground-level PM 2.5 via a deep-learning approach using TOA reflectance observed with GOCI satellite and meteorological variables of reanalysis data, demonstrating that the proposed DNN model can effectively reflect the spatial characteristics of PM 2.5 over the Korean Peninsula compared with conventional RF, MLR, and CMAQ methods. Overall, data-driven models, such as the DNN and RF models, showed more reliable PM 2.5 retrieval than that of conventional MLR and CMAQ. In addition, the DNN method exhibited higher accuracy than the RF method for both the validation approaches due to its deeper and more complicated network structure. Conventional MLR tended to converge to a certain value, with a low error rate but also a lower matching rate.
Although the DNN demonstrated that the temporal variations of PM 2.5 were sufficiently calculated according to the results of the hold-out validation, the spatial characteristics estimates remain a limit, despite applying the GOCI satellite and reanalysis data, due to the complexity of ground-level PM 2.5 , according to the cross-validation results. This implies that additional spatial variables (population, road networks, etc.) should be considered to reflect the substantially large spatial variability of PM 2.5 . Furthermore, compared to CMAQ, the DNN is limited in estimating PM for cloud areas and daily changes, including nighttime. Nevertheless, the suggested method enables the observation of the spatial variation in actual ground PM 2.5 , unlike previous studies, because the data from meteorological stations were not used as input data for the DNN model. Compared with other AOD-based PM 2.5 estimations, the model has several advantages: (1) the AOD calculation process is independent of errors and (2) time and computing resources are saved in this process.