An Air Pollutants Prediction Method Integrating Numerical Models and Artificial Intelligence Models Targeting the Area around Busan Port in Korea

Hong, Hyunsu; Choi, IlHwan; Jeon, Hyungjin; Kim, Yumi; Lee, Jae-Bum; Park, Cheong Hee; Kim, Hyeon Soo

doi:10.3390/atmos13091462

Open AccessArticle

An Air Pollutants Prediction Method Integrating Numerical Models and Artificial Intelligence Models Targeting the Area around Busan Port in Korea

by

Hyunsu Hong

¹

,

IlHwan Choi

²,

Hyungjin Jeon

³

,

Yumi Kim

³,

Jae-Bum Lee

⁴,

Cheong Hee Park

¹

and

Hyeon Soo Kim

^1,*

¹

Department of Computer Science & Engineering, Chungnam National University, Daejeon 34134, Korea

²

Department of Civil and Environmental Engineering, Daejeon University, Daejeon 34520, Korea

³

Korea Environment Institute, Sejong 30147, Korea

⁴

National Institute of Environmental Research, Incheon 22689, Korea

^*

Author to whom correspondence should be addressed.

Atmosphere 2022, 13(9), 1462; https://doi.org/10.3390/atmos13091462

Submission received: 18 August 2022 / Revised: 3 September 2022 / Accepted: 6 September 2022 / Published: 9 September 2022

(This article belongs to the Special Issue High-Resolution Weather and Climate Modeling with Industrial Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Exposure to air pollutants, such as PM_2.5 and ozone, has a serious adverse effect on health, with more than 4 million deaths, including early deaths. Air pollution in ports is caused by exhaust gases from various elements, including ships, and to reduce this, the International Maritime Organization (IMO) is also making efforts to reduce air pollution by regulating the sulfur content of fuel used by ships. Nevertheless, there is a lack of measures to identify and minimize the effects of air pollution. The Community Multiscale Air Quality (CMAQ) model is the most used to understand the effects of air pollution. In this paper, we propose a hybrid model combining the CMAQ model and RNN-LSTM, an artificial neural network model. Since the RNN-LSTM model has very good predictive performance, combining these two models can improve the spatial distribution prediction performance of a large area at a relatively low cost. In fact, as a result of prediction using the hybrid model, it was found that IOA improved by 0.235~0.317 and RMSE decreased by 4.82~8.50 μg/m³ compared to the case of using only CMAQ. This means that when PM_2.5 is predicted using the hybrid model, the accuracy of the spatial distribution of PM_2.5 can be improved. In the future, if real-time prediction is performed using the hybrid model, the accuracy of the calculation of exposure to air pollutants can be increased, which can help evaluate the impact on health. Ultimately, it is expected to help reduce the damage caused by air pollution through accurate predictions of air pollution.

Keywords:

air quality; CMAQ; PM_2.5; LSTM; hybrid model; seaport

1. Introduction

Air pollutants, such as PM_2.5 and ozone, have serious adverse effects on health, and as a result, more than 4 million deaths have been reported as of 2015 [1,2]. In the United States, PM₁₀ emission is 0.6 ton/day in the power generation sector, but it is 1.8 ton/day only in the Port of Los Angeles. Therefore, it is known that the emission of air pollutants from ports is more serious [3]. The PM₁₀ emission from ports is similar to or greater than the PM₁₀ emission from 500,000 vehicles [3,4,5]. The major sources of air pollution in ports include ships, cargo-handling equipment, trucks, and railway locomotives [6,7,8]. Of these sources of pollution, ships and trucks account for 43% and 31% of PM₁₀ emissions [3]. Accordingly, the International Maritime Organization (IMO) is making efforts to reduce air pollution around ports by regulating the sulfur content of ship fuels [9]. In order to manage air pollution, Busan Port is also making efforts, such as requesting Alternative Maritime Power (AMP) when ships are anchored and regulating the sulfur content of the fuel, but there are insufficient measures to manage and prevent air pollution using mathematical models, such as air pollution level modeling [4].

Methods for predicting air quality can be divided into numerical analysis methods and statistical methods. As a numerical analysis method, the Community Multiscale Air Quality (CMAQ) model is mainly used. The CMAQ model predicts air quality by analyzing the movement of pollutants and chemical reactions based on data such as meteorological data and emissions [10,11,12]. As a statistical method, prediction methods using artificial intelligence technologies, such as machine learning, are widely used [13,14]. In particular, machine learning algorithms have the ability to explain nonlinear relationships with large amounts of data very well compared to conventional statistical methods, which can effectively reduce estimation errors [15]. Among machine learning algorithms, LSTM has a very good ability to model long-term dependencies [16] and has proven to be one of the best neural networks suitable for time series problems [17].

The CMAQ model is designed to simulate the dynamics of emitted pollutants when they form aerosols. The model is designed to consider PM_2.5 and PM₁₀ and to consider the emissions of organic carbon, dust, and other types of primary pollutants, as well as secondary pollutants, such as sulfate, nitrate, and ammonium, produced by chemical reactions [18,19].

The CMAQ model has shown various abilities to simulate concentrations of PM_2.5 and PM₁₀ [19]. A study was conducted using the CMAQ model to improve the performance of the air quality model in a Japanese metropolitan area caused by fine dust (PM_2.5) [20]. In this study, several analyses of CMAQ composition, emission, boundary concentration, and meteorological field were performed to improve the prediction performance for PM_2.5. According to the analysis results, PM_2.5 nitrate concentration is very sensitive to ammonia (NH₃) emission and dry deposition of nitric acid (HNO₃) and ammonia, and PM_2.5 OA concentration is very sensitive to condensable organic compound (COC) emission. The CMAQ composition, chemical inputs (emissions and boundary concentrations), and the variability of the meteorological field were reported to be 6.1–6.5%, 9.7–10.9%, and 10.3–12.3%, respectively [20].

A considerable number of studies have been conducted to predict air quality using artificial neural networks. Xayasouk et al. developed a model for predicting PM concentration by applying Long Short-Term Memory (LSTM) and Deep Auto Encoder (DAE), and the accuracy of the model was verified by evaluating the prediction and actual measurement results using the Root Mean Square Error (RMSE) metric [21]. A study to predict PM_2.5 at air quality monitoring points in Beijing was also conducted [22]. In this study, Artificial Neural Network (ANN), LSTM, and Long Short-Term Memory-Fully Connected (LSTM-FC) methods were compared using air quality monitoring data as input data. As a result of the performance evaluation, the LSTM and LSTM-FC methods showed excellent predictive performance. Ma et al. proposed a fine-grained spatiotemporal PM_2.5 retrieval method by applying a machine learning algorithm based on satellite images and ground monitoring station data [23]. Gao et al. provided a short-distance healthy travel route planning method using PM_2.5 retrieval techniques with high spatiotemporal resolution and a dynamic Dijkstra algorithm [24]. Chen et al. performed a study to predict PM₁₀ per hour using top-of-the-atmosphere reflectance (TOAR) data from FY-4A, China’s next-generation geostationary satellite, and long-range transport dust (LRTD) data [25]. Song et al. suggested a study to estimate PM_2.5 per hour using TOAR data from FY-4A, meteorological factors, and geographic information [26]. In Korea, a study comparing the prediction performance of a 3D Chemical Transport Model (CTM) simulation and LSTM based on air quality monitoring data and meteorological data at two points was conducted to predict PM₁₀ and PM_2.5 [27]. An Index of Agreement (IOA) metric was used to evaluate performance. The 3D CTM simulation showed an IOA value of 0.36 to 0.78, but the LSTM-based model showed an IOA value of 0.62 to 0.79, indicating that the LSTM model improved air quality prediction. Lee et al. developed a DNN model that predicts PM_2.5 concentrations at 6-h intervals for 3 days—from the prediction day (D+0) to 2 days after the prediction day (D+2)—and conducted a performance evaluation through comparison with a CMAQ modeling system [28]. A study was also conducted to predict the air quality of Busan Port using ship activity data [29]. In this study, PM_2.5 of Busan North Port and Busan New Port was predicted using a Recurrent Neural Network-Long Short Term Memory (RNN-LSTM)-based model. As a result of the model verification, numerical values of IOA 0.975 and RMSE 4.88 for Busan North Port and IOA 0.970 and RMSE 5.87 for Busan New Port were reported, indicating that the RNN-LSTM-based model has excellent performance in predicting air quality. In particular, since LSTM has a very good ability to model long-term dependencies and has been proven to be one of the most suitable neural networks for time series problems, many recent studies have been conducted on various LSTM structures to predict air quality, including PM_2.5. The proposed structures include a Long Short-Time Memory Fully Connected (LSTM-FC) neural network [22], a Graph Convolutional networks-Long Short-Term Memory networks (GC-LSTM) [30], a Convolutional Neural Networks-Long Short Time Memory (CNN-LSTM) model [31], a LSTM-3D-VAR method [32], a long short-term memory neural network extended (LSTME) model [33], a Bayesian optimization-based Long Short-Time Method [34], etc.

There have been many attempts to find a more improved prediction method based on numerical models or artificial neural network models that were intended to find an improved model using only one approach. On the other hand, there have been attempts to improve predictive ability by combining various models. Isakov and his colleagues found that the location of roads and industrial facilities has a great influence on the concentration difference of air pollutants when using the atmospheric diffusion model to understand the effects of exposure to air pollutants [35]. To solve this problem, they presented the CMAQ-AERMOD Hybrid model, which combines the CMAQ model that can predict the wide-area scale, and the AMS/EPA Regulatory Model (AERMOD), which is suitable to reflect the characteristics of small areas such as roads and industrial clusters. Using this, the spatial distribution of air pollutants was predicted, and the impact evaluation on air pollution exposure was performed. In a similar case, there was also a study in which a high-resolution grid model was applied by combining the CMAQ-California Puff dispersion (CALPUFF) model to evaluate the health effects by NO₂ concentration [36].

The CMAQ model is helpful in predicting air quality for a relatively large area of space, but there are problems in that the model operation requires a lot of resources, and the prediction accuracy is low. In addition, when trying to use artificial neural networks, past observation data are required, but there is a limit to installing observation points in all spaces, and there is difficulty in securing sufficient observation data necessary for learning. Therefore, in this study, to overcome the limitations of these models, we propose a new hybrid model that combines the CMAQ model and an artificial neural network. To this end, first, the results of predicting air quality in a large area are obtained using the CMAQ model. Next, air quality is predicted using the RNN-LSTM model for a specific point that has sufficient historical observation data required for learning. Finally, within the range predicted by the CMAQ model, air quality at any point where the artificial intelligence model is not applied is predicted by combining the results of the two methods. Improved air quality prediction will ultimately contribute to minimizing health and property damage from air pollution.

2. Materials and Methods

2.1. Study Design

The basic concept of this study can be expressed as shown in Figure 1. Table 1 shows the domain settings for operating the CMAQ model. Using the nesting grid technique, the domain was modeled by narrowing the domain to Northeast Asia, Korea, and the research target areas. The situation is shown in Figure 2.

Figure 3 shows the prediction area of the CMAQ model, the prediction point using the artificial neural network, the verification point for performance evaluation of the hybrid model, and the weather measurement point. Centering on Busan Port, the prediction point of the artificial neural network is represented by red circles and red diamonds, the weather measurement point by blue stars, and the performance evaluation point of the hybrid model by orange squares.

2.2. Input Data

The operating period of the CMAQ model is from 27 January to 3 March 2020, including the ramp-up period, and the period used for prediction is from 1 February to 29 February, predicting PM_2.5 every hour. For the WRF weather model, NCEP GFS Forecasts (0.25 degree grid) data were used. The data used to predict PM_2.5 using artificial neural networks are air quality observation data (AQMS), meteorological data (KMA), and ship activity data. In this case, the data on the activity of the ship were the port-MIS data of Busan Port. The hybrid model predicts the spatial distribution of PM_2.5 using PM_2.5 prediction data obtained from the artificial neural network and the results of CMAQ as input data. Table 2 summarizes the input data used by each model. Since the data used in this study were collected every hour, the temporal resolution is 1 h, and since the CMAQ data was obtained based on a grid of 1 km intervals, the spatial resolution is 1 km. The spatial resolution information can be found in Table 1, and the temporal resolution information of the input data is described in Table 3.

2.3. Method

2.3.1. CMAQ

The CMAQ model system is largely divided into meteorological modeling, emissions modeling, and air quality modeling [37]. Meteorological modeling uses a three-dimensional medium-scale weather model, WRF v3.6.1 (Weather Research and Forecast). In order to convert the WRF modeling results into input data for air quality modeling, the WRF modeling results are processed using MCIP v3.6 (Meteorology-Chemistry Interface Processor). Emission modeling uses SMOKE v3.1 (The Sparse Matrix Operator Kernel Emissions), which can process three-dimensional emission data to convert it into the input data form of air quality modeling. Finally, air quality modeling is performed using CMAQ v4.7.1, with meteorological and emission data as input data.

The operating procedure of the CMAQ model is shown in Figure 4. The modeling target area applies the nesting technique with a resolution of 27 km, 9 km, 3 km, and 1 km. The 27 km grid domain covers all of South Korea, Japan, and parts of China, while the 9 km grid domain covers all of South Korea, the West Sea, North Korea, and parts of Japan. The 3 km grid domain covers the southern region of South Korea, and the 1 km grid domain covers an area of 50 km in width and 50 km in length centered on Busan Port.

2.3.2. RNN-LSTM

The RNN can process long sequence data; however, its performance decreases as the sequence length increases. This phenomenon is called the long-term dependency problem. One of the variant RNN models used to overcome this problem is LSTM [38]. To select a machine learning model to be applied in this study, experiments were performed using models such as RNN, RNN-LSTM, RNN-GRU, and Random Forest and the data used in the study. As a result of the experiment, the RNN-LSTM model showed the trend of the measured data best compared to the other models, so it was selected as a machine learning model to be applied to the hybrid model in this study.

The LSTM model is set up in four layers with a recursive structure. The core of the LSTM is the state of continuous cells that come in through the gate. The state of continuous cells is called a conveyor belt. The information that comes in on the conveyor belt is delivered without any change. LSTM can add or delete information through the input gate, forget gate, and output gate. Thus, the gate selectively delivers information and continues to learn by removing previous data. The mathematical formulas for LSTM computed by the LSTM gate vector are as follows:

f_{t} = σ (W_{i f} x_{t} + W_{h f} h_{t - 1} + b_{f})

(1)

i_{t} = σ (W_{i i} x_{t} + W_{h i} h_{t - 1} + b_{i})

(2)

g_{t} = t a n h (W_{i g} x_{t} + W_{h g} h_{t - 1} + b_{g})

(3)

o_{t} = σ (W_{i o} x_{t} + W_{h o} h_{t - 1} + b_{o})

(4)

W_{i} = [\begin{matrix} W_{i i} \\ W_{i f} \\ W_{i g} \\ W_{i o} \end{matrix}], W_{h} = [\begin{matrix} W_{h i} \\ W_{h f} \\ W_{h g} \\ W_{h o} \end{matrix}], b = [\begin{matrix} b_{i} \\ b_{f} \\ b_{g} \\ b_{o} \end{matrix}]

(5)

C_{t} = f_{t} \times C_{t - 1} + i_{t} \times g_{t}

(6)

h_{t} = o_{t} \times t a n h (C_{t})

(7)

where

f_{t}

is a forget gate vector and serves as a weight that remembers the previous state of the cell,

i_{t}

is an input gate vector that serves as a weight for acquiring new information,

o_{t}

, in contrast, is an output gate vector that serves to select an output candidate,

g_{t}

is a cell input vector,

x_{t}

is an input vector,

h_{t}

is a hidden state vector also known as an output vector of the LSTM unit,

C_{t}

is a cell state vector, W is the weights matrix, and b is a bias vector. Importantly,

f_{t}

,

i_{t}

and

o_{t}

are the gate vectors. LSTM uses two types of activation functions—σ is a sigmoid function and tanh is a hyperbolic tangent function. Additionally, the dimension of

W_{i}

is [t × m], the dimension of

W_{h}

is [t × t], the dimension of input

x_{t}

is [m × 1], the hidden node’s dimension is [t × 1], and the dimension of b is [t × 1]. W is initialized by the Xavier method. Bias vector b is initialized to 1 for the forget gate, while all other biases are initialized to zero [29,38].

The operation procedure of RNN-LSTM is shown in Figure 5. Air quality prediction at Busan North Port and Busan New Port using LSTM was performed using 13 input parameters composed of the air quality monitoring dataset, meteorological dataset, and hourly ship anchorage dataset. At sites except for the two points, prediction was performed using 12 input parameters that were configured with the air quality monitoring dataset and the meteorological dataset. Detailed information about the input parameters is summarized in Table 3. As a data processing method for learning and prediction, a smooth curve fitting method [39] was used, as the missing rate was less than 10%.

Many experiments were conducted with a combination of different condition values to set hyperparameters. Table 4 shows the condition values of each hyperparameter used in the experiment. In the end, the final values of the hyperparameters were determined by the condition values that gave the best results. Table 5 presents the partition situation of the input dataset.

To select a prediction algorithm, after learning using hidden nodes, hidden layers, and epochs, a model showing optimal prediction performance was selected [40]. Table 6 summarizes the configuration of the applied dataset for each measurement point to predict PM_2.5 using LSTM and the setting values of the hyperparameter to select the parameters of the optimal model. To determine the parameters of the optimal model, the Hidden node was set to 30, 60, and 120, the Hidden Layer was set to 1, 2, and 3, and the Epochs were set to 10, 15, and 20, respectively. After learning by each setting value, the prediction was performed with hyperparameters showing the optimal prediction performance.

2.3.3. Hybrid (CMAQ and RNN-LSTM) Model

To construct the hybrid model, the data assimilation method is applied based on the data predicted by the RNN-LSTM method and the result data of the CMAQ model. In this study, Pun’s Interpolation method, one of the successive correction methods, is used as a data assimilation method [41,42].

As shown in Figure 6, when there are machine learning prediction values (black square dots) that are irregularly distributed on the forecast model grid, Pun’s interpolation method is applied to correct the prediction value of the numerical model at point n using the prediction value of machine learning that exists within the influence radius (circle) surrounding one point n. By using this method, the prediction performance of the entire forecast model can be improved.

The data assimilation method works as follows. First, when the interpolation point and the machine learning prediction point are the same, the interpolation concentration is the same as the machine learning prediction concentration. Second, when the prediction point of machine learning is located outside the influence radius of the interpolation point, the interpolation concentration uses the results of the numerical model. Third, when there is more than one machine learning prediction point around the interpolation point, the interpolation of the corresponding point is determined by the prediction concentration of the surrounding machine learning and the prediction concentration of the numerical model. The interpolation value of a specific point is determined by Equation (9).

The concentration error (

E_{k}

) at point k with the machine learning predicted value is defined as the difference between the predicted concentrations of the machine learning model and the numerical model, such as Equation (8).

E_{k} = C_{m_{k}} - C_{q_{i - 2, j - 1}}

(8)

Here,

C_{m_{k}}

is the concentration predicted by machine learning at point k, and

C_{q_{i - 2, j - 1}}

is the concentration predicted by the numerical model at grid (i − 2, j − 1) including the point k. The interpolation concentration at point n is calculated by applying the error of the grid (i, j) to the concentration of the numerical model, as shown in Equation (9).

C_{t_{i j}} = E_{i j} + C_{q_{i j}}

(9)

Here,

C_{t_{i j}}

is a prediction value interpolated in the grid (i, j), and we intend to use the value as a hybrid model prediction value.

E_{i j}

is a concentration error in the grid (i, j). The value is a weighted sum of the concentration error of the machine learning prediction point located within the influence radius of the center point n of the grid (i, j), and is defined as Equation (10).

E_{i j} = \sum_{k} W_{k} E_{k}

(10)

Here,

W_{k}

is a weight applied to each error term and is inversely proportional to the square of the distance between the point n and the machine learning prediction point k, and is calculated as in Equation (11).

W_{k} = \frac{1 / r_{k}^{2}}{\sum_{k} 1 / r_{k}^{2}}

(11)

The approach of Equation (11) is suitable when the model grid center n and multiple machine learning prediction points are close to each other, and when the machine learning prediction point is located at point n,

E_{k} = E_{i j}

and

C_{t_{i j}} = C_{m_{i j}}

. When the machine learning prediction point is located outside the influence radius of the model grid center n,

W_{k} = 0

. Thus,

E_{i j} = 0

and

C_{t_{i j}} = C_{q_{i j}}

.

In regions where machine learning prediction points are rare, the influence of certain prediction points can cause sharp changes in concentration in the interpolated field. It is desirable to avoid this phenomenon and have the effect of the distance of the machine learning prediction point appear gradually. To this end, within the framework described in Equations (8)–(11), the option of adding one or more virtual points to the radius of influence for each grid cell can be considered. These virtual points have no error at all, and by introducing the virtual point, the weight value in Equation (10) can be reduced. The modified weight for the virtual point

n_{v}

at the influence radius R is expressed as Equation (12). In Equation (12), as the number of virtual points increases, the distance of the influence of the error at each machine learning prediction point decreases. In this study, 4 was applied according to Pun’s research results for the number of virtual points, and the distance of the radius of influence was 50 km.

W_{k} = \frac{1 / r_{k}^{2}}{n_{v} / R^{2} + \sum_{k} 1 / r_{k}^{2}}

(12)

2.3.4. Evaluation Method of Model Performance

To evaluate the model performance, we employ four statistical methods: IOA, RMSE, Normalized Mean Bias (NMB), and Mean Normalized Gross Error (MNGE). IOA measures the agreement between the observed values and the predicted ones, and ranges between 0 and 1. A value of 0.5 or greater is appropriate, and a value of 1 is very appropriate. RMSE measures the degree of error dispersion between the observed values and the predicted ones. A value of 0 indicates that the predicted values are more accurate. NMB measures the difference between values in the model and observed values in the observed spatial and temporal patterns. NMB values close to 0 indicate that the model appropriately reflects the observed values. MNGE is a relative error that represents the absolute error as a percentage of the actual value. These methods are used to determine the accuracy of the model [43,44,45].

IOA = 1 - \frac{\sum_{i = 1}^{n} (P_{i} - O_{i})}{\sum_{i = 1}^{n} {(| P_{i} - \bar{O_{i}} | + | O_{i} - \bar{O_{i}} |)}^{2}}

(13)

RMSE (μ g / m^{3}) = \sqrt{\frac{1}{n} \sum_{i = 0}^{n} {(P_{i} - O_{i})}^{2}}

(14)

NMB (%) = \frac{1}{n} \sum_{i = 0}^{n} \frac{P_{i} - O_{i}}{\bar{O_{i}}} \times 100

(15)

MNGE (%) = \frac{1}{n} \sum_{i = 0}^{n} \frac{| P_{i} - O_{i} |}{\bar{O_{i}}} \times 100

(16)

Here,

O_{i}

denotes the observed value,

P_{i}

is the predicted value of the model,

\bar{O_{i}}

is the mean of the observed value, and

n

is the number of observations.

3. Results and Discussion

3.1. Result

3.1.1. Prediction Results Using the CMAQ Model

From 1 February 2020 to 29 February 2020, the air quality of the study area was predicted over time using the CMAQ model. For the evaluation of predictive performance, statistical methods are used for the actual measurement data and model results. Table 7 summarizes the verification results of the model predictions and observations over time for PM_2.5. As a result of evaluating the model prediction performance for the vicinity of Busan Port, IOA was 0.600~0.684 and RMSE was 12.26~16.10 μg/m³. The average value (MEAN) for each point using the CMAQ model was 16.0~19.0 μg/m³, which was relatively lower than the actual measurement value (OBS Mean). As a result, it is difficult to say that the prediction using the CMAQ model has high accuracy.

3.1.2. Prediction Results Using the RNN-LSTM Model

Using RNN-LSTM, air quality at Busan North Port and Busan New Port was predicted by reflecting not only air quality data and meteorological data but also activity on pollutant emission sources, and predictions were performed using only air quality data and meteorological data at other points. To select hyperparameters, 27 methods were learned for each point with 1, 2, and 3 hidden layers, 30, 60, and 90 hidden nodes, and 10, 15, and 20 epochs, and a method to show optimal performance was selected. Table 8 presents the results of evaluating the predictive performance of the method, showing the optimal performance for each point. The time series of the results predicted by the RNN-LSTM model are presented in Figure 7. The trend change from 1 February to 29 February 2020, when the prediction was carried out, shows a tendency similar to the actual measured value. Figure 8 shows a scatter plot of the predicted values of the RNN-LSTM model and the measured values.

The best performance was shown when one hidden layer was applied at all points and when the hidden node was applied 120 in Busan North Port, Busan New Port, and Myeongseo-dong, 60 in Yongsu-ri and Jwa-dong, and 30 in Sambang-dong. Except for Busan North Port, the optimal performance was shown when 20 epochs were conducted. The IOA representing the degree of agreement between the predicted value and the measured value is 0.961 to 0.975, and the RMSE is 4.12 to 5.87 μg/m³, indicating that the predicted results using RNN-LSTM show a trend very similar to the measured values.

3.1.3. Prediction Results Using the Hybrid Model

A hybrid model was constructed by combining the CMAQ model and the RNN-LSTM model. The advantage of the hybrid model is that it has a higher spatial distribution prediction performance by combining the spatial distribution prediction ability of CMAQ and the high prediction performance of RNN-LSTM. When using RNN-LSTM in the hybrid model, the point at which the prediction was performed using the RNN-LSTM model should be selected. In this study, 2 points, 4 points, and 6 points were selected, and the prediction results of the RNN-LSTM model at each point were used. In the case of the two points, North Port and New Port of Busan Port were selected. For the four points, Yongsu-ri and Myeongseo-dong were added to the existing two points, and for the six points, Jwa-dong and Sambang-dong were added to the existing four points. Table 9 lists six verification points around Busan Port to evaluate the performance of the hybrid model.

Table 10 summarizes the performance evaluation results for the observed values of atmospheric measurement network points around Busan Port and the predicted values of the CMAQ model and the hybrid model. The index of agreement of the CMAQ model results at the points around Busan North Port (Choryang-dong, Gwangbok-dong, and Gwangan-dong) is 0.633 to 0.634, but that of the hybrid model results is 0.940 to 0.970, indicating that the performance is improved. The index of agreement of the CMAQ model results at the points around Busan New Port (Noksan-dong, Jangnim-dong, and Gyeonghwa-dong) is 0.590 to 0.680, but that of the hybrid model results is 0.888 to 0.928, indicating that the performance is improved.

The RMSE value of the hybrid model at the point around Busan North Port was 4.88 to 6.81 μg/m³, indicating that the error was significantly improved compared to the CMAQ model. In the case of NMB items, the CMAQ model in the vicinity of Busan North Port was −13.6 to 0.9%, which means that the overall CMAQ model result is lower than the observed value, and the hybrid model was −10.3 to 13.9%, showing different characteristics depending on the point. In the case of Busan New Port, the NMB item of the CMAQ model was −39.1 to −9.1%, indicating that the predicted concentration was lower than the actual measured concentration. The hybrid model also showed −26.3 to 3.8%, indicating that the predicted concentration was lower than the observed concentration, except for two cases in Gyeonghwa-dong (Hybrid sites 4 and 6 were applied). In particular, the reason why the value of the NMB item of Busan New Port is significantly lower than that of Busan North Port is that in the case of the CMAQ model, the uncertainty in the emission model, such as unknown emission sources, is large, and in the case of the hybrid model, the expansion of Busan New Port is still in progress, so the number of ships anchored in the port, a major source of pollutants, fluctuates greatly.

Figure 9 shows the time series data between the CMAQ, hybrid model, and observation values for the verification point around Busan Port. Figure 10 shows a scatter plot of the hybrid model. The prediction results of the hybrid model are more similar to the trend of observations than the CMAQ model. The index of agreement also shows that the hybrid model results are similar to the observed values.

Figure 11 shows the spatial distribution of the average air pollution level in February 2020 in the vicinity of Busan Port by reflecting the results of the CMAQ model and the hybrid model. Most of the CMAQ model predicted an average of 16 μg/m³ or less, but in the case of the hybrid model, most areas were predicted to be 16 μg/m³ or more, so the prediction result of the hybrid model showed a spatial distribution closer to the actual measured value.

3.2. Discussion

The CMAQ model is commonly used as a method of predicting air quality, and artificial neural networks have been widely used in recent years. However, when using CMAQ, uncertainty in prediction increases due to uncertainties of the meteorological model and the emission model; consequently, the prediction performance is negatively affected. For example, the accuracy of prediction is lowered due to the difficulty of accurately reflecting the meteorological characteristics of the coast in the meteorological model, even in the vicinity of the port, and the difficulty of accurately estimating the emissions due to various emission sources that emit air pollutants. To overcome the limitations of the existing CMAQ model, this work proposes a hybrid model that combines the RNN-LSTM model and the CMAQ model. In this process, to reduce the uncertainty of the meteorological model with the emission model, learning was carried out using the activity data of the emission source and the meteorological data as the learning data of the RNN-LSTM model. To apply the hybrid model, the prediction point of the artificial neural network (ANN) model must be included within the prediction range of the CMAQ model. If so, air quality at any point within the prediction range of the CMAQ model can be predicted by combining the results of the CMAQ model and the results of the ANN model. The accuracy of the hybrid model is affected by the range of the influence radius R and the surrounding environment of the ANN model prediction point selected for data assimilation. In the case of the influence radius, as R increases, the accuracy of the hybrid model may decrease because not only the near points but also the far points are included in the data assimilation process. However, the smaller R is, the more accurate the hybrid model can be because only the nearby points are included in the data assimilation process. Next, if the data assimilation process includes an ANN model prediction point close to the area with heavy pollutant emissions, it will affect the predicted concentration value of the hybrid model in the direction of increasing. Conversely, if an ANN model prediction point close to an area with low pollutant emissions is included, it will affect the predicted concentration value of the hybrid model in the direction of lowering it.

Each predicted value of 2, 4, or 6 points out of the six points predicted using the RNN-LSTM model was applied to the data assimilation process of the hybrid model to evaluate the predictive ability for the spatial distribution of pollutants in the vicinity of Busan Port. The predicted value of the hybrid model shows a tendency to increase the index of agreement, as it is closer to the RNN-LSTM prediction point, and the RMSE value also decreases. As the number of RNN-LSTM prediction points applied to the hybrid model increased, the prediction performance increased at the Gwangbok-dong and Gyeonghwa-dong points, but the best prediction performance was shown when 4 prediction points were applied to other points. When 6 prediction points were applied, the predicted value of Sambang-dong, which had a fairly low pollution level, were included in the data assimilation process, and this value has a great influence so that the predicted concentration value of the hybrid model at the verification points except for Gwangbok-dong and Gyeonghwa-dong is lowered. As a result, the prediction performance of the hybrid model was degraded. However, because Gwangbok-dong and Gyeonghwa-dong are regions with lower pollution levels compared to other verification points, the influence of the low predicted values of Sambang-dong was small. In this paper, the hybrid model shows the best prediction performance when applying four RNN-LSTM prediction points, but it is judged that only two points of Busan North Port and Busan New Port using meteorological data and the activity data of emission sources will be sufficient to predict air quality in the vicinity of Busan Port. Although the performance change according to the number of RNN-LSTM prediction points is mentioned here, the difference is not very large. Essentially, it is true that the hybrid model has much higher accuracy than the CMAQ, no matter how many prediction points are included.

4. Conclusions

Various studies have been conducted to predict PM_2.5 using the CMAQ model or the artificial neural network model. In this study, we tried to improve the predictive performance of the spatial distribution of PM_2.5 through a hybrid model that combines the CMAQ model and the artificial neural network model. Six points around Busan Port were selected and predicted using the CMAQ model. As a result of comparing the predicted values with the observed values, the IOA was 0.600 to 0.680, and the RMSE was analyzed to be 12.26 to 15.04 μg/m³. After predicting using RNN-LSTM, an artificial neural network model, for the same six points, and comparing the predicted values and the observed values, the IOA was 0.961 to 0.975 and the RMSE was analyzed to be 4.12 to 5.87 μg/m³. That is, the results predicted using artificial neural networks showed better performance than the results predicted using CMAQ. Each predicted value of 2, 4, or 6 points out of the six points predicted using the RNN-LSTM model was applied to the data assimilation process of the hybrid model to evaluate the predictive ability for the spatial distribution of pollutants in the vicinity of Busan Port. As a result of the evaluation, when the hybrid model was applied, the IOA was improved by 0.235~0.317, and the RMSE was reduced by 4.82~8.50 μg/m³ compared to the case using only the CMAQ, indicating that the error was reduced. Through the experiment, it was found that the prediction ability of the spatial distribution of PM_2.5 can be improved by using the hybrid model.

In the future, we plan to conduct a study on the location and number of optimized artificial neural network prediction points to improve the prediction performance of the hybrid model with minimal resources and effort. In addition, it will be necessary to predict the air quality in the city center by improving the model so that pollutant emission sources, such as vehicles, can be considered not only in ports but also in the city center. If air quality prediction is performed in real time using the hybrid model, the accuracy of the calculation of the spatial distribution of air pollutants can be increased, which will help reduce the various damages caused by air pollution.

Author Contributions

H.H.: Conceptualization, Methodology, Software, Writing—Original draft preparation. I.C.: Visualization, Investigation, Writing—Original draft preparation. H.J.: Data curation, Software. Y.K.: Methodology, Validation. J.-B.L.: Methodology, Software. C.H.P.: Methodology, Writing—Reviewing and Editing. H.S.K.: Conceptualization, Writing—Reviewing and Editing, Supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by research fund of Chungnam National University.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

This research is based on the findings of the research project “Development of Integrated Decision Support Model for Environmental Impact Assessment Project”(2022-003) which was conducted by Korea Environment Institute (KEI), supported from “ICT-based Decision Support System Development Project for Environmental Impact Assessment”(20200029990007, RE202101139) of Korea Environmental Industry & Technology Institute (KEITI) by Korea Ministry of Environment (MOE).

Conflicts of Interest

The authors declare no conflict of interest.

References

Murphy, B.N.; Nolte, C.G.; Sidi, F. The detailed emissions scaling, isolation, and diagnostic (DESID) module in the Community Multiscale Air Quality (CMAQ) modeling system version 5.3.2. Geosci. Model Dev. 2021, 14, 3407–3420. [Google Scholar] [CrossRef] [PubMed]
United States Environmental Protection Agency (USEPA). Integrated Science Assessment for Particulate Matter; United States Environmental Protection Agency: Washington, DC, USA, 2019.
Bailey, D.; Plenys, T.; Solomon, G.M.; Campbell, T.R.; Feuer, G.R.; Masters, J.; Tonkonogy, B. Harboring Pollution: The Dirty Truth about U.S. Ports; Natural Resources Defense Council: New York, NY, USA, 2004. [Google Scholar]
Han, C. Air Pollution Reduction Strategies of World Major Ports. Int. Commer. Law Rev. 2010, 48, 27–56. [Google Scholar]
EPA. Current Methodologies in Preparing Mobile Source Port-Related Emission Inventories; ICF International: Fairfax, VA, USA, 2009.
Mueller, D.; Uibel, S.; Takemura, M.; Klingelhoefer, D.; Groneberg, D.A. Ships, Ports and Particulate Air Pollution—An Analysis of Recent Studies. J. Occup. Med. Toxicol. 2011, 6, 31. [Google Scholar] [CrossRef] [PubMed]
Talley, W.K. Port Pollution and Abatement Policies Conference. Available online: https://www.dbpia.co.kr/pdf/pdfView.do?nodeId=NODE01783503&googleIPSandBox=false&mark=0&useDate=&ipRange=false&accessgl=Y&language=ko_KR&hasTopBanner=true (accessed on 15 July 2022).
Feng, J.; Zhang, Y.; Li, S.; Mao, J.; Patton, A.; Zhou, Y.; Ma, W.; Liu, C.; Kan, H.; Huang, C.; et al. The Influence of Spatiality on Shipping Emissions, Air Quality and Potential Human Exposure in the Yangtze River Delta/Shanghai, China. Atmos. Chem. Phys. 2019, 19, 6167–6183. [Google Scholar] [CrossRef]
IMO. Guidelines for Consistent Implementation of the 0.50% Sulphur Limit under Marpol; International Marit Organ: London, UK, 2019. [Google Scholar]
Community Modeling and Analysis System. Developers’ Guide for the Community Multiscale Air Quality (CMAQ) Modeling System; University of North Carolina at Chapel Hill: Chapel Hill, NC, USA, 2019. [Google Scholar]
Penn, S.L.; Arunachalam, S.; Woody, M.; Heiger-Bernays, W.; Tripodis, Y.; Levy, J.I. Estimating State-Specific Contributions to PM_2.5- and O₃-Related Health Burden from Residential Combustion and Electricity Generating Unit Emissions in the United States. Environ. Health Perspect. 2017, 125, 324–332. [Google Scholar] [CrossRef]
Chen, X.; Zhang, Y.; Wang, K.; Tong, D.; Lee, P.; Tang, Y.; Huang, J.; Campbell, P.C.; Mcqueen, J.; Pye, H.O.T.; et al. Evaluation of the Offline-Coupled GFSv15–FV3–CMAQv5.0.2 in Support of the next-Generation National Air Quality Forecast Capability over the Contiguous United States. Geosci. Model Dev. 2021, 14, 3969–3993. [Google Scholar] [CrossRef] [PubMed]
Russo, A.; Raischel, F.; Lind, P.G. Air Quality Prediction Using Optimal Neural Networks with Stochastic Variables. Atmos. Environ. 2013, 79, 822–830. [Google Scholar] [CrossRef]
Wu, Z.; Wu, X.; Wang, Y.; He, S. PM_2.5/PM₁₀ Ratio Prediction Based on a Long Short-Term Memory Neural Network in Wuhan, China. Geosci. Model Dev. 2020, 13, 1499–1511. [Google Scholar] [CrossRef]
Li, L.; Chen, B.; Zhang, Y.; Zhao, Y.; Xian, Y.; Xu, G.; Zhang, H.; Guo, L. Retrieval of daily PM2.5 concentrations using nonlinear methods: A case study of the Beijing-Tianjin-Hebei Region, China. Remote Sens. 2018, 10, 2006. [Google Scholar] [CrossRef]
Chen, Y.-Y.; Lv, Y.; Li, Z.; Wang, F.-Y. Long short-term memory model for traffic congestion prediction with online open data. In Proceedings of the 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, 1–4 November 2016; pp. 132–137. [Google Scholar]
Skrobek, D.; Krzywanski, J.; Sosnowski, M.; Kulakowska, A.; Zylka, A.; Grabowska, K.; Ciesielska, K.; Nowak, W. Prediction of Sorption Processes Using the Deep Learning Methods (Long Short-Term Memory). Energies 2020, 13, 6601. [Google Scholar] [CrossRef]
Binkowski, F.S.; Roselle, S.J. Models-3 Community Multiscale Air Quality (CMAQ) Model Aerosol Component 1. Model Description. J. Geophys. Res. Atmos. 2003, 108, 2001JD001409. [Google Scholar] [CrossRef]
Mebust, M.R.; Eder, B.K.; Binkowski, F.S.; Roselle, S.J. Models-3 Community Multiscale Air Quality (CMAQ) Model Aerosol Component 2. Model Evaluation. J. Geophys. Res. Atmos. 2003, 108, 2001JD001410. [Google Scholar] [CrossRef]
Shimadera, H.; Hayami, H.; Chatani, S.; Morikawa, T.; Morino, Y.; Mori, Y.; Yamaji, K.; Nakatsuka, S.; Ohara, T. Urban Air Quality Model Inter-Comparison Study (UMICS) for Improvement of PM2.5 Simulation in Greater Tokyo Area of Japan. Asian J. Atmos. Environ. 2018, 12, 139–152. [Google Scholar] [CrossRef]
Xayasouk, T.; Lee, H.; Lee, G. Air Pollution Prediction Using Long Short-Term Memory (LSTM) and Deep Autoencoder (DAE) Models. Sustainability 2020, 12, 2570. [Google Scholar] [CrossRef]
Zhao, J.; Deng, F.; Cai, Y.; Chen, J. Long Short-Term Memory—Fully Connected (LSTM-FC) Neural Network for PM_2.5 Concentration Prediction. Chemosphere 2019, 220, 486–492. [Google Scholar] [CrossRef] [PubMed]
Ma, P.; Tao, F.; Gao, L.; Leng, S.; Yang, K.; Zhou, T. Retrieval of Fine-Grained PM2.5 Spatiotemporal Resolution Based on Multiple Machine Learning Models. Remote Sens. 2022, 14, 599. [Google Scholar] [CrossRef]
Gao, L.; Tao, F.; Ma, P.; Wang, C.; Kong, W.; Chen, W.; Zhou, T. A short-distance healthy route planning approach. J. Transp. Health 2022, 24, 101314. [Google Scholar] [CrossRef]
Chen, B.; Song, Z.; Huang, J.; Zhang, P.; Hu, X.; Zhang, X.; Guan, X.; Ge, J.; Zhou, X. Estimation of atmospheric PM10 concentration in China using an interpretable deep learning model and top-of-the-atmosphere reflectance data from China’s new generation geostationary meteorological satellite, FY-4A. J. Geophys. Res. Atmos. 2022, 127, e2021JD036393. [Google Scholar] [CrossRef]
Song, Z.; Chen, B.; Zhang, P.; Guan, X.; Wang, X.; Ge, J.; Hu, X.; Zhang, X.; Wang, Y. High Temporal and Spatial Resolution PM_2.5 Dataset Acquisition and Pollution Assessment Based on FY-4A TOAR Data and Deep Forest Model in China. Atmos. Res. 2022, 274, 106199. [Google Scholar] [CrossRef]
Kim, H.S.; Park, I.; Song, C.H.; Lee, K.; Yun, J.W.; Kim, H.K.; Jeon, M.; Lee, J.; Han, K.M. Development of a Daily PM₁₀ and PM_2.5 Prediction System Using a Deep Long Short-Term Memory Neural Network Model. Atmos. Chem. Phys. 2019, 19, 12935–12951. [Google Scholar] [CrossRef]
Lee, J.B.; Lee, J.B.; Koo, Y.S.; Kwon, H.Y.; Choi, M.H.; Park, H.J.; Lee, D.G. Development of a Deep Neural Network for Predicting 6 h Average PM_2.5 Concentrations up to 2 Subsequent Days Using Various Training Data. Geosci. Model Dev. 2022, 15, 3797–3813. [Google Scholar] [CrossRef]
Hong, H.; Jeon, H.; Youn, C.; Kim, H.S. Incorporation of Shipping Activity Data in Recurrent Neural Networks and Long Short-Term Memory Models to Improve Air Quality Predictions around Busan Port. Atmosphere 2021, 12, 1172. [Google Scholar] [CrossRef]
Qi, Y.; Li, Q.; Karimian, H.; Liu, D. A hybrid model for spatiotemporal forecasting of PM_2.5 based on graph convolutional neural network and long short-term memory. Sci. Total Environ. 2019, 664, 1–10. [Google Scholar] [CrossRef] [PubMed]
Pak, U.; Ma, J.; Ryu, U.; Ryom, K.; Juhyok, U.; Pak, K.; Pak, C. Deep learning-based PM_2.5 prediction considering the spatiotemporal correlations: A case study of Beijing, China. Sci. Total Environ. 2020, 699, 133561. [Google Scholar] [CrossRef] [PubMed]
Lu, X.; Sha, Y.; Li, Z.; Huang, Y.; Chen, W.; Chen, D.; Shen, J.; Chen, Y.; Fung, J. Development and application of a hybrid long-short term memory–three dimensional variational technique for the improvement of PM_2.5 forecasting. Sci. Total Environ. 2021, 770, 144221. [Google Scholar] [CrossRef]
Li, X.; Peng, L.; Yao, X.; Cui, S.; Hu, Y.; You, C.; Chi, T. Long short-term memory neural network for air pollutant concentration predictions: Method development and evaluation. Environ. Pollut. 2017, 231, 997–1004. [Google Scholar] [CrossRef]
Guo, X.; Wang, Y.; Mei, S.; Shi, C.; Liu, Y.; Pan, L.; Li, K.; Zhang, B.; Wang, J.; Zhong, Z.; et al. Monitoring and modelling of PM_2.5 concentration at subway station construction based on IoT and LSTM algorithm optimization. J. Clean. Prod. 2022, 360, 132179. [Google Scholar] [CrossRef]
Isakov, V.; Touma, J.S.; Burke, J.; Lobdell, D.T.; Palma, T.; Rosenbaum, A.; KÖzkaynak, H. Combining Regional- and Local-Scale Air Quality Models with Exposure Models for Use in Environmental Health Studies. J. Air Waste Manage. Assoc. 2009, 59, 461–472. [Google Scholar] [CrossRef]
Oh, I.; Hwang, M.-K.; Bang, J.-H.; Yang, W.; Kim, S.; Lee, K.; Seo, S.; Lee, J.; Kim, Y. Comparison of Different Hybrid Modeling Methods to Estimate Intraurban NO₂ Concentrations. Atmos. Environ. 2021, 244, 117907. [Google Scholar] [CrossRef]
Community Modeling and Analysis System. Operational Guidance for the Community Multiscale Air Quality (CMAQ) Modeling System; University of North Carolina at Chapel Hill: Chapel Hill, NC, USA, 2010. [Google Scholar]
Reddy, V.; Yedavalli, P.; Mohanty, S.; Nakhat, U. Deep Air: Forecasting Air Pollution in Beijing, China. 2017. Available online: https://www.ischool.berkeley.edu/sites/default/files/sproject_attachments/deep-air-forecasting_final.pdf (accessed on 23 August 2022).
Akima, H. A New Method of Interpolation and Smooth Curve Fitting. J. ACM 1970, 17, 589–602. [Google Scholar] [CrossRef]
Hochreiter, S. Long Short-Term Memory. Neural Comput. 1997, 1780, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Zhang, Y.; Vijayaraghavan, K.; Wen, X.Y.; Snell, H.E.; Jacobson, M.Z. Probing into Regional Ozone and Particulate Matter Pollution in the United States: 1. A 1 Year CMAQ Simulation and Evaluation Using Surface and Satellite Data. J. Geophys. Res. Atmos. 2009, 114, D22304. [Google Scholar] [CrossRef]
Pun, B.K.; Seigneur, C. Using Cmaq To Interpolate Among Castnet Measurements. In Proceedings of the CMAS Conference, San Ramon, CA, USA, 18 October 2006; pp. 1–6. [Google Scholar]
Chang, L.; Scorgie, Y.; Duc, H.; Monk, K.; Fuchs, D.; Trieu, T. Major Source Contributions to Ambient PM2.5 and Exposures within the New South Wales Greater Metropolitan Region. Atmosphere 2019, 10, 138. [Google Scholar] [CrossRef] [Green Version]
Tesche, T.W.; McNally, D.E. Operational Evaluation of the MM5 Meteorological Model over the Continental United States: Protocol for Annual and Episodic Evaluation Task Order 4TCG-68027015; Alpine Geophysics, LLC: Ft. Wright, KY, USA, 2002. [Google Scholar]
Emery, C.; Tai, E.; Yarwood, G. Enhanced Meteorological Modeling and Performance Evaluation for Two Texas Ozone Episodes; ENVIRON, International Corp.: Novato, CA, USA, 2001. [Google Scholar]

Figure 1. Conceptual diagram of the hybrid model system.

Figure 2. Domain area for PM_2.5 prediction. (Red box: Research area).

Figure 3. Locations of Air Quality Monitoring System (AQMS) sites and the Korea Meteorological Administration (KMA) Automated Synoptic Observing System (ASOS) sites in the research area.

Figure 4. Operating procedure of CMAQ model.

Figure 5. Operating procedure of RNN-LSTM.

Figure 6. Structure of data assimilation.

Figure 7. PM_2.5 time series graph for prediction period by site. (a) North Port, (b) New Port, (c) Yongsu-ri, (d) Myeongseo-dong, (e) Jwa-dong, (f) Sambang-dong (time period: 1 February 2020 to 29 February 2020).

Figure 8. Scatter plot by site (optimal prediction result by RNN-LSTM). (a) North Port, (b) New Port, (c) Yongsu-ri, (d) Myeongseo-dong, (e) Jwa-dong, (f) Sambang-dong.

Figure 9. Time series graph for the prediction period by verification points. (a) Choryang-dong, (b) Gwangbok-dong, (c) Gwangan-dong, (d) Noksan-dong, (e) Jangnim-dong, (f) Gyeonghwa-dong (time period: 1 February 2020 to 29 February 2020).

Figure 10. Scatter plot by verification points (prediction result by hybrid model). (a) Choryang-dong, (b) Gwangbok-dong, (c) Gwangan-dong, (d) Noksan-dong, (e) Jangnim-dong, (f) Gyeonghwa-dong.

Figure 11. Monthly average spatial distribution diagram of air pollutants by (a) CMAQ model and (b) Hybrid model with 2 sites, (c) with 4 sites, (d) with 6 sites.

Table 1. CMAQ modeling area and GRID configuration.

	Description
Projection origin	126° E, 38° N
Projection	Lambert conformal conic
Two standard parallels of latitude of projection origin	30°, 60°
Domain name	BPA_27_01	BPA_09_01	BPA_03_01	BPA_01_01
Horizontal resolution (Size and Count)	27 km	9 km	3 km	1 km
Horizontal resolution (Size and Count)	121 × 128	70 × 82	82 × 76	58 × 58
X-origin	−1,633,500 m	−166,500 m	106,500 m	233,500 m
Y-origin	−1,728,000 m	−585,000 m	−402,000 m	−341,000 m

Table 2. Data types and variables of input dataset of CMAQ, LSTM, and Hybrid model.

Model	Input Data		Output Data
CMAQ	Meteorological	NCEP GFS 0.25 Degree Global Forecast data	Gridded Concentration (PM_2.5, PM₁₀, O₃, etc.)
CMAQ	Emission	PM_2.5, PM₁₀ O₃, NO₂, SO₂, CO	Gridded Concentration (PM_2.5, PM₁₀, O₃, etc.)
RNN-LSTM	Surface Meteorological	Temperature, Dew point Pressure Wind speed Wind Direction Rainfall	PM_2.5
	Air quality	PM_2.5, PM₁₀ O₃, NO₂, SO₂, CO
	Emission activity	Anchored ships
Hybrid Model	Gridded Concentration (PM_2.5, PM₁₀, O₃, etc.) of the CMAQ result		Gridded Concentration (PM_2.5)
Hybrid Model	PM_2.5 of RNN-LSTRM result		Gridded Concentration (PM_2.5)

Table 3. Detailed information about input parameters in the input dataset of RNN-LSTM.

Type	Input Parameters	Timing	Unit
Air quality data	PM_2.5, PM₁₀	every 1-h	µg/m³
Air quality data	SO₂, O₃, NO₂, CO	every 1-h	ppm
Meteorological data	Temperature	every 1-h	°C
	Dew point	every 1-h	°C
	Pressure	every 1-h	hPa
	Wind speed	every 1-h	m/s
	Wind Direction	every 1-h	Degree
	Rainfall	every 1-h	mm
Shipping activity Data	anchored ships	every 1-h	ea

Table 4. Situations for setting the Hyperparameter of RNN-LSTM.

Type	Condition Values	Applied Values
Optimizer	Adam, Adamax, RMSprop	Adam
Batch size	50, 100, 200	100
Learning rate	0.001, 0.01, 0.1	0.001
Dropout	0.2, 0.3, 0.4	0.2
Loss function	Loss, L2 Loss, L1 Loss	L2 Loss

Table 5. Partitions of the input dataset of RNN-LSTM.

Type	Configuration	Settings
Data partition	Training set	8760
	Validation set	744
	Test set	696

Table 6. Air quality prediction sites (learning and verification period: 1 January 2019 to 31 January 2020; prediction period: 1 February 2019 to 29 February 2020).

Site	Input Data	Hyperparameter
North Port	AQMS ¹ + ASOS ² (Busan) + Anchored Ships Information	Hidden nodes: 30, 60, 120 Hidden Layers: 1, 2, 3 Epochs: 10, 15, 20
New Port	AQMS ¹ + ASOS ² (Busan) + Anchored Ships Information
Yongsu-ri	AQMS + ASOS (Yangsansi)
Myeongseo-dong	AQMS + ASOS (Bukchangwon)
Jwa-dong	AQMS + ASOS (Busan)
Sambang-dong	AQMS + ASOS (Gimhaesi)

¹ AQMS: Air Quality Monitoring Station; ² ASOS: Automated Synoptic Observing System.

Table 7. Statistical comparison of CMAQ-modeled and AQMS-observed PM_2.5.

Site	OBS Mean	Model Mean	NMB	MNGE	RMSE	IOA
North Port	21.8	19.0	−13.1	56.68	13.63	0.684
New Port	23.1	17.5	−24.1	52.97	16.10	0.612
Yongsu-ri	22.7	17.5	−22.9	35.41	12.26	0.663
Myeongseo-dong	22.2	16.0	−28.0	62.23	15.04	0.616
Jwa-dong	23.1	18.8	−18.7	41.33	14.62	0.600
Sambang-dong	17.5	17.3	−1.4	98.3	11.86	0.646

Table 8. Statistical comparison of RNN-LSTM-modeled and AQMS-observed PM_2.5 (North Port and New Port: AQMS + ASOS + anchored ships; other sites: AQMS + ASOS).

Site Name	Optimal Training Parameters			NMB (%)	MNGE (%)	RMSE (μg/m³)	IOA
Site Name	Hidden Node	Hidden Layer	Epochs	NMB (%)	MNGE (%)	RMSE (μg/m³)	IOA
(a) North Port	120	1	15	2.8	24.28	4.88	0.975
(b) New Port	120	1	20	−3.5	23.79	5.87	0.969
(c) Yongsu-ri	60	1	20	0.9	17.38	4.12	0.974
(d) Myeongseo-dong	120	1	20	1.1	26.39	5.39	0.971
(e) Jwa-dong	60	1	20	−2.1	16.69	4.26	0.974
(f) Sambang-dong	30	1	20	−0.7	28.01	5.07	0.961

Table 9. Verification point information and distance information between port and measurement point.

Port	Verification Site	Latitude	Longitude	Distance from Busan Port (km)
North Port	Choryang-dong	35.12714	129.0467	0.90
	Gwangbok-dong	35.09985	129.0303	3.35
	Gwangan-dong	35.15231	129.1081	5.71
New Port	Noksan-dong	35.08663	128.8639	2.95
	Jangnim-dong	35.08298	128.9668	11.90
	Gyeonghwa-dong	35.15497	128.6896	15.61

Table 10. Statistics of Hybrid modeled and AQMS-observed PM_{2. 5} (OBS: Observations, Model: CMAQ or Hybrid model).

Site	Type	OBS Mean (μg/m³)	Model Mean (μg/m³)	NMB (%)	MNGE (%)	RMSE (μg/m³)	IOA
(a) Choryang-dong	CMAQ	21.1	19.3	−8.4	55.89	13.56	0.684
	Hybrid (2 sites)		21.8	3.7	27.03	5.31	0.967
	Hybrid (4 sites)		21.9	4.2	27.08	5.18	0.968
	Hybrid (6 sites)		18.8	−10.3	25.2	6.08	0.947
(b) Gwangbok-dong	CMAQ	19.4	19.4	0.9	83.16	12.27	0.683
	Hybrid (2 sites)		21.7	13.3	44.16	6.81	0.940
	Hybrid (4 sites)		21.8	13.9	44.82	6.66	0.942
	Hybrid (6 sites)		20.7	8.1	41.06	5.99	0.949
(c) Gwangan-dong	CMAQ	22.5	19.5	−13.6	43.64	13.3	0.683
	Hybrid (2 sites)		21.2	−5.7	24.00	5.29	0.966
	Hybrid (4 sites)		21.6	−4.1	22.19	4.88	0.970
	Hybrid (6 sites)		20.6	−8.4	21.04	5.11	0.965
(d) Noksan-dong	CMAQ	29.4	17.8	−39.1	40.19	19.28	0.590
	Hybrid (2 sites)		21.5	−25.6	28.84	10.87	0.895
	Hybrid (4 sites)		21.6	−25.3	28.104	10.78	0.895
	Hybrid (6 sites)		21.7	−26.3	28.33	11.00	0.888
(e) Jangnim-dong	CMAQ	28.7	19.9	−30.4	41.80	17.79	0.600
	Hybrid (2 sites)		21.9	−23.2	28.13	9.35	0.917
	Hybrid (4 sites)		22.0	−23.0	27.56	9.31	0.917
	Hybrid (6 sites)		21.3	−25.4	28.72	9.81	0.905
(f) Gyeonghwa-dong	CMAQ	18.2	16.4	−9.7	111.86	12.56	0.680
	Hybrid (2 sites)		17.7	−2.4	64.23	7.74	0.915
	Hybrid (4 sites)		18.4	1.5	66.12	7.33	0.925
	Hybrid (6 sites)		18.8	3.8	68.27	7.17	0.928

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hong, H.; Choi, I.; Jeon, H.; Kim, Y.; Lee, J.-B.; Park, C.H.; Kim, H.S. An Air Pollutants Prediction Method Integrating Numerical Models and Artificial Intelligence Models Targeting the Area around Busan Port in Korea. Atmosphere 2022, 13, 1462. https://doi.org/10.3390/atmos13091462

AMA Style

Hong H, Choi I, Jeon H, Kim Y, Lee J-B, Park CH, Kim HS. An Air Pollutants Prediction Method Integrating Numerical Models and Artificial Intelligence Models Targeting the Area around Busan Port in Korea. Atmosphere. 2022; 13(9):1462. https://doi.org/10.3390/atmos13091462

Chicago/Turabian Style

Hong, Hyunsu, IlHwan Choi, Hyungjin Jeon, Yumi Kim, Jae-Bum Lee, Cheong Hee Park, and Hyeon Soo Kim. 2022. "An Air Pollutants Prediction Method Integrating Numerical Models and Artificial Intelligence Models Targeting the Area around Busan Port in Korea" Atmosphere 13, no. 9: 1462. https://doi.org/10.3390/atmos13091462

APA Style

Hong, H., Choi, I., Jeon, H., Kim, Y., Lee, J.-B., Park, C. H., & Kim, H. S. (2022). An Air Pollutants Prediction Method Integrating Numerical Models and Artificial Intelligence Models Targeting the Area around Busan Port in Korea. Atmosphere, 13(9), 1462. https://doi.org/10.3390/atmos13091462

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Air Pollutants Prediction Method Integrating Numerical Models and Artificial Intelligence Models Targeting the Area around Busan Port in Korea

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Design

2.2. Input Data

2.3. Method

2.3.1. CMAQ

2.3.2. RNN-LSTM

2.3.3. Hybrid (CMAQ and RNN-LSTM) Model

2.3.4. Evaluation Method of Model Performance

3. Results and Discussion

3.1. Result

3.1.1. Prediction Results Using the CMAQ Model

3.1.2. Prediction Results Using the RNN-LSTM Model

3.1.3. Prediction Results Using the Hybrid Model

3.2. Discussion

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI