An Efficient Method for Capturing the High Peak Concentrations of PM 2.5 Using Gaussian-Filtered Deep Learning

: This paper proposes a deep learning model that integrates a convolutional neural network with a gate circulation unit that captures patterns of high-peak PM 2.5 concentrations. The purpose is to accurately predict high-peak PM 2.5 concentration data that cannot be trained in general deep learning models. For the training of the proposed model, we used all available weather and air quality data for three years from 2015 to 2017 from 25 stations of the National Institute of Environmental Research (NIER) and the Korea Meteorological Administration (KMA) observatory in Seoul, South Korea. Our model trained three years of data and predicted high-peak PM 2.5 concentrations for the year 2018. In addition, we propose a Gaussian ﬁlter algorithm as a preprocessing method for capturing high concentrations of PM 2.5 in the Seoul area and predicting them more accurately. This model overcomes the limitations of conventional deep learning approaches that are unable to predict high peak PM 2.5 concentrations. Comparing model measurements at each of the 25 monitoring sites in 2018, we found that the deep learning model with a Gaussian ﬁlter achieved an index of agreement of 0.73–0.89 and a proportion of correctness of 0.89–0.96, and compared to the conventional deep learning method (average POC = 0.85), the Gaussian ﬁlter algorithm (average POC = 0.94) improved the accuracy of high-concentration PM 2.5 prediction by an average of about 9%. Applying this algorithm in the preprocessing stage could be updated to predict the risk of high PM 2.5 concentrations in real time. filter to preprocess the PM 2.5 concentration data, we were able to overcome the limitations of existing deep learning models, which consider high-peak PM 2.5 concentration data as outliers and exclude them from training. To train the data, we applied a CNN/GRU deep learning model with a Gaussian filter, which produced slightly more accurate results than a CNN model without a Gaussian filter or a CNN/GRU model without a Gaussian filter. After we applied the CNN/GRU to train the high-peak PM 2.5 concentration data preprocessed with a Gaussian filter, improvement in the accuracy of predictions approached 0.93. We used CNN to extract features of the data and GRUs to


Introduction
The World Health Organization (WHO) guidelines for ensuring sufficient air quality recommend that concentrations of particulate matter with diameters of 2.5 µm and smaller (PM 2.5 ) and PM 10 should not exceed 24-h average concentration thresholds of 15 µg/m 3 and 45 µg/m 3 , respectively [1]. A close, quantitative relationship exists between exposure to high concentrations of small particulates (PM 10 and PM 2.5 ), both daily and over time, and increased mortality or morbidity [1]. Actual recorded hourly peak concentrations frequently exceed thresholds recommended by the WHO. An average hourly PM concentration at or above the WHO threshold can contribute to adverse health effects [2][3][4][5].
To assess the acute health effects of ambient air pollution, most studies have used average daily concentrations as exposure variables. The average daily exposure of PM 2.5 , however, does not assume a constant level of exposure throughout the day, nor does it account for the difference between high and low levels of concentration. Therefore, examining the daily peak concentrations of air pollution, rather than the average daily concentrations of air pollution, may better reflect the biological mechanisms triggered by high-level exposure to pollutants [6]. Although the time-dependent nature of hourly concentrations, which show daily peak concentrations, may be a more accurate indicator of exposure than daily averages, few studies have examined the health effects stemming from exposure to hourly peak concentrations of air pollution. Existing studies, such as this one, indicate the importance of determining peak hourly concentrations of air pollution [7,8].
The geographical characteristics of East Asia, which contribute to an inflow of external air pollutants, significantly affect the overall concentrations of air pollutants in South weather conditions that affect concentrations of PM 2.5 . We used time average wind speed (m/s), wind direction (0-360 • ), temperature ( • C), relative humidity (%), dew point temperature ( • C), surface pressure (hPa), and precipitation (mm) obtained from the Korean Meteorological Administration (KMA) and measurement data from 2015 to 2017 to train a model that predicts PM 2.5 concentrations per hour in 2018. When we were missing measurement data, we applied the K-nearest neighbor (KNN) imputation method [27], which estimates the values of missing data using k samples of spatially close proximity in the given data and then imputes each missing datum by computing the mean of its k neighbors.
We defined the hourly peak concentration of PM 2.5 as the maximum concentration of PM 2.5 in 24 h on a given day. We also used data from these stations to calculate the daily average and daily maximum concentrations of PM 2.5 , NO 2 , SO 2 , and O 3 .
We obtained the daily average and maximum concentrations of PM 2.5 for 25 atmospheric monitoring stations in Seoul, shown in Figure 1, each separated from the next by a distance between 1 km and 5 km. We collected air samples from remote stations far from high-traffic areas, industrial resources, buildings or residential sources of coal, waste, or petroleum combustion, which we considered representative of the air in the surrounding area in which the stations are located. As PM2.5 is small in quantity but constitutes a large specific surface area, it easily absorbs various heavy metals and harmful air pollutants. Therefore, when PM2.5 penetrates deep into the respiratory tract of humans, it attaches to the lung tissue and causes respiratory diseases. As heavy metals contained in PM2.5 are absorbed into blood vessels and cause strokes or cardiovascular disease, it is also associated with an increase in mortality [28]. Therefore, we required a more accurate method of predicting the maximum concentrations of PM2.5, which could contribute to mitigating the adverse effects on human health.

Correlation between Maximum Daily Concentrations of PM2.5 and Traffic Volume, the Factory Area, and Population Density
We began this study by investigating the causes of high peaks of PM2.5. A majority of studies have identified sources of high-concentration fine dust in Korea as those emitted from abroad, that is, long-distance pollutants and domestic emissions. We found that concentration levels depend on climatic factors [13,14]. For example, fine dust and ultrafine dust originating from outside of Korea mainly flow into the country during the spring and the winter, when heating for cold weathers and northwesterly winds are frequent [29,30]. Although the domestic contribution of foreign fine dust varies depending on the season or air quality modeling conditions, it has been reported to be approximately 30% or more [10,30].   Figure 1, the stations with the highest daily maximum concentrations of PM 2.5 in Seoul were #221, #231, #241, and #251.
As PM 2.5 is small in quantity but constitutes a large specific surface area, it easily absorbs various heavy metals and harmful air pollutants. Therefore, when PM 2.5 penetrates deep into the respiratory tract of humans, it attaches to the lung tissue and causes respiratory diseases. As heavy metals contained in PM 2.5 are absorbed into blood vessels and cause strokes or cardiovascular disease, it is also associated with an increase in mortality [28]. Therefore, we required a more accurate method of predicting the maximum concentrations of PM 2.5 , which could contribute to mitigating the adverse effects on human health. We began this study by investigating the causes of high peaks of PM 2.5 . A majority of studies have identified sources of high-concentration fine dust in Korea as those emitted from abroad, that is, long-distance pollutants and domestic emissions. We found that concentration levels depend on climatic factors [13,14]. For example, fine dust and ultrafine dust originating from outside of Korea mainly flow into the country during the spring and the winter, when heating for cold weathers and northwesterly winds are frequent [29,30]. Although the domestic contribution of foreign fine dust varies depending on the season or air quality modeling conditions, it has been reported to be approximately 30% or more [10,30].
According to several studies, the effects of fine dust from domestic emissions in South Korea vary by sources, such as thermal power plants, industrial combustion, and road/non-road pollution sources [31][32][33]. This paper investigates the correlation between the traffic volume, the number of factories, and the population density in the Seoul area as factors affecting daily maximum concentrations of PM 2.5 in the Seoul metropolitan area.
As shown in Figure 2a, the area with the largest population density is #273 in southeastern Seoul. Not surprisingly, the daily maximum concentrations of PM 2.5 in #273 are higher than those in other regions; this area, however, does not have the highest daily maximum concentrations. The area is a representative suburb/residential community in Seoul, and although, administratively, it is home to a large number of residents, most of the residents commute to other areas. They travel to other areas between 9 a.m. and 7 p.m. and typically return after 7 p.m. In addition, the population density in areas #221, #231, #241, and #251, where the maximum daily concentrations of PM 2.5 are high, is relatively low compared to that in the other areas. As shown in Figure 2b, the area with the largest daily average traffic volume is southeastern Seoul, and the station with the highest traffic volume is #261 (as shown in Figure 1), where the daily maximum concentrations of PM 2.5 are higher than in other areas but not the highest. In addition, the population density in areas #221, #231, #241, and #251, where the maximum daily concentrations of PM 2.5 are high, is relatively low compared to that in other areas. Figure 2c shows the distribution of factories in Seoul. Areas with a large number of factories are #221, #231, and 281, and as shown in Figure 1, these areas, compared to others in Seoul, have relatively high daily maximum concentrations of PM 2.5 . They are located in the center of Seoul, home to a concentration of industries. Figure 2c, however, displays only factory locations in the Seoul metropolitan area, not factory locations outside of Seoul. As shown in Figure 2d, a major factory area outside of Seoul is concentrated adjacent to the sea, southwest of Seoul, thus calling for an investigation of air pollutants in factory areas outside of Seoul. Since this study used only air quality data for the Seoul metropolitan area instead of analyzing direct air quality data from outside of Seoul, we investigated data on wind directions at five stations (from among the 25) in the areas that registered the highest concentrations of daily PM 2.5 and the five that recorded the lowest.
In our predictions of the daily maximum concentrations of PM 2.5 at the 25 stations in Seoul, we included wind direction information at each station ( Figure 3). Figure 3a shows five stations with the lowest daily maximum concentrations of PM 2.5 in Seoul. Most wind directions are easterly or westerly. Figure 3b, however, shows that most winds at the five stations with the highest daily maximum concentrations of PM 2.5 in Seoul are southerly and southwesterly. These observations show that air pollutants from factory areas, concentrated in the southwestern part of Seoul, influence the maximum daily concentration of PM 2.5 in Seoul. major factory area outside of Seoul is concentrated adjacent to the sea, southwest of Seoul, thus calling for an investigation of air pollutants in factory areas outside of Seoul. Since this study used only air quality data for the Seoul metropolitan area instead of analyzing direct air quality data from outside of Seoul, we investigated data on wind directions at five stations (from among the 25) in the areas that registered the highest concentrations of daily PM2.5 and the five that recorded the lowest.

Materials and Methods
In general, when predicting future data using deep learning approaches such as DNN, CNN, and LSTM, researchers remove outliers, such as maximum and minimum values, during the learning process because a feature of deep learning enables it to learn the general trend of data, not specific data such as outliers [19]. In this paper, we used a deep learning method to predict the maximum concentrations of PM2.5, but after extracting only outliers, not general data, we identified the outlier values and predicted the maximum concentrations of PM2.5 from the predicted outlier values.

Outlier Extraction of PM2.5 Data Using a Gaussian Filter
The Gaussian filter, a widely used tool in the image processing field, converts a smooth image by removing noise from the original image via alleviating the color change of abrupt pixels [34]. In general, pixels in an image have a greater weight the closer they are to the current pixel and less weight the farther they are from the current pixel. As the Gaussian filter spatially changes an image smoothly, nearby pixels have similar values. Therefore, since the noise value has relatively little correlation with the values of

Materials and Methods
In general, when predicting future data using deep learning approaches such as DNN, CNN, and LSTM, researchers remove outliers, such as maximum and minimum values, during the learning process because a feature of deep learning enables it to learn the general trend of data, not specific data such as outliers [19]. In this paper, we used a deep learning method to predict the maximum concentrations of PM 2.5 , but after extracting only outliers, not general data, we identified the outlier values and predicted the maximum concentrations of PM 2.5 from the predicted outlier values.

Outlier Extraction of PM 2.5 Data Using a Gaussian Filter
The Gaussian filter, a widely used tool in the image processing field, converts a smooth image by removing noise from the original image via alleviating the color change of abrupt pixels [34]. In general, pixels in an image have a greater weight the closer Sustainability 2021, 13, 11889 6 of 18 they are to the current pixel and less weight the farther they are from the current pixel. As the Gaussian filter spatially changes an image smoothly, nearby pixels have similar values. Therefore, since the noise value has relatively little correlation with the values of neighboring pixels, it can be mitigated by a weighted average of neighboring pixel values. Using this characteristic of the Gaussian filter, it is possible to capture the pattern of daily maximum concentrations of PM 2.5 .
The idea of the Gaussian filter algorithm applied in this study was to use the distribution of PM 2.5 data as a point-spread function, carried out by convolution [26]. Before we performed convolution, however, we needed to make individual approximations to Gaussian functions. In the theory of Gaussian smoothing, the Gaussian distribution is nonzero everywhere, requiring an infinitely large convolution kernel; in practice, nevertheless, it roughly equals zero greater than 3 standard deviations from the mean, and we could truncate the kernel at this point. Since we used one-dimensional data, referred to as PM 2.5 data, in this study, the Gaussian distribution was as follows: where σ is the standard deviation of the distribution. Figure 4 shows the convolution kernel of Gaussian approximation using a Gaussian filter, defined as an appropriate value for the PM 2.5 data. If we applied the Gaussian filter, shown in Figure 4b, to the PM 2.5 data in Figure 4a, we obtained the result shown in Figure 4c. By subtracting the obtained data from the original PM 2.5 data, we obtained the pattern for the high peak of PM 2.5 shown in Figure 4d, which displays the result of a comparison between the original PM 2.5 data and the outlier high-peak PM 2.5 data through a Gaussian filter. between the original PM2.5 data and the outlier high-peak PM2.5 data through a Gaussian filter.  Figure 5 shows the results of a comparison between the high peak data of PM2.5 and the original PM2.5 by applying a Gaussian filter to the daily PM2.5 data in Seoul. Figure 5 shows high accuracy for high peak data of 76 µg/m 3 or more, which satisfies the WHO standard guideline. Using the Gaussian filter, we preprocessed data that captures the high-peak pattern of daily maximum concentrations of PM2.5 for deep learning.  Figure 5 shows the results of a comparison between the high peak data of PM 2.5 and the original PM 2.5 by applying a Gaussian filter to the daily PM 2.5 data in Seoul. Figure 5 shows high accuracy for high peak data of 76 µg/m 3 or more, which satisfies the WHO standard guideline. Using the Gaussian filter, we preprocessed data that captures the high-peak pattern of daily maximum concentrations of PM 2.5 for deep learning. between the original PM2.5 data and the outlier high-peak PM2.5 data through a Gaussian filter.  Figure 5 shows the results of a comparison between the high peak data of PM2.5 and the original PM2.5 by applying a Gaussian filter to the daily PM2.5 data in Seoul. Figure 5 shows high accuracy for high peak data of 76 µg/m 3 or more, which satisfies the WHO standard guideline. Using the Gaussian filter, we preprocessed data that captures the high-peak pattern of daily maximum concentrations of PM2.5 for deep learning.

Deep Learning Architecture
This study proposes an architecture that captures the characteristics of input data using a CNN deep learning model and trains time-sequence characteristics for the data trained by CNN using a GRU deep learning model. In order to capture high-peak PM2.5 data, we used a Gaussian filter as input data to a neural network along with weather data and air pollutants.

Deep Learning Architecture
This study proposes an architecture that captures the characteristics of input data using a CNN deep learning model and trains time-sequence characteristics for the data trained by CNN using a GRU deep learning model. In order to capture high-peak PM 2.5 data, we used a Gaussian filter as input data to a neural network along with weather data and air pollutants.
In the CNN model used in this paper, each layer extracted features of the input received from the previous layer and output them as the input of the next layer. Thus, neurons in each layer could extract features from inputs and then combine them with features in subsequent layers. The combined output became the convolution results, to which the activation function was applied. The convolution results had the same features as other instances of the time series, and multiple features could be extracted from each instance [35]. This study, however, did not use a pooling layer that uses local averaging to reduce the output sensitivity.
The PM 2.5 concentration prediction method using the conventional CNN or CNN/GRU removes information about peak concentrations, a phenomenon essential for predicting PM 2.5 concentrations [7]. The PM 2.5 data used in this study, however, were not the typical data of PM 2.5 per hour gathered for 24 h a day, but the results of preprocessing the maximum concentration data of PM 2.5 using a Gaussian filter. Therefore, it solves the problem of the existing CNN or CNN/GRU algorithm, from which the data of maximum concentrations were removed. As the kernel size (convolution window) that we used in the CNN model was 2 × 1, the convolution of the two-hour continuous input functions occurred in the first layer. Then, we applied the results of the convolution operation to the activation function, which, in this paper, was ReLU. The CNN/GRU deep learning architecture proposed here connected the final output part of the CNN model to the input of the GRU model. If the deep learning architecture was configured in this way, the GRU model could use the features trained by the CNN. Because CNN processes all inputs and outputs independently, they may not sufficiently generate accurate predictions when previous information, such as time series, is required. Therefore, RNN models combined with CNN models allowed us to use previous information as an input, resulting in more accurate predictions [7,25,36]. RNN, however, cannot reset internal state information that affects subsequent observations. Therefore, to solve this shortcoming of RNN, we used GRU [7,23].
The last layer of the proposed CNN/GRU deep learning architecture was the output layer, and the predicted values of high-peak PM 2.5 concentrations became the final output. Figure 6 illustrates the structure of a neuron (expressed as x 1 , x 2 , . . . , x n ) with n inputs. As W n represented the weight of each neuron connected to the next layer, W i was the weight of the ith neuron. Computation of neuron x n used the W n of the weight component, as shown in Figure 6, and when the computed value reached a threshold, it was also converted to a specific value using an activation function. In Figure 6, the activation function was denoted as f(x). To maintain nonlinearity with respect to the computational results of input features and weights in a neural network, it must have an activation function. In our proposed CNN/GRU model, we used ReLU, defined by Equation (2).
to reduce the output sensitivity. The PM2.5 concentration prediction method using the conventional CNN or CNN/GRU removes information about peak concentrations, a phenomenon essential for predicting PM2.5 concentrations [7]. The PM2.5 data used in this study, however, were not the typical data of PM2.5 per hour gathered for 24 h a day, but the results of preprocessing the maximum concentration data of PM2.5 using a Gaussian filter. Therefore, it solves the problem of the existing CNN or CNN/GRU algorithm, from which the data of maximum concentrations were removed. As the kernel size (convolution window) that we used in the CNN model was 2 × 1, the convolution of the two-hour continuous input functions occurred in the first layer. Then, we applied the results of the convolution operation to the activation function, which, in this paper, was ReLU. The CNN/GRU deep learning architecture proposed here connected the final output part of the CNN model to the input of the GRU model. If the deep learning architecture was configured in this way, the GRU model could use the features trained by the CNN. Because CNN processes all inputs and outputs independently, they may not sufficiently generate accurate predictions when previous information, such as time series, is required. Therefore, RNN models combined with CNN models allowed us to use previous information as an input, resulting in more accurate predictions [7,25,37]. RNN, however, cannot reset internal state information that affects subsequent observations. Therefore, to solve this shortcoming of RNN, we used GRU [7,23].
The last layer of the proposed CNN/GRU deep learning architecture was the output layer, and the predicted values of high-peak PM2.5 concentrations became the final output. Figure 6 illustrates the structure of a neuron (expressed as x1, x2, …, xn) with n inputs. As Wn represented the weight of each neuron connected to the next layer, Wi was the weight of the ith neuron. Computation of neuron xn used the Wn of the weight component, as shown in Figure 6, and when the computed value reached a threshold, it was also converted to a specific value using an activation function. In Figure 6, the activation function was denoted as f(x). To maintain nonlinearity with respect to the computational results of input features and weights in a neural network, it must have an activation function. In our proposed CNN/GRU model, we used ReLU, defined by Equation (2). We used the results of applying the convolution operation and activation function of the first layer of CNN as input to the second layer. In the same way, we used the results of the calculations for the second layer as input to the third layer. As a result, our proposed CNN/GRU model had a four-layer CNN, each layer having 32 filters (activated by ReLU). Limiting the number of layers to four would neither increase the training time nor lead to overfitting. Conversely, using fewer than four layers would reduce accuracy because it limits feature extraction. Therefore, we found four CNN layers to be optimal. We used the features map obtained via the four CNN layers as input to the three-layer GRU layer, along with 64 filters, as shown in Figure 7. Once the operation of the last GRU layer ended, a high-peak PM 2.5 concentration prediction value was obtained through a fully connected hidden layer model with 256 nodes. We used the Keras library and the TensorFlow backend to implement the development environment [37].
overfitting. Conversely, using fewer than four layers would reduce accuracy because it limits feature extraction. Therefore, we found four CNN layers to be optimal. We used the features map obtained via the four CNN layers as input to the three-layer GRU layer, along with 64 filters, as shown in Figure 7. Once the operation of the last GRU layer ended, a high-peak PM2.5 concentration prediction value was obtained through a fully connected hidden layer model with 256 nodes. We used the Keras library and the TensorFlow backend to implement the development environment [38].

Model Training and Prediction
The CNN/GRU model proposed here needed to convert both air pollutant and meteorological data obtained from the KMA station into a format that the CNN/GRU model could understand. Table 1 lists air pollutant and meteorological data from 00:00 on 1 January 2015 to 23:00 on 31 December 2017.

Model Training and Prediction
The CNN/GRU model proposed here needed to convert both air pollutant and meteorological data obtained from the KMA station into a format that the CNN/GRU model could understand. Table 1 lists air pollutant and meteorological data from 00:00 on 1 January 2015 to 23:00 on 31 December 2017. As observed data often do not exist for various reasons (e.g., a power outage or system error), the next step was to impute these missing values. If values were left without replacing them, properly training the neural network model would have been virtually impossible. Thus, after replacing nonexistent values using the KNN imputation method, we normalized each input value.
We sorted the data by input and output functions, as shown in Figure 8. Each row contained a day and each column the time of day. For example, the first 24 columns had a 24-h wind speed, and the next 24 columns had a 24-h wind direction. The current 24-h observation data constituted the input function. In addition, the high peak data of the PM 2.5 preprocessed with the Gaussian filter was included. The maximum power function for that day had the maximum concentration value for PM 2.5 on the next day. Once the model was trained, the maximum of PM 2.5 on the next day (2 January 2018) was the predicted high peak value of PM 2.5 . contained a day and each column the time of day. For example, the first 24 columns had a 24-h wind speed, and the next 24 columns had a 24-h wind direction. The current 24-h observation data constituted the input function. In addition, the high peak data of the PM2.5 preprocessed with the Gaussian filter was included. The maximum power function for that day had the maximum concentration value for PM2.5 on the next day. Once the model was trained, the maximum of PM2.5 on the next day (2 January 2018) was the predicted high peak value of PM2.5. To predict high-peak PM2.5 concentrations on a particular day, we needed a training set of input and output functions from the previous day. For example, to predict the highpeak PM2.5 concentration on an nth day, we trained the input data of the (n−1)th day. In addition, to predict the high peak PM2.5 concentration for the entire year of 2018, we used SO2, CO, O3, NO2, and pretreated PM2.5 for three years from 2015 to 2017; and to model the time series characteristics of high-peak PM2.5 concentrations, we used observations of SO2, CO, O3, NO2, and PM2.5 during each hour (recorded by NIER). We also used several meteorological variables that impact air pollutant concentrations, such as wind speed, humidity, and wind direction. In this study, after training three years of data from 2015 to 2017, high-peak PM2.5 concentrations for the next 24 h were predicted for each date in 2018.

Results and Discussion
We evaluated the performance of the model using a Gaussian filter to predict the maximum concentrations of PM2.5 and trained the model from 2015 to 2017 for 25 stations located in the Seoul area. While existing studies predicting concentrations of PM2.5 have used PM2.5 concentration data as one of the input data [16,17,39], this study, however, after preprocessing PM2.5 using a Gaussian filter to predict daily maximum concentrations of PM2.5, used high-peak data of captured PM2.5 concentrations as one of the input data. The model then generated predictions for maximum PM2.5 concentrations for the entire year of 2018 and compared them to field measurements. To evaluate the hourly performance To predict high-peak PM 2.5 concentrations on a particular day, we needed a training set of input and output functions from the previous day. For example, to predict the highpeak PM 2.5 concentration on an nth day, we trained the input data of the (n−1) th day. In addition, to predict the high peak PM 2.5 concentration for the entire year of 2018, we used SO 2 , CO, O 3 , NO 2 , and pretreated PM 2.5 for three years from 2015 to 2017; and to model the time series characteristics of high-peak PM 2.5 concentrations, we used observations of SO 2 , CO, O 3 , NO 2 , and PM 2.5 during each hour (recorded by NIER). We also used several meteorological variables that impact air pollutant concentrations, such as wind speed, humidity, and wind direction. In this study, after training three years of data from 2015 to 2017, high-peak PM 2.5 concentrations for the next 24 h were predicted for each date in 2018.

Results and Discussion
We evaluated the performance of the model using a Gaussian filter to predict the maximum concentrations of PM 2.5 and trained the model from 2015 to 2017 for 25 stations located in the Seoul area. While existing studies predicting concentrations of PM 2.5 have used PM 2.5 concentration data as one of the input data [16,17,38], this study, however, after preprocessing PM 2.5 using a Gaussian filter to predict daily maximum concentrations of PM 2.5 , used high-peak data of captured PM 2.5 concentrations as one of the input data. The model then generated predictions for maximum PM 2.5 concentrations for the entire year of 2018 and compared them to field measurements. To evaluate the hourly performance of the model, we used the index of agreement (IOA), and to evaluate the daily performance based on the daily maximum PM 2.5 , we used categorical statistics [39,40]. Figure 9 shows the IOA for the hourly prediction of PM 2.5 concentrations generated by the three deep learning models we tested. The blue line in the figure represents the PM 2.5 data from the 25 stations in Seoul using the CNN model. In most cases, the average IOA of each station did not exceed 0.8. The orange dotted line shows the results of the CNN/GRU model, which could train the time-dependent information of PM 2.5 . The figure shows that the average IOA of each station was higher than that of the CNN, but in most cases, it did not exceed 0.8. The green line shows that at most stations, the maximum daily concentration value of PM 2.5 was preprocessed with a Gaussian filter and then trained with the CNN/GRU. The figure also shows the performance of the model as a result of training by capturing the high peak of PM 2.5 . Thus, to accurately predict high-peak PM 2.5 concentrations, we recommend using the GRU, which trains time information along with the CNN. In addition, high peaks of PM 2.5 could not be accurately predicted with the daily data of PM 2.5 alone. Therefore, to preprocess a maximum concentration pattern, using a Gaussian filter is an effective approach to learning data. with the CNN/GRU. The figure also shows the performance of the model as a result of training by capturing the high peak of PM2.5. Thus, to accurately predict high-peak PM2.5 concentrations, we recommend using the GRU, which trains time information along with the CNN. In addition, high peaks of PM2.5 could not be accurately predicted with the daily data of PM2.5 alone. Therefore, to preprocess a maximum concentration pattern, using a Gaussian filter is an effective approach to learning data. To compare the prediction results of high-peak PM2.5 concentrations with and without a Gaussian filter in more detail, we selected the four stations with the largest highpeak PM2.5 concentrations (#221, #231, #241, #251) from the 25 Seoul observation stations. Figure 10 displays a graph of the predicted high-peak concentration values of PM2.5 in 2018 using the CNN model without the Gaussian filter for the four selected stations. We used CNN, the most common model used to predict PM2.5 concentrations, to predict high-peak concentration values of PM2.5. As PM2.5 concentrations rapidly increased, however, the prediction accuracy was low for the high-peak concentrations. To compare the prediction results of high-peak PM 2.5 concentrations with and without a Gaussian filter in more detail, we selected the four stations with the largest high-peak PM 2.5 concentrations (#221, #231, #241, #251) from the 25 Seoul observation stations. Figure 10 displays a graph of the predicted high-peak concentration values of PM 2.5 in 2018 using the CNN model without the Gaussian filter for the four selected stations. We used CNN, the most common model used to predict PM 2.5 concentrations, to predict highpeak concentration values of PM 2.5 . As PM 2.5 concentrations rapidly increased, however, the prediction accuracy was low for the high-peak concentrations. Figure 11 shows the results of predictions using the CNN/GRU model without Gaussian filtering for high-peak PM 2.5 concentration data exceeding the threshold of 76 µg/m 3 , recommended by the WHO, from the selected four stations in the Seoul area. Although these results were slightly more accurate than those presented in Figure 10, their accuracy was not high. For example, for station #231, the results of predicting high-peak PM 2.5 concentrations using the CNN model without a Gaussian filter was 0.67, and the result using the CNN/GRU model without a Gaussian filter was 0.71. In contrast, the CNN/GRU model using a Gaussian filter was 0.82. Both CNN and CNN/GRU models produced poor results for the predictions of high-peak PM 2.5 concentrations above a threshold of 76 µg/m 3 .
In this paper, we proposed a method of using outliers as input data to overcome the problem of generalization, a characteristic of deep learning. In order to use outliers as input data, we preprocessed the data using a Gaussian filter to extract only outliers, the maximum concentration values per day, from the general PM 2.5 data. Using the preprocessed highpeak PM 2.5 data as the input, we trained the data by applying a CNN/GRU deep learning model that produced slightly more accurate predictions than the CNN model. The graph in Figure 12 displays the values predicted by CNN/GRU after training on high-peak PM 2.5 concentration data preprocessed with a Gaussian filter.
The results in Figure 12 exhibit a high prediction accuracy for high-peak PM 2.5 concentration data above the threshold of 76 µg/m 3 , recommended by the WHO. Even though the same CNN/GRU method was used, the predictions of high-peak PM 2.5 concentrations using a Gaussian filter, compared with the predictions in Figure 11, were more accurate than they were using CNN and CNN/GRU. Table 2 shows the IOA, the r-value, and the mean absolute error (MAE) of the high-peak PM 2.5 concentration predictions for each of the 25 stations in Seoul generated by the CNN, the CNN/GRU, and the CNN/GRU models using a Gaussian filter.  Figure 11 shows the results of predictions using the CNN/GRU model withou Gaussian filtering for high-peak PM2.5 concentration data exceeding the threshold of 7 µg/m 3 , recommended by the WHO, from the selected four stations in the Seoul area Although these results were slightly more accurate than those presented in Figure 10, thei accuracy was not high. For example, for station #231, the results of predicting high-pea PM2.5 concentrations using the CNN model without a Gaussian filter was 0.67, and th result using the CNN/GRU model without a Gaussian filter was 0.71. In contrast, th In this paper, we proposed a method of using outliers as input data to overcome th problem of generalization, a characteristic of deep learning. In order to use outliers a input data, we preprocessed the data using a Gaussian filter to extract only outliers, th maximum concentration values per day, from the general PM2.5 data. Using th preprocessed high-peak PM2.5 data as the input, we trained the data by applying CNN/GRU deep learning model that produced slightly more accurate predictions than the CNN model. The graph in Figure 12 displays the values predicted by CNN/GRU afte The results in Figure 12 exhibit a high prediction accuracy for high-peak PM2.5 concentration data above the threshold of 76 µg/m 3 , recommended by the WHO. Even though the same CNN/GRU method was used, the predictions of high-peak PM2.5 concentrations using a Gaussian filter, compared with the predictions in Figure 11, were more accurate than they were using CNN and CNN/GRU. Table 2 shows the IOA, the r-  We also evaluated the performance of the model by daily maximum values. To do so, we used categorical statistics [39,40] and divided pairs of observations and predictions as follows:

1.
N a , number of days when an observation was below the threshold and a prediction was above.

2.
N b , number of days when both observations and predictions were above the threshold.

3.
N c , number of days when both observations and predictions were below the threshold.

4.
N a , number of days when an observation was above the threshold and a prediction was below.
After categorizing the observations and predictions, we defined the following metrics based on the following: the hit rate (HIT), which represented the capability of the model to correctly forecast an extreme event (i.e., an event that exceeded the threshold of 76 µg/m 3 ); the false alarm rate (FAR), which represented times when the model falsely forecasted an extreme event; the equitable threat score (ETS), which defined the skill of the model on a scale of −1 to 1, with 1 indicating that the model was skillful; and the proportion of correctness (POC), which defined the times the model was able to correctly predict the occurrence of an event (both exceedances and non-exceedances). where High-peak PM 2.5 concentrations were predicted by the deep learning model trained by the CNN/GRU architecture. As input data for the model, the high-peak PM 2.5 concentrations were preprocessed using a Gaussian filter. Figure 13 presents the POC, the ratio of the accuracy predictions of the occurrence of an event exceeding the threshold of 76 µg/m 3 for each of the 25 stations in Seoul. In most cases, the accuracy of the POC values exceeded 0.93, and stations #221, #231, #241, and #251, with low IOAs (predictive accuracy) with high-peak PM 2.5 concentrations, also showed POC values close to 0.95. proportion of correctness (POC), which defined the times the model was able to correctly predict the occurrence of an event (both exceedances and non-exceedances). , , , where (6) High-peak PM2.5 concentrations were predicted by the deep learning model trained by the CNN/GRU architecture. As input data for the model, the high-peak PM2.5 concentrations were preprocessed using a Gaussian filter. Figure 13 presents the POC, the ratio of the accuracy predictions of the occurrence of an event exceeding the threshold of 76 µg/m 3 for each of the 25 stations in Seoul. In most cases, the accuracy of the POC values exceeded 0.93, and stations #221, #231, #241, and #251, with low IOAs (predictive accuracy) with high-peak PM2.5 concentrations, also showed POC values close to 0.95.

Conclusions
To improve the prediction accuracy of high-peak PM2.5 concentration data, this paper proposed a method of preprocessing PM2.5 concentration data using a Gaussian filter and used the high-peak concentration as input data for a deep learning model. Using a Gaussian filter to preprocess the PM2.5 concentration data, we were able to overcome the limitations of existing deep learning models, which consider high-peak PM2.5 concentration data as outliers and exclude them from training. To train the data, we applied a CNN/GRU deep learning model with a Gaussian filter, which produced slightly more accurate results than a CNN model without a Gaussian filter or a CNN/GRU model without a Gaussian filter. After we applied the CNN/GRU to train the high-peak PM2.5 concentration data preprocessed with a Gaussian filter, improvement in the accuracy of predictions approached 0.93. We used CNN to extract features of the data and GRUs to

Conclusions
To improve the prediction accuracy of high-peak PM 2.5 concentration data, this paper proposed a method of preprocessing PM 2.5 concentration data using a Gaussian filter and used the high-peak concentration as input data for a deep learning model. Using a Gaussian filter to preprocess the PM 2.5 concentration data, we were able to overcome the limitations of existing deep learning models, which consider high-peak PM 2.5 concentration data as outliers and exclude them from training. To train the data, we applied a CNN/GRU deep learning model with a Gaussian filter, which produced slightly more accurate results than a CNN model without a Gaussian filter or a CNN/GRU model without a Gaussian filter. After we applied the CNN/GRU to train the high-peak PM 2.5 concentration data preprocessed with a Gaussian filter, improvement in the accuracy of predictions approached 0.93. We used CNN to extract features of the data and GRUs to track changes in predicted values resulting from temporal changes. To predict high-peak PM 2.5 concentrations in Seoul, South Korea, we used three years of weather observations and chemical variables from 2015 to 2017 to train a model to predict maximum PM 2.5 concentrations for the next 24 h. After evaluating the full 2018 model, we found that the deep learning method predicted maximum concentrations with sufficient accuracy (IOA = 0.73-0.89, POC = 0.89-0.96) by modeling the relationship between local weather and species concentrations. Neither the CNN nor the CNN/GRU-based prediction system predicted high-peak PM 2.5 concentrations as accurately as the prediction system using a Gaussian filter. Thus, the latter could also be used as an effective implementation tool in field measurement locations. The ability to predict high concentrations of PM 2.5 is essential to Seoul and other areas with frequent high concentrations of PM 2.5 . Therefore, if the method of PM 2.5 concentration data preprocessed with a Gaussian filter is integrated as input into the PM 2.5 concentration prediction model, we could extend the function of the prediction system and achieve the accurate prediction of high-concentration PM 2.5 cases.