Air Pollution Prediction Using Long Short-Term Memory (LSTM) and Deep Autoencoder (DAE) Models

: Many countries worldwide have poor air quality due to the emission of particulate matter (i.e., PM 10 and PM 2.5 ), which has led to concerns about human health impacts in urban areas. In this study, we developed models to predict ﬁne PM concentrations using long short-term memory (LSTM) and deep autoencoder (DAE) methods, and compared the model results in terms of root mean square error (RMSE). We applied the models to hourly air quality data from 25 stations in Seoul, South Korea, for the period from 1 January 2015, to 31 December 2018. Fine PM concentrations were predicted for the 10 days following this period, at an optimal learning rate of 0.01 for 100 epochs with batch sizes of 32 for LSTM model, and DAEs model performed best with batch size 64. The proposed models e ﬀ ectively predicted ﬁne PM concentrations, with the LSTM model showing slightly better performance. With our forecasting model, it is possible to give reliable ﬁne dust prediction information for the area where the user is located.


Introduction
As industry and population expand rapidly in South Korea, air pollution is increasingly becoming problematic for human health in the country. In 2017, South Korea ranked 173rd among the 180 countries with the greatest air pollution impact [1]. Air pollution in urban areas consists of carbon dioxide (CO 2 ), carbon monoxide (CO), nitrogen oxide (NO 2 ), nitrogen monoxide (NO), ozone (O 3 ), and fine particulate matter (PM), the last of which is of greatest concern in South Korea. Fine PM is classified into PM 10 and PM 2.5 based on particle diameter, where PM 10 and PM 2.5 are particles with diameters <10 and <2.5 µm, respectively ( Figure 1). PM includes dust, pollen, soot, smoke, and liquid droplets that harm the respiratory system [2,3], causing respiratory symptoms including irregular heart rate, coughing, airway irritation, abnormal lung function, breathing difficulty, heart attack, stroke-associated diseases, and asthma. Despite increasing air pollutant concentrations in South Korea, the Korean government has reported difficulties in gaining accurate air pollution data due to insufficient air pollution measurement stations for reasonable nationwide coverage. Thus, many studies have been conducted to determine and analyze air quality. The recent development of machine learning techniques, especially deep learning, have provided new opportunities to improve air quality research. Deep learning consists of an artificial intelligence (AI) system that can obtain data unsupervised, in unstructured or unlabeled learning approaches such as deep neural network or deep neural learning methods [4]. Deep learning requires three essential elements: the graphics processing unit (GPU), which controls operation processing speed; vast quantities of data for experiments; and signal information processing. Deep learning has been widely adopted in academic and practical applications such as translation, speech recognition, language processing, and image classification [5,6] (Figure 2). Several studies of air quality prediction have also adopted AI and deep learning techniques [7][8][9][10][11][12][13][14]; many of these studies have used deep neural networks to obtain short-term air quality forecasts. Using these approaches, current fine PM concentrations have been found to be strongly correlated with pollution emissions from power plants, factory chimneys, and various other sources.  Thus, many studies have been conducted to determine and analyze air quality. The recent development of machine learning techniques, especially deep learning, have provided new opportunities to improve air quality research. Deep learning consists of an artificial intelligence (AI) system that can obtain data unsupervised, in unstructured or unlabeled learning approaches such as deep neural network or deep neural learning methods [4]. Deep learning requires three essential elements: the graphics processing unit (GPU), which controls operation processing speed; vast quantities of data for experiments; and signal information processing. Deep learning has been widely adopted in academic and practical applications such as translation, speech recognition, language processing, and image classification [5,6] (Figure 2). Several studies of air quality prediction have also adopted AI and deep learning techniques [7][8][9][10][11][12][13][14]; many of these studies have used deep neural networks to obtain short-term air quality forecasts. Using these approaches, current fine PM concentrations have been found to be strongly correlated with pollution emissions from power plants, factory chimneys, and various other sources. Thus, many studies have been conducted to determine and analyze air quality. The recent development of machine learning techniques, especially deep learning, have provided new opportunities to improve air quality research. Deep learning consists of an artificial intelligence (AI) system that can obtain data unsupervised, in unstructured or unlabeled learning approaches such as deep neural network or deep neural learning methods [4]. Deep learning requires three essential elements: the graphics processing unit (GPU), which controls operation processing speed; vast quantities of data for experiments; and signal information processing. Deep learning has been widely adopted in academic and practical applications such as translation, speech recognition, language processing, and image classification [5,6] (Figure 2). Several studies of air quality prediction have also adopted AI and deep learning techniques [7][8][9][10][11][12][13][14]; many of these studies have used deep neural networks to obtain short-term air quality forecasts. Using these approaches, current fine PM concentrations have been found to be strongly correlated with pollution emissions from power plants, factory chimneys, and various other sources.  In the current study, we used hourly PM 10 and PM 2.5 measurement data collected in Seoul, South Korea during 2015-2018, as well as data on meteorological features such as humidity, rain, wind speed Sustainability 2020, 12, 2570 3 of 18 and direction, temperature, and atmospheric conditions. These air pollution data attributes were learned by long short-term memory (LSTM) and deep autoencoder (DAE) models. The models were then used to predict fine PM concentrations in Seoul, and the performance of the two models was compared in terms of root mean square error (RMSE).

Related Research
Kalapanidas et al. [15] reported detailed air pollution effects using ordinal air pollution data (low, medium, high, and alarm levels) with a case-based reasoning (CBR) system and the lazy learning method. Similarly, Athanasiadis et al. [16] predicted air pollution based on O 3 concentrations classified as low, medium, and high levels of pollutants including SO 2 , NO, and NO 2 using a σ-fuzzy lattice neurocomputing (FLN) model. Land-use regression was also applied to estimate NO x and NO 2 concentrations [17], and O 3 concentrations [18]. Hoek et al. [19] concluded that land-use regression methods are able to model annual mean PM 2.5 concentrations. The LUR model is considered to be suitable for PM2.5 prediction due to the linear relationship between PM2.5 and explanatory variables, while the ANN based model designed to handle non-linearity may perform better in general as well [20]. Kunwar et al. [21] applied an ensemble learning method and a principal components analysis (PCA) algorithm to integrate air quality data to forecast air quality index (AQI) values. However, these approaches involving regression of categorical variables can produce ambiguous results because some data are ignored.
Various studies have predicted air pollutant concentrations under different circumstances. Corani [22] forecasted hourly O 3 and PM 10 concentrations from previous-day air pollution data using a neural network algorithm to train pruned neural network (PNN) and feed-forward neural network (FFNN) models. Fu et al. [23] also applied an FFNN model with a undulating scheme and the gray method. Jiang et al. [24] predicted air pollution using traditional chemical and physical models in combination with regression and multiple-layer perceptron models. Ni et al. [25] found that a linear regression model performed better than several other models for predicting fine PM concentrations in Beijing, China.
Detailed air pollution predictions have been obtained by combining various model designs based on LSTM and convolutional neural network (CNN) approaches. One such study proposed an experimental model to forecast fine PM concentrations [25]; another used LSTM and RNN models as a framework to obtain long-term PM 2.5 trends from time-series data for use in government policy making and resource allocation [26]. Fully connected LSTM (LSTM-FC) has been applied with a neural network to forecast and visualize PM concentrations at urban meteorological stations [27]. LSTM and RNN have also been used as a framework for large-scale, long-term time series data for PM forecasting [28]. Another study proposed an LSTM-based model to predict hourly fine PM concentrations at 25 target locations in Seoul [29].
Deep spatiotemporal learning based on an air quality forecast method has been applied to discuss spatial and temporal correlations in PM concentration based on a stacked autoencoder (SAE) model for training air pollution data with the greedy layer-wise technique [30]. These techniques have also been used to predict local traffic flow [31]. Another study applied multitask learning (MTL) approaches involving homogeneous and deep belief network (DBN) methods using unsupervised learning for predictive models [32].
A back-propagation (BP) neural network was combined with an integrated development environment (IDE) model to predict fine PM concentrations using meteorological and fine PM data for Chengdu, China [33]; model results were improved in the IDE-BPNN combination model. Another study applied a support vector machine (SVM) method using fine PM data, meteorological elements, and geographical information to predict air quality at pollution measurement stations by incorporating nonlinear PM characteristics [34].

Study Areas
Seoul, South Korea, contains 25 air pollution measurement stations (one station per district) separated by from one another by 5 km along the transverse Mercator (TM) link system ( Figure 3). The stations are mainly situated far from major roadways and at the tops of public buildings.

Study Areas
Seoul, South Korea, contains 25 air pollution measurement stations (one station per district) separated by from one another by 5 km along the transverse Mercator (TM) link system ( Figure 3). The stations are mainly situated far from major roadways and at the tops of public buildings. These stations automatically collect hourly air quality data 24 h per day; the data are then uploaded to a website that is open to the general public. Seoul also contains several special monitoring stations including the Namsan Mountain high-altitude station, the Gwanak Mountain station, which measures levels of air pollution that has traveled long distances, and the Bukhan Mountain station, which is located in a clean zone; there are also 14 roadside measurement stations and 12 measurement stations located on highway bus line medians.  These stations automatically collect hourly air quality data 24 h per day; the data are then uploaded to a website that is open to the general public. Seoul also contains several special monitoring stations including the Namsan Mountain high-altitude station, the Gwanak Mountain station, which measures levels of air pollution that has traveled long distances, and the Bukhan Mountain station, which is located in a clean zone; there are also 14 roadside measurement stations and 12 measurement stations located on highway bus line medians.

PM Data
PM concentration (µg/m 3 ) data used in this study were derived from hourly measurements at the 25 monitoring stations in Seoul, South Korea, from 1 January 2015, to 31 December 2018 [35]. Trends in the PM 10

Meteorological Data
Meteorological data for the study period were obtained from the Korea Meteorological Agency website [36]. The dataset contained preprocessed hourly values of wind speed, wind direction, temperature, sky condition, and rainfall ( Figure 6).
Korean government agencies use an air quality index (AQI) to quantify air quality concentration effects for communication with the general public. This AQI has five categories (Table 1), which indicate relative health risks due to air pollution.

Meteorological Data
Meteorological data for the study period were obtained from the Korea Meteorological Agency website [36]. The dataset contained preprocessed hourly values of wind speed, wind direction, temperature, sky condition, and rainfall ( Figure 6).
Korean government agencies use an air quality index (AQI) to quantify air quality concentration effects for communication with the general public. This AQI has five categories (Table 1), which indicate relative health risks due to air pollution.

Meteorological Data
Meteorological data for the study period were obtained from the Korea Meteorological Agency website [36]. The dataset contained preprocessed hourly values of wind speed, wind direction, temperature, sky condition, and rainfall ( Figure 6).
Korean government agencies use an air quality index (AQI) to quantify air quality concentration effects for communication with the general public. This AQI has five categories (Table 1), which indicate relative health risks due to air pollution.
The RNN architecture is unrolled or unfolded to show the entire network as a complete sequence, with one layer per word (Figure 7). The recursive RNN formulas are as follows: where is the input vector, is the hidden layer, is the experiment output vector, and is a weighted matrix. The RNN is applied to LSTM to create an environment for the computation process, obtain input, and create output [43]. During this process, long-term memory is created from shortterm memory. The LSTM system consists of an input gate, a forget gate, and an output gate.
The RNN architecture is unrolled or unfolded to show the entire network as a complete sequence, with one layer per word (Figure 7).
The RNN architecture is unrolled or unfolded to show the entire network as a complete sequence, with one layer per word (Figure 7). The recursive RNN formulas are as follows: where is the input vector, is the hidden layer, is the experiment output vector, and is a weighted matrix. The RNN is applied to LSTM to create an environment for the computation process, obtain input, and create output [43]. During this process, long-term memory is created from shortterm memory. The LSTM system consists of an input gate, a forget gate, and an output gate.
LSTM calculates the hidden state as follows: The recursive RNN formulas are as follows: where x t is the input vector, h t is the hidden layer, y t is the experiment output vector, and W h is a weighted matrix. The RNN is applied to LSTM to create an environment for the computation process, obtain input, and create output [43]. During this process, long-term memory is created from short-term memory. The LSTM system consists of an input gate, a forget gate, and an output gate. LSTM calculates the hidden state as follows: where σ is the logistic sigmoid function; i, f, and o are the input, forget, and output gates, respectively; h is a hidden vector that is the same size in each layer; W is a weight matrix for the transformation of information from cell to gate vectors; and m is a vector-only feature in every gate that obtains input from feature m of the cell vector. In Equation (7), c t is a hidden element that is tasked with the current input layer; c t is the internal memory computed in this unit; and h t is the output of a hidden state, derived through memory multiplication (Figure 8).
Sustainability 2020, 12, x FOR PEER REVIEW 7 of 18 = * + * ̃ where is the logistic sigmoid function; i, f, and o are the input, forget, and output gates, respectively; h is a hidden vector that is the same size in each layer; W is a weight matrix for the transformation of information from cell to gate vectors; and m is a vector-only feature in every gate that obtains input from feature m of the cell vector. In Equation (7), ̃ is a hidden element that is tasked with the current input layer; is the internal memory computed in this unit; and is the output of a hidden state, derived through memory multiplication (Figure 8). The forget gate (Figure 9a) is responsible for removing information from the cell state; it receives two inputs: the hidden state output from the previous time step ( ) and the input for the current time step ( ). These inputs are multiplied by weight matrices, and a bias is added. A sigmoid function is then applied to obtain an output vector with values ranging from 0 to 1, which is used to decide which values to keep and which to discard.
Next, the input gate transfers information to the cell state in a two-step method (Figure 9b). Similar to the input gate, a sigmoid function is applied as a filter for and to build a vector of suitable values for the cell state ranging from −1 to 1. This vector then provides values that can be added to the cell state.
The output gate (Figure 9c) decides which information to output from the cell state. In LSTM, the output gate function is performed in three steps. First, the vector is built and the hyperbolic tangent function tanh is applied to the cell state to scale the values from −1 to 1. The sigmoid function is then applied to the previous hidden state to create a filter for values of and . Finally, the filtered values are multiplied by the vector created in step 1 to produce LSTM output information.
The LSTM algorithm used in our prediction system is described in detail in Table 2. The forget gate (Figure 9a) is responsible for removing information from the cell state; it receives two inputs: the hidden state output from the previous time step (h t−1 ) and the input for the current time step (x t ). These inputs are multiplied by weight matrices, and a bias is added. A sigmoid function is then applied to obtain an output vector with values ranging from 0 to 1, which is used to decide which values to keep and which to discard.
= * + * ̃ (8) where is the logistic sigmoid function; i, f, and o are the input, forget, and output gates, respectively; h is a hidden vector that is the same size in each layer; W is a weight matrix for the transformation of information from cell to gate vectors; and m is a vector-only feature in every gate that obtains input from feature m of the cell vector. In Equation (7), ̃ is a hidden element that is tasked with the current input layer; is the internal memory computed in this unit; and is the output of a hidden state, derived through memory multiplication (Figure 8). The forget gate (Figure 9a) is responsible for removing information from the cell state; it receives two inputs: the hidden state output from the previous time step ( ) and the input for the current time step ( ). These inputs are multiplied by weight matrices, and a bias is added. A sigmoid function is then applied to obtain an output vector with values ranging from 0 to 1, which is used to decide which values to keep and which to discard.
Next, the input gate transfers information to the cell state in a two-step method (Figure 9b). Similar to the input gate, a sigmoid function is applied as a filter for and to build a vector of suitable values for the cell state ranging from −1 to 1. This vector then provides values that can be added to the cell state.
The output gate (Figure 9c) decides which information to output from the cell state. In LSTM, the output gate function is performed in three steps. First, the vector is built and the hyperbolic tangent function tanh is applied to the cell state to scale the values from −1 to 1. The sigmoid function is then applied to the previous hidden state to create a filter for values of and . Finally, the filtered values are multiplied by the vector created in step 1 to produce LSTM output information.
The LSTM algorithm used in our prediction system is described in detail in Table 2. Next, the input gate transfers information to the cell state in a two-step method (Figure 9b). Similar to the input gate, a sigmoid function is applied as a filter for h t−1 and x t to build a vector of suitable values for the cell state ranging from −1 to 1. This vector then provides values that can be added to the cell state.
The output gate (Figure 9c) decides which information to output from the cell state. In LSTM, the output gate function is performed in three steps. First, the vector is built and the hyperbolic tangent function tanh is applied to the cell state to scale the values from −1 to 1. The sigmoid function is then applied to the previous hidden state to create a filter for values of h t−1 and x t . Finally, the filtered values are multiplied by the vector created in step 1 to produce LSTM output information.
The LSTM algorithm used in our prediction system is described in detail in Table 2.

Autoencoder
An autoencoder is a type of neural network that encodes input data for reconstruction as output data [44]. To begin this process, the autoencoder must learn to capture the significant features of the input. An example of an autoencoder with a single input layer, single hidden layer, and single output layer is shown in Figure 10. To train set {x(1), x(2),...x(n)} such that x(i)∈Rd, the first step of the autoencoder model is to encode the single input x(i) to hidden layer y(x(i)) according to Equation (10); this layer is then decoded as output layer z(x(i)) according to Equation (11), as follows: where W 1 is a weight matrix for the optimization process, b is an encode bias vector, W 2 is a decoding matrix of the output layer, and c is a decoding bias vector. In this study, we also applied the logistic sigmoid function 1/(1 + exp(−x)) to f (x) and g(x).

Autoencoder
An autoencoder is a type of neural network that encodes input data for reconstruction as output data [44]. To begin this process, the autoencoder must learn to capture the significant features of the input. An example of an autoencoder with a single input layer, single hidden layer, and single output layer is shown in Figure 10. To train set {x(1), x(2),...x(n)} such that x(i)∊Rd, the first step of the autoencoder model is to encode the single input x(i) to hidden layer y(x(i)) according to Equation (10); this layer is then decoded as output layer z(x(i)) according to Equation (11), as follows: where W1 is a weight matrix for the optimization process, b is an encode bias vector, W2 is a decoding matrix of the output layer, and c is a decoding bias vector. In this study, we also applied the logistic sigmoid function 1/(1 + exp(−x)) to f(x) and g(x). The autoencoder model uses a vector input layer (x) and encoding function (f) to approximate another vector (y); during reconstruction, a decoder function (g) is applied to vector y to recreate vector x; the resulting output layer from the application of (g) is vector z. Reconstruction error is determined by scaling with the loss function LH(x,z); this function is minimized as L(X, Z) to obtain optimal parameter values as follows: (12) Figure 10. Processes of the autoencoder model. The autoencoder model uses a vector input layer (x) and encoding function (f ) to approximate another vector (y); during reconstruction, a decoder function (g) is applied to vector y to recreate vector x; the resulting output layer from the application of (g) is vector z. Reconstruction error is determined by scaling with the loss function LH(x,z); this function is minimized as L(X, Z) to obtain optimal parameter values as follows: One urgent problem in the application of autoencoder models is the size of the hidden layer, which is set as equal to or larger than the output layer. This problem is generally addressed by the design of the model functions. In the present study, we used a nonlinear autoencoder with a hidden layer that is one unit larger than the input layer by applying the sparsity constraint method, such that the autoencoder model was transformed into a sparse autoencoder. To obtain sparse representation, we imposed a sparsity constraint to minimize reconstruction error as follows: where γ is the weight, H D is the number of hidden units, ρ is the sparsity parameter, and H D is the number of hidden units. In Equation (14), the average value of the activation function for hidden unit j in the training set is the Kullback-Leibler (KL) divergence for machine learning, KL ρ ρ j , which is calculated as follows: KL divergence defines the parameter KL ρ ρ j = 0 if =ρ j . The sparsity constraint on the input process and back propagation (BP) method are applied to modify this problem.

The DAE Model
Deep or stacked autoencoder models are among the most powerful types of neural network architecture [45]. The DAE model begins by pre-training a single input layer, followed by hidden layers, such that the output of the kth hidden layer is used as input for the (k + 1) th hidden layer. Thus, hidden layers are stacked hierarchically within the DAE, so the final hidden layer is a higher-level representation of all layers of input, and may be used in forecasting.
In this study, we applied a DAE model for fine PM forecasting by adding a standard forecaster at the top of the model layer. Layer-wise training of the resulting DAE is shown in Figure 11. Figure 12 shows the structure of a DAE, including stacked autoencoder nodes.
We applied a DAE model to represent fine PM features; the prediction was then applied to a logistic regression model. In the proposed method, the DAE model was combined with a dropout process to handle multiple faults. The workflow of the DAE model is shown in Figure 13, and the algorithm is described in detail in Table 3.
layers, such that the output of the kth hidden layer is used as input for the (k + 1) th hidden layer. Thus, hidden layers are stacked hierarchically within the DAE, so the final hidden layer is a higherlevel representation of all layers of input, and may be used in forecasting.
In this study, we applied a DAE model for fine PM forecasting by adding a standard forecaster at the top of the model layer. Layer-wise training of the resulting DAE is shown in Figure 11. Figure  12 shows the structure of a DAE, including stacked autoencoder nodes.  We applied a DAE model to represent fine PM features; the prediction was then applied to a logistic regression model. In the proposed method, the DAE model was combined with a dropout process to handle multiple faults. The workflow of the DAE model is shown in Figure 13, and the algorithm is described in detail in Table 3. Table 3. Training the deep autoencoder (DAE) algorithm.
Step Description 1 Preprocessing of all fine particulate matter and meteorological data 2 Preparation of the DAE framework

Model Performance Evaluation
We evaluated the performance of the proposed model in terms of the root mean square error (RMSE) between measured air pollution values and predicted values. RMSE was calculated as

Model Performance Evaluation
We evaluated the performance of the proposed model in terms of the root mean square error (RMSE) between measured air pollution values and predicted values. RMSE was calculated as follows: where P m and P r are the measured and predicted PM concentrations, respectively, and N is the number of measured values.

Fine PM Prediction
In this study, we obtained PM 10 and PM 2.5 concentration data and meteorological data consisting of rainfall, wind speed and direction, temperature, humidity, and sky condition for use as input nodes. The output variable was predicted PM 10 or PM 2.5 concentration. All data were partitioned into two sets, with 85% used for training and 15% for testing.
We combined all raw data obtained from the open data website and performed preprocessing to check for missing values and categorical values within the dataset. We then split the data into training and test datasets and applied the LSTM and DAE models to predict PM 10 and PM 2.5 concentrations for the 10 days following the study period. Figure 14 shows the workflow for predicting PM concentrations using the LSTM and DAE model.
Finally, we evaluated the accuracy of the proposed method using the RMSE between observed and predicted values. We adjusting the learning rate, epoch, and batch size of the model to obtain optimal results. Sustainability 2020, 12, x FOR PEER REVIEW 12 of 18 Figure 14. Workflow for predicting fine PM concentrations using the LSTM and DAE models.
Finally, we evaluated the accuracy of the proposed method using the RMSE between observed and predicted values. We adjusting the learning rate, epoch, and batch size of the model to obtain optimal results.

LSTM Model Performance
The optimal settings for the LSTM model for both PM10 and PM2.5 prediction were a learning rate of 0.01, epoch of 100, and batch size of 32. For a batch size of 32, the RMSE values were 11.113 for PM10 and 12.174 for PM2.5, with a processing time of 11:18 min ( Figure 15, Table 4).

LSTM Model Performance
The optimal settings for the LSTM model for both PM 10 and PM 2.5 prediction were a learning rate of 0.01, epoch of 100, and batch size of 32. For a batch size of 32, the RMSE values were 11.113 for PM 10 and 12.174 for PM 2.5 , with a processing time of 11:18 min ( Figure 15, Table 4).

DAE Model Performance
The optimal settings for the DAE model for both PM10 and PM2.5 prediction were a learning rate of 0.01, epoch of 100, and batch size of 64. For a batch size of 64, the RMSE values were 15.038 for PM10 and 15.437 for PM2.5, with a processing time of 15:40 min ( Figure 16, Table 5).

DAE Model Performance
The optimal settings for the DAE model for both PM 10 Table 5).   We used total average RMSE values to compare the results obtained using the LSTM and DAE models. Although both proposed algorithms effectively predicted PM10 ( Figure 17) and PM2.5 ( Figure  18) concentrations, the LSTM model showed slightly better performance. We used total average RMSE values to compare the results obtained using the LSTM and DAE models. Although both proposed algorithms effectively predicted PM 10 ( Figure 17) and PM 2.5 ( Figure 18) concentrations, the LSTM model showed slightly better performance.

Conclusions
Recent advances in the development of deep learning models have led to a rapid increase in their application in academic and industrial settings. In South Korea, the greatest environmental concern is air pollution in the form of fine PM, which consists of liquid and solid particle compounds that are dangerous to human health. Despite increasing levels of air pollutants in South Korea, the number of measurement stations remains insufficient to obtain accurate PM levels throughout the country. In this study, we proposed predictive models of fine PM concentration using LSTM and DAE approaches, and compared their RMSE values for 10-day PM10 and PM2.5 concentration prediction results for Seoul. The principal contributions of this study are as follows: (1) 2) We also compared the total average RMSE of prediction of PM10 and PM2.5, the LSTM prediction model were more accurate than the DAE model. The comparison showed that our proposed algorithm can predict and receive the appropriate accuracy between LSTM and DAE model. In the future, we will design alternative deep learning models to obtain more accurate results with larger data sets. We will also improve our model's performance by considering GIS-based spatial data.
Author Contributions: T.X. and H.L. conceived and designed the experiments, analyzed the data and wrote the paper. G.L. supervised the work and helped with designing the conceptual framework, and edited the manuscript. All authors have read and agreed to the published version of the manuscript.

Conclusions
Recent advances in the development of deep learning models have led to a rapid increase in their application in academic and industrial settings. In South Korea, the greatest environmental concern is air pollution in the form of fine PM, which consists of liquid and solid particle compounds that are dangerous to human health. Despite increasing levels of air pollutants in South Korea, the number of measurement stations remains insufficient to obtain accurate PM levels throughout the country. In this study, we proposed predictive models of fine PM concentration using LSTM and DAE approaches, and compared their RMSE values for 10-day PM 10 2) We also compared the total average RMSE of prediction of PM 10 and PM 2.5 , the LSTM prediction model were more accurate than the DAE model. The comparison showed that our proposed algorithm can predict and receive the appropriate accuracy between LSTM and DAE model. In the future, we will design alternative deep learning models to obtain more accurate results with larger data sets. We will also improve our model's performance by considering GIS-based spatial data.