Atmospheric PM2.5 Prediction Using DeepAR Optimized by Sparrow Search Algorithm with Opposition-Based and Fitness-Based Learning

: There is an important signiﬁcance for human health in predicting atmospheric concentration precisely. However, due to the complexity and inﬂuence of contingency, atmospheric concentration prediction is a challenging topic. In this paper, we propose a novel hybrid learning method to make point and interval predictions of PM2.5 concentration simultaneously. Firstly, we optimize Sparrow Search Algorithm (SSA) by opposition-based learning, ﬁtness-based learning, and L é vy ﬂight. The experiments show that the improved Sparrow Search Algorithm (FOSSA) outperforms SSA-based algorithms. In addition, the improved Sparrow Search Algorithm (FOSSA) is employed to optimize the initial weights of probabilistic forecasting model with autoregressive recurrent network (DeepAR). Then, the FOSSA–DeepAR learning method is utilized to achieve the point prediction and interval prediction of PM2.5 concentration in Beijing, China. The performance of FOSSA–DeepAR is compared with other hybrid models and a single DeepAR model. Furthermore, hourly data of PM2.5 and O 3 concentration in Taian of China, O 3 concentration in Beijing, China are used to verify the effectiveness and robustness of the proposed FOSSA–DeepAR learning method. Finally, the empirical results illustrate that the proposed FOSSA–DeepAR learning model can achieve more efﬁcient and accurate predictions in both interval and point prediction.


Introduction
As one of the pollutants under public concern, PM2.5 concentration has a vital impact on air quality and human health. High concentration of PM2.5 will result in poor air quality and cause harm to human health, such as respiratory diseases, and even death in severe cases. Furthermore, the fluctuation of PM2.5 concentration has negative impacts on the travelling and working of residents. Thus, accurate prediction of PM2.5 concentration is significant for alerting residents before the high concentrations of PM2.5 occur. It is helpful for residents to arrange their outdoor activities flexibly, and reduce the health damage caused by bad air quality.
Over the past few years, most models established for PM2.5 concentration prediction are based on the time series analysis method [1]. However, the nonstationarity of atmospheric concentration brings a great challenge for accurate prediction. In recent years, some methods are proposed to predict atmospheric concentration such as the distribution-based method, machine learning, and optimization algorithms. For instance, Cavieres et al. [2] proposed a method based on bivariate control charts with heavy-tailed asymmetric distributions to monitor environmental quality. Puentes et al. [3] used bivariate regression and Birnbaum-Saunders distributions to predict PM2.5 and PM10 concentration. Jiang et al. [4] Atmosphere 2021, 12, 894 3 of 18

Sparrow Search Algorithm
SSA [10] is a novel swarm optimization approach inspired by the search behavior of sparrow population. The sparrow population consists of producers and scroungers, and the two kinds of sparrows follow different rules to update their positions. In order to find food, sparrows flexibly transform their roles between the producers and scroungers and then perform different search behaviors. Some sparrows will send out warning messages and immediately move to the safe area when they detect the predators. The sparrows who detect the danger are called alarmers in this paper.
There are two ways for producers to update their positions. The one way is executed when the producer does not receive the warning signal issued by the alarmer. In this case, the updated location of producers is shown in Equation (1).
where t indicates the current iteration. Epoch denotes the maximum number of iterations and α is a random number between 0 and 1. P i,j represents the value of the j-th dimension of the i-th individual and d is the dimension of the problem we want to solve. ST (ST ∈ [0.5, 1]) and R 2 (R 2 ∈ [0, 1]) represent the threshold value and the warning value, respectively. The other way is implemented when the warning signal is received by the producer. The producer knows that the predator is approaching, and then guides all sparrows to a safe area. The location is updated as Equation (2).
where r is a random number that obeys the normal distribution, and D is a 1 × d matrix in which all elements are 1. Some scroungers with better fitness will move towards the best producer, whose positions are updated as Equation (3).
where P best represents the optimal position of the entire population. A is a 1 × d matrix, in which all elements are 1 or −1. A + = A T AA T −1 . The remaining scroungers with poor fitness will continue to search for food near the previous location. This kind of scrounger updates their position as Equation (4).
If the producer of optimal position perceives danger, it will become an alarmer and the entire group must be led to a safe area. In this situation, the location of the alarmer is updated as Equation (5).
where K is a random number between −1 and 1, and f i , f g , and f w represent the current fitness value of the sparrow, the current optimal fitness value and the worst fitness value of the entire group, respectively. Let ε ≥ 0 is a small constant in order to ensure that the denominator is not 0. If the alarmer is not located in the best position, the alarmer will move towards the optimal position in order to reduce the probability of being preyed. The location updated as Equation (6).

Sparrow Search Algorithm with Fitness-Based and Opposition-Based Learning
In this paper, the opposition-based learning, Lévy flight, and fitness-based learning are used to improve the performance of SSA. The improvement of FOSSA mainly comes from three aspects. First of all, we use opposition-based learning to increase the diversity and quality of the initial population. Secondly, the Lévy flight is utilized to update the positions of producers and alarmers. Finally, the fitness-based learning method is employed to enhance the search ability of the sparrow population. The improvement details of FOSSA are summarized as follows.

Opposition-Based Learning
In the ordinary SSA [10], the positions of the initial population are randomly initialized.
The diversity and quality of the initial population, generated by random initialization method, are not guaranteed. Accordingly, the accuracy of the SSA solution decreases. The opposition-based learning method can greatly improve the quality of the initial population.
The steps of initializing the population via opposition-based learning are described as follows: Step 1.
Generate N individuals by random initialization, and calculate the center of gravity of N individuals. The calculation method of the center of gravity is expressed as Equation (7): where G j represents the center of gravity of the j-th dimension, n is the number of sparrows, and d is the dimension; Step 2.
alculate the opposite position of each individual. The calculation method is shown as Equation (8); Step 3.
Make sure the position of the anti-center of gravity individual within the search scope. The method is shown as Equation (9): where min j = min X i,j , max j = max X i,j ; Step 4.
Calculate the fitness value of 2N individuals, and select N individuals with the best fitness as the initial individuals of the population.

Lévy Flight
In BSSA, the Lévy flight is used to update a single parameter in Equation (6), but sparrows cannot avoid local optimum effectively. Lévy flight is applied in this paper to update the entire position of producers and alarmers. Equations (2) and (6) are changed as Equations (10)- (12).
where gamma represents the gamma function. τ is a hyperparameter, and this article sets the value of τ as 1. ∂ and v are random variables which obey the normal distribution N 0, σ 2 and N(0, 1), respectively. Moreover, m is a random number, and s is the step size whose value is 0.001. P best,j represents the value of the global optimal position in the j-th dimension at the previous iteration.

Fitness-Based Learning
On the one hand, some individuals will fall into the local optimum during the searching process, and their positions will not change during several continuous iterations. The part of sparrows, which can be regarded as lack of search ability, should be updated in the subsequent search process to increase the convergence speed and accuracy.
On the other hand, the producers in SSA are responsible for determining the search direction of the entire population and search in wider space. Accordingly, the producers are vital for whether the population can move towards the optimal result. The scroungers follow the producers and search for food near producers. It follows that the scrounger can improve the accuracy of the solution. In the original SSA, the number of producers and scroungers is fixed. There are too many producers in the early stage of the search process that the population cannot move towards the correct search direction quickly. In the later stage of the search process, the search scope gradually concentrates on a small range. In the later searching process, many scroungers are bad for exploring the high valuable area.
Therefore, this paper utilizes fitness-based learning to endow the population with powerful global search capability. First of all, inspired by the transformation behavior of employed bees to onlooker in an artificial bee colony (ABC) [22], fitness-based learning is introduced into FOSSA. If an individual does not update its position during 5 consecutive iterations, the individual will get a new position generated randomly. In addition, we adjust the number of producers and scroungers based on fitness value during the search process. The fitness value of each individual is calculated as Equation (13): where f (·) is the objective function for a minimization problem. Individuals with fitness greater than 0.9 will perform the tasks of the producer. When the fitness value of a sparrow is greater than 0.7 but less than 0.9, the sparrow will become a scrounger. It will leave its current position immediately and approach the optimal producer. If the fitness value is less than 0.7, the sparrow becomes a scrounger but will not move towards the best producer. The flowchart of FOSSA is shown in Figure 1.

DeepAR
Probabilistic forecasting with autoregressive recurrent networks (DeepAR), proposed by Salinas et al., is a novel forecasting method which could achieve accurate probabilistic forecasts. Combining several related time series together, this forecasting method could not only learn a global model from analogous time series, but also provides flexibility to achieve point prediction, interval prediction, or both. In addition, time step of prediction is also a selectable hyperparameter in this method. First of all, opposition-based learning is used to generate initial population for improving the quality and diversity of initial population. In addition, the Lévy flight is utilized in Equations (2) and (6). Finally, fitness-based learning is introduced into the whole searching process of sparrows.

DeepAR
Probabilistic forecasting with autoregressive recurrent networks (DeepAR), proposed by Salinas et al., is a novel forecasting method which could achieve accurate probabilistic forecasts. Combining several related time series together, this forecasting method could not only learn a global model from analogous time series, but also provides flexibility to achieve point prediction, interval prediction, or both. In addition, time step of prediction is also a selectable hyperparameter in this method.
The goal of DeepAR is to model the conditional distribution which is presented as Equation (14): First of all, opposition-based learning is used to generate initial population for improving the quality and diversity of initial population. In addition, the Lévy flight is utilized in Equations (2) and (6). Finally, fitness-based learning is introduced into the whole searching process of sparrows.
The goal of DeepAR is to model the conditional distribution which is presented as Equation (14): where z i,t is the value of time series i at time t. Given the past series z i,1 , z i,2 , · · · , z i,t 0 −2 , z i,t 0 −1 , this model can be employed to predict the future series z i,t 0 , z i,t 0 +1 , · · · , z i,T , where the t 0 is the time point from which z i.t needs to be predicted. [1 : t 0 − 1] and [t 0 : T] represent the conditioning range and prediction range, respectively. The DeepAR model will predict the value of prediction range based on the value of the conditioning range. If covariate time series x i is introduced in the model, the value of the x i from time 1 to time T (x i,1:T ) will also be used for forecasting. However, the value of covariate time series must be available during the entire time period. How the model works in conditioning range is shown in Figure 2. DeepAR assumes that P(z i,t 0 :T z i,1:t 0 −1 , x i,1:T ) consists of likelihood factors. These likelihood factors are defined as Equations (15) and (16).
where , is the value of time series i at time t. Given the past series [ , , , , ⋯ , , , , ] , this model can be employed to predict the future series [ , , , , ⋯ , , ], where the t0 is the time point from which . needs to be predicted. [1: − 1] and [ : ] represent the conditioning range and prediction range, respectively. The DeepAR model will predict the value of prediction range based on the value of the conditioning range. If covariate time series is introduced in the model, the value of the from time 1 to time T ( , : ) will also be used for forecasting. However, the value of covariate time series must be available during the entire time period.
How the model works in conditioning range is shown in Figure 2. DeepAR assumes that ( , : | , : , , : ) consists of likelihood factors. These likelihood factors are defined as Equations (15) and (16). is the previous output of neural network, , is the value of covariates at time t, , is the target value at time t-1. The value of , is available during conditioning range and is used to train the model. The , is unknown during prediction range, and , is replaced by ̃ , when DeepAR is utilized to achieve multi-step forecasts.
where , is the output of a multi-layer recurrent neural network constructed by an LSTM cell which is parametrized by Ɵ. Given a time series as conditioning range, we can obtain ℎ , by Equation (16) as the initial state. For prediction range, we can sample ̃ , by (• | (ℎ , , Ɵ)), where ℎ , = ℎ(ℎ , ,̃ , , , , Ɵ). The samples achieved in this way Figure 2. Process of DeepAR in conditioning range. At each time point t, the inputs of DeepAR model are x i,t , z i,t−1 and h i,t−1 , where h i,t−1 is the previous output of neural network, x i,t is the value of covariates at time t, z i,t−1 is the target value at time t−1. The value of z i,t−1 is available during conditioning range and is used to train the model. The z i,t−1 is unknown during prediction range, and z i,t−1 is replaced by z i,t−1 when DeepAR is utilized to achieve multi-step forecasts.
where h i,t is the output of a multi-layer recurrent neural network constructed by an LSTM cell which is parametrized by θ. Given a time series as conditioning range, we can obtain h i,t o −1 by Equation (16) as the initial state. For prediction range, we can sample z i,t by The samples achieved in this way could be used to compute some statistics we are interested in, such as the mean and quantile, in a future period.
In this paper, we assume PM2.5 concentration obeys normal distribution. We give z by Equation (17). Here, µ and σ are given by Equations (18) and (19). After training the model by using the conditioning range, the trained model is used to predict the mean value µ and variance σ of each time point. Then, we can obtain joint samples by using N(µ, σ) and use them to compute some statistics of interest.

Framework of FOSSA-DeepAR
In order to provide an accurate point prediction and interval prediction of PM2.5 concentration, a novel hybrid model is established based on the FOSSA and DeepAR. FOSSA is utilized to optimize the initial weights of DeepAR. The structure of the proposed approach is summarized as follows: Step 1.
Data preprocessing. In order to avoid the gradient vanishing and gradient exploding problem, the data is standardized before training model; Step 2.
Establish DeepAR model. In this study, the recurrent neural network used in DeepAR is the long short-term memory (LSTM) model. In addition, we assume that PM2.5 concentration follows a Gaussian distribution; Step 3.
Optimize the initial weights of DeepAR via FOSSA. It is inefficient and unnecessary to use all samples for weight initialization due to the similarity between some samples. Therefore, the first thousand samples are used to train the initial weight of the DeepAR network. The objective function which FOSSA needs to optimize is the sum of squared errors of the samples; Step 4.
Train FOSSA-DeepAR model. The samples got from the conditioning range are utilized to train FOSSA-DeepAR model. The number of iterations is set to 30; Step 5.
Forecast PM2.5 concentration. The samples resulting from the prediction range will be predicted via the FOSSA-DeepAR model, then we can obtain the point and interval prediction result. In addition, the prediction results will be compared with true values.

Comparison of SSA-Based Algorithms
In this subsection, according to six different benchmark functions shown in Table 1, we compare the performance of FOSSA with other existing improved algorithms of SSA. Unimodal functions only have the global optimum, thus can be used to verify the basic search ability and convergence speed of the algorithm. Conversely, multimodal functions not only have the global optimum, but also the local optimum. Accordingly, they can be used to test the ability of global search. According to [10,23], the dimension of benchmark functions is set to 30.
The experiments are implemented with Spyder (anaconda) running on a PC with Inter (R) Core (TM). The packages used are NumPy and pandas.
The solutions of the optimization algorithms have randomness. Thus, the algorithms run 50 times independently in this study. The maximum number of iterations is set to 30, and the population size is set to 100. In all optimization algorithms, the alarm threshold is 0.8. Here, 10% of the individuals in the population detect danger signals and issue alarm. Except for FOSSA, 20% of individuals of other algorithms are producers, and scroungers account for 80% of the total population. Table 1. Benchmark functions: f 1 -f 3 are unimodal functions and f 4 -f 6 are multimodal functions.

Function
Range Optimal Dimension , 32] n 0 30 Tables 2 and 3 show the results of the optimization algorithms on unimodal functions and multimodal functions, respectively. The four parameters shown in Tables 2 and 3 are the minimum, maximum, mean, and variance of the optimization results. For the unimodal functions, FOSSA can find the global optimal value of f 1 , f 2 , f 3 , but the other seven algorithms cannot find the global optimal solution. FOSSA can achieve better solutions in fewer iterations. Based on the statistics, the values of standard deviation, maximum, minimum, and mean of FOSSA are the smallest. Hence, the proposed FOSSA can improve the accuracy and stability of SSA. ISSAs1, ISSA1, and SSA have the poorest performance of f 1 , f 2 , and f 3 , respectively.  Table 2 represents the improved sparrow search algorithm proposed in [21].
In the experiment of multimodal function, FOSSA can give the best solution. For all multimodal functions, the four parameters of FOSSA and ISSA are smallest, which can indicate that the two algorithms achieve best performance. In general, ISSA1 has the worst performance among eight algorithms.
As shown in Figure 3, the fitness value of initial population of FOSSA is smallest, which can indicate that the quality of the initial population of FOSSA is best among all algorithms. The opposition-based learning can significantly improve the quality of the initial population than random initialization and chaotic mapping initialization. In addition, the convergence speed of FOSSA is the fastest. It can illustrate that oppositionbased learning, Lévy flight, and fitness-based learning can improve searching speed and avoid getting stuck in local optimal solutions. As a result, these advantages of FOSSA can raise the accuracy of the final result. In the experiment of multimodal function, FOSSA can give the best solution. For all multimodal functions, the four parameters of FOSSA and ISSA are smallest, which can indicate that the two algorithms achieve best performance. In general, ISSA1 has the worst performance among eight algorithms.
As shown in Figure 3, the fitness value of initial population of FOSSA is smallest, which can indicate that the quality of the initial population of FOSSA is best among all algorithms. The opposition-based learning can significantly improve the quality of the initial population than random initialization and chaotic mapping initialization. In addition, the convergence speed of FOSSA is the fastest. It can illustrate that opposition-based learning, Lévy flight, and fitness-based learning can improve searching speed and avoid getting stuck in local optimal solutions. As a result, these advantages of FOSSA can raise the accuracy of the final result. The quality of the initial population generated by the FOSSA algorithm is significantly higher than other algorithms. In addition, the convergence speed and search abilities of FOSSA is better than other algorithms.

Empirical Results and Analysis
In this section, we will use FOSSA-DeepAR model to make point prediction and interval prediction for PM2.5. Hourly PM2.5 concentration data observed at the Huairou monitoring station of Beijing is applied in this paper. The experiments are implemented with Spyder (anaconda) and Tensorflow running on a PC with Inter (R) Core (TM).

Data Set and Evaluation Criteria
In this paper, we use PM2.5 concentration data of the past twenty four hours to predict the PM2.5 concentration an hour later in the future. These hourly PM2.5 concentration time series are collected from 1 January 2020 to 10 April 2021. Specifically, these datasets contain two subsets: training dataset from 1 January 2020 to 3 p.m. on 1 April 2021 for establishing FOSSA-DeepAR model and test dataset from 4 p.m. 1 April 2021 to 23 p.m. 10 April 2021 for verifying the predicting performance of FOSSA-DeepAR. The time series of PM2.5 concentration is shown in Figure 4. The recurrent neural network (RNN) used in DeepAR is the long short-term memory (LSTM) model. After the training process, sample size generated by the model is set to 300, and the point prediction value of time t is the mean value of the 300 samples. Moreover, the variance of 300 samples is assumed as variance value at time t.
terval prediction for PM2.5. Hourly PM2.5 concentration data observed at the Huairou monitoring station of Beijing is applied in this paper. The experiments are implemented with Spyder (anaconda) and Tensorflow running on a PC with Inter (R) Core (TM).

Data Set and Evaluation Criteria
In this paper, we use PM2.5 concentration data of the past twenty four hours to predict the PM2.5 concentration an hour later in the future. These hourly PM2.5 concentration time series are collected from 1 January 2020 to 10 April 2021. Specifically, these datasets contain two subsets: training dataset from 1 January 2020 to 3 p.m. on 1 April 2021 for establishing FOSSA-DeepAR model and test dataset from 4 p.m. 1 April 2021 to 23 p.m. 10 April 2021 for verifying the predicting performance of FOSSA-DeepAR. The time series of PM2.5 concentration is shown in Figure 4. The recurrent neural network (RNN) used in DeepAR is the long short-term memory (LSTM) model. After the training process, sample size generated by the model is set to 300, and the point prediction value of time t is the mean value of the 300 samples. Moreover, the variance of 300 samples is assumed as variance value at time t.
where the T is the number of samples for testing,Ẑ(t) and Z(t) are the prediction value and true value of time t, respectively, Z is the mean value of time series, and U t and L t are the upper and lower bounds of the prediction interval, respectively. The smaller value of TIC, RMSE, MAE, MAPE, and IFNAW mean lower prediction bias and better prediction performance. The higher values of IFCP and R 2 mean superior prediction performance.

Results and Analysis
In order to show the effectiveness of FOSSA-DeepAR model, we use eight benchmark models including DeepAR, SSA-DeepAR, BSSA-DeepAR, CASSA-DeepAR, CSSA-DeepAR, HSSA-DeepAR, ISSA-DeepAR, and ISSA1-DeepAR.       Table 4 shows the evaluation indices of different models for point prediction of PM2.5 concentration. From Table 4, we see that the forecasting accuracy of the nine models are different. TIC, RMSE, MAE, and MAPE of the FOSSA-DeepAR model are lower than other hybrid models and the single DeepAR model. At the same time, of FOSSA-DeepAR is higher than other models. Accordingly, FOSSA-DeepAR outperforms other hybrid models and the single model for point prediction of PM2.5 concentration. Compared with a single DeepAR model, FOSSA-DeepAR model can reduce the RMSE of point prediction by an average of 5.76%, MAE by an average 10.39%, and MAPE by average 9.76%. In addition, compared with other hybrid models, FOSSA-DeepAR reduces the RMSE, MAE, and MAPE by an average of 2.834%, 6.243%, and 22.55%, respectively. In addition, the of FOSSA-DeepAR is 0.93 and is higher than other benchmark models.  Table 4 shows the evaluation indices of different models for point prediction of PM2.5 concentration. From Table 4, we see that the forecasting accuracy of the nine models are different. TIC, RMSE, MAE, and MAPE of the FOSSA-DeepAR model are lower than other hybrid models and the single DeepAR model. At the same time, R 2 of FOSSA-DeepAR is higher than other models. Accordingly, FOSSA-DeepAR outperforms other hybrid models and the single model for point prediction of PM2.5 concentration. Compared with a single DeepAR model, FOSSA-DeepAR model can reduce the RMSE of point prediction by an average of 5.76%, MAE by an average 10.39%, and MAPE by average 9.76%. In addition, compared with other hybrid models, FOSSA-DeepAR reduces the RMSE, MAE, and MAPE by an average of 2.834%, 6.243%, and 22.55%, respectively. In addition, the R 2 of FOSSA-DeepAR is 0.93 and is higher than other benchmark models. The FOSSA-DeepAR model has better model fitting capability and more accurate point prediction. For the interval prediction, IFCP of FOSSA-DeepAR is 14.79% higher than the DeepAR, 16.63% higher than other hybrid models, on average. FOSSA has a better performance for optimizing the initial weight of the DeepAR model. FOSSA can significantly improve the prediction accuracy of DeepAR.  Figure 7 shows the interval prediction results PM2.5 concentration. The blue shaded area is the interval prediction of PM2.5 concentration. Higher IFCP means a stronger interval prediction ability of model. In addition, the lower IFNAW indicates a higher accuracy of interval prediction. The minimum value of IFCP is 0.7175, which is 21.87% lower than FOSSA-DeepAR. ISSA1-DeepAR has the poorest performance for interval prediction. The proposed FOSSA-DeepAR can improve the interval prediction performance of PM2.5 concentration.  Figure 7 shows the interval prediction results PM2.5 concentration. The blue shaded area is the interval prediction of PM2.5 concentration. Higher IFCP means a stronger interval prediction ability of model. In addition, the lower IFNAW indicates a higher accuracy of interval prediction. The minimum value of IFCP is 0.7175, which is 21.87% lower than FOSSA-DeepAR. ISSA1-DeepAR has the poorest performance for interval prediction. The proposed FOSSA-DeepAR can improve the interval prediction performance of PM2.5 concentration. To further demonstrate the forecasting ability of the proposed model, Diebold-Mariano hypothesis test [24] is utilized to verify that sults of FOSSA-DeepAR are significantly different from the results of an models. In other words, the significance difference represents the differe diction performance. According to the results shown in Table 5, all SSA can reinforce the prediction performance of DeepAR. However, compar brid models, the result of FOSSA-DeepAR has a significant difference. that FOSSA-DeepAR model has a more effective forecasting ability. FOS performs other hybrid models and single model as the initial weights o timized significantly by FOSSA.  To further demonstrate the forecasting ability of the proposed FOSSA-DeepAR model, Diebold-Mariano hypothesis test [24] is utilized to verify that the prediction results of FOSSA-DeepAR are significantly different from the results of another benchmark models. In other words, the significance difference represents the difference in model prediction performance. According to the results shown in Table 5, all SSA-based algorithms can reinforce the prediction performance of DeepAR. However, compared with other hybrid models, the result of FOSSA-DeepAR has a significant difference. The result means that FOSSA-DeepAR model has a more effective forecasting ability. FOSSA-DeepAR outperforms other hybrid models and single model as the initial weights of DeepAR are optimized significantly by FOSSA.

Robustness Analysis of FOSSA-DeepAR
To further show the stability of FOSSA-DeepAR, this proposed model is applied to predict PM2.5 concentration and O 3 concentration observed by Taian monitoring station in Shandong, O 3 concentration observed by Huairou monitoring station in Beijing. For the conditioning range and prediction range, the last 200 time points of each time series will be used as the prediction range, and other time series are used to train the model. Table 6 gives

Robustness Analysis of FOSSA-DeepAR
To further show the stability of FOSSA-DeepAR, this proposed model is applied to predict PM2.5 concentration and O3 concentration observed by Taian monitoring station in Shandong, O3 concentration observed by Huairou monitoring station in Beijing. For the conditioning range and prediction range, the last 200 time points of each time series will be used as the prediction range, and other time series are used to train the model.   10 show the interval prediction of O3 concentration in Beijing, PM2.5 concentration in Taian, and O3 concentration in Taian, respectively. Most of the true values of atmospheric concentration are included in the interval prediction. The range of interval prediction is small, which means the accuracy of interval prediction is good. Accordingly, the proposed FOSSA-DeepAR model can be utilized for different atmospheric concentration and different regions.

Conclusions and Future Research
In this paper, a hybrid learning approach based on FOSSA and DeepAR is proposed to obtain the point and interval prediction of the PM2.5 concentration. In this paper, a novel hybrid learning approach based on FOSSA and DeepAR is proposed to simultaneously obtain point and interval predictions of PM2.5 concentration. Firstly, driven by the ABC algorithm, we use fitness-based learning, opposition-based learning, and Lévy flight to improve SSA, which outperforms other existing SSA-based algorithms. Thus, we introduce the DeepAR model into the field of air quality prediction, and use the proposed FOSSA to optimize DeepAR. Consequently, a FOSSA-DeepAR learning method is established. Moreover, we use the FOSSA-DeepAR hybrid learning method to predict PM2.5 and O3 concentration in Beijing and Taian. The empirical results show the powerful optimization capabilities of FOSSA, and the outstanding prediction performance of FOSSA-DeepAR.
We know that PM2.5 concentration is affected by many factors, such as air humidity, air pressure, and landforms, etc. These factors are not considered in this paper, which is a limitation. In future work, we will pay close attention to covariate time series in the FOSSA-DeepAR model. We will also take into account the prediction capability of the proposed model in different areas and countries, such as Brazil and Mexico. Other probability can also be utilized in the FOSSA-DeepAR model.

Conclusions and Future Research
In this paper, a hybrid learning approach based on FOSSA and DeepAR is proposed to obtain the point and interval prediction of the PM2.5 concentration. In this paper, a novel hybrid learning approach based on FOSSA and DeepAR is proposed to simultaneously obtain point and interval predictions of PM2.5 concentration. Firstly, driven by the ABC algorithm, we use fitness-based learning, opposition-based learning, and Lévy flight to improve SSA, which outperforms other existing SSA-based algorithms. Thus, we introduce the DeepAR model into the field of air quality prediction, and use the proposed FOSSA to optimize DeepAR. Consequently, a FOSSA-DeepAR learning method is established. Moreover, we use the FOSSA-DeepAR hybrid learning method to predict PM2.5 and O3 concentration in Beijing and Taian. The empirical results show the powerful optimization capabilities of FOSSA, and the outstanding prediction performance of FOSSA-DeepAR.
We know that PM2.5 concentration is affected by many factors, such as air humidity, air pressure, and landforms, etc. These factors are not considered in this paper, which is a limitation. In future work, we will pay close attention to covariate time series in the FOSSA-DeepAR model. We will also take into account the prediction capability of the proposed model in different areas and countries, such as Brazil and Mexico. Other probability can also be utilized in the FOSSA-DeepAR model.

Conclusions and Future Research
In this paper, a hybrid learning approach based on FOSSA and DeepAR is proposed to obtain the point and interval prediction of the PM2.5 concentration. In this paper, a novel hybrid learning approach based on FOSSA and DeepAR is proposed to simultaneously obtain point and interval predictions of PM2.5 concentration. Firstly, driven by the ABC algorithm, we use fitness-based learning, opposition-based learning, and Lévy flight to improve SSA, which outperforms other existing SSA-based algorithms. Thus, we introduce the DeepAR model into the field of air quality prediction, and use the proposed FOSSA to optimize DeepAR. Consequently, a FOSSA-DeepAR learning method is established. Moreover, we use the FOSSA-DeepAR hybrid learning method to predict PM2.5 and O3 concentration in Beijing and Taian. The empirical results show the powerful optimization capabilities of FOSSA, and the outstanding prediction performance of FOSSA-DeepAR.
We know that PM2.5 concentration is affected by many factors, such as air humidity, air pressure, and landforms, etc. These factors are not considered in this paper, which is a limitation. In future work, we will pay close attention to covariate time series in the FOSSA-DeepAR model. We will also take into account the prediction capability of the proposed model in different areas and countries, such as Brazil and Mexico. Other probability can also be utilized in the FOSSA-DeepAR model.