Predicting of Daily PM2.5 Concentration Employing Wavelet Artificial Neural Networks Based on Meteorological Elements in Shanghai, China

Anthropogenic sources of fine particulate matter (PM2.5) threaten ecosystem security, human health and sustainable development. The accuracy prediction of daily PM2.5 concentration can give important information for people to reduce their exposure. Artificial neural networks (ANNs) and wavelet-ANNs (WANNs) are used to predict daily PM2.5 concentration in Shanghai. The PM2.5 concentration in Shanghai from 2014 to 2020 decreased by 39.3%. The serious COVID-19 epidemic had an unprecedented effect on PM2.5 concentration in Shanghai. The PM2.5 concentration during the lockdown in 2020 of Shanghai is significantly reduced compared to the period before the lockdown. First, the correlation analysis is utilized to identify the associations between PM2.5 and meteorological elements in Shanghai. Second, by estimating twelve training algorithms and twenty-one network structures for these models, the results show that the optimal input elements for daily PM2.5 concentration predicting models were the PM2.5 from the 3 previous days and fourteen meteorological elements. Finally, the activation function (tansig-purelin) for ANNs and WANNs in Shanghai is better than others in the training, validation and forecasting stages. Considering the correlation coefficients (R) between the PM2.5 in the next day and the input influence factors, the PM2.5 showed the closest relation with the PM2.5 1 day lag and closer relationships with minimum atmospheric temperature, maximum atmospheric pressure, maximum atmospheric temperature, and PM2.5 2 days lag. When Bayesian regularization (trainbr) was used to train, the ANN and WANN models precisely simulated the daily PM2.5 concentration in Shanghai during the training, calibration and predicting stages. It is emphasized that the WANN1 model obtained optimal predicting results in terms of R (0.9316). These results prove that WANNs are adept in daily PM2.5 concentration prediction because they can identify relationships between the input and output factors. Therefore, our research can offer a theoretical basis for air pollution control.


Introduction
Air pollution affects global climate change, ecosystem and human health [1][2][3][4][5]. Additionally, air pollution also leads to huge losses in human capital, productive forces and social welfare [6]. Air pollution is responsible for millions of deaths all over the world [7]. Exposure to air pollution resulted in 7 million premature deaths all over the whole world in 2019 [8]. In total, 1.42 million deaths in China were ascribed to outdoor air pollution in and so on. There is good similarity in the predictive and metrical PM 2.5 for training in the ANN [45].
Although the deep neural network is powerful, it still has many shortcomings. First, there are too many parameters in DNN, and the learning performance depends heavily on careful parameter adjustment. Secondly, the training of DNN requires a large amount of training data, so it is laborious to apply DNN to tasks with only small-scale training data [46]. In addition, the challenges faced by DL are more common, such as the deficiency of theoretical basis, the insufficiency of interpretability of models, and the need for big amounts of computing resources [47].
There are various types of mother wavelet functions. According to the diurnal variation of air pollutant concentration, each wavelet has its pros and cons in the air pollutant concentration decomposition properties [48]. Using wavelet transform to transform highly variable air pollutant concentrations into several low variability subsequences has distinct merits. For most models, wavelet transform is an effective technique to increase the forecasting accuracy [49]. The basic prediction model uses wavelet transform to decompose the air pollutant concentrations, and then uses artificial neural networks to predict it.
This paper proposes a hybrid model of wavelet transform and ANN (WANN) solution to the problem of predicting the daily PM 2.5 concentration. To avoid overfitting, the improved algorithms are utilized for modeling, such as trainbr and trainlm. The hybrid model provides a novel alternative for forecasting daily PM 2.5 concentration.

Study Location and Data Sources
Shanghai is the largest city in China (Figure 1a). Shanghai is located in East China with the area of 6340 km 2 , and it is at the estuary of the Yangtze River. The average altitude of Shanghai is 2.19 m, and the permanent resident population and the GDP in 2021 were about 24.8943 million and CNY 4321.485 billion. Previous researchers have put forward different ML algorithms used for data modeling. Some researchers have proved that ANN has good learning efficiency and is extensively utilized in forecasting groundwater level [41], the COVID-19 epidemic [42], air pollution [43,44], and so on. There is good similarity in the predictive and metrical PM2.5 for training in the ANN [45].
Although the deep neural network is powerful, it still has many shortcomings. First, there are too many parameters in DNN, and the learning performance depends heavily on careful parameter adjustment. Secondly, the training of DNN requires a large amount of training data, so it is laborious to apply DNN to tasks with only small-scale training data [46]. In addition, the challenges faced by DL are more common, such as the deficiency of theoretical basis, the insufficiency of interpretability of models, and the need for big amounts of computing resources [47].
There are various types of mother wavelet functions. According to the diurnal variation of air pollutant concentration, each wavelet has its pros and cons in the air pollutant concentration decomposition properties [48]. Using wavelet transform to transform highly variable air pollutant concentrations into several low variability subsequences has distinct merits. For most models, wavelet transform is an effective technique to increase the forecasting accuracy [49]. The basic prediction model uses wavelet transform to decompose the air pollutant concentrations, and then uses artificial neural networks to predict it.
This paper proposes a hybrid model of wavelet transform and ANN (WANN) solution to the problem of predicting the daily PM2.5 concentration. To avoid overfitting, the improved algorithms are utilized for modeling, such as trainbr and trainlm. The hybrid model provides a novel alternative for forecasting daily PM2.5 concentration.

Study Location and Data Sources
Shanghai is the largest city in China (Figure 1a). Shanghai is located in East China with the area of 6340 km 2 , and it is at the estuary of the Yangtze River. The average altitude of Shanghai is 2.19 m, and the permanent resident population and the GDP in 2021 were about 24.8943 million and CNY 4321.485 billion. In this paper, the air pollution data sets and meteorological data sets of Shanghai from 1 January 2014 to 31 December 2020 are utilized. The daily PM2.5 concentration data are from the mean values of twenty monitoring sites (stations) in Shanghai and can be In this paper, the air pollution data sets and meteorological data sets of Shanghai from 1 January 2014 to 31 December 2020 are utilized. The daily PM 2.5 concentration data are from the mean values of twenty monitoring sites (stations) in Shanghai and can be obtained on the website of China Environmental Monitoring Station (http://www.cnemc.cn/) (accessed on 21 January 2022) and platform (http://www.aqistudy.cn/) (accessed on 22 January 2022) (Figure 1b). Table 1 displays the list of the monitoring stations used in this study. The data of meteorological elements (including temperature, precipitation, humidity, wind, atmospheric pressure, etc.) are from the average value of the observation station of the China Meteorological Administration. These data are divided into three stages, namely, the training stage (80%), the verification stage (10%) and the prediction stage (10%). The training stage is from 1 January 2014 to 30 June 2019, the verification stage is from 1 July 2019 to 31 March 2020, and the prediction stage is from 1 April 2020 to 31 December 2020.

Wavelet Transformation (WT)
Wavelet transformation (WT) is one of the waveform analytical methods for timevarying signals. In wavelet transform, the wavelet coefficients can be obtained by convolution integration of the mother wavelet function and the original time domain signal. Discrete wavelet transform (DWT) has the advantage of less computational expense than continuous wavelet transform (CWT). The Daubechies (db) wavelet is the most commonly utilized mother wavelet function. The Mallat pyramidal algorithm is used to compute DWT. Therefore, the DWT is used to decompose the daily PM 2.5 concentration data and meteorological elements data [50]. The DWT of a time series f(q) is defined as Equation (1): where ψ(h) expresses the fundamental wavelet of effective length h; c expresses the scale or dilation factor; and d expresses the translation time. For a discrete signal y, the DWT is defined by multi-resolution decomposition, which can be computed by the Mallat decomposition algorithm and Mallat pyramidal reconstruction algorithm [41]. For m-level decomposition and reconstruction, the original signal y can be expressed as where CAm is the approximation series representing the low-frequency component, which contains trend information, and CDi is the detail series on the i level representing the high-frequency component, which contains periodic information. Basically, this is a process in which the low-frequency sequence is decomposed into low-frequency subsequences and relatively high-frequency subsequences with the increase in m ( Figure 2). The results of the 2-level wavelet decomposition of the original time series of PM 2.5 concentration by applying bior1.1 wavelets was implemented in the wavelet toolbox of MATLAB. The main purpose of utilizing the discrete wavelet transform is to reduce the complexity of the input signal and the amount of relevant information between the decomposition combinations (detailed CD2, CD1 and approximate CA2). Discrete wavelet transform could be used to approximate components to obtain low dimensional components and gain components for multidimensional analysis. The main purpose of utilizing the discrete wavelet transform is to reduce the complexity of the input signal and the amount of relevant information between the decomposition combinations (detailed CD2, CD1 and approximate CA2). Discrete wavelet transform could be used to approximate components to obtain low dimensional components and gain components for multidimensional analysis.

Artificial Neural Network (ANN)
An artificial neural network (ANN) is a part of AI. It simulates the prediction and recognition functions of the biological brain and is used to solve complex problems in various application fields. The typical network architecture of an ANN consists of three layers (i.e., input layer, hidden (implication) layer and output layer), each one composed of several artificial neurons and an activation function. Each artificial neuron is contacted via weights and gains information from the correlative neurons for processing. Owing to its strong nonlinear processing features, ANN could output nonlinear relationships of many complicated scientific problems. The proposed ANN model for predicting the daily PM2.5 concentration is displayed in Figure 3. The seventeen input neurons of the input layer are designed as the key operating parameters, which include precipitation (P), extreme wind velocity (EWV), mean atmospheric pressure (MAP), mean wind velocity (MWV), mean atmospheric temperature (MAT), mean water vapor pressure (MWP), mean relative humidity (MRH), sunshine hours (SH), minimum atmospheric pressure (MINAP), minimum atmospheric temperature (MINAT), maximum atmospheric pressure (MAXAP), maximum atmospheric temperature (MAXAT), maximum wind velocity (MAXWV), minimum relative humidity (MINRH), PM2.5 (t), PM2.5 (t -1), and PM2.5 (t -2).

Artificial Neural Network (ANN)
An artificial neural network (ANN) is a part of AI. It simulates the prediction and recognition functions of the biological brain and is used to solve complex problems in various application fields. The typical network architecture of an ANN consists of three layers (i.e., input layer, hidden (implication) layer and output layer), each one composed of several artificial neurons and an activation function. Each artificial neuron is contacted via weights and gains information from the correlative neurons for processing. Owing to its strong nonlinear processing features, ANN could output nonlinear relationships of many complicated scientific problems. The proposed ANN model for predicting the daily PM 2.5 concentration is displayed in Figure 3. The seventeen input neurons of the input layer are designed as the key operating parameters, which include precipitation (P), extreme wind velocity (EWV), mean atmospheric pressure (MAP), mean wind velocity (MWV), mean atmospheric temperature (MAT), mean water vapor pressure (MWP), mean relative humidity (MRH), sunshine hours (SH), minimum atmospheric pressure (MINAP), minimum atmospheric temperature (MINAT), maximum atmospheric pressure (MAXAP), maximum atmospheric temperature (MAXAT), maximum wind velocity (MAXWV), minimum relative humidity (MINRH), PM 2.5 (t), PM 2.5 (t − 1), and PM 2.5 (t − 2).
Back propagation (BP) is the most commonly used and effective method to train the artificial neural network (ANN) algorithm. In the process of model development, there are two phases of forward propagation and error back propagation. The hidden (implication, middle) layer neurons calculate the weighted summation of the acquired input layer information s using Equations (3) and (4), and transmit these to the coming layer through the activation function (transfer function), then contrast the error criterions between the input value and the metrical value, then transfer the error back to the input layer, and decrease the error to the goal standard by altering the relation weight and thresholds (deviations or biases) [51].
where k is the weighted total, w ij is the relation weight, j is the number of neurons in the output layer, O i is the input data, and p is the biases (deviation or thresholds) value, utilized to balance the effect of the activation function. Q is the output data, and f is the activation function. After the forward propagation transversion of the signals, the global error is counted. If the global error is lower than the setting error (10 −5 ), the backward propagation of the global error is completed to change the weights and thresholds. The back propagation of the global error function is counted as in Equation (5): where E is the error of the current output, T j is the target output, Q j is the predicted output, and l is the total output number (2004). After adjusting and training the network model, the messages of the input parameters could be stored for modelling, such as weights and thresholds (biases).  Back propagation (BP) is the most commonly used and effective method to train the artificial neural network (ANN) algorithm. In the process of model development, there are two phases of forward propagation and error back propagation. The hidden (implication, middle) layer neurons calculate the weighted summation of the acquired input layer information s using Equations (3) and (4), and transmit these to the coming layer through the activation function (transfer function), then contrast the error criterions between the input value and the metrical value, then transfer the error back to the input layer, and decrease the error to the goal standard by altering the relation weight and thresholds (deviations or biases) [51].
where k is the weighted total, wij is the relation weight, j is the number of neurons in the output layer, Oi is the input data, and p is the biases (deviation or thresholds) value, utilized to balance the effect of the activation function. Q is the output data, and f is the activation function. After the forward propagation transversion of the signals, the global Four kinds of activation functions are usually utilized in BPANN are sigmoid (logsig), tanh (tansig), purelin and ReLU (poslin) functions, which are logarithmic sigmoid, hyperbolic tangent sigmoid, linear, and positive linear transfer functions, respectively. The four functions of the network are defined as follows: purelin(r) = r (8) where r is the corresponding input. Artificial neural networks could fulfil well in the training information, but not well in the forecasting information, which explains that they perform poorly as different information or error increases. When the artificial neural network (ANN) cannot generalize this problem, it is called "overfitting". This problem could be solved utilizing the Bayesian regularization algorithm (BR, or trainbr), Levenberg-Marquardt algorithm (LM, or trainlm) or other training algorithms [52]. Trainbr is a function which updates weights and threshold (bias or deviation) values on the basis of LM optimization. It minimizes the union of square error and weight, and then ascertains the correct union to generate a network with good generalization. In addition, the LM algorithm (trainlm) is a variant of Newton's way, which is devised to minimize the sum of squares of other nonlinear functions. While the property function has the modality of the summation of squares, the Hessian matrix could be calculated approximately as the outcome of the Jacobian matrix, which is much less complicated than calculating the Hessian matrix.
The raw data were normalized, for quick convergence, and rendered dimensionless. The results after treatment are as follows: where S is the normalized data for the original variable, s min is the minimum of the raw data, s max is the maximum of the raw data, and s denotes the original data.

Wavelet Artificial Neural Network
The WANN model is utilized to decompose the raw data Dn (t) into three suites: CD2, CD1 and CA2. After that, these data are employed by the ANN as the input factors. In Figure 4, Dn (t) is the input factors of day t, PM 2.5 (t + 1) is the PM 2.5 predicted t + 1 day in the future.

Performance criteria
Three kinds of statistical indicators were adopted to appraise the nature of ANN and WANN models. These are mean absolute error (MAE), root mean square error (RMSE), and correlation coefficient (R), which are as follows:

Performance Criteria
Three kinds of statistical indicators were adopted to appraise the nature of ANN and WANN models. These are mean absolute error (MAE), root mean square error (RMSE), and correlation coefficient (R), which are as follows: A k expresses the kth observed PM 2.5 concentration, C k expresses the kth predicted A is the mean of the observed PM 2.5 concentration, − C is the mean of the predicted PM 2.5 concentration, and U is the number of observed PM 2.5 concentration. Although the air quality in Shanghai has improved a lot, it exceeded the new Global Air Quality Guidelines (AQGs) of the WHO standard (5 µg/m 3 above the annual PM 2.5 limit). The average value change of PM 2.5 in 7 years has a U-shaped characteristic, with the maximum in January and the minimum in August. The seasonal average value of PM 2.5 in 7 years has obvious change characteristics, with the maximum in winter, followed by spring, autumn and the minimum in summer.

Relevance between Daily PM2.5 Concentration and Meteorological Factors in Shanghai.
Correlation analysis could ascertain the linear relationships between PM2.5 concen- 2019. The air quality during lockdown in 2020 is apparently improved compared with that in the same period of 2019.

Relevance between Daily PM 2.5 Concentration and Meteorological Factors in Shanghai
Correlation analysis could ascertain the linear relationships between PM 2.5 concentration and meteorological elements. The determination of input variables is one of the most significant parts in the projection of ANN and WANN models. The results of the relationships counted for the input factors are shown in Figure 6, which is significant at the 0.01 level (2-tailed).
Shanghai. These data in 2019-2020 are divided into three parts: period I (1 January to 26 January, 2019-2020); period II (27 January to 30 April, 2019-2020); and period III (1 May to 31 July, 2019-2020). Period II is the lockdown period. The values of PM2.5 during period I, period II, and period III in 2020 are, respectively, 52.62, 32.41, and 31.99 (μg/m 3 ). However, those are, respectively, 53.12, 43.11, and 29.58 (μg/m 3 ) in the same period of 2019. Compared with those values in 2019, these values of PM2.5 in 2020 decreased by 0.5, 10.7, and −2.41 (μg/m 3 ), which are respectively 0.9, 24.8, and −8.2% lower in 2020 than those in 2019. The air quality during lockdown in 2020 is apparently improved compared with that in the same period of 2019.

Relevance between Daily PM2.5 Concentration and Meteorological Factors in Shanghai.
Correlation analysis could ascertain the linear relationships between PM2.5 concentration and meteorological elements. The determination of input variables is one of the most significant parts in the projection of ANN and WANN models. The results of the relationships counted for the input factors are shown in Figure 6, which is significant at the 0.01 level (2-tailed). The correlation between each factor and PM 2.5 (t + 1) was appraised by determining its R. The analysis results exhibited that PM 2.5 (t) was strongly related to PM 2.5 (t + 1) in Shanghai. In addition, the performance results of MINAT, MAXAP, MAXAT, PM 2.5 (t), PM 2.5 (t − 1) were better than other factors in Shanghai. That is, these meteorological factors have the highest correlation to PM 2.5 (t + 1). We ascertained five important factors. Therefore, various combinations of factors were used as inputs for simulating daily PM 2.5 (t + 1) in Table 2. For example, the network structure 17:5:1 in Table 1 indicates that there are 17 neurons in the input layer, 5 neurons in the hidden layer, and 1 neuron in the output layer. Other network structures are similar. The ascertaining factors were chosen based on relationship with PM 2.5 . Table 2. Sets of input factors that were tested with the ANN and WANN models for the predicting of next-day PM 2.5 concentrations in Shanghai.

Determination of Model Structure and Parameters
It should be stressed that selecting the most compatible network structure is one of the important assignments of the model designer. The important information obtained from meteorological elements is extracted by discrete wavelet transform (DWT). The various details and dimensions of input factors are gained by two-period decomposition of WT. After two-period decomposition and reconstruction, the input factors are changed into three parts. The approximate component CA2 represents the low frequency information of the raw factor, while the detailed CD2 and CD1 represent the high-frequency information of the raw factor. The change characteristics of time series are the key factors affecting wavelet selection [53]. In order to optimize the decomposition of input factors, the mother wavelet is chosen, and the correlation between CD1, CD2 and CA2 is considered. The minimum R could primely meet the objective of analyzing the change characteristics of various components of input factors. The quantitative estimation showed that the components were independent of one another. In total, 21 types of wavelet functions are ascertained for wavelet transform. The mother wavelets (wavelet functions) evaluated are Daubechies (db), symlets (sym), coiflets (coif), and biorthogonal wavelets (bior) [54]. Coiflets are a family of compactly supported orthogonal wavelets. Figure 7 shows that bior1.1 is the best wavelet function in the current research on account of the smallest R. bior1.1 is a biorthogonal wavelet [54]. The correlation coefficients between CD1, CD2 and CA2 after the input factors are decomposed by bior1.1 are all 0. . Correlation between CA2, CD1 and CD2 for distinct wavelet functions (mother wavelets) in Shanghai during the testing phase; (a) wavelet Daubechies (db) and symlets (sym); (b) wavelet coiflets (coif) and biorthogonal wavelets (bior). Bior6.8 is a biorthogonal wavelet with an even symmetric high-pass decomposition filter.
The optimal model parameters are obtained by the trial-and-error method. Figure 8 demonstrates that the network structures (17-15-1 for ANN and 51-20-1 for WANN) are superior to other network topologies through repeated tests. In the models, the number of neurons of the hidden (implication, middle) layer increases from 1 to 21. It can be seen from Figure 6 that the RMSE value decreases slightly with the increase in the number of hidden (implication, middle) layer neurons. Consequently, the optimal structures of the mode for Shanghai are 17-15-1 (ANN) and 51-20-1 (WANN), respectively. Figure 8 expresses the properties of the training algorithms, indicating that the trainbr algorithm has the best property in forecasting PM2.5 (t + 1) in Shanghai. Trainbr automatically fits the optimal values of the objective function parameters. Figure 7. Correlation between CA2, CD1 and CD2 for distinct wavelet functions (mother wavelets) in Shanghai during the testing phase; (a) wavelet Daubechies (db) and symlets (sym); (b) wavelet coiflets (coif) and biorthogonal wavelets (bior). Bior6.8 is a biorthogonal wavelet with an even symmetric high-pass decomposition filter.
The optimal model parameters are obtained by the trial-and-error method. Figure 8 demonstrates that the network structures (17-15-1 for ANN and 51-20-1 for WANN) are superior to other network topologies through repeated tests. In the models, the number of neurons of the hidden (implication, middle) layer increases from 1 to 21. It can be seen from Figure 6 that the RMSE value decreases slightly with the increase in the number of hidden (implication, middle) layer neurons. Consequently, the optimal structures of the mode for Shanghai are 17-15-1 (ANN) and 51-20-1 (WANN), respectively. Figure 8 expresses the properties of the training algorithms, indicating that the trainbr algorithm has the best property in forecasting PM 2.5 (t + 1) in Shanghai. Trainbr automatically fits the optimal values of the objective function parameters. Figure 9 shows that the activation function (tansig-purelin) for ANN in Shanghai is better than others during training, calibration and predicting stages. In the same way, the transfer function (tansig-purelin) for WANN in Shanghai is also better than others in Figure 9.
of neurons of the hidden (implication, middle) layer increases from 1 to 21. It can be seen from Figure 6 that the RMSE value decreases slightly with the increase in the number of hidden (implication, middle) layer neurons. Consequently, the optimal structures of the mode for Shanghai are 17-15-1 (ANN) and 51-20-1 (WANN), respectively. Figure 8 expresses the properties of the training algorithms, indicating that the trainbr algorithm has the best property in forecasting PM2.5 (t + 1) in Shanghai. Trainbr automatically fits the optimal values of the objective function parameters.  Figure 9 shows that the activation function (tansig-purelin) for ANN in Shanghai is better than others during training, calibration and predicting stages. In the same way, the transfer function (tansig-purelin) for WANN in Shanghai is also better than others in Figure 9.

Comparative Analysis of the Different PM2.5 Predicting Models
All results of the ANNs and WANNs during the training, validation and predicting stage are shown in Table 3. We used the ten-fold cross-validation method to verify the models. During the training stage, the root mean square errors (RMSEs) of ANN1 and WANN1 in Shanghai were 20.7841 and 9.8824, respectively; mean absolute errors (MAEs) were 15.0825 and 7.1153, respectively; and correlation coefficients (Rs) were 0.7061 and 0.9416, respectively. In the meantime, RMSE, MAE, and R for ANN2, ANN3, ANN4, WANN2, WANN3, and WANN4 have similar results. During the training stage, the WANNs were superior to the ANNs. During the verification stage, the RMSEs of ANN1 and WANN1 in Shanghai were 17.0006 and 9.7850, respectively; MAEs were 13.1262 and 6.8827, respectively; and Rs were 0.6830 and 0.8969, respectively. During the predicting stage, the RMSEs of ANN1 and WANN1 in Shanghai were 24.2407 and 10.6580, respectively; MAEs were 17.7867 and 7.6918, respectively; and Rs were 0.5618 and 0.9316, respectively. In the above three stages, the WANNs were also superior to the ANNs. The WANN1 model based on all 17 input variables is the best model in predicting PM2.5 concentration. The WANN2 model based on five input variables is the second-best model for predicting PM2.5 concentration. It is interesting that the performance of WANN2 is similar to WANN1. These two models can meet the PM2.5 concentration pre-

Comparative Analysis of the Different PM 2.5 Predicting Models
All results of the ANNs and WANNs during the training, validation and predicting stage are shown in Table 3. We used the ten-fold cross-validation method to verify the models. During the training stage, the root mean square errors (RMSEs) of ANN1 and WANN1 in Shanghai were 20.7841 and 9.8824, respectively; mean absolute errors (MAEs) were 15.0825 and 7.1153, respectively; and correlation coefficients (Rs) were 0.7061 and 0.9416, respectively. In the meantime, RMSE, MAE, and R for ANN2, ANN3, ANN4, WANN2, WANN3, and WANN4 have similar results. During the training stage, the WANNs were superior to the ANNs. During the verification stage, the RMSEs of ANN1 and WANN1 in Shanghai were 17.0006 and 9.7850, respectively; MAEs were 13.1262 and 6.8827, respectively; and Rs were 0.6830 and 0.8969, respectively. During the predicting stage, the RMSEs of ANN1 and WANN1 in Shanghai were 24.2407 and 10.6580, respectively; MAEs were 17.7867 and 7.6918, respectively; and Rs were 0.5618 and 0.9316, respectively. In the above three stages, the WANNs were also superior to the ANNs. The WANN1 model based on all 17 input variables is the best model in predicting PM 2.5 concentration. The WANN2 model based on five input variables is the second-best model for predicting PM 2.5 concentration. It is interesting that the performance of WANN2 is similar to WANN1. These two models can meet the PM 2.5 concentration prediction requirements.  Figure 10 displays the forecasting PM 2.5 outcomes and scatter plots with the ANN models in the testing stage in Shanghai. ANNs were able to replicate the average of the daily PM 2.5 concentration but were limited in capturing minimal or maximal peaks. However, the predicted and observed values are relatively scattered.   Figure 11 indicates the forecasting line and scatter plots with the WANNs in the testing stage. The WANNs predicted daily PM2.5 concentration at an acceptable precision level in Shanghai. Additionally, WANNs were apparently superior to ANNs. The WANNs reproduced a good consistency between the observed PM2.5 (t + 1) concentration and predicted PM2.5 (t + 1) concentration. It is also apparent that the WANN1 model with 14 meteorological elements was better than the WANN4 with 1-day lag PM2.5 concentration; in other words, including 14 meteorological elements and the 3 former days' PM2.5 as parameters in the input factors supplies more precise results. The agreement between the observed PM2.5 (t + 1) concentration and the predicted PM2.5 (t + 1) concentration is also very good in Shanghai using the WANN2 model. The main meteorological elements of the WANN2 model are MINAT, MAXAP, and MAXAT in Shanghai. The possible reason is that the relationship between them and PM2.5 is stronger than for other meteorological elements.  Figure 11 indicates the forecasting line and scatter plots with the WANNs in the testing stage. The WANNs predicted daily PM 2.5 concentration at an acceptable precision level in Shanghai. Additionally, WANNs were apparently superior to ANNs. The WANNs reproduced a good consistency between the observed PM 2.5 (t + 1) concentration and predicted PM 2.5 (t + 1) concentration. It is also apparent that the WANN1 model with 14 meteorological elements was better than the WANN4 with 1-day lag PM 2.5 concentration; in other words, including 14 meteorological elements and the 3 former days' PM 2.5 as parameters in the input factors supplies more precise results. The agreement between the observed PM 2.5 (t + 1) concentration and the predicted PM 2.5 (t + 1) concentration is also very good in Shanghai using the WANN2 model. The main meteorological elements of the WANN2 model are MINAT, MAXAP, and MAXAT in Shanghai. The possible reason is that the relationship between them and PM 2.5 is stronger than for other meteorological elements. tion; in other words, including 14 meteorological elements and the 3 former days' PM2.5 as parameters in the input factors supplies more precise results. The agreement between the observed PM2.5 (t + 1) concentration and the predicted PM2.5 (t + 1) concentration is also very good in Shanghai using the WANN2 model. The main meteorological elements of the WANN2 model are MINAT, MAXAP, and MAXAT in Shanghai. The possible reason is that the relationship between them and PM2.5 is stronger than for other meteorological elements.

Comparison with Other Existing PM 2.5 Prediction Models
Many ML means have been utilized for PM 2.5 prediction. Table 4 shows the R 2 , relative errors (REs), RMSE, and MAE of different methods. The value of R 2 was 0.74 while training the ANN with 90% of basic data [45]. ANN was utilized to predict concentration of PM 2.5 for the coming 1 day in Delhi, India. Coefficient of correlations for the ANN is 0.65 [55]. The Trainlm using an ANN modeling nicely forecasted the vehicle exhaust emission of PM 2.5 with the R 2 of 0.94 in Addis Ababa, Ethiopia [56]. The support vector regression (SVR) and multiple linear regression (MLR) models provide more accurate and reliable predictions than other evaluation models. Among the ML models with the best performance, the execution speed of SVR is about five times that of the MLR model, and the lowest MAE for hourly prediction is 1.294 µg/m 3 for t 0 and 3.752 µg/m 3 for t + 12 [57]. The XGBoost model can accurately predict the daily PM 2.5 (R 2 = 0.80, RMSE = 14.75 µg/m 3 ) [58]. It is confirmed that the forecasting of the RNN model chiefly depends on the input information. The MAE of the RNN model for PM 2.5 prediction is 8.4 [59]. The optimized LSTM model has good assessment criteria, with R 2 = 0.94, RMSE = 13.06 µg/m 3 , and MAE = 8.61 µg/m 3 [60]. The CNN for PM 2.5 prediction in Beijing has a R of 0.85, a RMSE of 40.83 µg/m 3 , and a MAE of 25.32 µg/m 3 [61]. The hybrid models are also widely used in PM 2.5 prediction. The R-square, RMSE, and MAE of the gated recurrent unit neural network based on the empirical mode decomposition (EMD-GRU) model are, respectively, 0.9852, 11.372 µg/m 3 , and 6.532 µg/m 3 . These values are better than the decision tree regressor (DTR), support vector machine (SVM), random forest (RF), recurrent neural networks (RNNs), gradient boosted decision trees (GBDTs), long short-term memory (LSTM), and gated recurrent unit neural network (GRU). These results prove that the EMD-GRU model has a better simulation result and stronger precision than ordinary ML or DL models [62]. CNN and LSTM are combined and utilized to forecast PM 2.5 concentration. The R 2 , RMSE, and MAE of CNN-LSTM are, respectively, 0.92157312, 24.22874 µg/m 3 , and 14.63446 µg/m 3 [63]. The 3D CNN-GRU model was utilized to predict the PM 2.5 level. Compared with LSTM, ANN, SVR, GRU, and autoregressive integrated moving average (ARIMA), it can obtain promising results; it estimated 78% (R 2 = 0.78) of PM 2.5 concentration changes in the coming day [64]. Compared with other related DL or solitary models, the hybrid MCD-ESN-PSO model has better prediction accuracy for PM 2.5 concentration in four cities of China [65]. Considering CNN and the gradient boosting machine (GBM) method, a mixed model for estimating the PM 2.5 concentration in Shanghai was established. The constructed CNN-GBM model has good estimation accuracy, with the RMSE of 10.02 [66]. The iDeepAir model can accomplish better simulating and forecasting performance than the Seq2Seq, gradient boosting regression tree (GBRT), dual-stage attention-based recurrent neural network (DA-RNN), LSTM model and other DL models. Specifically, compared to ARIMA, iDeepAir could decrease the MAE from 25.36 µg/m 3 to 12.28 µg/m 3 [67]. Compared with the traditional land use regression (LUR) and SVM models, the prediction accuracy of the combined genetic algorithm and support vector machine (GA-SVM) method for PM 2.5 concentration is significantly improved, with a validation determination coefficient (R 2 ) of 0.84, and a lower RMSE and an MAE of 12.1 µg/m 3 and 10.07 µg/m 3 , respectively [68].
Compared with the outcomes of other PM 2.5 prediction models, our WANN model is in the upper middle position. Because each model has advantages and disadvantages, and different regions require different models, it is necessary to develop general artificial intelligence. Artificial intelligence could have decision-making processes that are very difficult to explicate with current knowledge. In addition, the application of the R value based on the correlation analysis method for variable selection keep more important information for prediction and shorten the model running time. MINAT(t), MINAP(t), and MAXAT(t) are input parameters for prediction at most levels. The main influencing factors of the low-level detail series are precursors, while the approximation series is affected by meteorological conditions and the accumulated PM 2.5 . The WT method improves the predictive performance of the ANN significantly.

Conclusions
In this study, we study the ML modeling technology on small data sets. The results prove that WANNs perfected the property of the regression model. Generally, the property of the WANNs was better than that of the ANNs in this work. The training algorithm trainbr avoids overfitting; consequently, a more powerful model could be established. These models have very different numbers of inputs (such as 17 versus 5), so their predicted results are different. When the input variables are the same, they are comparable (such as WANN1 versus ANN1).
The prediction methods of the PM 2.5 concentration make use of meteorological elements. There is an intimate relation between the meteorological elements and PM 2.5 concentration. Moreover, the relationship between meteorological elements and PM 2.5 concentration is nonlinear. The important information obtained from meteorological elements is extracted by discrete wavelet transform (DWT). ANNs and WANNs have flexible mathematical structures and can map highly nonlinear relationships between meteorological elements and PM 2.5 concentration. The performance results of WANNs are better than those of ANNs in Shanghai. Most WANN models have success in predicting PM 2.5 concentration.
The severe COVID-19 epidemic has had an unprecedented impact on PM 2.5 concentration in Shanghai. The air quality in Shanghai during lockdown is apparently better than before lockdown. The air quality during lockdown in 2020 was apparently improved compared with those in the same period of 2019.
We examined the practicability of utilizing artificial intelligence with meteorological elements as input factors to forecast the coming day's PM 2.5 concentration. The performance results of the ANNs and WANNs are evaluated using three criteria. A simple WANN model with 17 elements as input variables is used as a reference case. The accurate prediction ability of the WANN model is also proved.
China has formulated the grand goal of carbon neutrality and pollution reduction. In this paper, we only use ANN and WANN for daily PM 2.5 prediction and consider the meteorological elements and PM 2.5 concentration of the last 3 days as predictors. In the future, in order to further improve the effectiveness of future forecasting, we will use deep learning and hybrid models to predict PM 2.