Establishment of a Combined Model for Ozone Concentration Simulation with Stepwise Regression Analysis and Artificial Neural Network

With the development of industrialization and the increase in the number of motor vehicles in megacities in China, ozone pollution has become a prominent problem. Although different models have been used on ozone concentration simulation, the accuracy of different models still varies. In this study, the performance of two models including a linear stepwise regression (SR) model and a non-linear artificial neural network (ANN) model on the simulation of ozone concentration were analyzed in the Jing-Jin-Ji region, which is one of the most polluted areas in China. Results showed that the performance of the ANN model (adjusted R2 = 0.8299, RMSE = 22.87, MAE = 16.92) was better than the SR model (adjusted R2 = 0.7324, RMSE = 28.61, MAE = 22.30). The performance of the ANN on simulating an ozone pollution event was better than the SR model since a higher probability of detection (POD) and threat score (TS) values were obtained by the ANN model. The model performance for spring, autumn and winter was generally higher than that for summer, which may because the weights of factors on simulating high and low ozone concentrations were different. The method proposed by this study can be used in ozone concentration estimation.


Introduction
With the development of industrialization in megacities in China and the increase in the number of motor vehicles, ozone pollution has become a growing prominent problem [1]. Ozone in the troposphere is mainly produced by photochemical reactions of gaseous pollutants such as volatile organic compounds (VOCs), NO 2 and CO, which are emitted by motor vehicle exhaust and factories [2,3]. In addition, vegetation is also one of the most important sources of VOCs in urban environments [2]. VOCs react photochemically with nitrogen oxides (NO x ) and generate ozone in the presence of ultraviolet radiation [4,5]. Studies show that the increase in ozone concentration is harmful to human health, such as by means of inflammation of the respiratory system and dysfunction of the cardiovascular system [6]. However, the spatial resolution of air quality monitoring sites is relatively low in China. Thus, it is important to forecast ozone concentration accurately to protect human health.
Plenty of studies have shown that ozone precursors and meteorological conditions were important factors affecting ozone concentration [7][8][9][10]. The photochemical reaction rate of precursors will be strengthened under the condition of strong solar radiation and high temperature, thus increasing the ozone concentration [8][9][10]. On the contrary, rainfall and higher relative humidity always leads to a decrease in ozone concentration due to a decrease in photochemical production efficiency and an increase in wet deposition [7]. In addition, wind speed is another factor that affects ozone concentration. Normally, the increase in wind speed reduces O 3 concentration because high wind speeds are generally not conductive to local ozone concentration accumulation.
Due to the relatively low spatial accuracy of the ozone concentration monitoring network in China, researchers have explored different models to simulate ozone concentration. In previous studies, multiple linear regression model has been used to analyze the relationship between ozone and its affecting factors, including precursors and meteorological factors [11,12]. However, the simulation accuracy of this method is relatively low because the linear regression cannot fully explain the nonlinear reactions process of ozone formation in the air. Researchers have introduced the technique of machine learning into the ozone concentration prediction. Machine learning is one of the most commonly used artificial intelligence (AI) technologies to solve air pollution problems [13][14][15][16]. As one of the commonly used models of machine leaning, the artificial neural network (ANN) model solves complex nonlinear problems by imitating the structure and function of the human brain [17]. ANNs have good approximation performance and fast convergence speed, and can approximate any multivariable nonlinear function [18]. Compared with the linear regression model, ANNs have better simulation effects on the relationship of nonlinearity between air pollution concentration and different influencing factors [19,20]. Bandyopadhyay et al. used single hidden layer ANN models and multiple linear regression (MLR) models, respectively, to predict the average monthly total ozone concentration in Arosa, Switzerland. They found the ANN model overperformed the MLR model by using the method of error estimation and least squares to evaluate [21]. AlOmar et al. applied the wavelet transform (WT) approach to the ANN model; they compared the hybrid model (W-ANN) with classical ANN in predicting 1 h ahead ozone concentrations and found the W-ANN performed better than the ANN model [22]. ANN modeling combined with principal component analysis (PCA) was used to forecast ozone concentration by Al-Alawi et al. in the lower atmosphere [15]. They found that the R 2 between the real and predicted ozone values for the ANN, PCA, and the combined model were 0.986, 0.965, and 0.995, respectively. The combined model improves the prediction of ozone concentration. Gao et al. estimated O 3 _8h in Hebei province, China, by ANN using factors of NO 2 , CO, SO 2 , wind speed, temperature, pressure, visibility, precipitation, sunlight duration and boundary layer height. Results showed that ANN has good ozone estimation performance with R 2 of 0.80 [16].
In order to establish a model that simulates ozone concentration more accurately, the performance of two models including a linear stepwise regression (SR) model and nonlinear ANN model on simulating the daily average of the maximum 8 h moving average of O 3 concentration (O 3 _8h) in Jing-Jin-Ji region were compared using the concentrations of ozone precursors and meteorological factors. The result can provide a reference for making ozone pollution prevention and control measures, and is beneficial to the environment and public health.

Study Area
As shown in Figure 1, the research area of Jing-Jin-Ji region is located in the North China Plain and bordered by the Bohai Sea. The topography of Jing-Jin-Ji region is high in the northwest and low in the southeast. The region belongs to temperate monsoon climate. Summer is hot and humid, and winter is cold and dry. Rainfall is concentrated in the summer. The Jing-Jin-Ji region has an area of approximately 21,800 km 2 , which contains 13 major cities including Beijing, Tianjin, Shijiazhuang, Tangshan, Qinhuangdao, Handan, Baoding, Zhangjiakou, Chengde, Langfang, Cangzhou, Hengshui and Xingtai. Eleven cities expect Beijing and Tianjin belong to Hebei Province. As one of the world's fastest-developing economic zones, Jing-Jin-Ji region is mainly engaged in heavy industry and manufacturing. Along with economic development, air pollution has increased mainly due to emissions from automobile exhausts and industry.

Data Collection
The concentrations of air pollutants including the maximum 8 h moving average of O3 concentration (O3_8h), NO2, CO, PM10 and PM2.5 were collected from China National Environmental Monitoring Centre [23]. The data were collected from a total of 76 monitoring sites in the cities in Jing-Jin-Ji region (see Figure 1). The study period was from 1 January 2018 to 31 December 2020. In order to evaluate the effects of meteorological factors on ozone variance, factors including 2 m temperature (T2m, K), surface net solar radiation (SSR, J/m 2 ), total precipitation (TP, m), surface pressure (SP, Pa), boundary layer height (BLH, m), 10 m u-component of wind (U10, m/s) and 10 m v-component of wind (U10, m/s) were downloaded from European Centre for Medium-Range Weather Forecasts (ECMWF) [24]. The spatial resolution of the hourly meteorological data was 0.25° × 0.25°. When the monitoring site of air pollutants was located in a specific grid cell, we assigned the air pollution data to the cell and matched them with the corresponding meteorological data. Wind direction (WD, m/s) and wind speed (WS, m/s) were not provided by ECMWF and they were calculated by the Formulas (1) and (2): where, U10 and V10 are the u-component and v-component of wind at 10 m height.

Stepwise Regression Model
The model of SR is a type of multiple linear regression model, which can select the most appropriate combination of independent variables for dependent variable

Data Collection
The concentrations of air pollutants including the maximum 8 h moving average of O 3 concentration (O 3 _8h), NO 2 , CO, PM 10 and PM 2.5 were collected from China National Environmental Monitoring Centre [23]. The data were collected from a total of 76 monitoring sites in the cities in Jing-Jin-Ji region (see Figure 1). The study period was from 1 January 2018 to 31 December 2020. In order to evaluate the effects of meteorological factors on ozone variance, factors including 2 m temperature (T2m, K), surface net solar radiation (SSR, J/m 2 ), total precipitation (TP, m), surface pressure (SP, Pa), boundary layer height (BLH, m), 10 m u-component of wind (U10, m/s) and 10 m v-component of wind (U10, m/s) were downloaded from European Centre for Medium-Range Weather Forecasts (ECMWF) [24]. The spatial resolution of the hourly meteorological data was 0.25 • × 0.25 • . When the monitoring site of air pollutants was located in a specific grid cell, we assigned the air pollution data to the cell and matched them with the corresponding meteorological data. Wind direction (WD, m/s) and wind speed (WS, m/s) were not provided by ECMWF and they were calculated by the Formulas (1) and (2): where, U10 and V10 are the u-component and v-component of wind at 10 m height.

Stepwise Regression Model
The model of SR is a type of multiple linear regression model, which can select the most appropriate combination of independent variables for dependent variable prediction. In this study, we established a SR model with the O 3 _8h concentration as the dependent variable and the factors of NO 2 concentration, CO concentration, PM 10 and PM 2.5 concentrations, T2m, TP, SP, SSR, WS, WD and BLH as the independent variables. The equation of SR model is as follows: where, Y i is O 3 _8h concentration, β 0 , β 1 , . . . , β 10 are regression coefficients, X 1i , X 2i , . . . , X 10i are the independent variable discussed above, µ i is random variable. The software of SPSS (version 22.0.0.0, IBM Corp, Armonk, NY, USA) was used for SR analysis.

Artificial Neural Network Model
An ANN is a computing system that consists of multiple interconnected processing elements. The flexibility and validity of the model are desirable. Additionally, this model is less demanding for input dataset [25]. The structure of the ANN model is shown in Figure 2. The output layer is O 3 _8h concentration. Following the principle of parsimony, hidden layers should be as few as possible [26], thus, we selected different numbers of nodes including 3, 4 and 5 to find the model that works best. 70% of the data were selected as training samples to complete the self-learning process and to construct an optimal ANN structure, and the remaining 30% of the data were the test samples to evaluate the performance of the established model. The activation function of the hidden layer is selected as hyperbolic tangent (tanh) and S-shaped growth curve (sigmoid), which are commonly used in ozone prediction using an ANN model [27][28][29][30]; the formulas are as follows: where, Yi is O3_8h concentration, β0, β1,…, β10 are regression coefficients, X1i, X2i,…, X10i are the independent variable discussed above, μi is random variable. The software of SPSS (version 22.0.0.0, IBM Corp, Armonk, NY, USA) was used for SR analysis.

Artificial Neural Network Model
An ANN is a computing system that consists of multiple interconnected processing elements. The flexibility and validity of the model are desirable. Additionally, this model is less demanding for input dataset [25]. The structure of the ANN model is shown in Figure 2. The output layer is O3_8h concentration. Following the principle of parsimony, hidden layers should be as few as possible [26], thus, we selected different numbers of nodes including 3, 4 and 5 to find the model that works best. 70% of the data were selected as training samples to complete the self-learning process and to construct an optimal ANN structure, and the remaining 30% of the data were the test samples to evaluate the performance of the established model. The activation function of the hidden layer is selected as hyperbolic tangent (tanh) and S-shaped growth curve (sigmoid), which are commonly used in ozone prediction using an ANN model [27][28][29][30]; the formulas are as follows:

Model Validation
The root mean square error (RMSE), mean absolute error (MAE) and coefficient of determination (R 2 ) were used to evaluate the model performance. The formulas are as follows: where, n is the number of measurements; t i is the simulated value; y i is the true value; y i-average is the average of the true values. The value of R 2 gives an estimate of the relationship between movements of a dependent variable based on an independent variable's movements. It is between 0 and 1, and the closer it is to 1, the better the model fitting degree is. The MAE and the RMSE can be used to diagnose the variation in the errors in a set of forecasts. The greater difference between them, the greater the variance in the individual errors in the sample. The closer the values of the RMSE and MAE are to 0, the lower the error is and the better the model performance is.
In order to evaluate the performance of the model in predicting ozone pollution event when the concentration exceeds the limit value of 160 µg/m 3 , we calculated the probability of detection (POD), threat score (TS) and false alarm rate (FAR) of the two models. The higher the value of the POD and TS, the better the simulation is. The closer the FAR is to zero, the better the simulation is. The calculation formulas are as follows: where, A represents the days when both the monitored and simulated O 3 concentration are greater than 160 µg/m 3 ; B represents the number of days when the monitored O 3 concentration is greater than 160 µg/m 3 and the simulated value is less than 160 µg/m 3 ; C represents the number of days when the monitored O 3 concentration is less than 160 µg/m 3 and the simulated O 3 concentration is greater than 160 µg/m 3 . Figure 3 shows the variation trend of the daily maximum O 3 _8h concentrations in Beijing, Tianjin and Hebei province from 2018 to 2020. It can be seen that the annual distribution of ozone concentration presents a shape of a single peak. The ozone concentration reaches the maximum in summer, which is mainly due to the acceleration of photochemical reaction process by high temperature and intense solar radiation in summer [10]. A slight downward trend of ozone concentration was observed from 2018 to 2020 (the annual averaged ozone concentrations are shown in Table 1). This may be related to that the strict plan named "The 2017 work plan for air pollution prevention and control in Beijing, Tianjin, Hebei, and surrounding areas" issued by China in 2017 [31]. The areas were urged to promote the use of new energy sources and restructure industry structure, which reduced emissions of industrial pollutants and controlled NO x emissions.

Ozone Concentration in Jing-Jin-Ji Region
were urged to promote the use of new energy sources and restructure industry structure, which reduced emissions of industrial pollutants and controlled NOx emissions.      Figure 4 shows the distribution of ozone concentration in Jing-Jin-Ji region. Ozone concentrations in the southwestern Jing-Jin-Ji region were higher than those in the northeastern region. The reason is that the main industries in the southwestern area are steel, chemicals and coal, which are highly polluting industries [32].  Table 2.   As shown in Figure 5 and Table 3, the ozone concentration in Jing-Jin-Ji region showed a seasonal variation trend. Ozone concentration was the highest in summer, followed by spring and autumn, the concentration of ozone was the lowest in winter. The results were similar with the results reported by Cui et al., who analyzed ozone concentration in Beijing from 2013 to 2017 [33]. However, different seasonal variation characteristics of ozone concentration in Shanghai were reported by Li et al., with the highest ozone concentration in spring, followed by summer, autumn and winter [34]. The main reason is that Shanghai enters the "plum rain" season in July and August in summer, the overcast and rainy days lead to the decrease in solar radiation. Other changes in meteorological factors such as air humidity and temperature will affect the photochemical reaction of  As shown in Figure 5 and Table 3, the ozone concentration in Jing-Jin-Ji region showed a seasonal variation trend. Ozone concentration was the highest in summer, followed by spring and autumn, the concentration of ozone was the lowest in winter. The results were similar with the results reported by Cui et al., who analyzed ozone concentration in Beijing from 2013 to 2017 [33]. However, different seasonal variation characteristics of ozone concentration in Shanghai were reported by Li et al., with the highest ozone concentration in spring, followed by summer, autumn and winter [34]. The main reason is that Shanghai enters the "plum rain" season in July and August in summer, the overcast and rainy days lead to the decrease in solar radiation. Other changes in meteorological factors such as air humidity and temperature will affect the photochemical reaction of ozone, resulting in low ozone concentration in the corresponding month. In addition, ozone deposited in urban parks and greenery in rainy days was also influenced O 3 variance in Shanghai [35]. Therefore, the rule of the highest ozone concentration in summer is not universally applicable in different areas. The influence of climate characteristics should also be considered. ozone, resulting in low ozone concentration in the corresponding month. In addition, ozone deposited in urban parks and greenery in rainy days was also influenced O3 variance in Shanghai [35]. Therefore, the rule of the highest ozone concentration in summer is not universally applicable in different areas. The influence of climate characteristics should also be considered.

Ozone Concentration Simulated by Stepwise Regression Model
The performance of the SR model in Beijing, Tianjin and Hebei province is shown in Table 4. The averaged adjusted R 2 , RMSE and MAE values were 0.7564, 26.82 and 20.72,

Ozone Concentration Simulated by Stepwise Regression Model
The performance of the SR model in Beijing, Tianjin and Hebei province is shown in Table 4. The averaged adjusted R 2 , RMSE and MAE values were 0.7564, 26.82 and 20.72, respectively. The performance of the SR model for the 11 cities in Hebei province is shown in Table S1 in the Supplementary Materials.  Figure 6 shows the correlation between the simulated O 3 _8h concentrations by the SR model and the real monitoring value of O 3 _8h. Liu evaluated the performance of the SR method on the simulation of ozone concentration using input factors including solar radiation, NO 2 and CO concentrations in Luwan district of Shanghai from May to August in 2017. Their results showed relatively higher errors (RMSE = 31.5 and MAE = 35.1) compared with the values obtained in this study, which is probably related to their smaller sample size and less input factors [36]. Zhang estimated the error and fitting degree between the simulated and the real O 3 value in Hong Kong in different seasons by using the method of multiple linear regression. The mean value of R 2 was 0.59 and the averaged RMSE was 25.9 [37]. In general, linear models such as the SR model perform poorly in predicting the nonlinear process of O 3 formation, and the performance of the linear model on ozone concentration estimation is highly related to the selected input parameters.
Atmosphere 2022, 13, x FOR PEER REVIEW 9 of 17 respectively. The performance of the SR model for the 11 cities in Hebei province is shown in Table S1 in the Supplementary Materials.  Figure 6 shows the correlation between the simulated O3_8h concentrations by the SR model and the real monitoring value of O3_8h. Liu evaluated the performance of the SR method on the simulation of ozone concentration using input factors including solar radiation, NO2 and CO concentrations in Luwan district of Shanghai from May to August in 2017. Their results showed relatively higher errors (RMSE = 31.5 and MAE = 35.1) compared with the values obtained in this study, which is probably related to their smaller sample size and less input factors [36]. Zhang estimated the error and fitting degree between the simulated and the real O3 value in Hong Kong in different seasons by using the method of multiple linear regression. The mean value of R 2 was 0.59 and the averaged RMSE was 25.9 [37]. In general, linear models such as the SR model perform poorly in predicting the nonlinear process of O3 formation, and the performance of the linear model on ozone concentration estimation is highly related to the selected input parameters.

Ozone Concentration Simulated by ANN Model
The parameters used in the input layer are the parameters adopted in the SR discussed above the selected input parameters for Beijing, Tianjin and Hebei are shown in

Ozone Concentration Simulated by ANN Model
The parameters used in the input layer are the parameters adopted in the SR discussed above the selected input parameters for Beijing, Tianjin and Hebei are shown in Table 5. The input parameters for 11 cities in Hebei province are shown in Table S2 in the Supplementary Materials. Different network structures including number of nodes in hidden layer (3, 4 and 5 nodes) and activation function type (tanh or sigmoid) were applied to obtain the optimal structure. After evaluation, the performance of the ANN was best when the hidden layer contained 5 nodes and the activation function was tanh (see Table 6 and Table S3 in the Supplementary Materials). Table 7 shows the performance of the ANN model in Beijing, Tianjin and Hebei province. The results also showed that the performance of the ANN model can be improved by increasing the number of nodes in the hidden layer regardless of the activation function used.  In addition, the activation function of tanh (averaged RMSE = 23.16, averaged MAE = 17.12 and averaged adjusted R 2 = 0.8306) can provide better results than those obtained by sigmoid function (averaged RMSE = 23.59, averaged MAE = 17.49 and averaged adjusted R 2 = 0.8244). The correlation between the real and simulated ozone value by the optimal ANN structure is shown in Figure 7. The ANN model with 5 hidden nodes and activation function of tanh was thus used to predict ozone concentrations in different seasons. The performance of the ANN in different seasons for Beijing, Tianjin and Hebei province is shown in Table 7 and Figure 8. The corresponding results for 11 cities in Hebei province are shown in Table S4 and Figure  S1 in the Supplementary Materials. Results indicated that the model performance in spring, autumn and winter is better, with lower simulating errors (averaged RMSE = 17.61, 17.68 and 9.31, averaged MAE = 12.02, 12.34 and 7.06, respectively, for spring, autumn and winter) and higher R 2 (averaged adjusted R 2 = 0.7851, 0.8379 and 0.7938, respectively, for spring, autumn and winter). The performance of the ANN model in summer (averaged RMSE = 26.40, averaged MAE = 20.08, averaged adjusted R 2 = 0.6564) was relatively poorer than the other seasons. Our results indicated that the ANN model had limitations in estimating O3 concentration peaks, which mainly appeared in summer. This is probably due to the fact that the weights of input factors in the ANN model in the situations of high and low ozone levels are different [16]. Zhang et al. also found that the performance of the ANN model on ozone prediction in summer was worse than the results in other seasons [38]. Xue et al. predicted ozone concentration through BP neural network model in Tianjin, China using input factors of PM2.5, PM10, CO, temperature and wind direction [39]. The R 2 was 0.597, which was lower than the R 2 of 0.695 in this study [39]. This is probably due to the small amount of data used in their study. Hoshyaripour et al. simulated O3 concen-  Table 7 and Figure 8. The corresponding results for 11 cities in Hebei province are shown in Table S4 and Figure S1 in the Supplementary Materials. Results indicated that the model performance in spring, autumn and winter is better, with lower simulating errors (averaged RMSE = 17.61, 17.68 and 9.31, averaged MAE = 12.02, 12.34 and 7.06, respectively, for spring, autumn and winter) and higher R 2 (averaged adjusted R 2 = 0.7851, 0.8379 and 0.7938, respectively, for spring, autumn and winter). The performance of the ANN model in summer (averaged RMSE = 26.40, averaged MAE = 20.08, averaged adjusted R 2 = 0.6564) was relatively poorer than the other seasons. Our results indicated that the ANN model had limitations in estimating O 3 concentration peaks, which mainly appeared in summer. This is probably due to the fact that the weights of input factors in the ANN model in the situations of high and low ozone levels are different [16]. Zhang et al. also found that the performance of the ANN model on ozone prediction in summer was worse than the results in other seasons [38]. Xue et al. predicted ozone concentration through BP neural network model in Tianjin, China using input factors of PM 2.5 , PM 10 , CO, temperature and wind direction [39]. The R 2 was 0.597, which was lower than the R 2 of 0.695 in this study [39]. This is probably due to the small amount of data used in their study. Hoshyaripour et al. simulated O 3 concentration in Sao Paulo, Brazil using the method of the preceding selection-neural network (FS-ANN). The results showed that the R 2 between real and simulated O 3 values at two sampling points was 0.70 and 0.56, respectively. The corresponding RMSE values were 8.12 and 7.76 [40].

Model Contrast
Based on the R 2 , the RMSE and MAE values obtained by the two models, the ANN was more suitable for ozone concentration estimation due to its excellent ability of learning the nonlinear relationship between input and output variables [41]. In addition, with the help of SR model by selecting the most influenced factors that affect ozone concentration, the running time of the ANN was reduced and the over fitting phenomenon caused by high dimensional parameter sets was also avoided [25,42,43]. The performance of the ANN in each season was better than those obtained by the SR model. To compare the performance of the two models on simulating O3 concentration that exceeds the limited value of 160 μg/m 3 , the POD, TS and FAR values were calculated (see Table 8). The POD and TS values in Beijing, Tianjin and Hebei province for the ANN were higher than those for the SR model, indicated that the ANN model had a stronger performance in predicting ozone exceedance. The FAR values for ANN and SR are similar. Among the 13 major cities in Jing-Jin-Ji region, only 3 cities' FAR values of ANN model are slightly higher than those of the SR model. Comparison of prediction performance using SR and ANN models for 11 cities in Hebei province was shown in Table S5 in the Supplementary Materials. To sum up, the overall performance of the ANN is better than the SR model in

Model Contrast
Based on the R 2 , the RMSE and MAE values obtained by the two models, the ANN was more suitable for ozone concentration estimation due to its excellent ability of learning the nonlinear relationship between input and output variables [41]. In addition, with the help of SR model by selecting the most influenced factors that affect ozone concentration, the running time of the ANN was reduced and the over fitting phenomenon caused by high dimensional parameter sets was also avoided [25,42,43]. The performance of the ANN in each season was better than those obtained by the SR model. To compare the performance of the two models on simulating O 3 concentration that exceeds the limited value of 160 µg/m 3 , the POD, TS and FAR values were calculated (see Table 8). The POD and TS values in Beijing, Tianjin and Hebei province for the ANN were higher than those for the SR model, indicated that the ANN model had a stronger performance in predicting ozone exceedance. The FAR values for ANN and SR are similar. Among the 13 major cities in Jing-Jin-Ji region, only 3 cities' FAR values of ANN model are slightly higher than those of the SR model. Comparison of prediction performance using SR and ANN models for 11 cities in Hebei province was shown in Table S5 in the Supplementary Materials. To sum up, the overall performance of the ANN is better than the SR model in simulating exceedance of ozone, thus the performance of the ANN on predicting ozone pollution events is better.

Conclusions
A downward trend of ozone concentration was found in the Jing-Jin-Ji region during 2018-2020. The seasonal variation trend of ozone concentration was obvious with the concentration peaks all obtained in summers. Air pollutants concentrations including NO 2 , CO, PM 10 , and PM 2.5 , and meteorological factors including T2m, SSR, TP, SP, BLH, WD, and WS were used as input parameters for SR and the ANN model for O 3 _8h concentration simulation. The correlation analysis between the real and simulated ozone concentration showed that values of R 2 , RMSE and MAE were 0.7324, 28.61, and 22.30, respectively, by the SR model. The hybrid model of SR and ANN models can significantly improve the simulation level of ozone estimation with R 2 increased to 0.8299 and RMSE and MAE decreased to 22.87 and 16.92, respectively. The results show that the nonlinear ANN model is better than the linear model on simulating ozone concentration. The model performance in spring, autumn and winter was generally higher than that in summer, which indicated that the ANN has limitation in estimating high concentration of O 3 that often occur in summer. The POD and TS values obtained by the ANN model were higher than those obtained by the SR model, indicating that the ANN model is better in forecasting ozone pollution events. The results of this study can provide a technical reference for using an ANN on predicting ozone concentration in other regions in China.
Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/atmos13091371/s1, Figure S1: Relationship between the simulated value of O 3 _8h and the actual monitored concentration of O 3 _8h in each season based on the neural network model with activation function of tanh; Table S1: Performance of O 3 concentration simulation using SR and ANN models; Table S2: Input parameters for ANN in Shijiazhuang, Baoding, Cangzhou, Chengde, Handan, Hengshui, Langfang, Qinhuangdao, Tangshan, Xingtai and Zhangjiakou; Table S3: Performance of ANN on O 3 concentration simulation with different network structures; Table S4: Performance of ANN on O 3 concentration simulation in different seasons using the optimal network structure; Table S5: Prediction performance using SR and ANN models.