Short-Term PM 2.5 Concentration Changes Prediction: A Comparison of Meteorological and Historical Data

: Machine learning is being extensively employed in the prediction of PM 2.5 concentrations. This study aims to compare the prediction accuracy of machine learning models for short-term PM 2.5 concentration changes and to ﬁnd a universal and robust model for both hourly and daily time scales. Five commonly used machine learning models were constructed, along with a stacking model consisting of Multivariable Linear Regression (MLR) as the meta-learner and the ensemble of Random Forest (RF), Extreme Gradient Boosting (XGBoost), and Light Gradient Boosting Machine (LightGBM) as the base learner models. The meteorological datasets and historical PM 2.5 concentration data with meteorological datasets were preprocessed and used to evaluate the model’s accuracy and stability across different time scales, including hourly and daily, using the coefﬁcient of determination (R 2 ), Root-Mean-Square Error (RMSE), and Mean Absolute Error (MAE). The results show that historical PM 2.5 concentration data are crucial for the prediction precision of the machine learning models. Speciﬁcally, on the meteorological datasets, the stacking model, XGboost, and RF had better performance for hourly prediction, and the stacking model, XGboost and LightGBM had better performance for daily prediction. On the historical PM 2.5 concentration data with meteorological datasets, the stacking model, LightGBM, and XGboost had better performance for hourly and daily datasets. Consequently, the stacking model outperformed individual models, with the XGBoost model being the best individual model to predict the PM 2.5 concentration based on meteorological data, and the LightGBM model being the best individual model to predict the PM 2.5 concentration using historical PM 2.5 data with meteorological datasets.

Because of the uneven distribution of air quality and meteorological stations, researchers have attempted to use meteorological factors for predicting PM 2.5 concentrations, and have found that meteorological data can serve as a valuable supplement for missing air quality data [11].Previous studies have demonstrated a close relationship between PM 2.5 concentrations and factors such as wind speed, wind direction, humidity, air pressure, and temperature [12].Some scholars have even utilized meteorological factors to predict changes in PM 2.5 concentrations [13].Some scholars proposed a hybrid spatiotemporal Land Use Regression (LUR) model system that combines Support Vector Regression (SVR), MLR, and the ST algorithm, which yielded good spatial prediction performances in almost all time panels [14].
However, these studies did not compare the prediction precision across both hourly and daily time scales.In addition, current research is still lacking regarding the development of a robust PM 2.5 concentration prediction model that can be applied simultaneously to both meteorological factor-based and historical data-based PM 2.5 concentration prediction.
Recently, in addition to the mechanism model [15][16][17], statistical models and machine learning methods have become the main methods for PM 2.5 concentration prediction.Statistical models, such as the grey prediction model [18,19] and Multiple Linear Regression (MLR) model [20], are commonly used for PM 2.5 concentration prediction.As for machine learning methods of PM 2.5 concentration prediction, they mainly include Support Vector Regression (SVR) [21][22][23], Random Forest (RF) [24][25][26][27], and Long-Short-Term Memory (LSTM) [28][29][30][31].Chen [27] demonstrated that the Random Forest approach can be used to estimate the daily concentrations of PM 2.5 across China.Zhai [31] proposed a Long-Short-Term Memory (LSTM) approach for predicting air quality, and the results indicated that this method presented better prediction performance than traditional methods in terms of the PM 2.5 , PM 10 , O 3 , NO 2 , SO 2 , and CO concentrations.But most studies simply used a single machine learning model to predict the PM 2.5 concentration.As such, the linear model has poor performance in processing nonlinear and large amounts of data, and neural networks have problems with overfitting, slow convergence speed, and poor generalization capabilities [32].Therefore, combined models [33][34][35][36][37][38] and hybrid models [22,[39][40][41][42][43] have been used to predict PM 2.5 concentrations for improving the prediction accuracy for the PM 2.5 concentration.But most of these combined models were mainly focused on weighted averages or a simple combination of data preprocessing, modeling, and optimization techniques, which did not have complimentary training and ignored robustness.A general stack ensemble algorithm that fully integrates multiple machine learning models has been used for PM 2.5 concentration prediction [44,45].Stacking integration technology finds the optimal combination of base learners by training advanced learners, and conducts cross-validation training on the base learners.Based on the output results of the base learners, secondary features are constructed to train the meta-learners [46].
As a combination model, Stacking technology can overcome the disadvantages of the limitations of using individual models by integrating multiple machine learning methods [47], and it has shown promising performance in various applications [46,48].However, in addition to model selection, many studies have faced challenges when dealing with datasets [49].Existing studies have only focused on using one dataset for PM 2.5 concentration prediction, meteorological data or historical pollution data [50], and the comparison and analysis of model performance using both datasets simultaneously are lacking.
The specific objectives of this study are threefold: (1) to develop a PM 2.5 concentration prediction model using stacking technology, (2) to analyze the potential of using meteorological datasets for PM 2.5 concentration prediction, and (3) to compare the prediction accuracy of using meteorological datasets and historical PM 2.5 concentration datasets with meteorological datasets.

Study Area and Datasets
Jiangxi Province (24.29 • N-30.04 • N,113.34 • E-118.28 • E) is located in southeastern China.It is an important node of the Yangtze River Economic Belt.In recent years, many high-energyconsuming and polluting enterprises have migrated to Jiangxi Province, resulting in the air quality in some areas of Jiangxi Province not meeting the national secondary standards (GB 3095-2012) [51].Many highly polluting enterprises have been relocated to Jiangxi as restricted environmental policies have been put in place in coastal provinces.
The datasets included hourly meteorological data and hourly air quality data from 2016 to 2018, in which historical meteorological data were derived from the China Meteorological Information Center website (http://data.cma.cn/(accessed on 15 April 2020)), and historical air quality data were derived from the China Environmental Monitoring Station (http://www.cnemc.cn/(accessed on 15 April 2020)).There are 91 meteorological stations covering all cities of Jiangxi Province, and there are 60 air quality monitoring stations located in the central city and industrial areas of the province.As the meteorological station and the air quality station are geographically different, it is necessary to match the distance according to the geographical location.After considering the actual matching situation of Jiangxi Province and relevant research conducted by scholars [52,53], the matching distance between the meteorological station and air quality station was set to 20 km.
In the process of station matching, the meteorological data of meteorological stations with more than one air quality station and the air quality data of the matched air quality stations were used as the research data; the meteorological stations were matched to nearby air quality stations within a distance threshold of 20 km [52,53].This matching approach was adopted because meteorological data have broader coverage, and the meteorological data from several air quality stations near a meteorological station are consistent [54].After matching, data from 17 meteorological stations and 57 air quality monitoring stations were used (the specific information of the site is shown in Tables S1 and S2). Figure 1 displays the distribution and the use of meteorological stations (represented by red dots and five-pointed stars) and air quality monitoring stations (represented by blue dots and five-pointed stars) in Jiangxi Province.The base map of Figure 1 was downloaded from the Jiangxi Provincial Geographic Information Public Service Platform (http://bnr.jiangxi.gov.cn/col/col45382/index.html(accessed on 15 April 2022)) without any modifications.For the selection of the input variable, considering the quality and completeness of the data in the Jiangxi area and related research on PM 2.5 concentration prediction [50,55], 10 variables (Table 1) were finally used as input variables for the prediction [56].

Data Preprocessing 2.2.1. Data Quality Control
Preprocessing historical data from air quality and meteorological stations is a crucial step in ensuring data accuracy and reliability.This involves identifying and removing abnormal or missing values, which can distort the data and lead to incorrect results.Specifically, if a certain meteorological data item is missing or abnormal, all data for that hour will be removed.Similarly, when preprocessing PM 2.5 historical data, not only are abnormal values removed, but data with PM 2.5 concentrations below 0 µg/m 3 or above 1000 µg/m 3 are excluded, as these values are considered outliers [57].This preprocessing is essential for conducting accurate and meaningful analyses of air quality and meteorological data.2: (1) The geographic location information of the meteorological and air quality stations is used to match the average PM 2.5 concentration of corresponding stations and obtain a simultaneous dataset; (2) Based on the simultaneous dataset, future 1-6 h time scale data are matched; (3) The daily average dataset is then calculated and obtained from the simultaneous dataset; (4) Finally, based on the daily average dataset, future 1-6-day time scale data are matched.
These steps are necessary to ensure that the datasets are appropriately matched and that the data analysis is reliable and accurate.These datasets are critical for conducting accurate and meaningful analyses of the PM 2.5 concentration and its relationship with meteorological factors.The large number of records in these datasets reflects the extensive data collection efforts and underscores the importance of data preprocessing and matching to ensure the reliability and validity of the data.

Normalization and Division of the Datasets
Before training, the data were normalized and divided into a training set (90%) and a test set (10%) [58] (Table 2).The construction process of an individual model is shown in Figure 3. First, the initial parameters of each algorithm were set based on their characteristics.Then, according to the specific situation of each machine learning algorithm, the parameter values and ranges were adjusted and set.Ten-fold cross-validation was used to select the optimal parameters [59].The hyperparameter optimization of all single models was implemented using the GridSearch method [55] in the Scikit-Learn library.All models used the Grid Search method in the Scikit-Learn library of Python 3.6 (Python Software Foundation, Fredericksburg, VA, USA) for parameter selection.(1) Random Forest model RF is modeled based on the Bootstrap idea, which has high prediction accuracy and overcomes over-fitting.It has been widely used in the applications of medicine, bioinformatics, and agriculture [60][61][62].The algorithm was implemented using the Random Forest Regressor method in the Scikit-Learn library, and the Gridsearch method was used to adjust the main parameters of the algorithm, such as n_estimator, soob_score, max_fetures, max_depth, min_samples_split, and min_samples_leaf.The parameter selection range and results are shown in Table S3.
(2) XGBoost model XGBoost is one of the boosting algorithms, which is commonly used in regression and classification, especially in text classification, customer behavior prediction, and advertising click-through rate prediction [63,64].The algorithm was implemented using the Xgb library in the Scikit-Learn library, and the main parameters of the algorithm booster, n_estimators, max_depth, min_child_weight, gamma, subsample, colasample_bytree, reg_alpha, and reg_lambda, were adjusted step-by-step using the Gridsearch method.The parameter selection range and results are shown in Table S4.
(3) LightGBM model LightGBM [65,66] is a framework developed by Microsoft that can be used for sorting, classification, regression, and many other machine learning tasks.The framework is a gradient boosting framework based on decision trees and has the advantages of distribution and high performance.The algorithm was implemented through the Scikit-Learn and LightGBM libraries, and the LGBRegressor method.The main parameters, boosting_type, n_estimators, max_depth, num_leaves, min_child_samples, min_child_weight, subsample, colsample_bytree, reg_lambda, and reg_alpha, were adjusted step-by-step using the Gridsearch method.The results are shown in Table S5.
(4) Stacking model Stacking technology is an integration algorithm using multiple lower-level learners for integration to obtain a high-level learner and overcome the shortcomings of a single model, achieving higher accuracy.The stacking model commonly consists of two layers.The first layer is the "base learner", and the input is the initial training set.The second layer "metalearner" takes the output of the first layer as the input data to train and obtain the final result.For the first-level model, a model with strong learning ability and diversification can greatly improve the overall effect of model prediction.For the second-layer model, the input features of the second layer "meta-learner" are obtained from the combination of output features calculated by the cross-validation calculation of the "base learner".The features are strongly correlated, and the input and output are linearly correlated.Therefore, a simple model with better stability is usually selected as the second-layer model. (

5) Adaboost model
The Adaboost model is an iterative algorithm that belongs to the family of boosting algorithms and is commonly used for classification and regression tasks.Its core purpose is to train different classifiers (weak classifiers) on the same training set and then combine them to form a stronger final classifier (strong classifier).
(6) DT model Decision Tree (DT) is a non-parametric supervised learning method for classification and regression.The model predicts the value of the target variable by learning simple decision rules inferred from the features of the data.DT is commonly used in operations research and decision analysis.
Many previous studies on PM 2.5 concentration prediction based on machine learning modes showed that the XGBoost model and RF model have high prediction accuracy and good performance [67][68][69].The LightGBM model has high prediction accuracy and efficiency in many applications [65,66].Therefore, in this study, considering the performance and modeling speed simultaneously, RF, XGBoost, and LightGBM were chosen as the "base learner" for the first layer, and Multiple Linear Regression (MLR) was chosen as the "meta-learner" for the second layer to construct a stacked ensemble model.The parameters of the XGBoost, RF, and LightGBM algorithms were determined during single model construction.The multiple linear regression model was implemented using the Scikit-Learn library's Linear Regression, without the need for parameter selection.And the results are shown in Table S6.The stacking model construction process is shown in Figure 4.The steps are as follows:  The parameters of the stacking model were determined by the following principles and methods: the parameters of XGBoost, RF, and LightGBM in the stacking ensemble model used were the same as the parameters used when the single model was constructed.The multiple linear regression model was implemented using the Linear Regression function of the Scikit-Learn library without parameter selection.

Model Evaluation
Three dimensionless indicators, R 2 , RMSE, and MAE [72,73], were used to evaluate these models.The calculation methods for each indicator are as follows: where n represents the number of data points; ym is the predicted result; yo is the real value; and ym and yo represent the average values of the predicted result and real value, respectively.Generally, the closer the R 2 value to 1 and the smaller the values of RMSE and MAE, the better the model's performance.

Current PM 2.5 Concentration Estimation
Meteorological data cannot directly reflect the value of the PM 2.5 concentration, so before using meteorological data to predict the PM 2.5 concentration, experiments were conducted to verify the stacking model and other individual models' precision by using meteorological data to estimate the current hourly PM 2.5 concentration and the current daily PM 2.5 concentration; the results are shown in Table 3.When estimating the current hourly PM 2.5 concentration, the accuracies of each model were good, with R 2 values above 0.8.From the perspective of comprehensive indicators, the stacking model performed the best.However, when using meteorological data to estimate the current daily PM 2.5 concentration, the accuracies of each model were moderate, and the average values of R 2 , RMSE, and MAE were 0.76, 12.63, and 9.00.The performance achieved when using meteorological data to estimate the hourly PM 2.5 concentration was better than that when using the current daily PM 2.5 concentration.Air quality monitoring stations were concentrated in the city center and industrial areas, and the spatial distribution was uneven, resulting in a lack of monitoring of PM 2.5 and other air pollutant concentrations in some areas.The use of meteorological data to estimate the PM 2.5 concentration can resolve PM 2.5 monitoring blank or missing data; and compared with the spatial interpolation method, the stacking model was more reliable and accurate.Based on the hourly dataset of meteorological factors, the prediction of the hourly PM 2.5 concentration in the future 1-6 h is shown in Figure 5.The X-axis represents the predicted value of the PM 2.5 concentration, while the Y-axis represents the measured value of the PM 2.5 concentration.There is a fitting function relationship between the predicted and measured values, R 2 , MAE, and RMSE in the upper-left corner.The blue dashed line in the middle of the figure is a 1:1 line, and the red solid line represents a fitting function line.If the red line appears above the blue line, it indicates that the predicted value is greater than the measured value, and vice versa, the measured value is greater than the predicted value.The farther the scatter points deviate from the 1:1 line, the greater the difference between the predicted and measured results.According to the relationship between the red function fitting line and the 1:1 line, it can be seen that the four models generally predicted higher results when the measured PM 2.5 concentration values were lower.When the measured PM 2.5 concentration value was high, the predicted results of the four models were slightly smaller.The performance of the stacking model was the best of the six models, although the R 2 value was 0.88 on the 4 h scale, while those of the rest were 0.89, the RMSE values were 9.49, 9.58, 9.52, 9.87, 9.79, and 9.32, and the MAE values were 6.10, 6.07, 6.15, 6.14, 6.17, and 6.12 respectively.The stacked ensemble model had an average increase of 0.9%, 5.3%, and 1.3% compared with the best-performing models in the base model at different time scales on this dataset.

Daily Average PM 2.5 Concentration Prediction
The prediction of the daily average PM 2.5 concentration in the future 6 days based on the meteorological datasets is shown in Figure 6.According to the relationship between the red function fitting line and the 1:1 line, it can be seen that the four models generally predicted higher results when the measured PM 2.5 concentration values were lower.When the measured PM 2.5 concentration value was high, the predicted results of the four models were slightly smaller.The fitting slopes of the four models were all less than 1, and the stacking model had the smallest deviation and was closest to the 1:1 line.The performance was optimal on different time scales, and the indexes' values were increased by 1.41%, 1.99%, 1.98%, 4.11%, 3.57%, and 3.52% on average compared with the single model.Compared with the model with the best performance in the base model on different time scales for this dataset, the stacking model improved the performance, with R 2 , RMSE, and MAE increasing by 3.1%, 3.6%, and 3.2%.

Model Stability Comparison Analysis
For model stability comparison, to increase the credibility of the research, in addition to comparing and evaluating the stacked ensemble model and its separate model, this study also added the common bagging integrate Adaboost model [74,75] and DT model [76,77] to the experiment.The changes in all models on the hourly and daily scales based on the meteorological datasets are shown in Figures 7 and 8.It can be seen that, with the increase in the time scale, the indicators of the Stacking, XGBoost, LightGBM, and RF models did not change significantly, while the Adaboost and DT models had obvious changing trends.
When utilizing meteorological data, XGBoost exhibited superior prediction accuracy for the PM 2.5 concentration compared with other individual models; among all models, the stacking model had the most stable trends and the best prediction performance.

Hourly PM 2.5 Concentration Prediction
The predicted results of the future 1-6 h PM 2.5 concentration values are shown in Figure 9.The stacking model performed better than the single models in predicting the PM 2.5 concentration in the future 1-6 h, and the value range of R 2 was 0.92-0.97.Among them, the best prediction performance was in the future 1 h (R 2 : 0.97, RMSE: 5.03, MAE: 2.91).The performance of the prediction model in the future 6 h was poorer than that of the other hourly predictions, with R 2 : 0.92, RMSE: 8.24, and MAE: 5.29.As the prediction time increased, R 2 , RMSE, and MAE decreased by an average of 1% and increased by 10% and 16% for each hour, respectively.That is, as the time increased, the performances gradually decreased.On this dataset, the stacking model had an average increase, with R 2 of 0.3%, RMSE of 6.3%, and MAE of 3%, compared with the single models that performed best in the base model at different time scales.

Daily Average PM 2.5 Concentration Prediction
Figure 10 displays the daily average PM 2.5 concentration prediction for the next 1-6 days.The stacking model outperformed the individual models in predicting the PM 2.5 concentrations for the next 1 day, with an R 2 value of 0.82.However, the prediction accuracy decreased for the next 2-6 days, with R 2 values ranging from 0.71 to 0.73.Despite this, the stacking model still performed better than all individual models across all time scales, with an average increase of 2.3% for R 2 , 2.4% for RMSE, and 3.2% for MAE, compared with the best of the individual models for different time scales.When using historical PM 2.5 data with meteorological datasets, LightGBM performed better than other individual models, and among all models, the stacking model still had the most stable trends and the best prediction performance.

Discussion
Many existing studies on PM 2.5 concentration prediction based on machine learning modes show that the XGBoost model and RF model have high prediction accuracy and good performance [67][68][69].Hourly PM 2.5 concentration forecasting can provide accurate PM 2.5 levels and pollution warnings for environmental agencies [56].But these studies only deployed the hourly prediction experiment, and there was no further discussion comparing the models' precision crossing different time scales, including hourly and daily.In our experiments, twelve meteorological datasets and twelve historical PM 2.5 data with meteorological datasets were preprocessed for the comparison of short-term PM 2.5 concentrations prediction, including the future 1 h to 6 h, and future 1 day to 6 days.The results demonstrate that the XGBoost model was the best individual model for predicting the PM 2.5 concentration based on meteorological data, with the LightGBM model being the best individual model to predict the PM 2.5 concentration using historical PM 2.5 data with meteorological datasets.In our previous work [56], variable importance analysis was conducted on the saved XGBoost model; the main parameters selected and their importance ranking were: RHU, TEM, PRS, GST, and WIN_S.Then, a seasonal analysis of the model's prediction accuracy was conducted, and the results showed that the prediction accuracy of XGBoost and other individual models performed poorly in the spring and summer, and performed better in the autumn and winter.The value of the PM 2.5 concentration in the summer was lower, while in the winter, the value of the PM 2.5 concentration was higher and there were multiple sources of pollution; people's coal-fired heating behavior is one of the biggest sources of PM 2.5 pollution [78], indicating that the model had a more stable prediction ability in polluted seasons.
The stacking model performed better than all individual models in predicting the PM 2.5 concentrations across different time scales on both hourly and daily datasets (as shown in Table 4).The model using historical PM 2.5 data with meteorological datasets performed better than those only using meteorological datasets, indicating the importance of historical pollutant concentrations in accurately predicting the PM 2.5 concentration [79].For hourly PM 2.5 concentration prediction, the R 2 , RMSE, and MAE indexes increased on average by 5.6%, 26.4%, and 27.9% respectively; meanwhile, for the average daily PM 2.5 concentration prediction, the R 2 , RMSE, and MAE indexes increased on average by 4.2%, 5.6%, and 7.1% respectively.The stacking model performed better in the hourly PM 2.5 concentration prediction than in the daily average PM 2.5 concentration prediction on the historical PM 2.5 concentration dataset.Additionally, the stacking model and all individual models showed high accuracy in hourly prediction, while for daily average prediction, only the daily average prediction for the next day had high accuracy.Certain regions may exhibit diverse pollution sources, including both point sources and area sources.Additionally, variations in climatic conditions across different areas can influence the trends in the PM 2.5 concentrations.Moreover, as the forecasting time increased, the impact of prediction factors on the changes in the PM 2.5 concentration weakened.The reason why the stacking model was better than the single model can be attributed to two aspects: (1) From the perspective of the choice of the base learner, RF, LightGBM, and XGBoost were chosen as the "base learner" for the stacking method in this study.The RF model handles nonlinear problems by integrating multiple decision trees, which resolves the shortcomings of neural networks that overfit easily, and makes up for the insufficiency of the support vector machine model's over-reliance on the user's choice of kernel function and the regression function needing to be set in advance [80].The LightGBM model uses gradient unilateral sampling and mutually exclusive feature bundling to improve the traditional GBDT algorithm, which improves the model's prediction speed and reduces the model's computational complexity [80,81].The XGBoost model performing a second-order Taylor expansion of the cost function and adding a regular term improves the training speed of the model and reduces the overfitting of the model.The stacking method achieves an optimal combination of "base learners" through training.(2) From the perspective of the stacking model's structure, the input of the "meta-learner" is the output of the "base learner", effectively preventing the over-fitting phenomenon caused by the repeated use of data, and fully combining the advantages of a single base learner model to build the model, which improves the model ensemble effect and model predictive capabilities on the whole.
Based on the results, it can be observed that the stacking model is not well-suited for long-time-series tasks for two reasons.Firstly, the complexity of the stacking model significantly increases in long-time-series tasks.As the number of stacked models increases, the complexity and computational resource requirements of the stacking model also increase.Additionally, training and optimizing the model become more challenging, as the model needs to learn and predict from a longer history of information in long-time-series data.Secondly, the issue of data lag arises in the stacking model.In the stacking model, each sub-model may only have access to past observations as input features, making it difficult to capture the relationship between current observations and future time points.This limitation hampers the predictive performance of the stacking model.
Furthermore, the stacking model is not limited to PM 2.5 prediction; it can also be applied to other tasks, such as solar radiation [82] and electricity price forecast [83].These tasks exhibit clear time-series characteristics and are influenced by multiple factors.To generate accurate prediction results, all tasks require the consideration of the impact of various relevant factors and the selection and construction of appropriate individual models to capture their relationships.

Conclusions
This study performed data preprocessing and variable selection based on meteorological and air quality data from 2016 to 2018 for Jiangxi Province, China.Then, five PM 2.5 concentration prediction models and a stacking model based on RF, XGBoost, and LightGBM algorithms were constructed, and the R 2 , RMSE, and MAE indicators were used to evaluate the predictive ability.
The models' prediction performance of historical PM 2.5 data with meteorological datasets was better than that when using only the meteorological data; additionally, when there were missing data from a ground PM 2.5 monitoring site, meteorological data could be used to predict PM 2.5 concentration changes.The stacking model performed better in predicting the PM 2.5 concentration at different time scales on two types of datasets than all individual models, and was less affected by time factors.The XGBoost model was the best individual model to predict the PM 2.5 concentration on meteorological data, and the LightGBM model was the best individual model to predict the PM 2.5 concentration on the historical PM 2.5 data with meteorological datasets.The results demonstrate that the stacking integration strategy combined the advantages of the RF, XGBoost, and LightGBM models to effectively improve the robustness of the PM 2.5 concentration prediction accuracy, and can be used in warning systems for short-term air pollution.
As the time scale increased, the prediction performance of all individual models and the stacking model gradually decreased, especially on daily time scales.For the stacking model, this is because stacked models exhibit higher data dependency for long-time-series data, and the complexity and computational resource requirements of models are also higher in long-time-series tasks.Therefore, it becomes challenging to train and optimize stacking models, making them more suitable for short-term prediction tasks.Hence, significant efforts will be made to develop a high-precision prediction model for medium and long-term PM 2.5 concentrations.

Figure 1 .
Figure 1.Distribution map of meteorological stations and air quality monitoring stations in the study area.

Figure 2 .
Figure 2. Matching process of station data.After matching and merging the data from each site, the final datasets were obtained.The dataset of meteorological factors estimating the PM 2.5 concentration included 419,147 items at the hourly scale and 18,001 records at the daily scale.The different time scale PM 2.5 concentrationestimation datasets included 414,163 records at the 1 h scale, 413,101 records at the 2 h scale, 412,101 records at the 3 h scale, 411,264 records at the 4 h scale, 410,741 records at the 5 h scale, 410,443 records at the 6 h scale, 17,783 records at the 1-day scale, 17,680 records at the 2-day scale, 17,608 records at the 3-day scale, 17,534 records at the 4-day scale, 17,465 records at the 5-day scale, and 17,391 records at the 6-day scale.These datasets are critical for conducting accurate and meaningful analyses of the PM 2.5 concentration and its relationship with meteorological factors.The large number of records in these datasets reflects the extensive data collection efforts and underscores the importance of data preprocessing and matching to ensure the reliability and validity of the data.

Figure 3 .
Figure 3.The processes in a single model's construction.

( 1 )
Obtain the original training set and the original test set; (2) Each model was trained using 5-fold cross-validation [70,71].First, divide the training set into 5 parts, select 4 of them as training data and leave 1 as test data.Every time the training data are used to train, the test data are predicted to obtain a prediction result, a, and the test set data are predicted by the trained model to obtain the test set prediction result, b.After five training processes, the five-times prediction result, a, was combined into one column as A, and the prediction result, b, of the five training processes are averaged as B. Finally, new datasets, A and B, were obtained, where the number of one-dimensional A is the same as the number of training sets; (3) The step shown in step (2) was used to train the RF model, the XGBoost model, and the LightGBM model, respectively; after that, 3 A and 3 B were generated.Then, by combining 3 A and the actual value of the original training set, the 3 B data, and the original test set, the actual value expansion obtained a new training set and a new test set, which were input into the "meta-learner"; (4) A multiple linear regression algorithm was used to train the new training set.The trained model was saved and the stacking model was then performed by inputting the new test set.

Figure 4 .
Figure 4.The progress of stacking model construction.

Figure 5 .
Figure 5. Scatter plot of predicted and measured results in the future 1-6 h.

Figure 6 .
Figure 6.Scatter plots of predicted and measured results for the future 1-6 days.

Figure 7 .
Figure 7.Comparison of the stability of different models for 1-6 h prediction based on meteorological datasets.

Figure 8 .
Figure 8.Comparison of the stability of different models for 1-6-day prediction based on meteorological datasets.

Figure 9 .
Figure 9. Scatter plots of predicted and measured results in the future 1-6 h.

Figure 10 .
Figure 10.Scatter plots of predicted and measured results in the future 1-6 days.

Figures 11 and 12
Figures 11 and 12 provide a comparison of the stability of all models on hourly and daily time scales.It can be observed that, as the time scale increased, the performance of all models gradually decreased, as evidenced by decreasing R 2 and increasing RMSE and MAE values.The stacking model exhibited the smoothest broken line and showed the smallest change, indicating its strong stability and lower susceptibility to the effect of time.Conversely, the Adaboost model displayed the steepest broken line, indicating the weakest stability among all models and the most obvious range of change.

Figure 11 .
Figure 11.Comparison of the stability of different models for 1-6 h prediction based on based on historical PM 2.5 concentration with meteorological datasets.

Figure 12 .
Figure 12.Comparison of the stability of different models for 1-6-day prediction based on historical PM 2.5 concentration with meteorological datasets.

Figure S1 :
Simulation results of LUR model based on multiple linear regression; Figure S2: Kriging interpolation results; Figure S3: Elevation Map of Jiangxi Province; Figure S4: Daily distribution of errors; Figure S5: Hourly distribution of errors; Figure S6: Daily distribution of errors; Figure S7: Hourly distribution of errors.

Table 2 .
Datasets and division.

Table 3 .
Use of meteorological factors to estimate current PM 2.5 concentration.

Table 4 .
The prediction performance of the stacking model based on two types of datasets.
: Meteorological stations; Table S2: Air quality stations; Table S3: /; Table S4: Parameter range and results of Random Forest model; Table S5: Parameter range and results of XGBoost model; Table S6: Parameter range and results of LightGBM model;