A Case Analysis of Dust Weather and Prediction of PM10 Concentration Based on Machine Learning at the Tibetan Plateau

Dust weather is common and disastrous at the Tibetan Plateau. This study selected a typical case of dust weather and analyzed its main development mechanism in the northeast of the Tibetan Plateau, then applied six machine learning methods and a time series regression model to predict PM10 concentration in this area. The results showed that: (1) The 24-h pressure change was positive when the front intruded on the surface; convergence of vector winds with a sudden drop in temperature and humidity led by a trough on 700 hPa; a “two troughs and one ridge” weather situation appeared on 500 hPa while the cold advection behind the trough was strong and a cyclone vorticity was formed in the east of Inner Mongolia. (2) The trajectory of air mass from the Hexi Corridor was the main air mass path influencing Xining City, in this case, since a significant lag in the peak of PM10 concentration appeared in Xining City when compared with Zhangye City. (3) The Multiple Linear Regression was not only timely and effective in predicting the PM10 concentration but had great abilities for anticipating the transition period of particle concentration and the appearance date of maximum values in such dust weather. (4) The MA and MP in the clean period were much lower than that in the dust period; the PM10 of Zhangye City as an eigenvalue played an important role in predicting the PM10 of Xining City even in clean periods. Different from dust periods, the prediction effect of Random Forest Optimized by Bayesian hyperparameter was superior to Multiple Linear Regression in clean periods.


Introduction
Dust weather is common and disastrous at the Tibetan Plateau. It not only seriously reduces the quality of the ecological environment but causes great damage to the social economy and human health [1][2][3][4]. Dust storm behavior varies based on five attributes that are wind speed, pressure, temperature, humidity, and surface type [5,6]. The dust events significantly increase the air pollutants PM 2.5 , CO, and O 3 , which are directly associated with the increase in the spread and severity of the pandemic [7]. Sandstorms are the primary source of air pollutants, dust, gases, fine particulate matter, and their long-distance transport [8,9]. The dust storms mainly occur in winter and early spring with high frequency, and the path moves gradually from south to north, which is closely coupled with the northward moving of the westerly jet from winter to spring over the Tibetan Plateau [10]. Past studies have shown that the Gobi Desert, Hexi Corridor, Qaidam Basin, Turpan Basin, and the edges of the Tarim and Junggar basins are identified as the main dust sources for the northern Tibetan Plateau [11,12]. The occurrences and developments of dust weather are closely related to cold air and geographical conditions; under the block of mountains, the PM 10 concentrations increase significantly [13,14]. The aerosol optical depth (AOD) data and relevant meteorological parameters can be used to analyze the sandstorm and evaluate the temporal-spatial distributions with the evolution process of dust [15][16][17][18][19].
The applications of artificial Intelligence technology in the field of meteorology are mainly about various machine learning methods, including weather forecast, meteorological services, medical meteorology, and agricultural meteorological products [20][21][22]. Machine learning includes computational methods for learning from complex data. In the classification and evaluation of predicting meteorological elements or air pollution, some machine learning methods show the superiority and stability of fitting in nonlinear fields [23]. The case-based reasoning (CBR) approach with other AI techniques can be effective in predicting dust storm events as well as help in establishing effective counter policies [24]. By capturing temporal dependencies in the time series data, the Long Short-Term Memory (LSTM) achieved the better results in forecasting PM 2.5 concentrations with low Root Mean Square Error, proving that this method can be effective for forecasting and controlling air pollution [25][26][27][28][29]. An online forecasting method based on Random Forest is proposed to predict the concentrations of three kinds of air pollutants (PM 2.5 , NO 2 , SO 2 ), 24 h in advance, which achieves state-of-art performance [30]. A novel hybrid approach based on two-stage decomposition embeds sample entropy and extreme learning machine to forecast the concentration of particulate matter (PM 10 and PM 2.5 ), which overcomes the difficulties caused by the randomness and non-stationarity of air pollutant data [31,32]. The dust assimilation forecasting model is used to quantify the impact of assimilation on forecasts of a severe Asian dust storm [33]. Due to the lack of monitoring data limited by equipment, the Autoregressive Integrated Moving Average Model (ARIMA) and Multiple Linear Regression Additive Model based on linear interpolation are better [34,35]. Analyzing the data using different machine learning models, including linear regression, Artificial Neural Networks, and Long Short-Term Memory recurrent neural networks, considering the accuracy and capability of each method, and using different models to predict unhealthy pollution levels, are very important [36,37]. Various machine learning and deep learning methods estimating the concentration of particulate matter need more validation on a global scale.
The purpose of this study is to find the most suitable machine learning method for predicting PM 10 in the northeast of the Tibetan Plateau. The structures are as follows: firstly, analyze the main development mechanisms by selecting a typical case of dust weather in the northeast of the Tibetan Plateau; secondly, apply the machine learning methods to the prediction of PM 10 concentration and compare the advantages or disadvantages of the two most important methods in this area; finally, compare the prediction effects of the two machine learning methods during the cleaning period and dust period. In order to better capture the time, area, and intensity of the strong sandstorms at the Tibetan plateau, scientific and reasonable prediction and analysis were used in the study to reduce the adverse impact of sandstorms on the living environment and social economy.

Data Sources and Integration
The meteorological data provided by the Qinghai Provincial Meteorological Bureau included three-hourly surface observations and upper data from December 2020 to March 2021, mainly including: 500 pressure level (500 hPa) geopotential height, 500 hPa temperature, 500 hPa relative humidity, 500 hPa wind speed and direction, 700 pressure level (700 hPa) geopotential height, 700 hPa temperature, 700 hPa relative humidity, 700 hPa wind speed and direction, surface temperature, surface 24-h pressure change, surface relative humidity, surface minimum visibility.
The PM 10 concentrations were from the national urban air quality real-time publishing platform of the China National Environmental Monitoring Centre, including the hourly and daily mean of PM 10 concentration in Xining and Zhangye cities from December 2020 to March 2021 and from January 2022 to April 2022.

Introduction to Machine Learning Methods
The Scikit-learn machine learning library was used, of which most of the functions could be divided into estimator and converter, and the estimator was a model used for the regression and prediction [38,39]. The samples were trained repeatedly by basic regression methods such as the Support Vector Machine (SVM), Multiple Linear Regression (MLR), K-Nearest Neighbor (KNN) in scikit-learn, and integrated methods, including the AdaBoost algorithm (Ada), Gradient Boosting Regression Tree (GBRT), and Random Forest (RF).
The Multiple Linear Regression (MLR) is one of the basic algorithms in machine learning methods. It adds more characteristic variables on the basis of univariate linear regression [40][41][42][43]. The regression model is as follows: In the equation, ( ) represents the dependent variable with as the parameter, , , , … , are the regression parameters to be solved, and , , … , are the characteristic variables.

Introduction to Machine Learning Methods
The Scikit-learn machine learning library was used, of which most of the functions could be divided into estimator and converter, and the estimator was a model used for the regression and prediction [38,39]. The samples were trained repeatedly by basic regression methods such as the Support Vector Machine (SVM), Multiple Linear Regression (MLR), K-Nearest Neighbor (KNN) in scikit-learn, and integrated methods, including the AdaBoost algorithm (Ada), Gradient Boosting Regression Tree (GBRT), and Random Forest (RF).
The Multiple Linear Regression (MLR) is one of the basic algorithms in machine learning methods. It adds more characteristic variables on the basis of univariate linear regression [40][41][42][43]. The regression model is as follows: In the equation, y θ (x) represents the dependent variable with θ as the parameter, θ 0 , θ 1 , θ 2 , . . . , θ n are the regression parameters to be solved, and x 1 , x 2 , . . . , x n are the characteristic variables.
The Random Forest (RF) algorithm compiles the information from multiple random trees (a group of decision trees) at the same time, and naturally incorporates the selection and interaction of eigenvalues in the learning process [44] so as to provide decision-making and selection for estimating pollutant concentration. The simulation variable values can be output by summarizing and averaging the individual simulations of all such composite trees. There are many hyperparameter optimization methods for Random Forests, in-cluding particle swarm optimization, genetic algorithm, differential evolution, Bayesian optimization, and so on. We selected Bayesian optimization in this study.
The Support Vector Machine (SVM) is a new statistical learning technique based on machine learning and generalization theories which use a hinge loss function to estimate the empirical risk and add a regularization term into the calculation process that can be solved non-linearly by the kernel method; it can be considered as a method to minimize the risk. Moreover, a generalization capability makes possible their application to modeling dynamical and non-linear data sets [45].
K-Nearest Neighbor (KNN) processes and classifies the data according to the distance between each site sample in the training and the verification sets and arranges all the distance values in order [46,47].
The AdaBoost integration algorithm reduces or increases the weight information by training samples each time, and then transfers the weight data to the next layer classifier for complex sample training [48].
The Gradient Boosting Regression Tree (GBRT) calculates the gradient direction of the residual reduction of each sample site by establishing multiple decision trees, obtains a decision tree composed of multiple leaf nodes, and obtains the gain of each leaf node for prediction [49].
The time series regression model used in this study was the Autoregressive Integrated Moving Average Model (ARIMA). The model transforms non-stationary time series into stationary time series, which is established by regressing the dependent variable to its lag value or the present and lag value of random error. This method was compared with the above six machine learning methods.

HYSPLIT Model
The HYSPLIT model (hybrid single particle Lagrangian integrated trajectory model) was developed jointly by the NOAA and Australian Meteorological Administration. It is used to calculate the simple air mass trajectory and simulate complex diffusion and deposition. The model can deal with a variety of physical processes and the transportation, diffusion, and settlement of pollutant emission sources with a variety of meteorological elements input fields [50].
Based on the HYSPLIT model, we set the parameters of the model and selected the GDAS (global data estimation system) meteorological data in March 2021 to analyze the air mass trajectory of dust weather in Xining City.

Air Quality Standards
This study evaluated and analyzed the air quality index (AQI) and PM 10 concentration strictly according to the "Ambient Air Quality Standards" implemented in China from 1 January 2016 [51]. The evaluation standards are shown in Table 1:

Valuation Method
In this study, the index to test the observed values and the predicted values were the Index of Agreement (I A ), Mean Absolute Error (M A ), and Mean Absolute Percentage Error (M P ). The calculation formulas were as follows: O i is the observed value, P i the predicted value, and O is the mean observed value. The closer I A is to 1, the more accurate the predicted values are. The lower the of M A and M P , the less error the predicted values.

Analysis of Dust Weather from
From the 14th to 20th March 2021, dust weather from north to south and west to east occurred in the northern area of China, with a wide range of influence and long duration, which was rated as the strongest dust in the past decade. This dust weather not only seriously interfered with the daily life and transportation of the public, causing casualties and disappearances in some areas, but also had a terrible impact on agriculture and animal husbandry. Figure 2 shows the terrain height of the Tibetan Plateau with the geographical locations of the Qaidam Basin, Qilian Mountains, Hexi Corridor, Inner Mongolia, and the main meteorological observation sites affected by this dust weather.

Analysis of Dust Weather from
From the 14th to 20th March 2021, dust weather from north to south and west to east occurred in the northern area of China, with a wide range of influence and long duration, which was rated as the strongest dust in the past decade. This dust weather not only seriously interfered with the daily life and transportation of the public, causing casualties and disappearances in some areas, but also had a terrible impact on agriculture and animal husbandry. Figure 2 shows the terrain height of the Tibetan Plateau with the geographical locations of the Qaidam Basin, Qilian Mountains, Hexi Corridor, Inner Mongolia, and the main meteorological observation sites affected by this dust weather.   In the urban agglomeration of the Hehuang Valley in the northeast of the Tibetan Plateau, dust began appearing at 05:00 on the 16th and ended at 20:00 on the 20th, including dustfall and blowing sand in Xining, Menyuan, Guide, and Tongren cities. The visibility in Xining City was the worst from 23:00 on the 18th to 05:00 on the 19th, which was only 2km. Dustfall and blowing sand appeared in Gangcha, Chaka, and Gonghe cities. In addition, sandstorms occurred in the Xiaozaohuo, Golmud, Nuomuhong, Dulan, and Wulan cities with visibility of less than 1 km, and the visibility of Xiaozaohuo at 23:00 on the 18th was as poor as 200 m.
The above showed that this dust weather lasted for a long time, which had strong intensity and a wide range of influence in the northeast of the Tibetan Plateau. Figure 4 shows the distribution of the surface visibility, 24-h pressure change isolines, and the surface wind field at 08:00 on 15th and 16th of March. On the 15th, the 24-h pressure changes were significantly positive in the Hexi Corridor of the Tibetan plateau on the north side and Inner Mongolia with 14 hPa; a wide range of dust weather had occurred, which reduced the visibility to less than 1km in this area. The 24-h pressure change gradient was very significant, showing that there was a cold front passing through the area, and the surface wind speeds were strong. Until 08:00 on the 16th, 24-h pressure change was positive and exceeded 3 hPa in the Hehuang Valley, showing the cold front had completely invaded and carried large amounts of dust, resulting in a rapid decrease in the visibility and increase in the AQI in the area. The southern and southwestern regions of the Tibetan Plateau were not affected by the dust weather due to the high altitude and steep terrain. In the urban agglomeration of the Hehuang Valley in the northeast of the Tibetan Plateau, dust began appearing at 05:00 on the 16th and ended at 20:00 on the 20th, including dustfall and blowing sand in Xining, Menyuan, Guide, and Tongren cities. The visibility in Xining City was the worst from 23:00 on the 18th to 05:00 on the 19th, which was only 2 km. Dustfall and blowing sand appeared in Gangcha, Chaka, and Gonghe cities. In addition, sandstorms occurred in the Xiaozaohuo, Golmud, Nuomuhong, Dulan, and Wulan cities with visibility of less than 1 km, and the visibility of Xiaozaohuo at 23:00 on the 18th was as poor as 200 m.

Surface Meteorological Conditions
The above showed that this dust weather lasted for a long time, which had strong intensity and a wide range of influence in the northeast of the Tibetan Plateau. Figure 4 shows the distribution of the surface visibility, 24-h pressure change isolines, and the surface wind field at 08:00 on 15th and 16th of March. On the 15th, the 24-h pressure changes were significantly positive in the Hexi Corridor of the Tibetan plateau on the north side and Inner Mongolia with 14 hPa; a wide range of dust weather had occurred, which reduced the visibility to less than 1km in this area. The 24-h pressure change gradient was very significant, showing that there was a cold front passing through the area, and the surface wind speeds were strong. Until 08:00 on the 16th, 24-h pressure change was positive and exceeded 3 hPa in the Hehuang Valley, showing the cold front had completely invaded and carried large amounts of dust, resulting in a rapid decrease in the visibility and increase in the AQI in the area. The southern and southwestern regions of the Tibetan Plateau were not affected by the dust weather due to the high altitude and steep terrain.  3.1.3. 700 hPa and 500 hPa Meteorological Conditions Figure 5 shows the 700 hPa meteorological conditions with geopotential height, temperature, wind field, and relative humidity. At 08:00 on March 16th, the contours and isotherms were denser than the previous day, and the isotherms moved southward causing significant cooling in most areas, especially affecting the Hehuang Valley where the temperatures at 700 hPa dropped to below 0 °C. It could be clearly observed that there was a trough deepening into the area that led to the convergence of vector winds. The distribution of relative humidity at 700 hPa was significantly lower than that of the previous day, especially at the northeastern Tibetan Plateau and the central Hexi Corridor where the relative humidity and temperatures decreased when the dust passed. Figure 6 shows the 500 hPa meteorological conditions with geopotential height, temperature, wind field, and relative humidity. At 08:00 on March 15th, the atmospheric circulation of "two troughs and one ridge" with a large meridional scale and a maximum wind speed exceeding 32 m/s occurred. The cold advection behind the trough was strong and a cyclone vorticity was formed in the east of Inner Mongolia.

Surface Meteorological Conditions
Such powerful dynamic transmission and cold advection at 700 hPa and 500 hPa would contribute to the formation and intensification of a cold front on the surface. The temperature difference between the upper strong cold advection and the surface was the main reason for the formation of the dry convective sandstorm.  Figure 5 shows the 700 hPa meteorological conditions with geopotential height, temperature, wind field, and relative humidity. At 08:00 on March 16th, the contours and isotherms were denser than the previous day, and the isotherms moved southward causing significant cooling in most areas, especially affecting the Hehuang Valley where the temperatures at 700 hPa dropped to below 0 • C. It could be clearly observed that there was a trough deepening into the area that led to the convergence of vector winds. The distribution of relative humidity at 700 hPa was significantly lower than that of the previous day, especially at the northeastern Tibetan Plateau and the central Hexi Corridor where the relative humidity and temperatures decreased when the dust passed.   Figure 6 shows the 500 hPa meteorological conditions with geopotential height, temperature, wind field, and relative humidity. At 08:00 on March 15th, the atmospheric circulation of "two troughs and one ridge" with a large meridional scale and a maximum wind speed exceeding 32 m/s occurred. The cold advection behind the trough was strong and a cyclone vorticity was formed in the east of Inner Mongolia.

HYSPLIT Backward Trajectory Model Analysis in Xining City
Trajectories were calculated using the NOAA-HYSPLIT model based on the meteorological field of March 2021. Xining City was selected as the starting point of the simulation to trace the air mass trajectory 48 h ahead at 08:00 every day. The simulation results showed that the main sources of air masses in Xining City included the westward path, the eastward path, the local source, and the northeast path of the backflow of the Hexi Corridor; 33.33% of them came from the local source and the easterly path, 16.67% came from the westward path, and 8.33% came from the Hexi Corridor (Figure 7).
The length of the air mass trajectory line of the westward path was the longest, which indicated that the wind force in this direction was strong, resulting in the air mass moving fast, and the air mass moved from high-altitude mountainous areas to low-altitude urban Such powerful dynamic transmission and cold advection at 700 hPa and 500 hPa would contribute to the formation and intensification of a cold front on the surface. The temperature difference between the upper strong cold advection and the surface was the main reason for the formation of the dry convective sandstorm.

HYSPLIT Backward Trajectory Model Analysis in Xining City
Trajectories were calculated using the NOAA-HYSPLIT model based on the meteorological field of March 2021. Xining City was selected as the starting point of the simulation to trace the air mass trajectory 48 h ahead at 08:00 every day. The simulation results showed that the main sources of air masses in Xining City included the westward path, the eastward path, the local source, and the northeast path of the backflow of the Hexi Corridor; 33.33% of them came from the local source and the easterly path, 16.67% came from the westward path, and 8.33% came from the Hexi Corridor (Figure 7).

FOR PEER REVIEW
10 of 19 areas in this direction, therefore, the pressure differences were large and there were few pollutants making the transmission and diffusion conditions more favorable. The shortest trace of local source or eastward path indicated that, under the influence of calm or mild easterly wind, the air mass moved slowly from the east or stayed in place. At the same time, considering the terrain of the Hehuang Valley, the difference of pressure was slight with stable atmospheric stabilization, which resulted in poor transmission and diffusion conditions. The trajectory of air mass of the Hexi Corridor was shorter than that of the west, which indicated that it was blocked by the eastern edge of the Qilian Mountains in the northeast, resulting in slower movement, and converging with the east air flow, moving slowly into Xining City or staying in place. Lines a and b represent the main air mass paths in Xining City affected by this dust weather, and poor atmospheric diffusion conditions led to large amounts of dust carried by the air mass which accumulated in place.

Analysis of Meteorological Elements in Xining City
Xining City, the provincial capital city in the northeast of the Tibetan Plateau, was directly affected by this dust weather. The 24-h pressure change gradually increased at 11:00, which changed from negative (−0.6 hPa) to positive (0.3 hPa). At 21:00, visibility  The length of the air mass trajectory line of the westward path was the longest, which indicated that the wind force in this direction was strong, resulting in the air mass moving fast, and the air mass moved from high-altitude mountainous areas to low-altitude urban areas in this direction, therefore, the pressure differences were large and there were few pollutants making the transmission and diffusion conditions more favorable. The shortest trace of local source or eastward path indicated that, under the influence of calm or mild easterly wind, the air mass moved slowly from the east or stayed in place. At the same time, considering the terrain of the Hehuang Valley, the difference of pressure was slight with stable atmospheric stabilization, which resulted in poor transmission and diffusion conditions. The trajectory of air mass of the Hexi Corridor was shorter than that of the west, which indicated that it was blocked by the eastern edge of the Qilian Mountains in the northeast, resulting in slower movement, and converging with the east air flow, moving slowly into Xining City or staying in place. Lines a and b represent the main air mass paths in Xining City affected by this dust weather, and poor atmospheric diffusion conditions led to large amounts of dust carried by the air mass which accumulated in place.

Analysis of Meteorological Elements in Xining City
Xining City, the provincial capital city in the northeast of the Tibetan Plateau, was directly affected by this dust weather. The 24-h pressure change gradually increased at 11:00, which changed from negative (−0.6 hPa) to positive (0.3 hPa). At 21:00, visibility decreased rapidly to 2 km, indicating that the cold air had reached Xining City and carried large amounts of dust. Since on the 16th, the 24-h pressure changes in Xining City were mainly positive, the surface temperature decreased, and the southeast wind speeds increased dayby-day (as shown in Figure 8). Until on the 20th, the temperatures began to rise day-by-day, the 24-h pressure changes decreased, and the visibility improved, indicating that the dust weather affecting Xining City was coming to an end.

Analysis of Meteorological Elements in Xining City
Xining City, the provincial capital city in the northeast of the Tibetan Plate directly affected by this dust weather. The 24-h pressure change gradually incre 11:00, which changed from negative (−0.6 hPa) to positive (0.3 hPa). At 21:00, v decreased rapidly to 2km, indicating that the cold air had reached Xining City and large amounts of dust. Since on the 16th, the 24-h pressure changes in Xining Ci mainly positive, the surface temperature decreased, and the southeast wind spe creased day-by-day (as shown in Figure 8). Until on the 20th, the temperatures b rise day-by-day, the 24-h pressure changes decreased, and the visibility improve cating that the dust weather affecting Xining City was coming to an end.  According to the structure of the T-lnp diagram (emagram) of Xining City at 08:00 on the 16th in Figure 9, the water vapor content of the whole layer over the area was poor, and two obvious inversion layers appeared between 700 hPa and 600 hPa, leading to the stable atmospheric stratification, which indicated that there were adverse meteorological conditions on the upper levels that were not conducive to the diffusion of pollutants.
According to the structure of the T-lnp diagram (emagram) of Xining City at 08:00 on the 16th in Figure 9, the water vapor content of the whole layer over the area was poor, and two obvious inversion layers appeared between 700 hPa and 600 hPa, leading to the stable atmospheric stratification, which indicated that there were adverse meteorological conditions on the upper levels that were not conducive to the diffusion of pollutants.

Comparison of PM10 Concentrations between Xining and Zhangye Cities
Zhangye City was in the central Hexi corridor. The changes of surface PM10 concentration during this dust weather in Xining and Zhangye cities are shown in Figure 10. If Xining City was affected by the dust transport of path b, Zhangye City belonged to the upstream area of Xining City.
The PM10 concentration in Zhangye City reached the peak of 2190 μg/m3 at 16:00 on the 13th, while the PM10 concentration reached the peak of 475 μg/m 3 in Xining City at 11:00 on the 14th; the time differences between the two peaks were 19 h. At 13:00 on the 15th, the dust appeared again in Zhangye City, which was the main part of the dust weather. The PM10 concentration was 4975 μg/m 3 at 00:00 on the 16th in Zhangye City; at 01:00 on the 17th, a peak of 1691 μg/m3 appeared in Xining City, which was 25 h behind the peak of Zhangye City. Due to the high altitude and tortuous terrain with the obstruction of buildings in the Hehuang Valley compared with the Hexi Corridor, most of the dust diffused, settled, and was removed through backflow transportation. Although the PM10 concentration in Xining City was much lower than that in Zhangye City, the AQI grades were still at level Ⅵ for many days, and the air quality was significantly polluted.
Comparing the PM10 concentration between the two places and analyzing the time differences could better predict the particulate pollution from this dust weather in Xining City. The above showed that in such weather, the occurrence time of dust pollution in Xining City presented an obvious lag effect compared with Zhangye City. Therefore, when the machine learning methods were used to predict the PM10 concentration in Xining City, the PM10 historical data of Zhangye City could be extracted as one of the eigenvalues of the training and verification sets in order to achieve the prediction effect.

Comparison of PM 10 Concentrations between Xining and Zhangye Cities
Zhangye City was in the central Hexi corridor. The changes of surface PM 10 concentration during this dust weather in Xining and Zhangye cities are shown in Figure 10. If Xining City was affected by the dust transport of path b, Zhangye City belonged to the upstream area of Xining City.

Eigenvalues Selection
Through the analysis of the dust weather in the third section, it was found that when the dust occurred in Xining City, the surface 24-h pressure change, 700 hPa relative humidity, and 700 hPa temperature almost changed synchronously with the dust; the wind speed and direction between 600 hPa and 700 hPa played a decisive role in the transmission and accumulation of pollutants; according to the time lag effect of dust transport, the historical PM10 concentrations of Zhangye City in the Hexi Corridor could be used as data to predict PM10 in Xining city. Therefore, we selected the surface 24-h pressure change (∆P24), 700 hPa relative humidity, 700 hPa temperature, 700 hPa u-component of wind, 700 hPa v-component of wind, and Zhangye City's PM10 concentration as the eigenvalues of machine learning methods to predict Xining PM10 concentration. In order to achieve the purpose of prediction, we used all eigenvalues of the first day to predict the PM10 concentration of the second day in Xining. It was more convenient that grid forecast products from ECMWF of the first day had the prediction results for the meteorological elements of the second day, which could be directly used. The machine learning eigenvalues from December 2020 to March 2021 and from January 2022 to April 2022 were selected in Table  3.
We constructed the machine learning models and data sets to predict the daily mean change of PM10 concentration in Xining City during the dust pollution from March 13 to The PM 10 concentration in Zhangye City reached the peak of 2190 µg/m 3 at 16:00 on the 13th, while the PM 10 concentration reached the peak of 475 µg/m 3 in Xining City at 11:00 on the 14th; the time differences between the two peaks were 19 h. At 13:00 on the 15th, the dust appeared again in Zhangye City, which was the main part of the dust weather. The PM 10 concentration was 4975 µg/m 3 at 00:00 on the 16th in Zhangye City; at 01:00 on the 17th, a peak of 1691 µg/m 3 appeared in Xining City, which was 25 h behind the peak of Zhangye City. Due to the high altitude and tortuous terrain with the obstruction of buildings in the Hehuang Valley compared with the Hexi Corridor, most of the dust diffused, settled, and was removed through backflow transportation. Although the PM 10 concentration in Xining City was much lower than that in Zhangye City, the AQI grades were still at level VI for many days, and the air quality was significantly polluted.
Comparing the PM 10 concentration between the two places and analyzing the time differences could better predict the particulate pollution from this dust weather in Xining City. The above showed that in such weather, the occurrence time of dust pollution in Xining City presented an obvious lag effect compared with Zhangye City. Therefore, when the machine learning methods were used to predict the PM 10 concentration in Xining City, the PM 10 historical data of Zhangye City could be extracted as one of the eigenvalues of the training and verification sets in order to achieve the prediction effect.

Eigenvalues Selection
Through the analysis of the dust weather in the third section, it was found that when the dust occurred in Xining City, the surface 24-h pressure change, 700 hPa relative humidity, and 700 hPa temperature almost changed synchronously with the dust; the wind speed and direction between 600 hPa and 700 hPa played a decisive role in the transmission and accumulation of pollutants; according to the time lag effect of dust transport, the historical PM 10 concentrations of Zhangye City in the Hexi Corridor could be used as data to predict PM 10 in Xining city. Therefore, we selected the surface 24-h pressure change (∆P 24 ), 700 hPa relative humidity, 700 hPa temperature, 700 hPa u-component of wind, 700 hPa v-component of wind, and Zhangye City's PM 10 concentration as the eigenvalues of machine learning methods to predict Xining PM 10 concentration. In order to achieve the purpose of prediction, we used all eigenvalues of the first day to predict the PM 10 concentration of the second day in Xining. It was more convenient that grid forecast products from ECMWF of the first day had the prediction results for the meteorological elements of the second day, which could be directly used. The machine learning eigenvalues from December 2020 to March 2021 and from January 2022 to April 2022 were selected in Table 3. We constructed the machine learning models and data sets to predict the daily mean change of PM 10 concentration in Xining City during the dust pollution from 13 to 23 March 2021. In addition, the time series regression model was also constructed to compare with the machine learning models.

Methods Optimization
For the Random Forest model, tree depth (max depth) and tree number (n estimators) were the most important hyperparameters [53], hence, we used the Bayesian optimization algorithm to adjust these two hyperparameters. The core of the Bayesian optimization was to use prior knowledge to approximate the posterior distribution of the unknown objective function and then select the next sampling hyperparametric combination according to the distribution [54]. We built a Bayesian optimizer and performed optimization iterations through importing the Bayesian Optimization of Python. The optimized max depth was 23 and n estimators was 1144.
For Multiple Linear Regression, overfitting often affected the accuracy of results. According to the definition of the Least Square Method, it was obvious that when the independent variables were added into the model, the fitting residual was smaller. However, when there were too many independent variables, the collinearity of the matrix was high, and the variance became larger resulting in overfitting of the model. In this study, the independent variables were input into the Multiple Linear Regression model in turn, and finally the Equation V was selected with the minimum Mean Absolute Error of the training set (Table 4). In addition, we also found that the PM 10 concentration in Zhangye City was the most important eigenvalue. Y was the dependent variable (PM10 concentration in Xining City), X 1 , X 2 , . . . , X 6 were independent variables (X 1 was PM 10 concentration in Zhangye City, X 2 was u-component of wind at 700 hPa, X 3 was v-component of wind at 700 hPa, X 4 was temperature at 700 hPa, X 5 was relative humidity at 700 hPa, X 6 was ∆P 24 on surface).

Prediction Results of PM 10 Concentration in the Dust Weather
Six machine learning methods and a time series regression (ARIMA) were used to establish the models for the prediction. In addition, the Index of Agreement (I A ), correlation coefficient, Mean Absolute Error (M A ), and Mean Absolute Percentage Error (M P ) were used to evaluate the prediction effect; and the advantages and disadvantages of all methods were compared to select the best method.
The Index of Agreement between the predicted values and the observed values are shown in Table 5. The Multiple Linear Regression algorithm could better predict the daily mean changes of PM 10 concentration with the lowest Mean Absolute Error from the 13th to 23rd; the Index of Agreement was as high as 0.83, and the correlation coefficient was as high as 0.93. The Index of Agreement of RF and GBRT were lower than 0.4, but the correlation coefficients were higher than 0.5; the Mean Absolute Error of Random Forest was second only to Multiple Linear Regression. The Index of Agreement of Support Vector Machine algorithm was the lowest, only 0.34, and the correlation coefficient was negative, indicating that the predicted values of this method had the opposite trend with the observed values, which showed that the prediction ability in such dust pollution was the worst. The KNN, AdaBoost, and ARIMA was poor on predicting PM 10 concentration in the dust weather, and the Index of Agreement of ARIMA was only 0.34. It showed that the prediction of PM 10 concentration with the Multiple Linear Regression in Xining City was better than other nonlinear models by these selected independent variables.  Figure 11a shows the trend of PM 10 concentration predicted by the Multiple Linear Regression algorithm during the dust weather in Xining City, which had a great corresponding relationship with the observed daily mean values. During the period that the dust did not arrive in Xining City between the 13th and 15th, the observed PM 10 concentrations were low, the predicted values on the 13th were basically consistent with the observed values; although the predicted values on the 14th and 15th were higher than the observed values, the deviations were relatively minimal in the model. During these three days, the prediction effect of Random Forest in Figure 11b was better than the Multiple Linear Regression because the predicted values were more consistent with the observed values.

Prediction Results of PM10 Concentration in Clean Period
In order to compare with the dust weather, we selected the clean period without dust from the 16th to the 24th of April, 2022 and used Multiple Linear Regression and Random Forecast to predict the PM10 concentration again. As shown in Table 6, the Mean Absolute Error (11 and 14) and Mean Absolute Percentage Error (16% and 20%) of the two methods during the clean period were much lower than that in the dust period; the Index of Agreement and correlation coefficient were also higher. In addition, if the PM10 of Zhangye City was removed from the eigenvalues, the Mean Absolute Error of Random Forest and Multiple Linear Regression increased to 17 and 23, respectively, which indicated that even during the clean period, the PM10 of Zhangye City still played an important role in predicting the PM10 concentration of Xining City.
The Index of Agreement of Random Forest was higher, and Mean Absolute Error and Mean Absolute Percentage Error were lower, than Multiple Linear Regression. Compared with Multiple Linear Regression, the predicted values of Random Forest were more consistent with the observed values during the clean period ( Figure 12).  On the 16th, large amounts of dust arrived in Xining City, and the observed PM 10 concentrations increased sharply, with a daily mean of 959 µg/m 3 ; the AQI grade was at level VI, and the air quality was significantly polluted with the PM 10 concentration exceeding 420 µg/m 3 . The Multiple Linear Regression algorithm could predict the PM 10 concentration on this day more effectively. Although the predicted value was lower than the observed value, which was 596 µg/m 3 , the AQI grade of the predicted value was also at level VI with serious pollution, which proved that this method could make effective predictions for such turning weather.
On the 17th, the daily mean of the observed value of PM 10 concentration reached the peak, and the predicted value with Multiple Linear Regression also exceeded 1000 µg/m 3 . There was a consistent corresponding relationship between the observed and predicted values, showing that this method had a satisfactory effect in predicting the maximum value, while the other five machine learning methods did not predict the turning trend and the peak of pollution. From the 18th to the 21st, the predicted value changes were consistent with the observed values, which also decreased day-by-day and the trend was more obvious. The observed value on the 18th was 1276 µg/m 3 , though the predicted value had been reduced to 594 µg/m 3 with large deviation, the AQI grade of the predicted value was still at level VI.
During the period from 16th to 21st, the prediction effect of the Random Forest was far weak than that of Multiple Linear Regression, and the predicted values were far lower than the observed values. The Random Forest had strong generalization ability and stability, but it was not significant to the missing characteristics. In other words, it was insensitive to outliers. For multidimensional sparse data, the performance of the Random Forest was not ideal. Therefore, in this sudden dust weather, outliers appeared on a large scale, and there was no such feature tree in the training set, so the Random Forest was unable to predict the maximum PM 10 .
From the 22nd to 23rd, the predicted values were also consistent with the observed values with little deviation using the two methods. In addition, the AQI grades of the predicted values were at level II, and the air quality was significantly improved.
As a whole, the Multiple Linear Regression algorithm was not only timely and effective in predicting the PM 10 concentration but had great abilities for anticipating the transition period of particle concentration and the appearance date of maximum values in such dust weather.

Prediction Results of PM 10 Concentration in Clean Period
In order to compare with the dust weather, we selected the clean period without dust from the 16th to the 24th of April, 2022 and used Multiple Linear Regression and Random Forecast to predict the PM 10 concentration again. As shown in Table 6, the Mean Absolute Error (11 and 14) and Mean Absolute Percentage Error (16% and 20%) of the two methods during the clean period were much lower than that in the dust period; the Index of Agreement and correlation coefficient were also higher. In addition, if the PM 10 of Zhangye City was removed from the eigenvalues, the Mean Absolute Error of Random Forest and Multiple Linear Regression increased to 17 and 23, respectively, which indicated that even during the clean period, the PM 10 of Zhangye City still played an important role in predicting the PM 10 concentration of Xining City. The Index of Agreement of Random Forest was higher, and Mean Absolute Error and Mean Absolute Percentage Error were lower, than Multiple Linear Regression. Compared with Multiple Linear Regression, the predicted values of Random Forest were more consistent with the observed values during the clean period ( Figure 12).

Conclusions and Discussion
In view of dust weather affecting the northern Tibetan Plateau from March 14 to 20, 2021, we analyzed the main development mechanisms and explored the leading factors of the dust weather. Using the ECMWF prediction products and PM10 historical observation data with machine learning methods, the trend of PM10 concentration in Xining City during this dust weather was predicted.
(1) The main mechanisms influencing the dust were as follows: The 24-h pressure change was positive when the front intruded on the surface; the convergence of vector winds with a sudden drop in temperature and humidity led by a trough at 700 hPa; a "two troughs and one ridge" weather situation appeared at 500 hPa while the cold advection behind the trough was strong and a cyclone vorticity was formed in the east of Inner Mongolia; (2) The trajectory of air mass from the Hexi Corridor was the main air mass path influencing Xining City, in this case, since a significant lag in the peak of PM10 concentration appeared in Xining City when compared with Zhangye City; (3) The Multiple Linear Regression was not only timely and effective in predicting the PM10 concentration but had great abilities for anticipating the transition period of particle concentration and the appearance date of maximum values in such dust weather; (4) The MA and MP during the clean period were much lower than that during the dust period; the PM10 of Zhangye City as an eigenvalue played an important role in predicting the PM10 of Xining City even during the clean period. In contrast to the dust period, the prediction effect of Random Forest was superior to Multiple Linear Re-

Conclusions and Discussion
In view of dust weather affecting the northern Tibetan Plateau from 14 to 20 March 2021, we analyzed the main development mechanisms and explored the leading factors of the dust weather. Using the ECMWF prediction products and PM 10 historical observation data with machine learning methods, the trend of PM 10 concentration in Xining City during this dust weather was predicted.
(1) The main mechanisms influencing the dust were as follows: The 24-h pressure change was positive when the front intruded on the surface; the convergence of vector winds with a sudden drop in temperature and humidity led by a trough at 700 hPa; a "two troughs and one ridge" weather situation appeared at 500 hPa while the cold advection behind the trough was strong and a cyclone vorticity was formed in the east of Inner Mongolia; (2) The trajectory of air mass from the Hexi Corridor was the main air mass path influencing Xining City, in this case, since a significant lag in the peak of PM 10 concentration appeared in Xining City when compared with Zhangye City; (3) The Multiple Linear Regression was not only timely and effective in predicting the PM 10 concentration but had great abilities for anticipating the transition period of particle concentration and the appearance date of maximum values in such dust weather;