Research on Monthly Precipitation Prediction Based on the Least Square Support Vector Machine with Multi-Factor Integration

: Accurate precipitation prediction is of great signiﬁcance for regional ﬂood control and disaster mitigation. This study introduced a prediction model based on the least square support vector machine (LSSVM) optimized by the genetic algorithm (GA). The model was used to estimate the precipitation of each meteorological station over the source region of the Yellow River (SRYE) in China for 12 months. The Ensemble empirical mode decomposition (EEMD) method was used to select meteorological factors and realize precipitation prediction, without dependence on historical data as a training set. The prediction results were compared with each other, according to the determination coefﬁcient (R 2 ), mean absolute errors (MAE), and root mean square error (RMSE). The results show that sea surface temperature (SST) in the Niño 1 + 2 region exerts the largest inﬂuence on accuracy of the prediction model for precipitation in the SRYE (R 2SST = 0.856, RMSE SST = 19.648, MAE SST = 14.363). It is followed by the potential energy of gravity waves (Ep) and temperature (T) that have similar effects on precipitation prediction. The prediction accuracy is sensitive to altitude inﬂuences and accurate prediction results are easily obtained at high altitudes. This model provides a new and reliable research method for precipitation prediction in regions without historical data.


Introduction
The source region of the Yellow River (SRYE), located in the northeast of the Qinghai-Tibet Plateau, China, is a key region closely related to adverse weather events in the eastern, southern, and northern regions of China. The SRYE contributes a considerable amount of water to the lower reaches of the Yellow River, with an average annual water supply of 2 × 10 10 m 3 . The source region of the Yellow River is located on the edge of the East Asian monsoon; the inhomogeneity of the temporal and spatial distribution of precipitation has caused frequent occurrences of droughts and floods in the middle and lower reaches of the Yellow River. Accurate precipitation prediction is of great significance for flood control and disaster mitigation. However, precipitation has a high degree of nonlinearity, randomness, and complexity, and is affected by environmental factors, such as terrain, air pressure, circulation, etc., which lowers the accuracy of precipitation prediction. In recent years, with the rapid development of artificial intelligence technology, artificial intelligence methods have greatly improved the accuracy of precipitation forecasts. The main task of this research was to predict precipitation at 18 ground-based meteorological stations in the SRYE and quantify the influences of different driving forces of precipitation.
In recent years, data-driven methods based on artificial intelligence technology have been widely used to estimate and predict global precipitation. The support vector machine (SVM) learning algorithm, based on the statistical learning theory and the mapping theory of kernel function, is a typical machine-learning algorithm for small samples. It has been widely applied in the field of hydrology and water resources, such as runoff prediction and evaluation on water quality [1,2]. However, these models need (a lot of) accurate data for parameter calibration; it is difficult to quantify a temporal-spatial structure of rich precipitation with a long-term dependence. In fact, it is hard to obtain sufficient and accurate data in some regions not investigated, which leads to poor performance and high uncertainty of the models [3]. Other relatively mature methods, such as the least square support vector machine regression (LSSVR), are often used in hydrology and climate research [4][5][6][7]. They perform excellently in modeling complex processes and make up for the shortage of excessive noise in short data sets [8,9]. These advantages, of data-driven methods, make them suitable for modeling hydrological and climatic processes. Many studies have reported the successful application of data-driven methods in the modeling of hydrological and climatic processes, such as precipitation, rainfall runoff, runoff, evapotranspiration, water quality, and drought, revealing the capacity of these methods in modeling complex processes [10][11][12][13][14][15][16][17]. However, such methods do not usually involve the physical mechanisms and laws behind the data, so it is difficult to deeply analyze the hydrological and climatic processes [18]. To date, many studies have been carried out to compare the performances of different types of data-driven methods in modeling and forecasting of hydrometeorological variables [19,20]. The atmospheric circulation has a significant impact on precipitation, and the persistent circulation system will cause long-term precipitation events [21]. Previous research shows that the increase of carbon dioxide concentration and the atmospheric circulation affect the spatial pattern of precipitation over the tropical Pacific [22]. Using climate model projections in the 21st century, tropical circulation slows down [23], which leads to an increase in the growth rate of global average rainfall (every 2-3% temperature rise on the surface of the earth). The study demonstrates that North Atlantic Oscillation (NAO) index and Western Pacific Oscillation (WPO) index are negatively correlated with average precipitation over the Siberia and Central Asia [24]. Many numerical modes and statistical regression analysis methods have proven the role of the topographic effect on precipitation, and its influence on the distribution of precipitation has been researched from geographical and topographical factors [25][26][27][28][29][30][31]. A mountain range is the main cause of atmospheric gravity waves propagating to high altitudes. The waves generating by convection is further mixed in the lower troposphere, which is conducive to transporting water vapors to the upper layer of the Qinghai-Tibet Plateau. In particular, at the altitude of 17-24 km, gravity waves are mostly related to mountain waves [32]. The momentum transfer effect of topographical gravity waves is of great significance for atmospheric change and circulation over the Qinghai-Tibet Plateau, different peak values will produce propagating vertically waves with different wavelengths, and the potential energy of gravity waves mainly depends on wind component. To quantify the possible impacts of topography on precipitation, the potential energy of gravity waves, extracted as a significant factor, is integrated into the prediction model for precipitation to explore its effects on prediction results. In the precipitation prediction, the necessary and sufficient conditions of precipitation are rarely mentioned. Therefore, the precipitation prediction over the SRYE needs further quantitative understanding. Considering these, this study determined factors driving precipitation based on long-term precipitation data (1960-2016) from 18 meteorological stations in the study area. After normalizing the samples, the genetic algorithm (GA) was used to optimize the regularization parameter γ and kernel function σ of the least square support vector machine (LSSVM) to establish a LSSVM model. This model can accurately simulate precipitation. An overview of the study area and main research methods are described in Section 2. In Section 3, the results of the precipitation prediction are presented. Sections 4 and 5 present the discussion and conclusion, respectively.

Study Region and Data Collection
The SRYE, generally defined as the Yellow River basin upstream of Longyangxia reservoir (Qinghai Province, China), is located in the northeast of the Qinghai-Tibet Plateau (32.5 • -36.5 • N and 95 • -103.5 • E). It covers 6 prefectures and 18 counties in Qinghai, Sichuan, and the Gansu Provinces, and is deployed with 18 meteorological stations (Figure 1), with a drainage area of 121,972 km 2 . The region has a typical plateau continental climate, with alternate cold and hot seasons, as well as distinct dry and wet seasons, and has a long sunshine duration and strong radiation. The annual average temperature is −4.01 • C and the annual precipitation ranges from 350 to 750 mm, which decreases from the southeast to the northwest. Under the influence of monsoons, in summer and autumn (June to September) when precipitation is concentrated, southwest airflow from the Indian Ocean and warm-wet airflow from the Western Pacific are transported to the SRYE, forming weather with stable precipitation. The precipitation is distributed non-uniformly in time and space and changes largely interannually. The region is high altitude in the west and low altitude in the east, with an average altitude of 4500 m. The highest altitude is 6282 m (located in the Animaqing Mountain in the northwest of Maqin County, Guoluo Prefecture, Qinghai Province, China), and the lowest altitude is 2572 m (at the outlet of Longyangxia Reservoir).
of the study area and main research methods are described in Section 2. In Section 3, the results of the precipitation prediction are presented. Sections 4 and 5 present the discussion and conclusion, respectively.

Study Region and Data Collection
The SRYE, generally defined as the Yellow River basin upstream of Longyangxia reservoir (Qinghai Province, China), is located in the northeast of the Qinghai-Tibet Plateau (32.5°-36.5° N and 95°-103.5° E). It covers 6 prefectures and 18 counties in Qinghai, Sichuan, and the Gansu Provinces, and is deployed with 18 meteorological stations ( Figure  1), with a drainage area of 121,972 km 2 . The region has a typical plateau continental climate, with alternate cold and hot seasons, as well as distinct dry and wet seasons, and has a long sunshine duration and strong radiation. The annual average temperature is −4.01 °C and the annual precipitation ranges from 350 to 750 mm, which decreases from the southeast to the northwest. Under the influence of monsoons, in summer and autumn (June to September) when precipitation is concentrated, southwest airflow from the Indian Ocean and warm-wet airflow from the Western Pacific are transported to the SRYE, forming weather with stable precipitation. The precipitation is distributed non-uniformly in time and space and changes largely interannually. The region is high altitude in the west and low altitude in the east, with an average altitude of 4500 m. The highest altitude is 6282 m (located in the Animaqing Mountain in the northwest of Maqin County, Guoluo Prefecture, Qinghai Province, China), and the lowest altitude is 2572 m (at the outlet of Longyangxia Reservoir).  The meteorological data used in this study include (1) data of monthly precipitation and monthly average temperature from 18 meteorological stations in the SRYE over 57 years . The data are from the China Meteorological Administration (http: //data.cma.cn/, accessed on 5 July 2021) website, and data quality controls, including missing value inspections and extreme value tests, were strictly implemented. The study shows that the spatial distribution and number of meteorological stations determine accuracy of the prediction model. When there are two few meteorological stations, the performance of the model will be extremely weakened, while too many meteorological stations will not improve the simulation accuracy indefinitely [33]. Therefore, this study removed data from meteorological stations with large deviations and selected meteorological stations distributed around the SRYE, as many as possible, to represent the overall precipitation of the SRYE. Precipitation from 1960 to 2016, from 18 meteorological stations in the SRYE, is closely related to altitude of the stations. With the increase of altitude, precipitation over the SRYE firstly increases and then decreases, overall.

Ensemble Empirical Mode Decomposition
Ensemble empirical mode decomposition (EEMD) is a new time-series signal processing method proposed by Wu and Huang to overcome the shortcomings of empirical mode decomposition (EMD) [34]. Thus, we implemented this method to extract the in-depth characteristics of the precipitation series. The process of EEMD is shown as follows: (1) white noise series β i (t) following normal distribution is added into the original signal x(t), that is, where, x i (t) represents the signal after adding white noise at i times.
(2) By decomposing x i (t) with EMD, the j intrinsic mode function (IMF) IMF ij (t) and trend component Res i (t) are obtained.
(3) The ensemble averaging is performed for IMFs obtained from each decomposition, so that the added white noise offsets each other, thus obtaining the trend component Res(t) extracted by EEMD.
The IMF j (t) is shown as follows:

Extraction of Potential Energy of Gravity Waves
According to the linear theory of gravity waves, when only temperature is measured, the activity intensity of gravity waves can be characterized by potential energy [35]. The background temperature profile and temperature disturbance profile can be separated from the dry temperature profile at the COSMIC level 2. On this basis, the potential energy of gravity waves in the SRYE is calculated. The vertical interpolation, with a resolution of 200 m, is carried out in the range of 10-50 km for each profile at the monthly scale in the SRYE to eliminate temperature profiles exceeding the range of [−100, +10] • C. Through 3-sigma (3σ) criteria, the data are preprocessed and the processed temperature profiles are averaged and subjected to a moving average, thus obtaining the background temperature profile T B of the SRYE. Afterward, the temperature disturbance profile T can be obtained by subtracting the background temperature profile from the original temperature profile, and is detrended by quadratic fitting. By using a sixth-order Butterworth filter with a band-pass width of 2-10 km, other waves, except for gravity waves, are removed [36]. By substituting T and T B into Formula (5), the potential energy of gravity waves is calculated. The square of buoyancy frequency and potential energy of gravity waves are separately calculated through Formulas (4) and (5) [37].

LSSVM Optimized by GA
The SVM, created by Vapnik [38], solves the problems that traditional methods may be trapped in local minimums and requires trial and error with experience; it has been applied in many fields. The LSSVM inherits the basic idea of SVM, replaces the inequality constraints of traditional SVM with equality constraints, and takes the quadratic term in the error as a loss function. It not only solves the problem that the number of hidden layer nodes is difficult to determine, but also has high accuracy and calculation speed.
For a training set D = {(x i, y i )|i = 1, 2, . . . , n}, x ∈ R d , y ∈ R, the optimal decisionmaking function is constructed as follows [39]: where, ω T , ϕ(x) and b indicate the weight vector, linear mapping function and bias, respectively. The formula is transformed into the following regression and optimization problem for smooth approximation by using the structural risk minimization (SRM) criterion.
where, J, γ, and e represent the loss function, regularization parameter, and deviation, respectively. Based on the optimization theory, the Lagrange multiplier is introduced, so the Lagrange function of this problem is converted as follows: where, α k denotes the Lagrange multiplier. Let each partial derivative be zero, the optimization conditions are shown as follows: By eliminating e k and ω, the following linear equation set can be obtained.
Atmosphere 2021, 12, 1076 6 of 16 where, y = y 1 , . . . , y n T , e L = 1, . . . , 1, α = α 1 , . . . , α N T and Q = ϕ(x i ) T ϕ(x i ); I indicates the unit matrix. Let the kernel function be k(x i , y i )= ϕ(x i ) T ϕ(x i ), a and b can be derived according to Formula (14), so LSSVM regression function is presented as follows: GA is a method to search the optimal solution by simulating the natural evolution process [40]. Therefore, this study used GA to optimize the kernel function γ and regularization parameter σ of the LSSVM, thus building the GA-LSSVM model. The radial basis function (RBF) that is widely used was selected as the kernel function. The maximum generation, population size and range of probability of crossover were 200, 20, and 0.7-0.9, respectively.

Establishment of the Prediction Model for Precipitation
This study built a basic data model based on monthly precipitation data in 2008 and geographic data of 18 meteorological stations in the SRYE. In order to carry out the simulation under the same standard, the meteorological stations were used as a test set, and the other 17 stations were used as a training set. On this basis, meteorological factors were integrated to seek the effects of driving factors on prediction results at each meteorological station. After repeated calculations, the results obtained by the algorithm did not change with the order of the training set. To establish the LSSVM model for the relationship between precipitation and each factor, it is necessary to select appropriate parameters γ and σ to set the model. However, these parameters are difficult to directly select. This study adopted GA for optimization and the specific steps are shown as follows: 1.
γ and σ are randomly generated.

2.
The LSSVM model is trained by the normalized training samples and the fitness function is used as the objective function of GA.

3.
The samples are separately trained and verified. The global optimal solution is searched and the output is through iteration.

4.
The LSSVM model is constructed by using the searched global optimal solution (γ, σ).
The prediction scheme for precipitation integrating meteorological factors is summarized in Figure 2.
Step 1: the monthly precipitation data from 18 stations (1960-2016) were subjected to mean processing. In the EEMD model, the amplitude of white noise was set to 0.2 times the standard deviation of the sample data, and the maximum number of sifting iterations was set to 200. Through EEMD on data, eight IMF components and one residual term were obtained, namely IMF1-IMF8, and r.
Step 2: the time-delayed correlation coefficients between 20 meteorological factors (1960-2016) and monthly precipitation series and their decomposed series were calculated and two maximum correlation coefficients were selected as the key factors affecting precipitation.
Step 3: potential energy of gravity waves extracted from the dry-temperature profile at COSMIC level 2 (2006-2014) was regarded as a significant topographic factor influencing precipitation.
Step 4: the LSSVM model was built and the optimal parameter was searched through GA. The selected two key factors and potential energy of gravity waves were taken as input variables of the model, while precipitation was used as output variable.
Step 5: to reveal the influences of each factor on precipitation, the precipitation from each meteorological station was predicted by taking 17 meteorological stations as a training set and one meteorological station as a test set.
Step 6: the accuracy of the model was evaluated by the mean absolute error (MAE), root mean square error (RMSE), and R 2 , and the driving factors for precipitation were analyzed.

Analysis on Monthly Precipitation Series in Many Years Based on EEMD
The series of average precipitation in 1960-2016 and measured precipitation in meteorological stations in the SRYE were decomposed, step-by-step, by using the EEM method, obtaining eight IMFs, and one trend component (residue). To explore the avera oscillation period of subseries at different time scales in the series of monthly precipit

Analysis on Monthly Precipitation Series in Many Years Based on EEMD
The series of average precipitation in 1960-2016 and measured precipitation in 18 meteorological stations in the SRYE were decomposed, step-by-step, by using the EEMD method, obtaining eight IMFs, and one trend component (residue). To explore the average oscillation period of subseries at different time scales in the series of monthly precipitation, the average period of corresponding components was obtained by dividing the length of the series by the number of extreme points of each IMF component. As shown in Figure 3, at the monthly scale, the precipitation over the SRYE had a quasi-four-month (IMF1) climate variability. According to the statistics of IMF2-IMF6, precipitation presented a long period-an average period of quasi 12-134 months, showing an interannual variation of precipitation. The medians of the average periods of IMF7 and IMF8 are quasi-21 a and quasi-49 a, showing an interdecadal variation of precipitation. These IMFs include the periodic changes of external forcing of the climate system, as well as the nonlinear feedback effect of the climate system. Such periodic changes at different scales are not only affected by the multi-scale complex topography of the region, but also by the local atmospheric circulation system. These periods are the result of multiple influence factors.

Identification of Significant Meteorological Factors
To identify the significant correlation between monthly precipitation and meteorological factors in the SRYE, the time-delayed correlation analysis was conducted between each series of precipitation decomposed by EEMD and 20 meteorological factors; the 20 factors are shown in Table 1. The delayed period was set as 0-11 months, and the two factors with the largest correlation coefficients were selected for each component.  Table 2 shows the two largest correlation coefficients between components at all stations in the SRYE and corresponding meteorological factors. The number following the correlation coefficient represents the delayed period. All correlation coefficients pass the

Identification of Significant Meteorological Factors
To identify the significant correlation between monthly precipitation and meteorological factors in the SRYE, the time-delayed correlation analysis was conducted between each series of precipitation decomposed by EEMD and 20 meteorological factors; the 20 factors are shown in Table 1. The delayed period was set as 0-11 months, and the two factors with the largest correlation coefficients were selected for each component.  Table 2 shows the two largest correlation coefficients between components at all stations in the SRYE and corresponding meteorological factors. The number following the correlation coefficient represents the delayed period. All correlation coefficients pass the significance test at the significant level of 1% (except the correlation coefficient of IMF1 and NAO). As demonstrated in Table 2, the correlation coefficient between temperature (T) of the original data and IMF2 and SST in the Niño 1+2 delayed for four months is the largest and passed the significance test at the level of 1%. This indicates that the precipitation in the SRYE is significantly affected by the SST in the Niño 1+2 index, delayed for four months, and temperature (T) in the same period at different time scales. Therefore, SST in the Niño 1+2 index and temperature (T) are the main meteorological factors affecting monthly precipitation in the SRYE. Table 2. Meteorological factors with the highest correlation with the series of monthly precipitation and the EEMD decomposed series in the SRYE and correlation coefficients.

Component
Original Data IMF1 IMF2 IMF3 IMF4 Time-delayed correlation coefficient (the first two largest)

Analysis on the Correlation between Topographic Driving Factors and Precipitation
Gravity waves are excited by atmospheric convection and atmospheric motion in the Qinghai-Tibet Plateau [41]. In particular, gravity waves in the SRYE in the eastern part of the Qinghai-Tibet Plateau are more easily excited than in the western region. When gravity waves act, there is not (necessarily) a precipitation region, but there is usually at least one arc-shaped rain band or shower band before a wave trough comes. Not all gravity wave-induced events have a related precipitation system, but in the unstable atmosphere, gravity waves are one of the trigger mechanisms of rainstorms [42]. Since the gravity waves at an altitude of 17-24 km are mostly related to mountain waves [32], when processing the data of a gravity wave profile, the parts with elevations of 17-24 km on each profile are selected. The monthly variations of potential energy of gravity waves in the SRYE are obtained by averaging the potential energy of gravity waves at the elevation of 17-24 km after eliminating the value that the potential energy of gravity waves is too large (or smaller than zero). According to the calculation, the Pearson |r| between the potential energy of gravity waves and precipitation is 0.688, which shows a moderate correlation. In conclusion, the potential energy of gravity waves may have a certain correlation with precipitation. The weakest correlation is found between them, which is weaker than the correlations of SST in the Niño 1 + 2 region (r = 0.849) and surface temperature (r = 0.933) with precipitation.

Analysis on Simulation Results of Precipitation
SST in the Niño 1+2 (Hereinafter referred to as SST), temperature (T), and potential energy of gravity waves (Ep) have strong correlations with precipitation in physical mechanisms and correlation analysis. For this reason, the above three highly correlated factors are integrated into the prediction model for precipitation to explore their influences on the accuracy of the model. The test results of the GA-LSSVM model are displayed in Figure 4. The horizontal and vertical axes separately represent the measured and predicted data. Mirabbasi et al. [43] predicted monthly precipitations using the M5Tree model (MTM), multivariate adaptive regression spline (MARS), least square support vector regression (LSSVR), gene expressing programming (GEP), and artificial neural networks methods (ANNs). They used geographical information and rainfall data from 61 rain gauge stations in India. They divided the data into three sets, training, validation, and test. The test results show that for MTM, MARS, LSSVR, ANN, and GEP models, The RMSE ranges were 6.10-40.91 mm, 12.61-43.69 mm, 5.53-31.8 mm, 8.72-34.08 mm, and 26.07-58.91mm, respectively. The LSSVR model is better than other methods in the test stage. Kisi and Sanikhani [8] used adaptive neuro-fuzzy inference system (ANFIS), artificial neural networks (ANN), and support vector regression (SVR) models to predict long-term monthly precipitation in Iran. They found the lowest correlations as 0.696 (Sari station), 0.661 (Urmia station), and 0.785 (Bandar Lengeh station), and maximum correlations as 0.964 (Bam station), 0.944 (Fasa station), and 0.977 (Zabol and Tabas stations) for the ANFIS, SVM, and ANN models in the test stage, respectively. As can be seen from the above results, the applied models in this study provided relatively accurate results in modeling the precipitation of the SRYE.
The prediction indexes in each station are subjected to mean processing, and the results can be regarded as the simulation accuracy of precipitation over the whole SRYE.

Model Verification
In this research, the average curves of the best fitness and iterations of the GA-LSSVM model integrating three factors are shown in Figure 6. Each curve in the figure represents the optimization process under different conditions. It can be obtained that the optimal fitness of the three models obviously decreases and will not change after several evolutions, and the optimization results tend to converge. When the Ep-GA-LSSVM model evolves to the 120th generation, the optimal fitness function for optimization results gradually tends to be stable, at about 7.8, indicating that individuals are found near the optimal solution. The values of the optimal fitness functions of the T-GA-LSSVM and SST-GA-LSSVM models gradually stabilize at about 7.8 and 5.6 after 45 and 90 generations. When GA is used for optimization and training, the smaller the optimal fitness of individuals is, the higher the prediction accuracy. This suggests that the SST-GA-LSSVM model is better than the other two models in optimizing network parameters.

Model Verification
In this research, the average curves of the best fitness and iterations of the GA-LSSVM model integrating three factors are shown in Figure 6. Each curve in the figure represents the optimization process under different conditions. It can be obtained that the optimal fitness of the three models obviously decreases and will not change after several evolutions, and the optimization results tend to converge. When the Ep-GA-LSSVM model evolves to the 120th generation, the optimal fitness function for optimization results gradually tends to be stable, at about 7.8, indicating that individuals are found near the optimal solution. The values of the optimal fitness functions of the T-GA-LSSVM and SST-GA-LSSVM models gradually stabilize at about 7.8 and 5.6 after 45 and 90 generations. When GA is used for optimization and training, the smaller the optimal fitness of individuals is, the higher the prediction accuracy. This suggests that the SST-GA-LSSVM model is better than the other two models in optimizing network parameters.

Model Verification
In this research, the average curves of the best fitness and iterations of the GA-LSSVM model integrating three factors are shown in Figure 6. Each curve in the figure represents the optimization process under different conditions. It can be obtained that the optimal fitness of the three models obviously decreases and will not change after several evolutions, and the optimization results tend to converge. When the Ep-GA-LSSVM model evolves to the 120th generation, the optimal fitness function for optimization results gradually tends to be stable, at about 7.8, indicating that individuals are found near the optimal solution. The values of the optimal fitness functions of the T-GA-LSSVM and SST-GA-LSSVM models gradually stabilize at about 7.8 and 5.6 after 45 and 90 generations. When GA is used for optimization and training, the smaller the optimal fitness of individuals is, the higher the prediction accuracy. This suggests that the SST-GA-LSSVM model is better than the other two models in optimizing network parameters.

Discussions
There is an obvious non-linear relationship between precipitation and altitude, but few people have studied the relationship between precipitation prediction accuracy and altitude. Data-driven methods can be used to reasonably evaluate the information behind the data. Therefore, this study drew the schematic diagram for the relationship between the elevation of meteorological stations and prediction accuracy R 2 of each model (Figure 7). After integrating T, the prediction accuracy is improved most significantly. The accuracy of the prediction model for precipitation, integrating the three driving factors, increases with elevation overall. However, the prediction accuracy at the altitude of 3500 m does not confirm the overall trend. Such a law is reflected in the results of the three simulation methods. As shown in the figure, such a phenomenon may be correlated with distribution laws of precipitation with vertical gradients in this region.

Discussions
There is an obvious non-linear relationship between precipitation and altitude, but few people have studied the relationship between precipitation prediction accuracy and altitude. Data-driven methods can be used to reasonably evaluate the information behind the data. Therefore, this study drew the schematic diagram for the relationship between the elevation of meteorological stations and prediction accuracy R 2 of each model ( Figure  7). After integrating T, the prediction accuracy is improved most significantly. The accuracy of the prediction model for precipitation, integrating the three driving factors, increases with elevation overall. However, the prediction accuracy at the altitude of 3500 m does not confirm the overall trend. Such a law is reflected in the results of the three simulation methods. As shown in the figure, such a phenomenon may be correlated with distribution laws of precipitation with vertical gradients in this region. To reveal the possible relationship between precipitation and prediction accuracy of the model, this study established a scatter diagram for the relationship of precipitation with altitude and accuracy, R 2 . The relationship between precipitation and altitude of meteorological stations in the study area is shown in Figure 8. It can be observed that the regions with abundant precipitation (average annual precipitation > 600 mm) are concentrated near the altitude of 3500 m, where the prediction accuracy of precipitation is at a low level ( = 0.70). The regions with less precipitation (average annual precipitation < 400 mm) are concentrated in high-and low-altitude regions, which is consistent with the regions with abundant precipitation, showing low prediction accuracy ( = 0.71). However, the prediction accuracy is higher in the regions where the annual average precipitation is more than 400 mm and less than 600 mm ( = 0.86). Therefore, compared with the regions with high and low annual average precipitation, this study has a better applicability to the SRYE with moderate precipitation. To reveal the possible relationship between precipitation and prediction accuracy of the model, this study established a scatter diagram for the relationship of precipitation with altitude and accuracy, R 2 . The relationship between precipitation and altitude of meteorological stations in the study area is shown in Figure 8. It can be observed that the regions with abundant precipitation (average annual precipitation > 600 mm) are concentrated near the altitude of 3500 m, where the prediction accuracy of precipitation is at a low level (R 2 = 0.70). The regions with less precipitation (average annual precipitation < 400 mm) are concentrated in high-and low-altitude regions, which is consistent with the regions with abundant precipitation, showing low prediction accuracy (R 2 = 0.71). However, the prediction accuracy is higher in the regions where the annual average precipitation is more than 400 mm and less than 600 mm (R 2 = 0.86). Therefore, compared with the regions with high and low annual average precipitation, this study has a better applicability to the SRYE with moderate precipitation.

Conclusions
By integrating meteorological factors selected by EEMD and the potential energy of gravity waves representing topographic factors in the SRYE into the GA-LSSVM model for precipitation prediction, this study discussed the prediction effects of different factors in the estimation of long-term precipitation. The results demonstrate that the SST-GA-LSSVM model is the optimal model ( = 0.856, = 19.648, = 14.363) in the test stage and the simulation results of the Ep-GA-LSSVM model are similar to those of the T-GA-LSSVM model. From the perspectives of RMSE and MAE, in terms of accuracy, the SST-GA-LSSVM model improves by 24% and 23% compared with the other two models. According to the statistics of prediction results, R 2 of the three models, more than a half of meteorological stations exhibit R 2 over 80%.
Furthermore, this research analyzed the possible relationship of prediction accuracy R 2 of the model with altitude and annual average precipitation. The accuracy is sensitive to influences of altitude and accurate prediction results are easily obtained at high altitude. The SST-GA-LSSVM model can "fill the gap" of low prediction accuracy of other models at low altitudes. Using the altitude of 3500 m as the boundary, the prediction accuracy R 2 in the region with annual precipitation of 400-600 mm is about 10% higher than that in the regions with different precipitation.
In conclusion, this study proposes a prediction model for precipitation, combined with physical mechanisms, while being independent of historical precipitation data in the field of precipitation prediction. By integrating key factors driving precipitation, this study achieved high accuracy of prediction results, which is conducive toward understanding precipitation prediction. The relevant research results are favorable for flood prevention and risk management of water resources. It is worth noting that the framework and methods in this study can be popularized to regions without data.

Conclusions
By integrating meteorological factors selected by EEMD and the potential energy of gravity waves representing topographic factors in the SRYE into the GA-LSSVM model for precipitation prediction, this study discussed the prediction effects of different factors in the estimation of long-term precipitation. The results demonstrate that the SST-GA-LSSVM model is the optimal model (R 2 SST = 0.856, RMSE SST = 19.648, MAE SST = 14.363) in the test stage and the simulation results of the Ep-GA-LSSVM model are similar to those of the T-GA-LSSVM model. From the perspectives of RMSE and MAE, in terms of accuracy, the SST-GA-LSSVM model improves by 24% and 23% compared with the other two models. According to the statistics of prediction results, R 2 of the three models, more than a half of meteorological stations exhibit R 2 over 80%.
Furthermore, this research analyzed the possible relationship of prediction accuracy R 2 of the model with altitude and annual average precipitation. The accuracy is sensitive to influences of altitude and accurate prediction results are easily obtained at high altitude. The SST-GA-LSSVM model can "fill the gap" of low prediction accuracy of other models at low altitudes. Using the altitude of 3500 m as the boundary, the prediction accuracy R 2 in the region with annual precipitation of 400-600 mm is about 10% higher than that in the regions with different precipitation.
In conclusion, this study proposes a prediction model for precipitation, combined with physical mechanisms, while being independent of historical precipitation data in the field of precipitation prediction. By integrating key factors driving precipitation, this study achieved high accuracy of prediction results, which is conducive toward understanding precipitation prediction. The relevant research results are favorable for flood prevention and risk management of water resources. It is worth noting that the framework and methods in this study can be popularized to regions without data.  Data Availability Statement: All data used in this study are available upon request.