Regional Zenith Tropospheric Delay Modeling Based on Least Squares Support Vector Machine Using GNSS and ERA5 Data

: The meteorological reanalysis data has been widely applied to derive zenith tropospheric delay ( ZTD ) with a high spatial and temporal resolution. With the rapid development of artiﬁcial intelligence, machine learning also begins as a high-efﬁciency tool to be employed in modeling and predicting ZTD . In this paper, we develop three new regional ZTD models based on the least squares support vector machine (LSSVM), using both the International GNSS Service ( IGS )- ZTD products and European Centre for Medium-Range Weather Forecasts Reanalysis 5 ( ERA5 ) data over Europe throughout 2018. Among them, the ERA5 data is extended to ERA5S-ZTD and ERA5P-ZTD as the background data by the model method and integral method, respectively. Depending on different background data, three schemes are designed to construct ZTD models based on the LSSVM algorithm, including the without background data, with the ERA5S-ZTD , and with the ERA5P-ZTD . To investigate the advantage and feasibility of the proposed ZTD models, we evaluate the accuracy of two background data and three schemes by segmental comparison with the IGS - ZTD of 85 IGS stations in Europe. The results show that the overall average Root Mean Square Errors ( RMSE ) value of all sites is 30.1 mm for the ERA5S-ZTD , and 10.7 mm for the ERA5P-ZTD . The overall average RMSE is 25.8 mm, 22.9 mm, and 9 mm for the three schemes, respectively. Moreover, the overall improvement rate is 19.1% and 1.6% for the ZTD model with ERA5S-ZTD and ERA5P-ZTD , respectively. In order to explore the reason of the lower improvement for the ZTD model with ERA5P-ZTD , the loop veriﬁcation is performed by estimating the ZTD values of each available IGS station. In actuality, the monthly improvement rate of estimated ZTD is positive for most stations, and the biggest improvement rate can even reach about 40%. The negative rate mainly comes from speciﬁc stations, these stations are located on the edge of the region, near the coast, as well as the lower similarity between the individual veriﬁed station and training stations.


Introduction
The troposphere constitutes most of the mass and water vapor of the entire atmosphere. The water vapor mainly concentrated in the troposphere below 10-12 km, which is an important meteorological factor causing climate change. With the rapid development of Global Navigation Satellite System (GNSS) technique, it has become an indispensable tool for monitoring water vapor variation [1,2]. When GNSS signals across the troposphere, these signals will slow down and bend because of the refraction of atmospheric molecules. The delay is usually described as the zenith tropospheric delay (ZTD) and the mapping function related to the elevation angle in the processing of GNSS data [3,4], and the water vapor variation can be retrieved and monitored by using high-precision ZTD values. The structure of the article is as follows. Section 2.1 to Section 2.4 present the two mentioned data sources, the calculation method of two background data, the designed three ZTD models, and the strategy of accuracy evaluation, respectively. Sections 3.1 and 3.2 evaluate the accuracy of two background data and the three ZTD models, and Section 3.3 analyzes the improvement rate of the two ZTD models based on ERA5S-ZTD and ERA5P-ZTD in detail. In Section 4, we discuss the dependency of estimated ZTD on the station distribution, as well as on the output parameters. At last, the overview and outlook are given in Section 5.

Data Source
Since 1998, the IGS center has regularly provided the services of tropospheric error correction products. The troposphere products are offered in daily files by site for over 350 GNSS stations in the IGS network, which include five-minute estimates of ZTD and north and east troposphere gradient components, as well as the position of the IGS station. We can access to these products from IGS center (ftp://igs.ensg.ign.fr/pub/igs/products/ troposphere/, accessed on 1 January 2021).
ERA5 data is the fifth-generation global meteorological parameter reanalysis data updated by the European Centre for Medium-Range Weather Forecasts (ECWMF) in January 2019. The ERA5 data is the grid data with a spatial resolution of 0.25 • * 0.25 • and is provided with one-hour time resolution. It is divided into two types of data, namely the hourly data on single level and the hourly data on pressure levels, both of them can be taken from ECWMF center (https://www.ecmwf.int/en/forecasts/datasets/reanalysis-datasets/ era5, accessed on 1 January 2021). The single-level products refer to the meteorological parameters on the surface of Earth. The pressure-level products divide the atmosphere into 37 pressure layers in vertical and provide meteorological parameters on the surface of each pressure layer, which enable ERA5 data to describe changes in meteorological parameters in more detail.
In both types of data, the IGS final product offer ZTD with a temporal resolution of 5 min, while these ZTD values have low spatial resolution owing to the uneven distribution of IGS stations all over the world. Interestingly, the ZTD values can reach a fixed spatial resolution of 0.25 • * 0.25 • with hourly ERA5 data on a global scale. To unify the temporal resolution of the two types of data, we extract the hourly ZTD values from IGS products, namely IGS-ZTD in this contribution.

Two Background Data
By using the model method and the integral method, we derive the ZTD values at IGS stations as two background data based on the ERA5 data, namely ERA5S-ZTD and ERA5P-ZTD, respectively. Our specific calculation steps are as follows: 1.
Obtaining original ERA5 data and the position of IGS station. In this study, the ERA5 hourly geopotential (m), 2-meter dewpoint temperature (K), 2-meter temperature (K), surface pressure (Pa) on single-level data, as well as the ERA5 hourly geopotential (m), temperature (K), and specific humidity (%) on 37 pressure-level data are utilized. The latitude, longitude and altitude of the IGS stations are extracted from the tropospheric products.

2.
Getting the WGS84 ellipsoid height. Based on the EGM2008 [25], the ERA5 hourly geopotential of the single-level and the pressure-level products are corrected to ellipsoid height H s and H P , respectively.

3.
Deriving meteorological parameters of the IGS stations using the single-level ERA5 data. According to the latitude and longitude of the IGS station, we detect the positions of the four nearest grid points of ERA5 data firstly.
where, R h represents the relative humidity, and the values of R 2 = 6.112 hpa, R 3 = 17.502 K, as well as R 4 = 32.19 K.

4.
Calculating ERA5S-ZTD by the model method from T, P and e. These meteorological parameters are substituted into the Saastamoinen model [5], and then ERA5S-ZTD values of the IGS stations are obtained using the following equations.
where, ϕ denotes the latitude of the IGS stations.

5.
Interpolating meteorological parameters above the IGS stations by the pressure-level ERA5 data. According to the relationship of H P and H, the pressure-level data above the IGS stations are retained. Through the plane interpolation and fitting, the temperature T P and specific humidity Q P of the IGS stations at retained height of H P are derived. The water vapor pressure e P is obtained through Q P and P P .
where, P P is the pressure value of each level.

6.
Calculating ERA5P-ZTD by the integral method. The ERA5P-ZTD values of the IGS stations are obtained by integrating as shown in the following equations [4].
where, N S refers to the atmospheric refractive index at the IGS stations, N P represents the atmospheric refractive index above the IGS stations, and N contains the N S and N P in each IGS station. K 1 , K 2 and K 3 are the refractive index constants with the values of 77.604 K/hPa, 64.79 K/hPa and 377,600 K 2 /hPa, respectively.

LSSVM Algorithm
The training sets (x train_i ,y train_i ) i=1,2,3...m and testing sets (x test_i ,y test_i ) i=1,2,3...n are constructed using known data according to the network structure of different schemes, x and y represent the input and output vectors, respectively. The numbers of training sample and testing sample are described as m and n.
For a given training sample (x train_i ,y train_i ), the LSSVM algorithm constructs the special function, that is combined with the least-squares equation constraint conditions, to obtain the corresponding Lagrange function equation. The hyper-parameters of optimal model are obtained through the cross-validation method. The optimal solution conditions and the Radial Basis Function (RBF) kernel function matrix K are assembled to compose the LSSVM regression model. These above-mentioned processing is elucidated by the ensuing equations [26].
where, ω represents the weight vector, ϕ denotes the mapping function, α and b are the model parameters, E refers to the error vector of the model, C refers to the regularization constant, and σ explains the indicator of the kernel function, which is determined by the cross-validation optimization algorithm. In addition, Table 1 lists the initial parameters and optimization strategy settings of the LSSVM algorithms in this contribution.

Three Schemes
Based on the LSSVM algorithm, we design three schemes to establish the ZTD models according to different background data, namely Scheme 1, Scheme 2, and Scheme 3. The final ZTD estimations of three ZTD models are regarded as EST-ZTD1, EST-ZTD2, and EST-ZTD3 respectively. It should be pointed that, the ZTD data is normalized to [0,1] when the ZTD data is taken as one of input parameters to directly participate in the ZTD modeling based on the LSSVM algorithm, in which the ZTD data includes the IGS-ZTD, the ERA5S-ZTD, as well as ERA5P-ZTD.
Scheme 1: The core of this scheme is employing the LSSVM algorithm to depict the functional relationship between the station position and the ZTD value, as shown in Figure 1. The input parameters are the longitude, latitude, altitude of the IGS stations, and the output elements are the estimated IGS-ZTD at the corresponding IGS stations. There is no background data involved in the ZTD model. This scheme is also employed in most ZTD models based on machine learning algorithms. When we verify the trained model by testing stations, the EST-ZTD1 at the test station is the output ZTD value of this model. ZTD models based on machine learning algorithms. When we verify the trained model by testing stations, the EST-ZTD1 at the test station is the output ZTD value of this model. Scheme 2: We introduce the ERA5S-ZTD as background data in the ZTD model. The core of this scheme is to estimate the deviation of ERA5S-ZTD and IGS-ZTD by the LSSVM algorithm, the network structure is shown in Figure 2. In this scheme, the longitude, latitude, altitude, and the ERA5S-ZTD of the IGS station constitute the input vector. The corresponding output parameter is the deviation D-ZTDERA5S depicted through the following equation. In the last, the EST-ZTD2 of the test station is the summation of the estimated D-ZTDERA5S and the corresponding ERA5S-ZTD.     Scheme 2: We introduce the ERA5S-ZTD as background data in the ZTD model. The core of this scheme is to estimate the deviation of ERA5S-ZTD and IGS-ZTD by the LSSVM algorithm, the network structure is shown in Figure 2. In this scheme, the longitude, latitude, altitude, and the ERA5S-ZTD of the IGS station constitute the input vector. The corresponding output parameter is the deviation D-ZTD ERA5S depicted through the following equation. In the last, the EST-ZTD2 of the test station is the summation of the estimated D-ZTD ERA5S and the corresponding ERA5S-ZTD. ZTD models based on machine learning algorithms. When we verify the trained model by testing stations, the EST-ZTD1 at the test station is the output ZTD value of this model. Scheme 2: We introduce the ERA5S-ZTD as background data in the ZTD model. The core of this scheme is to estimate the deviation of ERA5S-ZTD and IGS-ZTD by the LSSVM algorithm, the network structure is shown in Figure 2. In this scheme, the longitude, latitude, altitude, and the ERA5S-ZTD of the IGS station constitute the input vector. The corresponding output parameter is the deviation D-ZTDERA5S depicted through the following equation. In the last, the EST-ZTD2 of the test station is the summation of the estimated D-ZTDERA5S and the corresponding ERA5S-ZTD.     Scheme 3: Similar as Scheme 2, the ERA5P-ZTD replaces the ERA5S-ZTD as the background data in Scheme 3, as shown in the following equation and in Figure 3. ZTD models based on machine learning algorithms. When we verify the trained model by testing stations, the EST-ZTD1 at the test station is the output ZTD value of this model. Scheme 2: We introduce the ERA5S-ZTD as background data in the ZTD model. The core of this scheme is to estimate the deviation of ERA5S-ZTD and IGS-ZTD by the LSSVM algorithm, the network structure is shown in Figure 2. In this scheme, the longitude, latitude, altitude, and the ERA5S-ZTD of the IGS station constitute the input vector. The corresponding output parameter is the deviation D-ZTDERA5S depicted through the following equation. In the last, the EST-ZTD2 of the test station is the summation of the estimated D-ZTDERA5S and the corresponding ERA5S-ZTD.

Accuracy Evaluation
The IGS-ZTD is adopted as the true value because it can reach a high precision of 4 mm. Due to the poor data integrity of some stations in a long time period, we divide the year into 12 months to ensure a sufficient number of samples in the research stage. The monthly bias and the root mean square error (RMSE) of each experimental IGS station are calculated by the following equations. The verifications of three ZTD models are evaluated by the average monthly bias and average RMSE of all testing stations in each month. In addition, piece-wise evaluations of two different background data are performed at each session. The evaluation indexes are the same as that of the ZTD model.
where, N is the monthly number of EST-ZTD values for each available station.

Results and Analysis
The ERA5 hourly data of the European region (32  The IGS-ZTD is adopted as the true value because it can reach a high precision of 4 mm. Due to the poor data integrity of some stations in a long time period, we divide the year into 12 months to ensure a sufficient number of samples in the research stage. The monthly bias and the root mean square error (RMSE) of each experimental IGS station are calculated by the following equations. The verifications of three ZTD models are evaluated by the average monthly bias and average RMSE of all testing stations in each month. In addition, piece-wise evaluations of two different background data are performed at each session. The evaluation indexes are the same as that of the ZTD model.
where, is the monthly number of EST-ZTD values for each available station.

Results and analysis
The ERA5 hourly data of the European region (32N-72N, 15W-40E) in 2018 are extended to the ERA5S-ZTD and the ERA5P-ZTD at the positions of 85 IGS stations. More than 50 IGS stations are available per month, and all of them are employed to investigate the consistency of the extended ERA5S-ZTD and ERA5P-ZTD with IGS-ZTD. Moreover, 10 available IGS stations are selected in each month to verify the accuracy of our three ZTD models, where these models are built by the remaining more than 40 IGS stations derived from the three schemes. Figure Table 2 shows the monthly bias and RMSE of the ERA5S-ZTD and ERA5P-ZTD by comparison with the IGS-ZTD at all available stations. For the ERA5S-ZTD, the average  Table 2 Figure 5 demonstrates the average monthly bias and the average RMSE values of two background ZTD. The average monthly bias of ERA5S-ZTD is larger than that of ERA5P-ZTD from March to October 2018, with the more obvious variation. The average RMSE series of both ERA5S-ZTD and ERA5P-ZTD appear the same trend of rising firstly and then falling over time, and the average RMSE of ERA5S-ZTD is greater than that of ERA5P-ZTD in each month, especially from May to October. These results imply that ERA5S-ZTD has a larger systematic bias with the IGS-ZTD than the ERA5P-ZTD, and the consistency between ERA5P-ZTD and IGS-ZTD are better than that of between ERA5S-ZTD and IGS-ZTD. The reasons are caused by both the difference of the grid data and the processing methods. On one hand, the extended ERA5S-ZTD only uses hourly ERA5 data on the single-level data, while the hourly data on 37 pressure levels are employed in the extended ERA5P-ZTD. On the other hand, the ERA5S-ZTD is obtained by substituting the meteorological parameters into the fixed model, which may have certain system errors, while the ERA5P-ZTD is computed by integrating all the refractive across the path of the GNSS signals. Figures 6 and 7 show the RMSE values of all available IGS stations derived from the ERA5S-ZTD and ERA5P-ZTD in January, April, July, and October of 2018, respectively. This corresponds to winter, spring, summer, and autumn in the northern hemisphere.

Accuracy of Two Background ZTD
The RMSE values of each station in January are relatively small for both the ERA5S-ZTD and ERA5P-ZTD, and the highest RMSE appears for these stations in July. This indicates that the situation is a common feature of the whole selected region. The value for tropospheric delay changes with different seasons and latitudes because there are complex meteorological phenomena in the troposphere [27]. In different seasons, the diverse moisture content brings the seasonal variation of the ZTD values. When the summer is coming, the change of the moisture content can get complex and dramatic in a smaller area, while the ability of minor change capturing is limited for the ERA5 grid data. Moreover, the lower RMSE values are always distributed in high latitudes and inland areas, as well as the higher RMSE values mostly appear in low latitudes and coastal regions. It implies that the high heat and active monsoon activities lead to the large and complex ZTD value in low latitudes and coastal areas. The ERA5 data does not indicate this variety of ZTD values perfectly, which brings about this spatial characteristic of RMSE values.
Remote Sens. 2021, 13, x FOR PEER REVIEW 9 of 19 Figure 5 demonstrates the average monthly bias and the average RMSE values of two background ZTD. The average monthly bias of ERA5S-ZTD is larger than that of ERA5P-ZTD from March to October 2018, with the more obvious variation. The average RMSE series of both ERA5S-ZTD and ERA5P-ZTD appear the same trend of rising firstly and then falling over time, and the average RMSE of ERA5S-ZTD is greater than that of ERA5P-ZTD in each month, especially from May to October. These results imply that ERA5S-ZTD has a larger systematic bias with the IGS-ZTD than the ERA5P-ZTD, and the consistency between ERA5P-ZTD and IGS-ZTD are better than that of between ERA5S-ZTD and IGS-ZTD. The reasons are caused by both the difference of the grid data and the processing methods. On one hand, the extended ERA5S-ZTD only uses hourly ERA5 data on the single-level data, while the hourly data on 37 pressure levels are employed in the extended ERA5P-ZTD. On the other hand, the ERA5S-ZTD is obtained by substituting the meteorological parameters into the fixed model, which may have certain system errors, while the ERA5P-ZTD is computed by integrating all the refractive across the path of the GNSS signals.    Figure 7 show the RMSE values of all available IGS stations derived from the ERA5S-ZTD and ERA5P-ZTD in January, April, July, and October of 2018, respectively. This corresponds to winter, spring, summer, and autumn in the northern hemisphere. The RMSE values of each station in January are relatively small for both the ERA5S-ZTD and ERA5P-ZTD, and the highest RMSE appears for these stations in July. This indicates that the situation is a common feature of the whole selected region. The value for tropospheric delay changes with different seasons and latitudes because there are complex meteorological phenomena in the troposphere [27]. In different seasons, the diverse moisture content brings the seasonal variation of the ZTD values. When the summer is coming, the change of the moisture content can get complex and dramatic in a smaller area, while the ability of minor change capturing is limited for the ERA5 grid data. Moreover, the lower RMSE values are always distributed in high latitudes and inland areas, as well as the higher RMSE values mostly appear in low latitudes and coastal regions. It implies that the high heat and active monsoon activities lead to the large and complex ZTD value in low latitudes and coastal areas. The ERA5 data does not indicate this variety of ZTD values perfectly, which brings about this spatial characteristic of RMSE values.

Accuracy of three schemes
We verify the consistency of our three ZTD models by comparison the estimated ZTD values with IGS-ZTD. Table 3 gives their monthly bias and RMSE values of 10 verification stations. The overall average monthly bias is 1.1 mm, 0.3 mm, and -1.6 mm for Scheme 1, 2, and 3, respectively. The mean value of the average RMSE of 12 months is 25.8 mm, 22.9 mm, and 9.0 mm for Scheme 1, 2, and 3, respectively. Among them, the average monthly bias of 10 verification stations of Scheme 1 is in the range of -3.

Accuracy of Three Schemes
We verify the consistency of our three ZTD models by comparison the estimated ZTD values with IGS-ZTD. Table 3 gives their monthly bias and RMSE values of 10 verification stations. The overall average monthly bias is 1.1 mm, 0.3 mm, and −1.6 mm for Scheme 1, 2, and 3, respectively. The mean value of the average RMSE of 12 months is 25.8 mm, 22.9 mm, and 9.0 mm for Schemes 1, 2, and 3, respectively. Among them, the average monthly bias of 10 verification stations of Scheme 1 is in the range of −3.5-9.3 mm, the corresponding average RMSE values are within 16.9-33.9 mm. The maximum monthly bias of 36.7 mm appears on the individual verification station in August, as well as the maximum RMSE of 52.3 mm. For Scheme 2, the average monthly bias is relatively small with the range of −2.4-4.4 mm in 12 months and the average RMSE is concentrated in the 15.3-30.4 mm. One of 10 verification stations has the maximum monthly bias of 32.4 mm and maximum RMSE value of 54.2 mm in September. In addition, all of the average monthly biases of Scheme 3 are negative values from −0.7 mm to −3.5 mm with the corresponding average RMSE values from 6.1 mm to 12.6 mm.  Figure 8 visualizes the average monthly bias and average RMSE per month. All of the average monthly biases of Scheme 1 and Scheme 2 go up and down around zero in 12 months; that of Scheme 3 is close to zero. These results illustrate that the overall systematic errors are small for all three ZTD models based on the LSSVM algorithm. Interestingly, a near overlap emerges owing to the more negative monthly bias values of three schemes in July 2018. Meanwhile, a temporal feature emerges for the average RMSE values of three ZTD models, where the RMSE values grow in size with the coming of summer. It reflects a common problem that the accumulation of water vapor and the irregular changes of cyclones in summer lead to more difficulty in modeling ZTD. Moreover, the average RMSE values of Scheme 3 and Scheme 2 are smaller than that of Scheme 1, and the value of Scheme 3 is the smallest in each month. It indicates that the two models with extended ERA5 data as background data have a significant advantage compared to the model without background data, and the accuracy of the model based on ERA5P-ZTD is better because the higher consistency occurs between the ERA5P-ZTD and IGS-ZTD.

Improvement rate of the two ZTD models based on ERA5S-ZTD and ERA5P-ZTD
To analyze the advantage of the two ZTD models of Scheme 2 and Scheme 3, we compare the RMSE values of the 10 verification stations derived from the two ZTD models with ERA5S-ZTD and ERA5P-ZTD, and their respective improvement rates are calculated according to the following two equations.

Improvement Rate of the Two ZTD Models Based on ERA5S-ZTD and ERA5P-ZTD
To analyze the advantage of the two ZTD models of Scheme 2 and Scheme 3, we compare the RMSE values of the 10 verification stations derived from the two ZTD models with ERA5S-ZTD and ERA5P-ZTD, and their respective improvement rates are calculated according to the following two equations.
where, the RMSE ERA5S-ZTD and RMSE ERA5P-ZTD refer to the RMSE value of the ERA5S-ZTD and ERA5P-ZTD in each verified station, respectively. The RMSE EST-ZTD2 and RMSE EST-ZTD3 represent the RMSE values of Scheme 2 and Scheme 3 at 10 verification stations. Table 4 lists the statistical result of the improvement rate based on 10 verification stations. For Scheme 2, the overall average improvement rate reaches 19.1%, with the monthly average improvement rate of from 12.7% to 26.3%. In these months, the optimal improvement rate is up to 61.3% at an individual station in July, as well as the worst rate of −64.7% in June. For Scheme 3, only 1.6% overall average improvement rate is achieved, and most of the monthly average improvement rates are close to zero. During those 12 months, the best monthly average improvement rate is 17.5% with the maximum improvement of 42.5% at one of 10 verification stations in March, and the worst rate is −12.5% with the rate of −86.1% at an individual station in December. By comprehensively comparing the improvement rate of Scheme 2 and Scheme 3, it clearly stands out that the LSSVM algorithm has the excellent performance of compensating the errors derived from ERA5S-ZTD, while the situation is not obvious for the ERA5P-ZTD. Even so, the improvement rate of several stations can also exceed 17% in all 12 months for Scheme 3. Table 4. Improvement rate of Schemes 2 and 3. Values in brackets refer to the monthly minimal and maximal improvement rate for 10 verifying stations. Unit is percent.

Improvement-Scheme 3
As a typical, Figure 9 and Table 5 display the result of GLSV site in January, April, July, and October of 2018, where the GLSV site is one of the 10 verification stations. Among them, as shown in Figure 9 that significant periodic error signals exist in the bias time series of ERA5S-ZTD and have significantly improved after modeling with ERA5S-ZTD based on the LSSVM algorithm, especially in summer. For the bias time series of ERA5P-ZTD, the periodic error signals have a small fluctuation amplitude compared with ERA5S-ZTD, especially for the results in winter. In addition, the improvement effects from Scheme 3 are not better than that of Scheme 2. As can be seen from the monthly bias in Table 5, EST-ZTD2 model has better correction effect for the obvious systematic errors in ERA5S-ZTD model compared with the improvement of EST-ZTD3 relative to ERA5P-ZTD. Overall, the LSSVM algorithm has better performance in correcting periodic bias of the ERA5S-ZTD, and the extended ERA5P-ZTD by the integral method has good accuracy property, which constrains the improvement capability of the LSSVM algorithm.    Table 4 lists the statistical result of the improvement rate based on 10 verification stations. For Scheme 2, the overall average improvement rate reaches 19.1%, with the monthly average improvement rate of from 12.7% to 26.3%. In these months, the optimal improvement rate is up to 61.3% at an individual station in July, as well as the worst rate of -64.7% in June. For Scheme 3, only 1.6% overall average improvement rate is achieved, and most of the monthly average improvement rates are close to zero. During those 12 months, the best monthly average improvement rate is 17.5% with the maximum improvement of 42.5% at one of 10 verification stations in March, and the worst rate is -12.5% with the rate of -86.1% at an individual station in December. By comprehensively comparing the improvement rate of Scheme 2 and Scheme 3, it clearly stands out that the LSSVM algorithm has the excellent performance of compensating the errors derived from ERA5S-ZTD, while the situation is not obvious for the ERA5P-ZTD. Even so, the improvement rate of several stations can also exceed 17% in all 12 months for Scheme 3.  Table 5. Monthly bias and RMSE of ZTD estimations from ERA5S-ZTD, EST-ZTD2, ERA5P-ZTD and EST-ZTD3 model at the GLSV site in January, April, July, and October of 2018. Unit is millimeters.

Discussion
To further investigate the feasibility and availability of the ZTD model of Scheme 3, we perform loop verification on each of the all available IGS stations per month. In each month, each of the available IGS stations is as a testing station in turn to estimate ZTD values, and all remaining available IGS stations are used to train the ZTD model based on the LSSVM algorithm. Afterward, the RMSE values of these estimated ZTD values are evaluated by comparison with IGS-ZTD according to Equation (20), as well as the calculated improvement rate according to Equation (22). We calculate the percentage of all available IGS stations per month by dividing them into the accuracy-improved stations and the accuracy-reduced stations, as well as the overall average improvement rate of their RMSE values. Among them, the percentage of stations with accuracy-reduced stations can be subtracted by the percentage of accuracy-improved stations from 100%. As shown in Figure 10, the percentage of the accuracy-improved stations exceeds 50%, while the overall average improvement rate of RMSE values is less than 10% per month, with the exception of March 2018.    Figure 11 unfolds the improvement rate of each available IGS station in January, April, July, and October of 2018. The most red and yellow spots with the improvement rate in the range of 0-40% are mainly located in the central region of Europe, where the available IGS stations evenly are distributed and compact. This allows that the information of input and output parameters can be comprehensively captured with precision in these areas. The mainly cyan and blue spots with the rate of −40%-0 are located in the edge of the station network or near the coast, and few stations adjacent to them. Therefore, no enough samples can be trained by using the LSSVM algorithm, and the stations located in the coastal area suffer from more drastic climate variability. Specially, several cyan and blue spots are surrounded by a dense network of stations in the central region, such as the WTZR site. It implies that good station distribution is not the unique requirement for the improvement of the estimated ZTD. Therefore, we make a further discussion for the dependency of estimated ZTD on the output parameters D-ZTD ERA5P in the next section.

Dependency of the Estimated ZTD on the Output Parameter D-ZTD ERA5P
According to the following formula, we calculate the cosine value of the monthly D-ZTD ERA5P time series between the vector of the individual verified station and each of the training station [28].
where, i refers to the ith training station, D − ZTD train_i ERA5P denotes the vector of the ith training station in the monthly D-ZTD ERA5P time series, as well as D − ZTD test ERA5P for the individual verified station. In order to express more clearly the similarity of the training samples with the verified sample, we give the average-cosine value as where n refers to the number of training stations in each month.
In Figure 12, we divide the available IGS station in January, April, July, and October of 2018 according to the station distribution, the improvement rate < 0, as well as |average cos ine| < 0.2. When |average cos ine| < 0.2, the similarity of the training samples with the verified samples is weak. The overall trend between the average-cosine and the improvement rate tends to be linear from black spots, where the improvement rate increases with the rise of the average-cosine value. The average-cosine values of most stations are more than 0.2 with a positive improvement rate. It implies that the overall accuracy of estimated ZTD from Scheme 3 is superior to the accuracy of ERA5P-ZTD. Moreover, the overall improved accuracy is associated with the similarity between training stations and testing stations. According to the following formula, we calculate the cosine value of the monthly D-ZTDERA5P time series between the vector of the individual verified station and each of the training station [28].
where refers to the number of trainin stations in each month. Figure 12. Similarity of the improvement rate with average-cosine value for each available station in January, April, July, and October of 2018. Among them, the black spots refer to the station with the improvement rate > 0 or |average cosine| > 0.2. The red spots refer to the stations located in the edge region with the improvement rate < 0 and |average cosine| < 0.2, the blue spots for the stations located in the central region, and the magenta spots for the stations close to the coast. For the red, blue, and magenta spots, their corresponding stations are included at least three months in January, April, July, and October of 2018. The other stations are presented by using the green spots.
In Figure 12, we divide the available IGS station in January, April, July, and October of 2018 according to the station distribution, the improvement rate < 0 , as well as |average cosine| < 0.2. When |average cosine| < 0.2, the similarity of the training samples with the verified samples is weak. The overall trend between the average-cosine and the improvement rate tends to be linear from black spots, where the improvement rate in- Figure 12. Similarity of the improvement rate with average-cosine value for each available station in January, April, July, and October of 2018. Among them, the black spots refer to the station with the improvement rate > 0 or |average cos ine| > 0.2. The red spots refer to the stations located in the edge region with the improvement rate < 0 and |average cos ine| < 0.2, the blue spots for the stations located in the central region, and the magenta spots for the stations close to the coast. For the red, blue, and magenta spots, their corresponding stations are included at least three months in January, April, July, and October of 2018. The other stations are presented by using the green spots.
Although the WTZR, OPMT, and LEIJ sites with blue spots are surrounded by a dense network of stations in the central region as shown in Figure 13, their average-cosine values are less than 0.2 at least three months in January, April, July, and October of 2018. The estimated ZTD values of the three sites have a negative improvement rate owing to the trained stations have lower similarity with them. For the sites with red, magenta, and green spots, they are always located on the edge of the region or near the coast. On one hand, their locations are relatively outliers when their ZTDs are estimated, which also leads to a lower similarity and negative improvement rate. On the other hand, the large change of air humidity often occurs near the coast, and the accuracy of ZTD modeling become unstable owing to the influence of the abundant water vapor and active monsoon climate. Overall, this is a common problem for the ZTD modeling. Even though, the improvement rate of estimated ZTD is positive for most stations, and the biggest improvement rate from Scheme 3 can reach about 40%. Although the WTZR, OPMT, and LEIJ sites with blue spots are surrounded by a dense network of stations in the central region as shown in Figure 13, their average-cosine values are less than 0.2 at least three months in January, April, July, and October of 2018. The estimated ZTD values of the three sites have a negative improvement rate owing to the trained stations have lower similarity with them. For the sites with red, magenta, and green spots, they are always located on the edge of the region or near the coast. On one hand, their locations are relatively outliers when their ZTDs are estimated, which also leads to a lower similarity and negative improvement rate. On the other hand, the large change of air humidity often occurs near the coast, and the accuracy of ZTD modeling become unstable owing to the influence of the abundant water vapor and active monsoon climate. Overall, this is a common problem for the ZTD modeling. Even though, the improvement rate of estimated ZTD is positive for most stations, and the biggest improvement rate from Scheme 3 can reach about 40%.

Conclusions
As a fresh meteorological reanalysis data, the ERA5 data has higher spatial resolution than the IGS-ZTD. In addition, the LSSVM algorithm has a more stable computing performance than the traditional BPNN algorithm. In this paper, we combine the ERA5 data with IGS-ZTD by using the LSSVM algorithm. Among them, we extend the ERA5 data to ERA5S-ZTD and ERA5P-ZTD based on the model method and integral method, and design three schemes to build ZTD models.
The consistency of ERA5P-ZTD with IGS-ZTD is better than that of ERA5S-ZTD owing to the integral method, as well as the multiple pressure-level data. The overall average RMSE value is 30.1 mm and 10.7 mm for ERA5S-ZTD and ERA5P-ZTD, respectively. While they have the same trend with the seasonal variation, the RMSE values reach biggest with the coming of summer. In addition, the stations located in low latitudes and coastal regions have always higher RMSE values owing to the high heat and active monsoon activities.

Conclusions
As a fresh meteorological reanalysis data, the ERA5 data has higher spatial resolution than the IGS-ZTD. In addition, the LSSVM algorithm has a more stable computing performance than the traditional BPNN algorithm. In this paper, we combine the ERA5 data with IGS-ZTD by using the LSSVM algorithm. Among them, we extend the ERA5 data to ERA5S-ZTD and ERA5P-ZTD based on the model method and integral method, and design three schemes to build ZTD models.
The consistency of ERA5P-ZTD with IGS-ZTD is better than that of ERA5S-ZTD owing to the integral method, as well as the multiple pressure-level data. The overall average RMSE value is 30.1 mm and 10.7 mm for ERA5S-ZTD and ERA5P-ZTD, respectively. While they have the same trend with the seasonal variation, the RMSE values reach biggest with the coming of summer. In addition, the stations located in low latitudes and coastal regions have always higher RMSE values owing to the high heat and active monsoon activities.
For the three ZTD models, the overall average monthly bias is 1.1 mm, 0.3 mm, and −1.6 mm, with the corresponding mean value of average RMSE values of 25.8 mm, 22.9 mm, and 9.0 mm for Schemes 1, 2, and 3, respectively. Hardly any systematic errors exist for all three ZTD models based on the LSSVM algorithm, and the ZTD modeling accuracy is significantly improved when the ERA5 data is added, especially for the ZTD model with ERA5P-ZTD. Meanwhile, the seasonal characteristics still exist owing to the accumulation of water vapor and the irregular changes of cyclones in summer, as well as the accuracy limitation of ERA5S-ZTD and ERA5P-ZTD.
To investigate the advantage of the combination of ERA5 data using the LSSVM algorithm, the improvement rates are analyzed and discussed in detail. The overall improvement rate has 19.1% for the ZTD model with ERA5S-ZTD, and 1.6% for the ZTD model with ERA5P-ZTD. The LSSVM algorithm shows excellent performance in correcting systematic and periodic errors of ERA5S-ZTD, while the overall improvement effect is weak for the ZTD model with ERA5P-ZTD.
Moreover, we perform loop verification of each available IGS station to explore the reason for the lower improvement for the ZTD model with ERA5P-ZTD. The dependency of estimated ZTD on the station distribution and the output parameters D-ZTD ERA5P are discussed. On one hand, the improvement rate is positive for most stations located in the central region of the regional network, and the biggest monthly improvement rate can even reach about 40%. On the other hand, the improvement rate is negative for several stations. They are always located on the edge of the region, or near the coast. Additionally, they suffer from both the scare training samples and more drastic climate variability around them. Furthermore, a few stations are surrounded by a dense network of stations in the central region with a negative improvement rate, which is mainly caused by the lower similarity between the individual verified station and training stations. Overall, the two discussed dependencies are the common problem for the ZTD modeling, and it is important to refine the strategies of multisource combination based on the excellent machine learning algorithms in the near future.