Weighted Mean Temperature Hybrid Models in China Based on Artiﬁcial Neural Network Methods

: The weighted mean temperature ( T m ) is crucial for converting zenith wet delay to precipitable water vapor in global navigation satellite system meteorology. Mainstream T m models have the shortcomings of poor universality and severe local accuracy loss, and they cannot reﬂect the nonlinear relationship between T m and meteorological/spatiotemporal factors. Artiﬁcial neural network methods can effectively solve these problems. This study combines the advantages of the models that need in situ meteorological parameters and the empirical models to propose T m hybrid models based on artiﬁcial neural network methods. The veriﬁcation results showed that, compared with the Bevis, GPT3, and HGPT models, the root mean square errors ( RMSEs ) of the new three hybrid models were reduced by 35.3%/32.0%/31.6%, 40.8%/37.8%/37.4%, and 39.5%/36.4%/36.0%, respectively. The consistency of the new three hybrid models was more stable than the Bevis, GPT3, and HGPT models in terms of space and time. In addition, the three models occupy 99.6% less computer storage space than the GPT3 model, and the number of parameters was reduced by 99.2%. To better evaluate the improvement of hybrid models T m in the precipitable water vapor (PWV) retrieval, the PWVs calculated using the radiosonde T m and zenith wet delay (ZWD) were used as the reference. The RMSE of PWV derived from the best hybrid model’s T m and the radiosonde ZWD meets the demand for meteorological research and is improved by 33.9%, 36.4%, and 37.0% compared with that of Bevis, GPT3, and HGPT models, respectively. The hypothesis testing results further veriﬁed that these improvements are signiﬁcant. Therefore, these new models can be used for high-precision T m estimation in China, especially in Global Navigation Satellite System (GNSS) receivers without ample storage space.


Introduction
Water vapor is an essential component of the Earth's atmosphere and is crucial for global atmospheric radiation, water cycle, and energy balance [1,2]. Study on water vapor's spatial and temporal distribution is quite important in weather and climate forecasting. The GNSS signals propagating through the troposphere are delayed and bent due to the nonvacuum conditions, known as tropospheric delays [3][4][5]. The tropospheric delay is the zenith total delay (ZTD) multiplied by the tropospheric mapping function, and ZTD consists of zenith hydrostatic delay and zenith wet delay. The GNSS zenith wet delay can be converted into precipitable water vapor using Weighted Mean Temperature (T m ) [6]. Therefore, obtaining a high-precision T m is crucial for improving the accuracy of GNSS retrieving precipitable water vapor [6][7][8][9][10][11][12]. Previous studies have shown that GNSS-derived PWV is accurate and reliable, with RMSE of 1-3 mm [6,13].
An accurate way to obtain high-precision T m is to integrate the vertical profiles of temperature and humidity. However, these profiles are often challenging to obtain in practical work, and thus currently the T m models are usually used to calculate the T m . These models can be divided into two categories. The first category is the model that needs in situ meteorological parameters. Bevis et al. [6] analyzed the correlation between the surface temperature (T s ) and T m of radiosonde data in North America. They found an excellent linear correlation between T m and T s , thereby proposing a linear regression formula of T m (T m = a + bT s ). Since the model coefficients have significant seasonal and local characteristics, to obtain high-precision T m values in other regions, the coefficients of the Bevis formula need to be re-estimated from local radiosonde data [14,15]. Therefore, many scholars have established regional models for different areas [16][17][18]. These models have high accuracy; however, they are only applicable to certain areas and cannot reflect the delicate nonlinear relationship between T m and meteorological factors. Another model [19,20] is an empirical model driven by spatiotemporal information, which does without in situ meteorological parameters and can generate empirical T m for large area or even global locations. Leandro et al. [21] used relative humidity to replace the water vapor pressure in the parameter table of the UNB3 model and established the UNB3m model. The UNB3m model considers the annual cyclic variation of meteorological parameters, has a resolution of 15 latitudes, and can be used to calculate T m . Yao et al. [22] used the GGOS grid T m with a 6 h resolution to analyze the daily variation characteristics of T m , and they constructed a new global T m model (GTm-III) that considers the diurnal variation of T m . Subsequently, Böhm et al. [19] established the GPT2w model, which can provide vital tropospheric parameters such as T m and water vapor pressure with a horizontal resolution of up to 1 • × 1 • . Landskron et al. [23] proposed the GPT3 model, which incorporated meteorological parameter data directly from the GPT2w model; hence, T m in the GPT3 model is consistent with GPT2w. Sun et al. [24] used ERA5 data to establish a new model integrating tropospheric delay correction for GNSS positioning and T m calculation for GNSS meteorology. The model has a spatial resolution of 0.5 • × 0.5 • and a temporal resolution of 1 h. Mateus et al. [25] developed an hourly global pressure and temperature (HGPT) model based on the full spatial and temporal resolution of the new ERA5 reanalysis. The HGPT model can provide hourly surface pressure, surface air temperature, zenith hydrostatic delay, and weighted average temperature information with a spatial resolution of 0.25 • × 0.25 • . As the spatiotemporal factors considered by the empirical model increase, the models' overall accuracy gradually improves, but it also increases the number of model parameters. In addition, scholars have also developed many similar empirical models [20,26,27]. Although these empirical models are driven by spatiotemporal information and convenient for use and have strong universality, their accuracy is inferior to the model that needs in situ meteorological parameters mentioned above, especially in areas with relatively sparse weather stations and large terrain fluctuations. Still, it is difficult to reflect the delicate nonlinear relationship between T m and spatiotemporal factors; therefore, the accuracy of the T m model needs to be further improved. Artificial neural network methods have been widely used in various industries due to their nonlinear fitting ability [28][29][30][31].
In this study, we hope to establish several T m models with higher accuracy than the approved T m models, such as Bevis, GPT3, and HGPT, through artificial neural networks, and these models require fewer parameters. Consequently, the PWV converted by the T m from the new models can meet the requirements of meteorological research. To achieve this goal, we used artificial neural network methods to fit the relationship between high-precision radiosonde T m , empirical T m (provided by the UNB3m model with few parameters), meteorological parameters, and spatiotemporal factors, and then built high-precision T m models.

Study Area and Methods for Calculating T m 2.1. Study Area
The research area spans 70 • E-135 • E and 15 • N-55 • N, which covers the land of China and some surrounding countries and regions (see Figure 1). In this study, we accessed radiosonde data from the Integrated Global Radiosonde Archive (IGRA). Data from 150 radiosonde stations in the study area were obtained for the experiments. The distribution of the stations is indicated by the red triangles in Figure 1.

Study Area
The research area spans 70°E-135°E and 15°N-55°N, which covers the land of China and some surrounding countries and regions (see Figure 1). In this study, we accessed radiosonde data from the Integrated Global Radiosonde Archive (IGRA). Data from 150 radiosonde stations in the study area were obtained for the experiments. The distribution of the stations is indicated by the red triangles in Figure 1.

Calculation of Tm Based on Radiosonde Data
Radiosonde data are important meteorological observation data. The IGRA has provided high-quality sounding observations from more than 1500 radiosondes and sounding balloons worldwide since the 1960s and launches radiosonde twice daily at 00:00 and 12:00 UTC. This study used data from 150 radiosonde stations (2007-2016) to calculate Tm according to the following formula [6]. It is worth mentioning that during long-term continuous observations, the radiosonde data may have discontinuities and outliers. Since these values will affect the results of the new model establishment, this study used the interquartile range (IQR) method to remove outliers in the long-term series of radiosonde data [32].
where is the absolute temperature (K). is the water vapor pressure (hPa), which is calculated from the relative humidity (RH) using Equations (2) and (3)  (2) where es is the saturated vapor pressure (hPa) and Td is the dew point temperature (°C).  Radiosonde data are important meteorological observation data. The IGRA has provided high-quality sounding observations from more than 1500 radiosondes and sounding balloons worldwide since the 1960s and launches radiosonde twice daily at 00:00 and 12:00 UTC. This study used data from 150 radiosonde stations (2007-2016) to calculate T m according to the following formula [6]. It is worth mentioning that during long-term continuous observations, the radiosonde data may have discontinuities and outliers. Since these values will affect the results of the new model establishment, this study used the interquartile range (IQR) method to remove outliers in the long-term series of radiosonde data [32].
where T is the absolute temperature (K). e is the water vapor pressure (hPa), which is calculated from the relative humidity (RH) using Equations (2) and (3) [6,10]: where e s is the saturated vapor pressure (hPa) and T d is the dew point temperature ( • C).

Calculating T m Based on the UNB3m Model
The UNB3m model is based on a look-up table with annual mean and amplitude for temperature, pressure, temperature lapse rate, and water vapor pressure height factor. These parameters are calculated for a particular latitude and day of year using a cosine Remote Sens. 2022, 14, 3762 4 of 22 function for the annual variation and a linear interpolation for latitude. The annual average value of the meteorological parameters is calculated using the following formula: where ϕ is the latitude of the target point ( • ), AVG ϕ is the annual mean value, i is the latitude band closest to the target location that has a smaller value, and LAT i is the value of the corresponding latitude band. The formula for calculating the annual cycle amplitude is as follows: where AMP ϕ denotes the annual cycle amplitude. The meteorological parameter values at a specified time at the target point can be calculated by entering the mean yearly value and annual periodic amplitude value of the target parameter into the trigonometric function: where X ϕ,doy is the annual periodic amplitude value, doy is the day of the year, and doy 0 specifies the initial phase of the regular change, which is 28 in the Northern Hemisphere and 211 in the Southern Hemisphere. The UNB3m model can calculate T m according to the following formula: where T 0 , β, and λ are the meteorological parameters calculated according to procedures (4)- (6), which are temperature (K), temperature lapse rate (K/m), and water vapor pressure height factor, respectively; H is the orthometric height (m); R is the gas constant for dry air (287.054 J/kg/K); and g m is the acceleration of gravity at the atmospheric column centroid (m/s 2 ), which can be expressed as The above model has few parameters and is convenient and straightforward to use; however, the UNB3m model only considers the variation of T m with latitude, and his grid is too sparse, so the accuracy is limited [33].

Calculating T m Based on the GPT3 Model
The global pressure and temperature 3 (GPT3) model proposed by Landskron and Bohm is the latest version of the GPT series [23]. The meteorological parameters of the GPT3 model are the same as that of GPT2w with admirable performance. It is one of the most widely used models [10,31]. GPT3 characterizes T m seasonal variations based on the following Equation (9) and takes into account their geographical variations by 1 • × 1 • or 5 • × 5 • grids. where r(t) is the meteorological parameters to be estimated; t denotes the day of year; A 0 represents its mean value; and (A 1 , B 1 ) and (A 2 , B 2 ) are their annual and semiannual amplitudes, respectively.

Construction of Hybrid Model
In this study, we use Backpropagation neural network (BPNN), Random Forest (RF), and Generalized Regression Neural Network (GRNN) in the neural network toolbox of MATLAB software to optimize UNB3m T m by using high-precision radiosonde T m data, then construct three hybrid models.

BPNN
The BPNN, first proposed by Rumelhart et al. [34], is one of the most widely used ANNs. The network adopts the gradient descent method to minimize the differences between the network output and target output [35].
The BPNN consists of an input layer, a hidden layer, and an output layer. The number of neurons in the input layer of the BPNN is equal to the number of input variables, and the number of neurons in the output layer is equal to the number of output variables. Almost every bounded continuous function can be approximated with an arbitrarily minor error using a neural network with a single hidden layer [36]. Therefore, in this study, we chose the number of hidden layers as a single layer and determined the number of neurons in the hidden layer in Section 3.3.
Because the empirical model optimized in this study is the UNB3m model, UNB3m T m must be used as an essential input for the artificial neural network model. UNB3m T m is established considering the variations in the latitude, height, and day of year, so there must be a strong correlation between UNB3m T m and latitude, height, and day of year. Simultaneously, some studies [4,19,24] found that there exist long-term interannual variations and diurnal variations in T m , so we take year, hour of day (hod), latitude (lat), height, and day of year (doy) as input variables.
It is well known that the surface temperature (T s ) and water vapor pressure (e s ) have strong relationships with the observation T m [6,11], and the T m models based on in situ T s /e s observations can reach higher precision than empirical models. Thus, the T s /e s were employed as inputs for the new model. Li et al. [5] suggested that UNB3m often causes great prediction tropospheric delay biases in some areas because it is only based on latitude-label meteorological parameters that are also needed for calculating T m . Moreover, many T m models [11,19,25], considering the longitude variations, have excellent performance. Therefore, we also took longitude as an input variable. It should be noted that the widespread distribution of surface meteorological observation facilities, T s and e s , can be obtained in real-time in most regions [37]. Therefore, it is not difficult to obtain the temperature and water vapor pressure to support the operation of the new model, and establishing the T m model based on meteorological parameters has a good potential for real-time application. Table 1 shows all the input parameters and the output parameter. In addition, when training the model, the radiosonde-derived T m was loaded into the output layer, and the output layer outputs the corrected T m value when used. The structure of the BPNN T m model is shown in Figure 2.   The process of training the BPNN includes forward and backpropagation. Each neuron in each layer of the BPNN model is directly connected to the neurons in the next layer and had an activation function. This study used the hyperbolic tangent function to activate the input and hidden layer neurons. A linear function was used to activate the neurons of the hidden and output layers. The equations are represented as: Then the final output of BPNN can be expressed as: where , and , represent the weight matrix, and represent the bias matrix. These four matrices store the coefficients of the BPNN model. X and Y are the input and output variables, respectively.

RF
Breiman and Cutler first proposed a random forest in 2001 [38]. RF is an ensemble learning method used for classification and regression. RF works by constructing many decision trees during training and then outputting the mode of the classes or the average prediction of the individual trees [38]. RF has the advantages of fast training speed and handling complex nonlinear relationships between the input and output variables. The The process of training the BPNN includes forward and backpropagation. Each neuron in each layer of the BPNN model is directly connected to the neurons in the next layer and had an activation function. This study used the hyperbolic tangent function to activate the input and hidden layer neurons. A linear function was used to activate the neurons of the hidden and output layers. The equations are represented as: Then the final output of BPNN can be expressed as: where W 2,1 and W 3,2 represent the weight matrix, b 1 and b 2 represent the bias matrix. These four matrices store the coefficients of the BPNN model. X and Y are the input and output variables, respectively.

RF
Breiman and Cutler first proposed a random forest in 2001 [38]. RF is an ensemble learning method used for classification and regression. RF works by constructing many decision trees during training and then outputting the mode of the classes or the average prediction of the individual trees [38]. RF has the advantages of fast training speed and handling complex nonlinear relationships between the input and output variables. The structure of the RF T m model is shown in Figure 3. The input data included time (year, day of year, and hour of day), location (latitude, longitude, and height), e s , T s , and UNB3m-T m . In addition, the output data were radiosonde-derived T m or the corrected T m . Because overfitting can occur with a single decision tree, RF overcomes this problem by introducing randomness into each decision tree and averaging the results. The result of the model was the mean of the consequences of all constructed decisions, as shown in the following formula: where X represents the input variable, Y is the final RF output value, T b denotes the output value of each decision tree, and B represents the number of decision trees. The number of decision trees directly affected the accuracy of the RF model. Because the number of decision trees must be optimally selected, the enumeration method is generally used. The specific number value is determined in Section 3.3.
Remote Sens. 2022, 14, x FOR PEER REVIEW 7 of 23 structure of the RF Tm model is shown in Figure 3. The input data included time (year, day of year, and hour of day), location (latitude, longitude, and height), es, Ts, and UNB3m-Tm. In addition, the output data were radiosonde-derived Tm or the corrected Tm. Because overfitting can occur with a single decision tree, RF overcomes this problem by introducing randomness into each decision tree and averaging the results. The result of the model was the mean of the consequences of all constructed decisions, as shown in the following formula: where represents the input variable, is the final RF output value, denotes the output value of each decision tree, and represents the number of decision trees. The number of decision trees directly affected the accuracy of the RF model. Because the number of decision trees must be optimally selected, the enumeration method is generally used. The specific number value is determined in Section 3.3.

GRNN
Specht first proposed GRNN in 1991 [39]. The GRNN neural network is a radial basis function network based on mathematical statistics. The GRNN has a strong nonlinear mapping ability and learning speed, and the network can also handle unstable data. The GRNN consists of four layers: the input, pattern, summation, and output layers. The number of neurons in the input layer corresponds to the number of input variables; nine neurons correspond to the time information (year, day of year, and hour of day), location information (latitude, longitude, and height), es, Ts and UNB3m-Tm. The number of neurons in the output layer corresponds to the number of output variables. In this experiment, the output layer had only one neuron corresponding to the radiosonde Tm or

GRNN
Specht first proposed GRNN in 1991 [39]. The GRNN neural network is a radial basis function network based on mathematical statistics. The GRNN has a strong nonlinear mapping ability and learning speed, and the network can also handle unstable data. The GRNN consists of four layers: the input, pattern, summation, and output layers.
where p i is the output of the i neuron in the pattern layer, represented by the exponential function of the square of the Euclidean distance between the input variable X i (the i-th learning sample) and its corresponding test sample. σ represents the spread parameter, the only unknown parameter in the network, and needs to be set first. It is necessary to determine the optimal spread parameter as a σ, which, when too large, may make the estimation very smooth or, when too small, may result in a value too close to the sample value [39]. The specific value is defined in Section 3.3. The structure of the GRNN T m model is shown in Figure 4.
4, x FOR PEER REVIEW 8 of 23 the model-corrected Tm. The summation layer includes two types of summation neurons, which perform arithmetic summation and weighted summation of the output values of the pattern layer. The number of neurons in the pattern layer corresponds to the number of training samples, and the transmission of neurons can be expressed as: where is the output of the neuron in the pattern layer, represented by the exponential function of the square of the Euclidean distance between the input variable (the -th learning sample) and its corresponding test sample. represents the spread parameter, the only unknown parameter in the network, and needs to be set first. It is necessary to determine the optimal spread parameter as a , which, when too large, may make the estimation very smooth or, when too small, may result in a value too close to the sample value [39]. The specific value is defined in Section 3.3. The structure of the GRNN Tm model is shown in Figure 4.

Evaluation Indicators Adopted by the Model
This experiment adopted a 10-fold cross-validation method to evaluate the different artificial neural network models [40]. The basic principle of the 10-fold cross-validation technique is to randomly split the dataset into 10 groups and then select 9 groups as the training set and 1 group as the test set. This process was repeated 10 times, so each part of the dataset was tested once, trained 9 times, and all residuals were computed and saved. Note that it can ensure that more data is involved in the training so that the results are closer to the accuracy of the final model and can also prevent overfitting. We calculated five statistical indicators based on these residuals to evaluate model performance. These indicators are the bias, mean absolute error (MAE), standard deviation (STD), root mean square error (RMSE), and Pearson correlation coefficient (R). The formulas of these indicators are as follows:

Evaluation Indicators Adopted by the Model
This experiment adopted a 10-fold cross-validation method to evaluate the different artificial neural network models [40]. The basic principle of the 10-fold cross-validation technique is to randomly split the dataset into 10 groups and then select 9 groups as the training set and 1 group as the test set. This process was repeated 10 times, so each part of the dataset was tested once, trained 9 times, and all residuals were computed and saved. Note that it can ensure that more data is involved in the training so that the results are closer to the accuracy of the final model and can also prevent overfitting. We calculated five statistical indicators based on these residuals to evaluate model performance. These indicators are the bias, mean absolute error (MAE), standard deviation (STD), root Remote Sens. 2022, 14, 3762 9 of 22 mean square error (RMSE), and Pearson correlation coefficient (R). The formulas of these indicators are as follows: where N is the number of samples, T hm k,i is the T m value output by the hybrid models, T rs k,i is the T m values derived from the radiosonde data, and T hm k,i , T rs k,i are the mean values of T hm k,i and T rs k,i , respectively. After verifying the accuracy and reliability of the hybrid models using 10-fold crossvalidation, we fitted all samples to generate the final model for subsequent T m prediction. When users want to use the hybrid models to calculate T m , they only need to collect T s and e s and then input them with the time and location information into the models' code.

Parameter Determination
In this experiment, the number of neurons in the BPNN hidden layer, the number of RF decision trees, and the GRNN spread value are the parameters that need to be set first. We followed Sun et al. [41] to set up the crucial parameters. For the BPNN, the optimal number of neurons in the hidden layer can be selected in the range of 2 √ n + µ to 2n + 1 (n is the number of neurons in the input layer, µ is the number of neurons in the output layer) [42,43]; therefore, the experiment used a 10-fold cross-validation technique to test the BPNN models with 7 to 19 hidden layer neurons. For RF, we set the number of decision trees between 5 and 95 with a step size of 10 and then used 10-fold cross-validation to test the performance of RF models with different numbers of decision trees. For GRNN, the spread value used is usually in the range of 0.01 to 1 [28]. After many tests, we found that the optimal spread value was between 0 and 0.1; therefore, this experiment narrowed the selection range from 0.01 to 0.1, with a step size of 0.01. Similarly, we used a 10-fold cross-validation technique to test the performance of GRNN models with different spread values. Finally, the root mean square errors (RMSE) calculated from the cross-validation residuals were used to evaluate the performance of the other models with different parameter settings. The results are shown in Figure 5. In the BPNN model, the RMSE gradually decreased as the number of neurons in the hidden layer increased. This decline stopped when the number of neurons reaches 18. Therefore, we can set 18 or 19 as the number of hidden layer neurons. When the neuron number changed from 18 to 19, we found that the RMSE was equal, but a very long training time was required. Therefore, we set the number of neurons in the hidden layer to 18. In the RF model, when the number of decision trees increased from 5 to 55, the RMSE decreased significantly. But this decrease closed out after the number of decision trees exceeded 55. Thus, we set the number of decision trees to 55 to reduce the complexity of the model. In the GRNN model, when the spread value increased from 0 to 0.06, the RMSE gradually decreased; however, after the spread value exceeded 0.06, the RMSE gradually increased. Therefore, we finally chose 0.06 as the spread value.

Performance Analyses of Hybrid Models
In this study, the three hybrid models constructed using the three artificial neural network methods of BPNN, RF, and GRNN are named hybrid model 1, hybrid model 2, and hybrid model 3 (referred to as hm1, hm2, and hm3, respectively). The Bevis model ( = 70.2 + 0.72 ) needs in situ meteorological parameters as input and is commonly used internationally. The GPT3 model is an empirical model that is widely used, and it is a grid model that also can reflect the characteristics of Tm in a certain area. Its accuracy performance in China can be used to represent mainstream empirical models. The HGPT model [25] is a recently released Tm model with open source code. Therefore, the Bevis, GPT3 (1° × 1°), and HGPT models were chosen to compare with the new models.

Overall Performance
After determining the parameters of the three artificial neural network methods, the experiment selected 936,034 samples from 2007 to 2016 for training and obtained the corresponding models and results. The cross-validation and model fitting accuracy results are listed in Table 2. Note that since the 10-fold validation uses all the data to verify the model's accuracy, the validation results here span from 2007 to 2016. Scatter plots between the Tm values obtained from different models and the Tm values derived from the radiosonde data are shown in Figures 6 and 7. In Figures 6 and 7, the color indicates data density. In the BPNN model, the RMSE gradually decreased as the number of neurons in the hidden layer increased. This decline stopped when the number of neurons reaches 18. Therefore, we can set 18 or 19 as the number of hidden layer neurons. When the neuron number changed from 18 to 19, we found that the RMSE was equal, but a very long training time was required. Therefore, we set the number of neurons in the hidden layer to 18. In the RF model, when the number of decision trees increased from 5 to 55, the RMSE decreased significantly. But this decrease closed out after the number of decision trees exceeded 55. Thus, we set the number of decision trees to 55 to reduce the complexity of the model. In the GRNN model, when the spread value increased from 0 to 0.06, the RMSE gradually decreased; however, after the spread value exceeded 0.06, the RMSE gradually increased. Therefore, we finally chose 0.06 as the spread value.

Performance Analyses of Hybrid Models
In this study, the three hybrid models constructed using the three artificial neural network methods of BPNN, RF, and GRNN are named hybrid model 1, hybrid model 2, and hybrid model 3 (referred to as hm1, hm2, and hm3, respectively). The Bevis model (T m = 70.2 + 0.72T s ) needs in situ meteorological parameters as input and is commonly used internationally. The GPT3 model is an empirical model that is widely used, and it is a grid model that also can reflect the characteristics of T m in a certain area. Its accuracy performance in China can be used to represent mainstream empirical models. The HGPT model [25] is a recently released T m model with open source code. Therefore, the Bevis, GPT3 (1 • × 1 • ), and HGPT models were chosen to compare with the new models.

Overall Performance
After determining the parameters of the three artificial neural network methods, the experiment selected 936,034 samples from 2007 to 2016 for training and obtained the corresponding models and results. The cross-validation and model fitting accuracy results are listed in Table 2. Note that since the 10-fold validation uses all the data to verify the model's accuracy, the validation results here span from 2007 to 2016. Scatter plots between the T m values obtained from different models and the T m values derived from the radiosonde data are shown in Figures 6 and 7. In Figures 6 and 7, the color indicates data density. Table 2. Relative radiosonde data, the accuracy evaluation results of different models.

Model
Hyperparameter Bias (K)      Table 2 shows that the biases of the UNB3m model, the Bevis model, and the GPT3 model are −1.97 K, 0.80 K, and −0.48 K, respectively. The results indicate that there are systematic biases in the three models. However, the biases of the three hybrid models are all close to zero, which suggests that the artificial neural network method successfully corrects systematic biases. Because the artificial neural network methods successfully removed systematic biases, the STDs and RMSEs were nearly equal. After being corrected by the three hybrid models, the RMSEs are reduced to 2.954 K, 2.703 K, and 2.763 K, which are 73.1%, 75.3%, and 74.8% lower than the UNB3m model, respectively. By using the analysis of variance (ANOVA), it is further demonstrated that the Tm of the three hybrid models is not significantly different from the radiosonde Tm (p > 0.05). Meanwhile, relative to the Bevis model, the RMSEs were reduced by 35.3%, 40.8%, and 39.5%; for the GPT3 model, the RMSEs were reduced by 32.0%, 37.8%, and 36.4%; for the HGPT model, the RMSEs were reduced by 31.6%, 37.4%, and 36.0% respectively. After improving the three artificial neural network methods, the correlation coefficient R increased from 0.540 to 0.969, 0.974, and 0.973. The above analysis shows that all three artificial neural network methods significantly improved the accuracy of the UNB3m model in calculating Tm. These improvements may be accounted for by the strong ability of artificial neural network methods to fit complex nonlinear relationships. (The discussion will be described in the following paragraph.) In addition, when comparing the three hybrid models, we can see that the accuracy validation result of hm2 is smaller than those of hm1 and hm3. Therefore, hm2 exhibited a more stable performance.

Model Hyperparameter Bias (K) MAE (K) STD (K) RMSE (K) R
To illustrate whether the improvement in accuracy of the three hybrid models comes from the data source or the method, we developed a linear model named LS model, which has the same input parameters as the three hybrid models using the least-squares method. The LS model is based on the same modeling data set as the hybrid model, and the specific formula is as follows:  Table 2 shows that the biases of the UNB3m model, the Bevis model, and the GPT3 model are −1.97 K, 0.80 K, and −0.48 K, respectively. The results indicate that there are systematic biases in the three models. However, the biases of the three hybrid models are all close to zero, which suggests that the artificial neural network method successfully corrects systematic biases. Because the artificial neural network methods successfully removed systematic biases, the STDs and RMSEs were nearly equal. After being corrected by the three hybrid models, the RMSEs are reduced to 2.954 K, 2.703 K, and 2.763 K, which are 73.1%, 75.3%, and 74.8% lower than the UNB3m model, respectively. By using the analysis of variance (ANOVA), it is further demonstrated that the T m of the three hybrid models is not significantly different from the radiosonde T m (p > 0.05). Meanwhile, relative to the Bevis model, the RMSEs were reduced by 35.3%, 40.8%, and 39.5%; for the GPT3 model, the RMSEs were reduced by 32.0%, 37.8%, and 36.4%; for the HGPT model, the RMSEs were reduced by 31.6%, 37.4%, and 36.0% respectively. After improving the three artificial neural network methods, the correlation coefficient R increased from 0.540 to 0.969, 0.974, and 0.973. The above analysis shows that all three artificial neural network methods significantly improved the accuracy of the UNB3m model in calculating T m . These improvements may be accounted for by the strong ability of artificial neural network methods to fit complex nonlinear relationships. (The discussion will be described in the following paragraph.) In addition, when comparing the three hybrid models, we can see that the accuracy validation result of hm2 is smaller than those of hm1 and hm3. Therefore, hm2 exhibited a more stable performance.
To illustrate whether the improvement in accuracy of the three hybrid models comes from the data source or the method, we developed a linear model named LS model, which has the same input parameters as the three hybrid models using the least-squares method. The LS model is based on the same modeling data set as the hybrid model, and the specific formula is as follows: Then, the accuracy of the hybrid models, LS model, Bevis model, GPT3 model, and HGPT model are compared. The results are shown in Table 3.  Table 3 shows that the RMSE of the LS model has been improved to different degrees when compared with that of the Bevis, GPT3, and HGPT models, which should be due to the use of the T s , e s , and T m data in the study area to fit the relationship between them. Using the same data source, the RMSE of the three hybrid models increased by 13.2%, 23.7%, and 21.0% compared with the LS model. This improvement should be due to the method that can fit the nonlinear relationship between different parameters.

Spatiotemporal Performance of the Hybrid Models
In this section, we calculate the RMSEs of 150 radiosonde stations to analyze the spatial performance of the hybrid models. Figure 8 shows the specific values and distributions. Figure 9 shows the frequencies of the RMSEs for each interval. The numbers over the bars represent the number of stations within the corresponding RMSE range. Figures 8 and 9 also include the RMSEs of the UNB3m, Bevis, GPT3, and HGPT models at each radiosonde station.       Figure 8, regardless of the model used, low latitudes show a smaller RMSE and high latitudes show a larger RMSE, which is consistent with the results of Sun et al. [24]. This phenomenon occurs because the seasonal variation at high latitudes is more substantial than that at low latitudes, which increases the difficulty of T m modeling and eventually leads to larger RMSEs at high latitudes. At the same time, it can be seen in the figure that the RMSEs of the UNB3m model exhibit apparent differences in different latitude ranges. In contrast, the three hybrid models are uniformly distributed and stable. The RMSEs of the Bevis, GPT3, and HGPT models are much larger than those of the three hybrid models. Therefore, the method proposed in this study improves the accuracy of the model. Further, it makes the model accuracy more evenly distributed in space, owing to the introduction of geographic information into the model input layer. Figure 9 shows that the RMSEs of the three hybrid models are much smaller than the UNB3m model. Simultaneously, compared with the RMSEs of the Bevis model, the GPT3 model, and the HGPT model, the hybrid models also have a significant improvement. The number of sites with an RMSE value smaller than 3.0 K is 0 for the UNB3m model, 74 for the hybrid model 1, 101 for the hybrid model 2, 89 for the hybrid model 3, 22 for the Bevis model, 20 for the GPT3 model, and 21 for the HGPT model. The above results show that the method proposed in this study significantly improves the model's accuracy, and hm2 performs best in all models.

As shown in
Because latitude is an essential factor affecting T m [6], we divided the study area into eight latitude bands with intervals of 5 • to compare and analyze the performance of each model in different latitude bands. The results are shown in Figure 10.
the method proposed in this study significantly improves the model's accuracy, and hm2 performs best in all models.
Because latitude is an essential factor affecting Tm [6], we divided the study area into eight latitude bands with intervals of 5° to compare and analyze the performance of each model in different latitude bands. The results are shown in Figure 10. The biases of the three hybrid models at different latitudes are all close to zero, which indicates that the artificial neural network method has a noticeable effect on correcting the systematic error of the UNB3m model ( Figure 10). The stability of the hybrid models at different latitudes is better than that of the Bevis, GPT3, and HGPT models ( Figure 10). The RMSEs of all models are generally smaller at low latitudes and larger at high latitudes and show an increasing trend with increasing latitude, which is consistent with the results presented in Figure 8. Regardless of the latitude band, the RMSEs of the hybrid models were much lower than those of the Bevis, GPT3, and HGPT models, indicating that the three hybrid models significantly improved the accuracy of Tm. Furthermore, compared with the Bevis, GPT3, and HGPT models, the RMSEs of the three hybrid models are all within 2 to 3.6 K in different latitude bands, and the variation between various bands is slight and uniform, showing more notable advantages. Comparing the three hybrid models, the RMSEs of hm2 at different latitudes are slightly lower than those of the other two models. This result means that hm2 outperforms other models.
Tm is greatly affected by station altitude, and altitude differences may lead to uncertainties of Tm model accuracy [6]. Therefore, we divided the height into eight layers with an interval of 0.5 km and calculated RMSEs for each height layer. The results of the analysis are presented in Figure 11. The biases of the three hybrid models at different latitudes are all close to zero, which indicates that the artificial neural network method has a noticeable effect on correcting the systematic error of the UNB3m model ( Figure 10). The stability of the hybrid models at different latitudes is better than that of the Bevis, GPT3, and HGPT models ( Figure 10). The RMSEs of all models are generally smaller at low latitudes and larger at high latitudes and show an increasing trend with increasing latitude, which is consistent with the results presented in Figure 8. Regardless of the latitude band, the RMSEs of the hybrid models were much lower than those of the Bevis, GPT3, and HGPT models, indicating that the three hybrid models significantly improved the accuracy of T m . Furthermore, compared with the Bevis, GPT3, and HGPT models, the RMSEs of the three hybrid models are all within 2 to 3.6 K in different latitude bands, and the variation between various bands is slight and uniform, showing more notable advantages. Comparing the three hybrid models, the RMSEs of hm2 at different latitudes are slightly lower than those of the other two models. This result means that hm2 outperforms other models.
T m is greatly affected by station altitude, and altitude differences may lead to uncertainties of T m model accuracy [6]. Therefore, we divided the height into eight layers with an interval of 0.5 km and calculated RMSEs for each height layer. The results of the analysis are presented in Figure 11. Remote Sens. 2022, 14, x FOR PEER REVIEW 16 of 23 Figure 11. Biases and RMSEs of each model at different heights. Figure 11 shows that the biases of the three hybrid models tend to be stable and remain near 0, indicating that the accuracies of the hybrid models at different heights are much better than the UNB3m model, Bevis model, GPT3 model, and HGPT model. The RMSEs of the hybrid models obtained after optimization by the artificial neural network methods are 30% lower, relative to the UNB3m model. The RMSEs of the three hybrid models decrease with the increase in altitude and are lower than that of the Bevis, GPT3, and HGPT models. Furthermore, most of the RMSEs of the three hybrid models were below 3 K and were much smaller than those of the UNB3m, Bevis, GPT3, and HGPT models. This result shows that the accuracies of the hybrid models are higher than those of the Bevis, GPT3, and HGPT models and have a more stable accuracy in the vertical direction. 4.3. The hybrid models performance in different seasons Because Tm has prominent seasonal characteristics [6], we calculated the biases and RMSEs for the four seasons to analyze the temporal variation of the performance of each model, as shown in Figure 12.  Figure 11 shows that the biases of the three hybrid models tend to be stable and remain near 0, indicating that the accuracies of the hybrid models at different heights are much better than the UNB3m model, Bevis model, GPT3 model, and HGPT model. The RMSEs of the hybrid models obtained after optimization by the artificial neural network methods are 30% lower, relative to the UNB3m model. The RMSEs of the three hybrid models decrease with the increase in altitude and are lower than that of the Bevis, GPT3, and HGPT models. Furthermore, most of the RMSEs of the three hybrid models were below 3 K and were much smaller than those of the UNB3m, Bevis, GPT3, and HGPT models. This result shows that the accuracies of the hybrid models are higher than those of the Bevis, GPT3, and HGPT models and have a more stable accuracy in the vertical direction.
Because T m has prominent seasonal characteristics [6], we calculated the biases and RMSEs for the four seasons to analyze the temporal variation of the performance of each model, as shown in Figure 12. Figure 12 shows that the UNB3m model shows poor accuracy in all seasons, especially in summer. However, after the correction by the artificial neural network methods, the accuracy in each season has been greatly improved. The RMSE in spring, autumn, and winter dropped to more than 50% of the UNB3m model, and the RMSE in summer directly decreased to more than 80%. In addition, the RMSEs of the hybrid models in each period are smaller than that of the Bevis, GPT3, and HGPT models, indicating the superiority of the hybrid model. The hybrid models have higher and more uniform accuracy in time. When comparing the three hybrid models, we can see that the RMSEs of hm2 are smaller than those of hm1 and hm3. Therefore, hm2 performs best in each season. Remote Sens. 2022, 14, x FOR PEER REVIEW 17 of 23  Figure 12 shows that the UNB3m model shows poor accuracy in all seasons, especially in summer. However, after the correction by the artificial neural network methods, the accuracy in each season has been greatly improved. The RMSE in spring, autumn, and winter dropped to more than 50% of the UNB3m model, and the RMSE in summer directly decreased to more than 80%. In addition, the RMSEs of the hybrid models in each period are smaller than that of the Bevis, GPT3, and HGPT models, indicating the superiority of the hybrid model. The hybrid models have higher and more uniform accuracy in time. When comparing the three hybrid models, we can see that the RMSEs of hm2 are smaller than those of hm1 and hm3. Therefore, hm2 performs best in each season.

Occupancy of Hybrid Models
We also compared the computer storage space and the number of parameters for each model, and the results are presented in Table 4.  Table 4 indicates that the computer storage space occupied by the three hybrid models is small and is reduced by 99.6% compared to that occupied by the GPT3 model. Compared with GPT3 model, the number of parameters was reduced by 99.2%. Therefore, the new model has a tremendous advantage in storage.

Occupancy of Hybrid Models
We also compared the computer storage space and the number of parameters for each model, and the results are presented in Table 4.  Table 4 indicates that the computer storage space occupied by the three hybrid models is small and is reduced by 99.6% compared to that occupied by the GPT3 model. Compared with GPT3 model, the number of parameters was reduced by 99.2%. Therefore, the new model has a tremendous advantage in storage.

Applications of Hybrid Models in Retrieving PWV
From the performance discussions of the three hybrid models, we recommend hybrid model 2 (Hm2). The formula for calculating PWV by combining ZWD and T m is as follows: where ρ w is the density of water and R v is the specific gas constant.
To evaluate the effect of error in T m on its synthesized PWV, a commonly used quantity is the relative error in PWVs calculated using the following formula: where σ pwv is the error in PWV caused by the error in T m and σ T m . Since k 2 /k 3 ≈ 5.9 × 10 −5 K −1 and T m is in the range from 220 K to 310 K in our experiment, Equation (23) can be simplified to [22,44] Assuming that there is no error in the value of ZWD, if the error of T m is small, this will improve the converting accuracy of ZWD to PWV. Therefore, our established hybrid model will indirectly enhance the accuracy of PWV.
To further illustrate the improved accuracy of the hybrid model for PWV inversion, we selected four stations for analysis, and the station information is shown in Table 5. The PWVs calculated from the ZWD and the T m provided by the four radiosonde stations in 2016 were employed as a reference to validate the PWV mapped from radiosonde ZWD using the hybrid model 2/Bevis/GPT3/HGPT model. The results are shown in Figure 13, Tables 6 and 7.    Figure 13. The PWV converted by the Bevis, GPT3, HGPT, and Hm2 models at four radiosonde stations. As shown in Figure 13, the PWV calculated by the hybrid model agrees better with the reference value, which shows that the hybrid model outperforms the Bevis, GPT3, and HGPT models in retrieving the PWV. The RMSEs of PWV derived from the Hm2 model are all less than 1 mm; the accuracy is very appreciable in weather research because the required RMS is 3 mm [45]. Tables 6 and 7 show that compared with the Bevis model, the retrieving PWV accuracy of the hybrid model is greatly improved, and its MAE is 96.5% lower on average than the Bevis model. The RMSE is 89.8% lower on average. The accuracy of the hybrid model in retrieving PWV has been dramatically improved in comparison with the GPT3 model and the HGPT model. Simultaneously, the MAE of the hybrid model decreased by 103.7% and 107.3% on average, and RMSE decreased by 96.9% and 102.7%, respectively. These proved that the hybrid model is superior to the Bevis, GPT3, and HGPT models in retrieving PWV.
To demonstrate the outperformance of the Hm2 for retrieving PWV more intuitively, Figure 14 gives RMSEs reduction at all radiosonde stations. Figure 14 illustrates that the Hm2 model shows smaller RMSE than other models at most stations. The mean RMSE reduction of the Hm2 model reaches up to 33.9% in comparison with Bevis,36.4% in comparison with GPT3, and 37.0% in comparison with HGPT. It is interesting to note that the RMSE of the Hm2 PWV is reduced at all stations in China Mainland. Overall, the Hm2 model performs the best in mapping ZWD onto PWV.
To further verify the Hm2 has a significant improvement in accuracy compared to the comparison model, the RMSE of each model at each site was used as a sample for hypothesis testing. Set the null hypothesis H0: there is no significant difference between the RMSE of the comparison model at each site (sample X1) and the RMSE of the Hm2 model at each site (sample X2). Alternative hypothesis H1: there is a significant difference between X1 and X2. Due to the large sample size, the z-test [46] was used. Set the left-side confidence level to a = 0.05. Calculating the statistic z and looking up the table to get the corresponding p value, the result is shown in Table 8. By comparing the statistic z and the p value (p (|Z| > 1.64) = 0.05), we can see that the absolute value of the z of the three pairs of samples is larger than 1.64 in all cases, with the corresponding p less than 0.05. Therefore, the null hypothesis is finally rejected, indicating significant differences among the Hm2 RMSEs and other model's RMSEs. This result suggests that the Hm2 RMSE is improved significantly compared to that of other models at the 95% confidence level.
To demonstrate the outperformance of the Hm2 for retrieving PWV more intuitively, Figure 14 gives RMSEs reduction at all radiosonde stations. Figure 14 illustrates that the Hm2 model shows smaller RMSE than other models at most stations. The mean RMSE reduction of the Hm2 model reaches up to 33.9% in comparison with Bevis,36.4% in comparison with GPT3, and 37.0% in comparison with HGPT. It is interesting to note that the RMSE of the Hm2 PWV is reduced at all stations in China Mainland. Overall, the Hm2 model performs the best in mapping ZWD onto PWV. To further verify the Hm2 has a significant improvement in accuracy compared to the comparison model, the RMSE of each model at each site was used as a sample for hypothesis testing. Set the null hypothesis H0: there is no significant difference between the RMSE of the comparison model at each site (sample X1) and the RMSE of the Hm2 model at each site (sample X2). Alternative hypothesis H1: there is a significant difference between X1 and X2. Due to the large sample size, the z-test [46] was used. Set the left-side confidence level to a = 0.05. Calculating the statistic z and looking up the table to get the corresponding p value, the result is shown in Table 8. By comparing the statistic z and the p value (p (|Z| > 1.64) = 0.05), we can see that the absolute value of the z of the three pairs of samples is larger than 1.64 in all cases, with the corresponding p less than 0.05. Therefore, the null hypothesis is finally rejected, indicating significant differences among the Hm2 RMSEs and other model's RMSEs. This result suggests that the Hm2 RMSE is improved significantly compared to that of other models at the 95% confidence level.

Conclusions
To overcome the drawbacks of existing Tm models with poor universality, profound local accuracy loss, and difficulty in reflecting the nonlinear relationship between Tm and meteorological parameters, this study used artificial neural network methods for

Conclusions
To overcome the drawbacks of existing T m models with poor universality, profound local accuracy loss, and difficulty in reflecting the nonlinear relationship between T m and meteorological parameters, this study used artificial neural network methods for constructing T m hybrid models in China. Validated with radiosonde T m , the accuracies of the three hybrid models were 2.954 K, 2.703 K, and 2.763 K in terms of RMSE. In view of RMSE, compared with the UNB3m model, the accuracies were improved by 73.1%, 75.3%, and 74.8%; for the Bevis model, accuracies were optimized by 35.3%, 40.8%, and 39.5%; and for the GPT3 model, accuracies were improved by 32.0%, 37.8%, and 36.4%, respectively. Moreover, the hybrid models effectively weakened the spatiotemporal variation in the accuracy of the UNB3m model and achieved higher and more uniform accuracy in space and time. Among the three hybrid models, hm2 exhibited the best performance, followed by hm3. The models constructed in this study had better accuracy than the three international models. Moreover, the computer storage space occupied by the new models is significantly lower than that of the GPT3 model, and the number of parameters is substantially reduced. The accuracy improvement in the best hybrid model's T m on its resultant PWV at four exemplary radiosonde stations and the whole study area were investigated using the PWVs through the radiosonde ZWD and T m . The RMSEs of PWV derived from the Hm2 model are all less than 1 mm at exemplary radiosonde stations; the accuracy meets the needs of weather research. The overall error of the best hybrid model's T m in the resultant PWV is smaller than that of Bevis, GPT3, and HGPT models by 33.9%, 36.4%, and 37.0% in terms of RMSE. The results of hypothesis testing further proved that this accuracy improvement of the best hybrid T m relative to the compared models is significant. The new models can be widely used to calculate high-precision T m and are more suitable for GNSS receivers without large storage space.
However, the study area was mainly conducted in China, and the global regions need to be further examined to validate the new models. In addition, meteorological parameters (surface temperature and water vapor pressure) were considered when constructing the model in this study. In future research, we hope to develop a globally hybrid model that can calculate the T m only based on geographical information (latitude, longitude, height) and temporal information (year, day of year (doy), hour of day (hod)).