1. Introduction
Water vapor plays an important role in climate change and water cycling at various scales. In the past decades, global climate change has exerted noticeable influences on the water vapor distribution in China, which is evidenced by the fact that some regions in western and southern China are becoming wetter while some regions in eastern China are becoming drier [
1,
2]. Besides being an indicator of climate change, water vapor also affects the propagation of radio signals by causing path bending and time delay, which is known as wet tropospheric delay in radio-based geodetic techniques including Global Navigation Satellite System (GNSS) and Interferometric Synthetic Aperture Radar (InSAR). Namely, water vapor is an error source in radio-based geodetic techniques, and its effect should be corrected. Therefore, monitoring the water vapor is a basic requirement for understanding the change in climate and water cycling, and improving the accuracy of radio-based geodetic techniques.
Precipitable water vapor (PWV) is commonly used to describe the amount of water vapor in the atmosphere and is defined as the vertically integrated water vapor over unit area expressed by equivalent water height. PWV can be retrieved from observations and models. The observation methods include radiosonde, satellite remote sensing [
3,
4,
5], GNSS [
6,
7,
8], and water vapor radiometer, etc. The model methods mainly refer to the numerical weather prediction models, including European Centre for Medium-Range Weather Forecasts (ECMWF) [
9,
10], National Centers for Environmental Prediction [
9,
11], etc. Moderate-resolution Imaging Spectroradiometer (MODIS) PWV is one kind of satellite remote sensing PWV, which has a high spatial resolution (1–5 km) but its quality is affected by weather and ground surface conditions. Scientists have found that the accuracy of MODIS PWV is 5–6 mm in the China area [
12,
13], which is much worse than in the US area (2–4 mm) [
14,
15]. The accuracy of GNSS PWV is typically 1–2 mm [
6,
7,
16,
17], but its spatial resolution is limited by the sparse station deployment. The model-simulated PWV has a medium spatiotemporal resolution but its accuracy is dependent on the density and quality of the assimilated observations. Because of the systematic biases among different methods, different spatial and temporal resolutions and coverages, care has to be taken for the joint use of different PWV data sources. Otherwise, contradictory conclusions may be obtained when using different individual PWV data sources [
15,
18].
Data fusion is a promising way to address this problem. Li et al. [
14] used a Global Position System-derived correction linear fit model to modify the accuracy of MODIS PWV and produce a regional 1 km
1 km water vapor field in Southern California. Lindenbergh et al. [
19] fused Medium Resolution Imaging Spectrometer PWV and GNSS PWV based on ordinary kriging interpolation in western Europe and the results were beneficial to numerical weather prediction. Alshawaf et al. [
20] built PWV maps by fusing Persistent Scatterer InSAR and GNSS data over the region of Upper Rhine Graben in Germany and France and used the kriging geostatistical interpolation technique to produce continuous grids of PWV. Alshawaf et al. [
21] used a fixed-rank kriging method to fuse GNSS PWV, InSAR PWV, and the Weather Research and Forecasting model simulated PWV in Europe and found that PWV maps inferred by the data fusion have a better quality than those inferred from a single data source. Zhang et al. [
15] proposed a method based on spherical cap harmonic model and Helmert variance component estimation to fuse GNSS, MODIS, and ECMWF ReAnalyses 5 (ERA5) PWV, and obtained high accuracy (2 mm) in North America. Li and Long [
22] combined MODIS and ERA5 PWV to generate PWV products in the upper reaches of the Brahmaputra river from 2007 to 2013, and the generated PWV products had good consistency with the Global Position System measurements. Zhao et al. [
18] proposed a PWV fusion model in China based on the polynomial fitting and spherical harmonic function, and the results showed that the mean root square (RMS) of the hybrid PWV fusion model was less than 3 mm in any areas of China in all four seasons. Bai et al. [
23] proposed a linear calibration model which uses GNSS PWV to calibrate MODIS PWV in mainland China, and the results showed obvious improvements in calibrated MODIS PWV. Some of the above methods are based on interpolation approach, which may inevitably impose biases or inaccurate information due to the imperfect assumptions made for the interpolation [
24]. Some of the above methods only calibrate a global bias and do not consider spatial or temporal variations in bias. Moreover, the MODIS PWV has poor quality in China [
12,
13], which makes it unqualified for the investigation of subtle PWV variation in China. Because of these problems, efforts are needed to produce high-quality PWV in China. The method proposed by Zhang et al. [
24] does not need any prior information or assumption which could avoid introducing biases before fusing and can fully consider the spatial variation of the systematic biases between different PWV data.
In this study, we use the Generalized Regression Neural Network (GRNN) method proposed by Zhang et al. [
24] to fuse the PWV data from GNSS, MODIS, and ERA5 to generate a unified high-quality PWV dataset in the land area of China. Different from Zhang et al. [
24] that fused PWV data in North America at an annual scale, this study fuses the PWV data at annual, quarterly, and monthly scales in China in an attempt to find out the best timescale for PWV fusion in China. Through this study, we aim to generate a unified PWV product for China with improved accuracy and spatial resolutions and thus providing a better PWV product. The new PWV product not only can be useful for the application of earth observation such as performing the tropospheric refraction correction of InSAR observation, but it will also allow more detailed monitoring of water vapor over China. Besides, this PWV product can also be a data source of numerical weather forecasting and help researchers better study the circulation of water vapor over China.
3. GRNN Method for PWV Fusion
We use the GRNN method to fuse GNSS, MODIS, and ERA5 PWV. In addition to the annual fusion as in Zhang et al. [
24], we also implement the fusions at the quarterly and monthly timescales to explore more subtle variations in the PWVs.
3.1. Generalized Regression Neural Network
The GRNN was proposed by Specht in 1991 [
36] which is based on non-parametric regression and uses sample data as a posterior condition. GRNN has a four-layer structure with an input layer, a pattern layer, a summation layer, and an output layer. The spread parameter in the pattern layer is the only hyperparameter in GRNN which needs to be set at first.
The advantage of the GRNN model is that no model parameters need to be trained, so the convergence speed is fast. In addition, the GRNN is based on radial basis function and has good nonlinear approximation performance. The disadvantage of the GRNN is its high computational complexity and space complexity.
3.2. Model Structure
Figure 2 shows the model structure of the GRNN model for PWV fusion. The input layer has five neurons corresponding to the longitude, latitude, height, time, and MODIS (ERA5) PWV. There is only one neuron in the output layer which is the GNSS PWV or the modified MODIS (ERA5) PWV. The number of neurons in the pattern layer is equal to the number of observations. Considering that ERA5 PWV, MODIS PWV, and GNSS PWV have systematic differences and different error characteristics, we take GNSS PWV as the reference values and use them to improve the MODIS and ERA5 PWV, respectively.
3.3. PWV Data Matching
To form input-output samples, we need to interpolate the dense MODIS (ERA5) PWV to the location of GNSS PWV. For temporal interpolation, we interpolate the GNSS PWV and ERA5 PWV to the epoch of MODIS PWV using a simple linear interpolation. For the spatial interpolation, we use a spherical cap harmonic model to fit the MODIS (ERA5) PWV, which is then used to compute PWV at the GNSS stations. For the ERA5 PWV, the pole of the spherical cap is (35.5327° N, 104.1477° E), and the half angle of the spherical cap is 35°. Because the area covered by MODIS PWV data at different epochs is inconsistent, the pole of the spherical cap is set at the center of the available MODIS PWV. The order of the spherical cap harmonic model is set to 5 for the MODIS PWV while it is 10 for the ERA5 PWV. These settings guarantee that the corresponding spherical cap harmonic model has the smallest misfit RMS error. After temporal and spatial interpolations, the MODIS (ERA5)-GNSS PWV pairs are obtained. Finally, we obtain 353,271 MODIS-GNSS pairs and 960,047 ERA5-GNSS pairs in 2018 and 2019. The monthly number of MODIS-GNSS pairs fluctuates between 9025 and 23,667, while that of ERA5-GNSS pairs fluctuates between 28,474 and 61,961.
Since the data quality of the samples affects the model performance, simple quality control is performed. We first calculate the bias and STD of the differences between the interpolated MODIS (ERA5) PWVs and GNSS PWVs and then use bias ±3 STD as the threshold to reject any pair whose corresponding difference is larger than the threshold. Finally, 7345 MODIS-GNSS PWV pairs and 10,238 ERA5-GNSS PWV pairs are removed, and the remaining samples are used to train the model. The remaining MODIS-GNSS pairs fluctuate between 8785 and 23,387 in one month while the ERA5-GNSS pairs fluctuate between 28,449 and 61,599. The average bias, STD, RMS, and correlation coefficient (R) between the MODIS (ERA5) PWV in 2018 and 2019 after excluding outliers are given in
Table 1. It is seen that MODIS PWV tends to be larger than GNSS PWV while ERA5 PWV is smaller than GNSS PWV in terms of bias. The accuracy of MODIS PWV in China region is poorer than that of ERA5 PWV in terms of RMS.
3.4. Constructing the GRNN Models
After obtaining the MODIS-GNSS PWV pairs and ERA5-GNSS PWV pairs in the research area in 2018 and 2019, we use the GRNN method to fuse them at annual, quarterly, and monthly scales. Because the spread parameter in the pattern layer has a great influence on the model results, we use an enumeration algorithm to determine the optimal spread parameter, and the results can be found in
Appendix B. In
Appendix B, we set a series of optional spread parameters to test the GRNN model and present the model performance against spread parameter values.
In machine learning, cross-validation is often used to test the accuracy of algorithms. K-fold cross-validation has been widely used in machine learning [
37,
38,
39]. Using K-fold cross-validation, every sample can be used for training or testing. In this study, the 10-fold cross-validation is used to test the model. The ten-fold cross-validation randomly divides the dataset into ten equal parts, and selects one of them as the test data and the other nine as the training data without duplication. After repeating ten times, the average statistical results can be used to evaluate the accuracy of the model.
4. Results
4.1. Model Performance at Annual Timescale
We use the samples in 2018 and 2019, respectively, to train and test the MODIS-GNSS PWV model and the ERA5-GNSS PWV model. We train the models using data in a whole year, and obtain a GRNN model in 2018 and another one in 2019.
Table 2 presents the average testing results when training the model based on annual data in 2018 and 2019 with the optimal spread parameter given in
Appendix B. Compared to the RMS values of the original datasets given in
Table 1, the accuracy in terms of RMS is improved by 1.9 mm for MODIS PWV and 1.0 mm for ERA5 PWV. The bias between the modified MODIS (ERA5) PWV and the GNSS PWV is close to 0, indicating that the systemic difference between them has been eliminated after training. The scatter plots of the modified MODIS (ERA5) PWV versus GNSS PWV in the annual models are shown in
Figure 3. Both the modified MODIS and ERA5 PWV have a better correlation with the GNSS PWV.
4.2. Model Performance at Quarterly Timescale
We also train and test the GRNN models based on quarterly MODIS (ERA5)-GNSS PWV samples. This means we use the samples in every individual quarter to construct a corresponding quarterly GRNN model. In this manuscript, spring is defined from March to May, summer from June to August, autumn from September to November, and winter from December to February.
Table 3 presents the overall performance of the quarterly models with the optimal spread parameter given in
Appendix B. The systematic difference between MODIS (ERA5) PWV and GNSS PWV is also eliminated after training since the bias is close to 0. The accuracy in terms of RMS of the quarterly model is improved by 2.5 mm for MODIS PWV and 1.1 mm for ERA5 PWV, which are better than the improvement achieved by the annual models.
Figure 4 shows the average bias, STD, RMS, and R of the quarterly models in different seasons. The original accuracy in terms of RMS for MODIS PWV in the spring and autumn is 5–6 mm, and it is even worse with RMS increasing to about 7.5 mm in the summer. The quarterly MODIS-GNSS PWV models effectively improve the accuracy of the modified MODIS PWV, as demonstrated in
Figure 4c. The accuracy improvement in terms of RMS for MODIS PWV in four seasons is 1.8 mm (spring), 3.3 mm (summer), 3.0 mm (autumn), and 2.3 mm (winter), respectively. The accuracy improvement in terms of RMS in the spring is slightly less than that of the annual model (1.9 mm), but the RMS in the Spring quarterly model is 3.6 mm which is smaller than that of the annual model (3.8 mm). These results indicate that the seasonal differences are accounted for by the quarterly models. As shown in
Figure 4d, the R has improved in all four seasons, especially in the winter.
For the ERA5-GNSS quarterly models, the accuracy improvement in terms of RMS in four seasons is 0.9 mm (spring), 1.5 mm (summer), 1.3 mm (autumn), and 0.7 mm (winter), respectively. The accuracy improvement in terms of RMS in the summer and autumn is better than that of the annual model (1.0 mm). Although the accuracy improvement in terms of RMS in spring and winter is less than the annual model, the RMSs of the quarterly models in spring and winter are 1.8 mm and 1.3 mm, respectively, which are smaller than that of the corresponding annual model (2.0 mm). Those results indicate that considering the seasonal differences can improve the model performance, which is conducive to the generation of high-quality PWV products in China. The systematic difference between the modified ERA5 PWV and GNSS PWV is close to 0, which indicates that the ERA5-GNSS quarterly model can also eliminate the systematic difference. In addition, as shown in
Figure 4h the correlation coefficient R between the modified ERA5 PWV and the GNSS PWV increases to 0.99 by the quarterly models.
4.3. Model Performance at Monthly Timescale
From the above experiments, it is seen that the performance of the quarterly models is better than that of the annual models. It is easy to extend the models from quarterly to monthly.
Table 4 presents the overall performance of the monthly models with the optimal spread parameter given in
Appendix B. The overall accuracy improvement in terms of RMS is 3.1 mm for MODIS PWV and 1.3 mm for ERA5 PWV, which is better than that by the quarterly models (2.5 mm for MODIS PWV and 1.1 mm for ERA5 PWV). The bias between MODIS (ERA5) PWV and GNSS PWV is reduced to 0 in the monthly models, which indicates that the systematic difference is eliminated.
Figure 5 shows the average bias, STD, RMS, and R of the monthly models in different months. For original MODIS PWV, the bias, STD and RMS are increased from winter to summer and then decreased from summer to winter. The original negative bias between GNSS PWV and MODIS PWV in all months indicates that MODIS PWV tends to be larger than GNSS PWV. The accuracy of MODIS PWV in terms of RMS of the monthly models in spring is improved by 2.0 mm for March, 2.4 mm for April, and 2.4 mm for May, respectively, which is better than that of quarterly models (1.8 mm). The results indicate that the MODIS-GNSS monthly models in the spring are better than the corresponding quarterly model. A similar conclusion can be made in other months. The R between MODIS PWV and GNSS PWV is obviously improved.
For ERA5 PWV, the original bias between ERA5 PWV and GNSS PWV shows that ERA5 PWV is larger than GNSS PWV in summer and autumn while smaller than GNSS PWV in spring and winter in China land region. The accuracy improvement in terms of RMS of the monthly models in spring is 0.9 mm for March, 1.1 mm for April, and 1.2 mm for May, respectively, which is not less than the corresponding quarterly model (0.9 mm). Similar conclusion can be made for other months. Those results show that the ERA5-GNSS monthly models could achieve better performance than the corresponding quarterly model. As shown in
Figure 5h, the correlation coefficient R between ERA5 PWV and GNSS PWV is also increased.
4.4. Generating PWV Products in the Research Area
From the above results, the monthly models achieved the best performance among the models of three timescales. Therefore, we use the monthly models to generate PWV products in the research area. The spatial resolution of the products is 1 km × 1 km in areas where MODIS PWV is present, and 0.25° × 0.25° in other areas. The temporal resolution is 1 day. Since the accuracy measured by RMS is 5–6 mm for the MODIS PWV and 3.0 mm for the ERA5 PWV in the research area while the accuracy of the fused PWV is better than 2.6 mm, apparently, the fusion provides a better PWV product for the research area.
5. Accuracy Analysis
To investigate the temporal variations of the model accuracy, we compute the daily bias, STD and RMS of the original MODIS (ERA5) PWV, and the modified MODIS (ERA5) PWV from the monthly models. The results are shown in
Figure 6,
Figure 7 and
Figure 8.
Figure 6 shows that the original MODIS PWV has overall negative biases with obvious temporal variations. The absolute values of the biases of the original MODIS PWV increased from January to July and then decreased from August to December. Most of the biases of the original ERA5 PWV are negative from December to April while positive in the rest months. There are also obvious temporal variations in the biases of the original ERA5 PWV. The biases of the modified MODIS PWV and the ERA5 PWV are close to 0 without noteworthy temporal fluctuations, which indicates the time-varying systematic biases between the MODIS (ERA5) PWV and the GNSS PWV are eliminated by the new methods.
Figure 7 shows the daily STDs of the original MODIS (ERA5) PWV and the modified MODIS (ERA5) PWV. It demonstrates that the STD of the original MODIS PWV has obvious temporal variations with an increase from January to July and a decrease from August to December. The modified MODIS PWV has apparently reduced STD whose temporal variations are also greatly weakened. The temporal pattern of STD of the ERA5 PWV is similar to that of the MODIS PWV.
Figure 8 shows the daily RMS of the original MODIS (ERA5) PWV and the modified MODIS (ERA5) PWV. The temporal patterns of the original MODIS (ERA5) PWV and the modified MODIS (ERA5) PWV are similar to that of STD. The temporal variations of RMS are greatly weakened for both MODIS PWV and ERA5 PWV after modeling.
To investigate the spatial variations of the model accuracy, we compute the bias, STD, and RMS of the original MODIS (ERA5) PWV and the modified MODIS (ERA5) PWV at individual GNSS stations.
Figure 9 shows the biases of the original MODIS (ERA5) PWV and the modified MODIS (ERA5) PWV at the GNSS stations. It can be seen that the original MODIS PWV has negative biases at most stations. The biases are greater in the south and smaller in the north. These results suggest that the biases in the original MODIS PWV are uneven in the research area. Comparing
Figure 9a,b, the biases between MODIS PWV and GNSS PWV are significantly decreased by the GRNN models, and their distribution becomes more even.
Figure 9c shows that most of the biases of the original ERA5 PWV are positive in the south and northeast, and negative in the rest research area. Comparing
Figure 9c,d, the biases of the modified ERA5 PWV are significantly reduced and distributed more evenly over the area.
Figure 10 shows the STDs of the original MODIS (ERA5) PWV and the modified MODIS (ERA5) PWV at individual stations. The STD of the original MODIS PWV is greater in the southeast while smaller in the northwest. Comparing
Figure 10a,b, the STD of modified MODIS PWV is significantly reduced and the distribution is more even. A similar conclusion can be made for the modified ERA5 PWV.
Figure 11 shows the RMSs of the original MODIS (ERA5) PWV and the modified MODIS (ERA5) PWV at individual stations. Comparing
Figure 11a,b, the RMS of MODIS PWV is reduced after modeling, especially for the stations in the southeast. We can find that the RMS of ERA5 PWV is also reduced after modeling by comparing
Figure 11c,d.
6. Conclusions
Many scholars have used different methods to produce PWV products with high spatial resolution and improved accuracy in different regions. Zhang et al. [
15] used a spherical cap harmonic model and Helmert variance component estimation method in North America. Zhao et al. [
18] used a hybrid PWV fusion model in China. Alshawaf et al. [
21] used a fixed-rank kriging method in Europe. All those methods could achieve better performance, but there are still some shortcomings. Some of the methods ignore the spatial or temporal variations in bias and only give a global bias for all observations. Some methods are based on the interpolation approach, which inevitably imposes some biases or inaccurate information due to the imperfect assumptions for the interpolation. In this study, we generate a PWV product in China area by fusing GNSS PWV, MODIS PWV, and ERA5 PWV in 2018 and 2019 through a generalized regression neural network. Models at annual, quarterly, and monthly timescales are constructed and performances are assessed to find out the best model for the research area. The results demonstrate that the monthly models achieve the best performance among the three timescales. Thus, we produce the fused PWV data by training and testing the GRNN models at a monthly timescale. Therefore, we have obtained a unified PWV product with a temporal resolution of 1 day and a spatial resolution better than 31 km. The bias, STD, and RMS of the modified MODIS PWV are 0.0 mm, 2.6 mm, and 2.6 mm. The percentage improvement is as high as 50% in terms of RMS. It becomes 0.0 mm, 1.7 mm, 1.7 mm for the modified ERA5 PWV and the percentage improvement is 40%. In this product, systematic differences between different PWV sources are almost diminished. Compared to the original MODIS (ERA5) PWV, the accuracy of the modified PWV is more even in space and time. With much improved quality, this PWV product can not only benefit the studies regarding water vapor circulation over China, but also serve in numerical weather forecasting.