Geostationary Satellite Observation of Precipitable Water Vapor Using an Empirical Orthogonal Function ( EOF ) based Reconstruction Technique over Eastern China

Water vapor, as one of the most important greenhouse gases, is crucial for both climate and atmospheric studies. Considering the high spatial and temporal variations of water vapor, a timely and accurate retrieval of precipitable water vapor (PWV) is urgently needed, but has long been constrained by data availability. Our study derived the vertically integrated precipitable water vapor over eastern China using Multi-functional Transport Satellite (MTSAT) data, which is in geostationary orbit with high temporal resolution. The missing pixels caused by cloud contamination were reconstructed using an Empirical Orthogonal Function (EOF) decomposition method over both spatial and temporal dimensions. GPS meteorology data were used to validate the retrieval and the reconstructed results. The diurnal variation of PWV over eastern China was analyzed using harmonic OPEN ACCESS Remote Sens. 2015, 7 5880 analysis, which indicates that the reconstructed PWV data can depict the diurnal cycle of PWV caused by evapotranspiration and local thermal circulation.


Introduction
Water vapor plays an important role in many atmospheric processes, such as radiative cooling, latent heating, cloud formation and convective activity [1][2][3].As the most abundant greenhouse gas, characterization of water vapor at high spatial and temporal resolution is important for understanding and forecasting climate change [4].The spatial and temporal variations of water vapor have strong impacts on atmospheric stability and the radiation budget.Lack of real-time water vapor information impedes the further development of weather forecasting systems [5].
While in-situ measurements, such as radiosonde, sunphotometer and Global Positioning System (GPS), provide reliable water vapor measurements, they are restricted by sparse site distributions [6][7][8][9].Space-borne satellite observations of water vapor have provided a synoptic view of the atmosphere for over two decades [10,11].The MODerate resolution Imaging Spectroradiometer (MODIS), on board the NASA TERRA and AQUA Spacecraft platforms, is one of the most widely used sensors for water vapor retrieval.MODIS products have been used for many territory-wide and regional-scale applications [12][13][14].However, most sun-synchronous satellites have relatively low revisit frequency (typically once or twice a day), prohibiting their abilities to capture rapid atmospheric changes.Instead, geostationary satellites in geosynchronous orbits provide frequent observations of the atmosphere.The Geostationary Operational Environmental Satellite (GOES) has been successfully used for weather forecasting and nowcasting over North America and the oceans [15,16].Geostationary satellite platforms in East Asia, including the Chinese Feng-Yun satellite and the Japanese Multi-functional Transport Satellite (MTSAT), were launched recently [17,18].The use of geostationary satellite water vapor data in mainland China can augment the current practices where mainly in-situ measurements or based on sun-synchronous satellite observations are used [19,20].
Remote sensing technique usually measures a vertically integrated column of water vapor in the atmosphere, namely precipitable water vapor (PWV).The split-window algorithm can be employed to derive the PWV column from brightness temperature of the infrared channels of geostationary satellite data [21,22].Chesters et al. applied the split-window algorithm to GOES VISSR Atmospheric Sounder (VAS) data through a linearized radiative transfer model [21,23].Park applied this algorithm to MTSAT-1R data, and the root mean square error (RMSE) compared with ground observations is around 9.4 mm [24].Guillory et al. proposed a physical split-window technique, which is physically based, but it requires assumptions of initial values [25].Sobrino et al. introduced a split-window covariance-variance ratio (SWCVR) to estimate the PWV by assuming surface temperature changed much faster than air temperature within a small field of view [26], but the results showed that SWCVR technique was sensitive to instrumental errors [21].Eck et al. applied a statistical linear retrieval technique to AVHRR channel 4 (centered at 10.8 µm) and channel 5 (centered at 11.8 µm) [27].The Meteorological Satellite Center of the Japanese Meteorological Agency proposed an operational linear retrieval algorithm [28] based on the split-window algorithm [23] and an empirical linear model [27].This semi-empirical algorithm is shown to have promising accuracy with reliable estimates of inputs.Almost all water vapor retrieval algorithms discussed above are only applicable to cloud-free atmospheric conditions, which implies that the PWV column cannot be directly calculated over cloud-contaminated pixels.However, complete water vapor dataset without any data missing that are resulted from cloud contamination or other factors is highly desired for input into the climate and weather forecasting model.Thus several spatial interpolation methods, such as Kriging interpolation and spline interpolation have been used to interpolate the missing data of satellite images [29].Since the PWV column is exponentially correlated with elevation, Morland et al. proposed an elevation-based spatial interpolation method [30].Since geostationary satellites have high temporal resolution with a fixed field of view, high temporal information can be used to improve the interpolation performance.Lindenbergh et al. integrated MERIS-derived PWV data with GPS PWV data using a space-time Kriging model [31].However, these spatiotemporal interpolation methods are computationally demanding and are likely to be biased by data anomalies.Empirical Orthogonal Function (EOF) analysis has been widely used in geophysics to extract spatial and temporal patterns [32].Beckers et al. filled in the missing data from oceanographic datasets using Data Interpolating Empirical Orthogonal Functions (DINEOF) [33], which takes into account both spatial and temporal features.DINEOF is a computationally efficient and non-parametric model, and it provides a self-consistent estimation through cross-validation [34].EOF-based interpolation is thus suitable for filling the missing data in geostationary satellite images.
In this study, an operational and real-time algorithm of retrieving the PWV column from MTSAT satellite images was developed.In order to achieve real-time PWV acquisition, NCEP/Reanalysis forecast temperature data were acquired from NOAA.The PWV column for cloud-contaminated pixels was reconstructed through EOF-based interpolation.The performance of PWV retrieval and interpolation method was evaluated by comparison with hourly GPS meteorological data.Finally, the derived high temporal resolution PWV data were applied to analyze the PWV diurnal cycle characteristics over eastern China.

Study Area
China is known for its vastly diversified landscapes.Influenced by the Asian monsoon, China experiences a significant variety of seasonal variations, with cold dry winters, hot and wet summers.The diverse topography and vast landmass in China cause significant geographical variations in climate, from south to north, and east to west.Due to rapid urbanization and resource exploitation in the past 30 years, eastern China has experienced dramatic environmental changes, such as deforestation, loss of biodiversity, air pollution, intensified climatic imbalance and more frequent severe weather [35,36].As an important parameter in the analysis of weather systems, water vapor distribution in China exhibits high spatial and temporal variations that are closely correlated with topography, latitude and the monsoon [19].
While MTSAT data cover the entire East Asia and western Pacific regions, this study covers only the land area of eastern China.This is because: firstly, the derived coefficients are assumed to be constant over the entire study area, but such an assumption is only applicable to relatively uniform land cover types and climatic conditions.Since most radiosonde and GPS sites are located in eastern China, the derived coefficients may not be applicable to western China.In addition, since the surface emissivity differs from land to sea, biased water vapor results may be obtained if coefficients derived from land stations are used for PWV retrieval over sea.Secondly, most parts of the Qinghai-Tibet plateau are covered by ice and snow.The surface albedo is very high, which may affect the sensitivity of satellite retrieval.Thirdly, western China is far from the nadir of MTSAT (140°E), and the large satellite viewing zenith angle (VZA) not only causes coarser spatial resolution over off-nadir areas, but also has significant uncertainty of cloud detection [37].

Geostationary Satellite Data
The Multi-functional Transport Satellite (MTSAT) was launched by the Japan Meteorological Agency (JMA) in February 2005.MTSAT is operated in geosynchronous orbit, providing synoptic regional coverage of western Pacific region and East Asia on an hourly basis.MTSAT, a successor of the GMS (Geostationary Meteorological Satellite) series of Japan, has made significant improvements, including higher spatial sampling rate, better radiometric sensitivity, higher calibration accuracy and improved performance around local midnight [17].MTSAT has one visible channel and four infrared channels centered at 10.8 µm, 12.0 µm, 6.8 µm, and 3.8 µm.The spatial resolution of MTSAT is 1 km at visible channel and 4 km at infrared channels.In this study, one-year of MTSAT data in 2008 were acquired.MTSAT satellite images were geometrically and radiometrically corrected using the "MTSATGEO" software package [17].The IR1 channel (10.3~11.3μm) and IR2 channel (11.5~12.5 μm) at a temporal resolution of one hour were used to derive PWV.

Radiosonde Data
Radiosonde data from 64 radiosonde stations in eastern China were acquired for the year 2008, with an average of 3~4 stations in each province, as shown in Figure 1.The radiosondes are launched twice daily at 12:00 a.m. and 12:00 p.m. (UTC), respectively.The PWV column is calculated as Equation (1): where PW is the PWV column (unit: mm) within a layer; p2 and p1 are atmospheric pressures (unit: Pascal) at the layer's upper and lower boundaries; x is the mixing ratio (unit: g•kg −1 ), which can be obtained from Equation (2): 0.622 e x p e where p is the atmospheric pressure (unit: Pascal); e is the vapor pressure (unit: Pascal) that can be derived from e = RH•es(T); RH is the relative humidity (unit: %) of the atmosphere, which is contained in the radiosonde data files, es is the saturation vapor pressure (unit: Pascal) that is a function of temperature (unit: degree Celsius).Several factors including dry bias, time lag, calibration error and sensor drift may introduce errors in the PWV estimation from radiosonde data [38].The uncertainty of radiosonde data is typically within 5% [39].

GPS Data
PWV data from 24 ground-based GPS stations in the eastern China were derived (Figure 1).The column of PWV was calculated from the propagation delay of GPS signals between GPS satellites and ground-based GPS receivers (more details are given in the Supplementary Materials).GPS observations have high temporal resolution and can operate under all weather conditions [9].The PWV column retrieved from GPS data has been shown to agree well with radiosonde, sunphotometer and microwave radiometer data [40].The errors of retrieval are approximately 5%-7% for typical PWV amounts [5,38,41].GPS PWV data were used for validation in this study.

Hourly Air Temperature Data
Gridded air temperature data were used as ancillary data for water vapor retrieval.NCEP Climate Forecast System Reanalysis Version 2 (CFSv2) products were acquired from the National Center for Atmospheric Research, Computational and Information System Laboratory.The CFSR is the latest version of reanalysis data with global coverage and high resolution [42].The horizontal resolution of CFSR over eastern China is 0.5° with 40 vertical levels.The spatial resolution of CFSR is coarser than MTSAT IR data, but the scaling effect is negligible since the spatial variation of effective temperature is small [23].The temporal resolution for air temperature data is 1 h, which is similar to the frequency of MTSAT data.

Semi-Empirical PWV Retrieval Algorithm
The PWV column can be derived by comparing the observed brightness temperature for two thermal infrared bands (11 µm and 12 µm).Chester et al. proposed a simplified PWV split-window algorithm, assuming the effects of continuum absorption and the temperature dependence of dry air transmittance are negligible, as in Equation ( 3) [23].
where T11 and T12 (unit: K) are brightness temperatures of MTSAT for the 11 μm and 12 μm channels, respectively; Δa and Δk are differential absorption coefficients in the water vapor and dry air, respectively; Tair is the effective air temperature (unit: K).The derivation of this algorithm is given in the Supplementary Materials.Chesters and coworkers' retrieval algorithm involves the estimation of effective air temperature [23].Park et al. assumes constant effective air temperature and derives the coefficients using least-square error fitting [24].However, the assumption of constant effective temperature does not apply to large-scale retrieval.
where PWV (unit: mm) is the vertically integrated water vapor column; T700 (unit: K) is the atmospheric temperature at 700 hPa; θ (unit: rad) is the satellite zenith angle; and a0 to a7 are coefficients of the retrieval that can be estimated through linear regression with in-situ observation of PWV from radiosonde.Equation (4) requires the air temperature at 700 hpa be lower than the brightness temperature of MTSAT.Thus the pixels meeting conditions of T11 < T700 or T12 < T700 are thus masked.In this study, the coefficients were derived using Takeuchi and coworkers' algorithms [17].The PWV column of 64 radiosonde stations in eastern China and their corresponding brightness temperature were used for the derivation of coefficients.Both algorithms were applied to clear sky conditions only.A two-dimensional and threshold-based (2D-THR) cloud detection method was applied to identify the cloudy pixels [43], as shown in Figure 2. Despite the uncertainties associated with cloud-type identification, the cloud detection algorithm is useful to isolate clear pixels if only the pixels outside the outer boundary were selected.The 2D-THR is shown to have good correlation with the Japanese Meteorological Agency cloud-type classification data over tropical and sub-tropical areas [43].

EOF-Based Interpolation
The split-window algorithm is only applicable over cloud-free pixels, which means a large number of cloud-contaminated pixels are discarded.Beckers et al. first proposed DINEOF reconstruction to interpolate missing data in oceanographic datasets [33].The DINEOF method takes into account both temporal and spatial information of satellite data by calculating EOFs in the spatial and temporal domains.Considering X as the original PWV matrix (with missing data), the reconstruction of cloud-contaminated data can be estimated using Equation ( 5): where X has the dimensions m × n, m denoting the spatial dimension (total number of grid cells in each image) and n denoting the temporal dimension (number of satellite images in the time-series); U is the matrix of spatial EOFs with dimension m × r, V is the matrix of temporal EOFs with dimension r × n, and S is the matrix of eigenvectors with dimension r × r.The first k EOFs with the largest eigenvalues are used for reconstruction, as shown in Equation ( 6): where ep is the eigenvalue of the pth eigenvector; up is the pth column of U; vp is the pth row of V; i and j represent the location and time of the missing data.The implementation of DINEOF follows four major steps [34]: 1.In order to obtain an unbiased initial value, the whole PWV matrix is subtracted from the average value, and the missing values are initialized as zero.One percent of cloud-free pixels is not used for reconstruction and are later used for cross-validation.
2. Spatial and temporal EOFs are obtained from Singular Value Decomposition (SVD) of the PWV matrix.For each given k, the missing data are replaced by the value obtained from the EOFs shown in Equation (6).EOFs are then recomputed for the new matrix with improved approximation of the missing data.The computations of EOFs are repeated until the discrepancy converges.
3. The optimal number of EOFs used for reconstruction is calculated by cross-validating with a set of valid data [32].That is, for given k, the root mean square errors (RMSEs) between the reference value and the reconstructed value are calculated.The optimum number of EOFs is then determined referring to the lowest RMSE.The maximum number of EOFs is set as 10 in this study.
4. After obtaining the optimal number of EOFs, steps 1 and 2 are repeatedly calculated for koptimal with a set of valid data for cross-validation.Finally, the missing data values can be computed.
This process was implemented using the DINEOF package with the 720-hour MTSAT PWV images recorded in each month [33].Since DINEOF method requires at least one valid value for each grid cell in this study, grid cells with no valid value were removed [34].

Diurnal Cycle Analysis
The high temporal resolution of MTSAT data was used to analyze the diurnal cycle of PWV over eastern China.For each pixel, the hourly PWV value was first converted to daily departure, as shown in Equation ( 7) [44]: where PWVD(h) is the daily departure from average at hour h on the local standard time (LST) regime; PWV(h) is PWV column at hour h and PWVavg is the daily average PWV.
The amplitude and phase of the diurnal cycle is characterized using harmonic analysis.For each pixel, the daily departures are fitted to the following Equation ( 8) [32,45]: where the PWVm(h) is the fitted PWV at hour h (LST); PWVavg is the monthly average PWV column; k = 1 is the harmonic index, which represents the diurnal cycle; ck and sk are the regressed coefficients of cosine and sine components of the kth harmonic cycle; ε is the fitting residual that contains higher order of the daily variations not represented by the diurnal cycle.The harmonic (Sk) can be rewritten as Equations (9-11): ( ) cos( ) where Ak represents the amplitude of diurnal cycle; φk is the phase of the diurnal cycle, which can be converted to the hour with maximum (hmax) PWV in a day using Equation ( 12): max 24 2 The extent to which the diurnal variations can be captured by Equation (8), and it can be measured by the percentage of explained variance (Var(%)), as shown in Equation ( 13): (%) 100 ( 1) 100 where RSS is sum of squared differences between diurnal cycles and the sample mean; TSS is the total sum of squared deviations of observed values; ESS is the sum of squared differences between the residuals and their mean [32,45].
The diurnal cycle of PWV over eastern China was modeled and analyzed using the above methods with the reconstructed MTSAT PWV values over eastern China in each month.

Regression Coefficients
A total of 9945 pixels collocated with radiosonde stations was extracted in order to derive the regression coefficients from Equation (4) (Table 1).The correlation coefficient is 0.79 and the root mean square error (RMSE) is 9.19 mm if constant coefficients are assumed throughout the year.Better agreement is observed for individual months, as R increased in nine of the months and the RMSE decreased in all months.The relatively low R observed in February, March and December may partly result from the smaller number of data pairs used for regression (Table 1).The larger percentage of invalid data pairs in early spring may result from either higher cloud frequency or overestimation of effective air temperature.However, as shown in Table 1, the 2D-THR derived cloud frequency does not decrease in early spring, and the cloud frequency is the highest in the rainy monsoon season.The smaller sample size in early spring thus results from the overestimation of effective air temperature.In addition to the smaller sample size, the split-window algorithm is highly influenced by the variation of surface emissivity [46].The heterogeneity of surface emissivity caused by snow melting in early spring may also be one of the reasons of a fairly weak correlation in February and March.In summer, although high cloud frequency is observed, stronger correlation (0.83~0.87) and smaller RMSE (7.00~7.32mm) are also observed, indicating that the proposed retrieval algorithm outperforms in summer, when humidity is high.

Real-Time PWV Retrieval
Monthly regression coefficients were used for the derivation of PWV due to better statistical agreement.As an example, Figure 3 shows the time series of original and reconstructed hourly PWV as well as GPS PWV in August 2008 at "WHJF" site (30.5°N,114.5°E, marked in Figure 4).The original PWV correlates well with GPS PWV (R = 0.83), but a large number of data gaps are observed during 2-3 August, 14-17 August and 27-30 August due to cloudy conditions.The first nine EOFs were retained as the optimal for reconstruction.The reconstructed PWV generally captures the diurnal PWV variation, even during the cloudy periods when the PWV column cannot be directly retrieved from MTSAT.It is noted the reconstructed PWV agrees well with GPS PWV during clear sky periods, but tends to underestimate PWV column during cloudy periods.Since rainfall is usually associated with clouds in summer, PWV is generally higher on cloudy days than on clear days.The temporal EOF is calculated mainly from clear pixels with relatively low humidity, resulting in overall underestimation of PWV during cloudy days.Another noteworthy feature is that the reconstructed PWV shows a more pronounced diurnal cycle than GPS data, and the diurnal peak of PWV generally matches GPS PWV, despite the overall underestimation.
Figure 4a-h show the original and reconstructed PWV maps over eastern China at 0000 UTC (08:00 a.m.LST), 0600 UTC (02:00 p.m. LST), 1200 UTC (08:00 p.m. LST) and 1800 UTC (02:30 a.m.LST) on 16 August 2008.Since the interpolation requires at least one valid value for each pixel, pixels without cloud-free values were discarded.In the original maps (Figure 4a-d), spatial heterogeneity of PWV is observed over eastern China, with high PWV observed in the coastal regions of south China and low PWV observed in northeast China, but a large number of cloud-masked values are observed over central south China in the original PWV maps due to cloud contamination.Figure 4i-l show the corresponding hourly precipitation from Tropical Rainfall Measuring Mission (TRMM) data [47].Intense rainfall is observed in central south China from 12:00 a.m.UTC to 12:00 p.m. UTC on 16 August, which also explains the rising of observed GPS PWV at 'WHJF' site (Figure 3).The reconstructed PWV data show increased PWV from 12:00 a.m.UTC to 06:00 a.m.UTC over most parts of central south China (Figure 4e-f), which lags the peak rainfall by about 3 h.The PWV column decreased from 06:00 a.m.UTC to 06:00 p.m. as a result of decreasing air temperature and precipitation (Figure 4g-h).Despite the high percentage of cloud-masked pixels, EOF-based reconstruction captures the increase of PWV caused by precipitation.The topographical and spatial patterns are preserved in the reconstructed PWV maps, but some distinct features of the original data were smoothed in the reconstructed data.For example, high PWV is observed in Zhejiang Province (30°N, 120°E) at 12:00 p.m. UTC in the original data (Figure 4c), but the high PWV values are smoothed in the reconstructed data (Figure 4g).This smoothing phenomenon can enhance the retrieval accuracy by filtering out noise but on the other hand real anomaly features may be smoothed too [33].The smoothing effect on retrieval accuracy will be quantified and discussed in Section 4.3.2.

Validation of PWV Retrieval Algorithm
Previous studies have demonstrated the GPS meteorological data have high accuracy of PWV [12,38,41].Hourly GPS PWV data were used in this study to evaluate the performance of our proposed method by extraction of the nearest pixels of the GPS stations.Figure 5 shows the statistical results of validating MTSAT water vapour column with GPS water vapour data.Four statistical metrics, including root mean square error (RMSE), correlation coefficient (R), absolute difference (AD) and mean bias (MB), were used for validation.In general, the derived MTSAT PWV agrees well with GPS PWV data (R = 0.86, RMSE = 7.74 mm, MB = −1.34mm, AD = 5.63 mm), as shown in Figure 5c, but overestimation is more pronounced at low PWV levels, while underestimation is more pronounced at high levels.Figure 6 shows the validation of retrieved PWV at each GPS site.The annual average PWV column calculated from MTSAT PWV data is represented in different colors.The upward and downward triangles indicate the positive and negative mean biases, respectively.Negative MBs are observed over the sites in the south coastal region where the annual average PWV is high (>30 mm), while positive biases are observed over inland dry regions (<20 mm).For sites with the highest and lowest humidity, the correlation coefficients between MTSAT and GPS PWV are also comparatively lower, indicating the retrieval algorithm is somehow limited in capturing extreme PWV values.Homogeneous regression coefficients over the entire land area of China were calculated in order to ensure sufficient data for regression, but the coefficients might vary with surface emissivity [46].The generalization of a semi-empirical linear algorithm may thus lead to the overestimation over dry regions and underestimation over wet regions.Over northwestern China, the surface reflectance and emissivity differ from those in eastern China as a result of distinct topography and surface types, which may explain the relatively lower correlation coefficient (R = 0.71) when compared with central China (R = 0.82~0.90).In addition, the view zenith angle increases polewards, and thus a coarser resolution is also expected which will reduce the effectiveness of cloud detection [37].
Since water vapor displays pronounced diurnal variation, it is reasonable to validate the retrieval performance separately in daytime and nighttime.Figures 5f-i show the relationship between MTSAT and GPS PWV in daytime and nighttime, respectively.The overall performances in daytime and nighttime are fairly good, with RMSE of 7.38 mm in the daytime and 8.40 mm at night; R of 0.87 in the daytime and 0.86 at night.The MB is 0.16 mm in the daytime, suggesting the retrieval algorithm slightly overestimate PWV, whereas the nighttime MB is −4.29 mm, indicating overall underestimation.Figure 7 shows the variation of R, MB and RMSE on an hourly basis.R is higher than 0.85 for 22 out of 24 h, except for 10:00 a.m.(0.76) and 07:00 p.m. (0.838), indicating a reasonably good agreement throughout the day.Positive biases are observed from 10:00 a.m. to 04:00 p.m. local time, while negative biases are observed from 05:00 p.m. to 09:00 a.m.The variation of MB resembles the diurnal cycle of air temperature, with maximum positive bias at 02:00 p.m. when the temperature is highest, and maximum negative bias at 05:00 a.m. when the temperature is the lowest, though the underestimation at night is more pronounced than in daytime.The linear algorithm used in the present study takes T700 as an approximation of effective air temperature.Several factors may explain the difference between daytime and nighttime.The effective temperature Tair characterizes the temperature variation of the entire lower troposphere, which is less affected by diurnal heating within the boundary layer than the temperature at a single level (T700), which may lead to overestimation of the diurnal cycle of Tair [21].T700 has a scaling effect on the last four terms in Equation (4).Overestimation of Tair in the daytime may thus lead to higher PWV column as the transmittance ratio of the 11 µm to 12 µm band is amplified, and similarly underestimation of Tair at night results in underestimation of the PWV column.Secondly, since an assumption was made for constant coefficients throughout the day, diurnal variation in surface reflectance might also introduce errors into the PWV retrieval [48].Third, propagation errors in the retrieval algorithm, uncertainties of sensor calibration, spectral response and geometric registration may also introduce errors in retrieval of the PWV column [17].It is shown that the signal to noise ratio (SNR) of the Japanese Advanced Meteorological Imager (JAMI) onboard MTSAT decreases noticeably during local mid-night, which may partly explain the relatively poorer performance at night [49].

Validation of EOF-Based Interpolation
Figure 5a shows the reconstructed MTSAT PWV versus GPS PWV.The statistical metrics show the reconstructed PWV have very good overall accuracy: MB decreases slightly from −1.34 mm to −1.73 mm; RMSE increases from 5.64 to 6.14 mm; and R increases from 0.86 to 0.87 after reconstruction.Figure 6b shows the annual average PWV and the validation results of the reconstructed PWV column over GPS sites.It is shown that the annual average PWV column has decreased in most of the sites after reconstruction, except for the southern site "KMIN" (marked in Figure 6b) where the average PWV increased from 17.46 mm to 21.29 mm.The overall decreasing trend is mainly due to the uneven distribution of clear pixels in the original data.Over northern parts of China, the decreasing PWV weakens the overestimation of original data, thus better agreement was reached over these sites.However, over the coastal sites in southern China, the decreasing trend leads to further underestimation, thus lower correlation coefficients were observed.
As shown in Figure 5d-g, the reconstructed PWV agrees well with GPS-derived PWV, and better agreement is observed in daytime than nighttime.For the cloudy pixels, the value of R is higher than for clear pixels in both daytime and nighttime, indicating that the EOF-based interpolation has reproduced the spatial and temporal variation of PWV over cloudy pixels.The underestimation of MTSAT PWV at night is less pronounced (Figure 5g-h), but the positive MB (0.16 mm) in daytime turns out to be negative (−1.95 mm) after reconstruction (Figure 5d-e).This is because EOF-based interpolation takes into account the temporal variation of PWV, thus the underestimation at night also affects the PWV column in daytime.As shown in Figure 7, the reconstruction tends to smooth the diurnal variation of R, MB and RMSE.In the original PWV dataset, an abrupt decreasing in R and increasing in RMSE are observed at 10:00 a.m. and 07:00 p.m. local time, but the abrupt change has been smoothed after reconstruction.Influenced by the nighttime EOFs, MB remains negative throughout the day, though the extent of underestimation decreases by more than a factor of two at night.

Diurnal Cycle of PWV
Harmonic analysis was conducted over 12 months in 2008 for eastern China.The observed diurnal cycle varies in both space and time.Figure 8 shows the magnitude of diurnal cycle from May to August calculated from harmonic analysis, and Figure 9 shows the time with maximum PWV in a day in the corresponding months.Over the coastal regions of China, the PWV peaks are observed in early afternoon (around 13:00 LST) in all of the months, which is consistent with the diurnal cycle of air temperature [50].Despite the high humidity over coastal areas, the magnitude of diurnal cycle is low, i.e., less than 1 mm in May and 1 to 2 mm from June to August, resulting from the small temperature diurnal variation.Over the North China Plain, maximum PWV is found around noon in most of the months except May, when a morning peak is observed north of 35°N; and an evening peak is observed south of 35°N along the Tainhang Mountains.The morning peak may be associated with the diurnal cycle of stratiform precipitation whose peaks occur in the morning over the North China Plain [44].The evening peak over mountainous areas is likely due to the convergence of water vapor by anabatic wind in the afternoon [51].It is noted that the magnitude and variance of the diurnal cycle over the North China Plain in May is small (<2 mm), which may be due to the contradictory effects between local circulation and diurnal cycle of air temperature.In southern inland China, the PWV peaks in early afternoon in all months, and the spatial pattern of the magnitude of the diurnal cycle resembles that of air temperature.In southern China, since the topography is relatively flat, diurnal variation of PWV is mainly affected by the surface evapotranspiration that is proportional to air temperature, rather than local thermal circulations generated from the topography.The magnitude of diurnal cycle in Sichuan Basin (centered at 30°N, 113°E) is large in summer time (>3 mm), and PWV peaks at noon (12:00 p.m. to 14:00 p.m.) in the basin areas.Previous studies indicate an evening peak of PWV in basin areas of Japan [51], but the evening peak is not observed from the reconstructed MTSAT PWV, which may be attributed to the larger impacts of surface evapotranspiration in the Sichuan Basin.

Conclusions
Our study utilized the IR1 and IR2 channels from MTSAT images to derive the PWV column at a temporal resolution of one hour.A semi-empirical split-window algorithm proposed by Akatsuka et al. was applied to derive the PWV column [28].Since the split-window algorithm only applies over clear sky pixels, the cloudy pixels were filtered out using a 2D-THR algorithm [43], and 64% to 79% cloud pixels were excluded under the strict cloud detection algorithm.To address the limitation of split-window algorithm being only applicable in cloud-free conditions, an EOF-based interpolation method was applied to reconstruct the missed PWV value in cloud-contaminated pixels.The retrieval and interpolation performance was evaluated using in-situ GPS meteorology data.The results indicate a good agreement between MTSAT and GPS PWV data (R = 0.87, RMSE = 8.46 mm), but MTSAT reconstructed data tends to overestimate the PWV column for low humidity and underestimate at high humidity levels.However, the linearized algorithm assumes consistent absorption coefficients across the entire study area, which may introduce some uncertainties due to the impact of topography and surface albedo.
Our study proposed a real-time and cloud-free water vapor retrieval method for geostationary satellite data, which could be used to analyze atmospheric phenomena without introducing sampling biases caused by cloudy condition.As an example, the diurnal cycle of PWV over eastern China was analyzed.It is shown that the diurnal cycle of PWV varies between coastal, inland, mountainous and basin areas.PWV peaks in early afternoon in most part of China, but an evening peak was observed in mountainous areas, which is likely due to the convergence of water vapor by anabatic wind in the afternoon [51].In addition to the diurnal cycle analysis, hourly reconstructed PWV images have multiple applications, e.g., facilitating more accurate and real-time weather services by meteorological agencies, using as input to force large-scale climate models, and using for atmospheric correction in remote sensing applications, which is critical in deriving accurate surface reflectance.

Figure 1 .
Figure 1.Spatial distribution of radiosonde and Global Positioning System (GPS) stations over eastern China.

Figure 4 .
Figure 4. Distribution of PWV and TRMM (Tropical Rainfall Measuring Mission) precipitation over eastern China on 16 August 2008 at 12:00 a.m., 06:00 a.m., 12:00 p.m. and 06:00 p.m. UTC: (a-d) original PWV; (e-h) reconstructed PWV; (i-l) precipitation rate from TRMM data.The location of 'WHJF' is marked as star symbol.Missing data or pixels out of the boundary of China are marked in white.

Figure 5 .
Figure 5. Validation of Multi-functional Transport Satellite (MTSAT) derived PWV with GPS-derive PWV: (a) reconstructed data (whole day); (b) interpolated data over cloudy pixels (whole day); (c) observed data only (whole day); (d) reconstructed data at daytime; (e) interpolated data over cloudy pixels at daytime; (f) observed data at daytime; (g) reconstructed data at night; and (h) interpolated data over cloudy pixels at night; (i) observed data at night.

Figure 6 .
Figure 6.GPS validation results over sites: (a) validation with reconstructed PWV; (b) validation with original PWV.Upward and downward triangles represent positive and negative biases, respectively.Color represents the value of the annual average MTSAT PWV over each site.Numbers represent the correlation coefficients between MTSAT and GPS PWV data.

Figure 7 .
Figure 7. GPS validation results at different hours of the day: (a) correlation coefficient; (b) mean bias; and (c) RMSE.

Figure 8 .
Figure 8. Magnitude of the diurnal cycle calculated from the reconstructed PWV data: (a) May; (b) June; (c) July; and (d) August.Missing data or pixels out of the national map boundary are indicated as white color.

Figure 9 .
Figure 9.Time of the maximum PWV in a day calculated from the reconstructed PWV data: (a) May; (b) June; (c) July; and (d) August.Missing data or pixels out of the national map boundary are indicated as white color.

Table 1 .
Derived coefficients using linear precipitable water vapor (PWV) retrieval algorithm: n is the number of data pairs used for regression; R is the Pearson correlation coefficient; RMSE is the root mean square error of the regression; a0 to a7 are the partial regression coefficients; % C is the percentage of cloudy pixels over the radiosonde sites in each month.