Validation of Water Vapor Vertical Distributions Retrieved from MAX-DOAS over Beijing, China

: Water vapor vertical proﬁles are important in numerical weather prediction, moisture transport, and vertical ﬂux calculation. This study presents the Multi-Axis Di ﬀ erential Optical Absorption Spectroscopy (MAX-DOAS) retrieval algorithm for water vapor vertical proﬁles and the retrieved results are validated with corresponding independent datasets under clear sky. The retrieved Vertical Column Densities (VCDs) and surface concentrations are validated with the Aerosol Robotic Network (AERONET) and National Climatic Data Centre (NCDC) datasets, achieving good correlation coe ﬃ cients (R) of 0.922 and 0.876, respectively. The retrieved vertical proﬁles agree well with weekly balloon-borne radiosonde measurements. Furthermore, the retrieved water vapor concentrations at di ﬀ erent altitudes (100–2000 m) are validated with the corresponding European Centre for Medium-range Weather Forecasts (ECMWF) ERA-interim datasets, achieving a correlation coe ﬃ cient (R) varying from 0.695 to 0.857. The total error budgets for the surface concentrations and VCDs are 31% and 38%, respectively. Finally, the retrieval performance of the MAX-DOAS algorithm under di ﬀ erent aerosol loads is evaluated. High aerosol loads obstruct the retrieval of surface concentrations and VCDs, with surface concentrations more liable to severe interference from such aerosol loads. To summarize, the feasibility of detecting water vapor proﬁles using MAX-DOAS under clear sky is conﬁrmed in this work.


Introduction
Water vapor, as an important atmospheric component, plays a key role in the radiative energy exchange processes that occur there [1,2]. A large portion of the energy transferred between the ground surface and the atmosphere takes the form of latent heat, which determines the global distribution of clouds [3]. The transport and redistribution of water vapor has significant regulatory effects on precipitation and water balance [4]. As the most important greenhouse gas, water vapor's atmospheric concentration increases drastically with temperature, and this increase can amplify global warming processes [5,6]. Moreover, water vapor has a critical influence on atmospheric pollution [7]. It has been demonstrated that under high relative humidity, the volume of aerosol particles can be doubled through the absorption of water vapor onto their surfaces, thereby promoting secondary aerosol formation [8].
The vertical distributions of water vapor are significant for numerical weather prediction, moisture transport, and vertical flux calculations [9,10]. For a long time, balloon-borne radiosonde measurements have been the primary method to monitor water vapor vertical profiles. However, the measurement capabilities of these radiosondes are limited by horizontal drift, poor temporal coverage, and low measurement frequency [11,12]. The water vapor vertical profile is usually monitored twice a day (00:00 and 12:00 UTC) by radiosondes and per flight (from 0 to about 35 km) takes 1-2 h [13]. Furthermore, these devices cannot resolve fast-running weather phenomena such as the development of convective boundary layers and the passage of cold fronts [14]. Numerous passive and active Remote Sens. methods have been developed to compensate for the deficiencies in balloon-borne radiosonde measurements. Ground-based microwave radiometers measure the brightness temperatures of atmospheric radiation (at around 22 GHz) to build water vapor profiles via neural network algorithms in all-weather conditions with the temporal resolution of the order of seconds [9,15]. The Atmospheric Emitter Radiance Interferometer (AERI) collects high-resolution atmospheric emitted infrared radiances from the atmosphere to retrieve the vertical thermal and moisture structure in the earth's atmosphere during daytime with the temporal resolution of 10 min via the infrared radiative transfer equation [16,17]. Recently, Raman lidar and Differential Absorption Lidar (DIAL) have been developed as active Remote Sens. techniques to rapidly detect water vapor around the clock with a high temporal resolution of a few seconds [10,[18][19][20]. Raman lidar detects the Raman backscattered radiation from the atmosphere, and the unique wavelength shifts of this backscattered radiation can be associated with specific molecules [21]. Micropulse DIAL uses a pulsed laser transmitter that can produce two wavelengths (online and offline) around 828 nm. The ratio of the return power between the online and offline signals is used to determine the water vapor number density or absolute humidity [22].
Multi-Axis Differential Optical Absorption Spectroscopy (MAX-DOAS) is a passive Remote Sens. technique that uses the scattered solar spectra at different elevation angles to derive Slant Column Densities (SCDs) [23][24][25][26][27][28]. These SCDs for different elevation angles can then be converted into vertical profiles via retrieval algorithms. The temporal resolution in retrieved vertical profiles is about 15 min, which depends on the integral time and the number of elevation angles in a sequence. Comparing to other Remote Sens. techniques, the advantages of MAX-DOAS are the simple experimental setup, low power consumption, highly automated operation, and without absolute radiometric calibration. Moreover, MAX-DOAS retrieved results have high sensitivities in lower atmosphere, which can compensate the blind area near ground of lidar results [29]. Ground-based MAX-DOAS has been employed to retrieve water vapor Vertical Column Densities (VCDs) in the yellow and red spectral ranges, using a simple and effective geometric Atmospheric Mass Factor (AMF) [30]. An airborne MAX-DOAS instrument was used to detect water vapor vertical profiles for detection altitudes of up to 3500 m with a vertical grid of 250 m [31]. The water vapor profiles retrieved by ground-based MAX-DOAS were first presented in 2011 [32]. In that study, the fitting window 495-515 nm was analyzed for water vapor retrieval at 506 nm. The retrieved vertical profile was roughly represented by VCD and three F values (F 1 , F 2 , and F 3 ) after these values were converted to partial VCD values for 0-1, 1-2, 2-3, and 3-8 km. The partial VCD values for these grids were described as VCD • F 1 , VCD • (1 − F 1 ) F 2 , VCD • (1 − F 1 ) (1 − F 2 ) F 3 and VCD • (1 − F 1 ) (1 − F 2 ) (1 − F 3 ), respectively. The retrieved surface volume mixing ratios were validated using collocated in-situ measurements, achieving a correlation coefficient (R) of 0.75. In this study, due to the limited spectral range of spectrometer, we choose a fitting window of 433-462 nm for water vapor retrieval at 443 nm, where the H 2 O absorption bands are about one order of magnitude weaker than those at 506 nm.
In this paper, we present a MAX-DOAS retrieval algorithm for water vapor vertical profile calculation, and demonstrate that the retrieved results accord well with independent datasets. In the next section, the MAX-DOAS instrument's specifications and MAX-DOAS profile retrieval algorithm are introduced. In Section 3, the ancillary data used for validation (taken from the AERONET, NCDC, and ECMWF ERA-interim datasets) are introduced. In Section 4, the retrieved VCDs, surface concentrations, and vertical profiles are validated with corresponding independent datasets. In Section 5, the error budget estimations for the surface concentrations and VCDs are summarized in this section. Moreover, we further evaluate the retrieval performance of surface concentrations and VCDs under different aerosol loads, by investigating the correlation results at different Aerosol Optical Depths (AODs). Finally, the conclusions are summarized in Section 6.  Figure 1. In this observation experiment, more than 100,000 spectra with outside temperatures varying from 4 to 40 • C (shown in Figure S1) were collected.

The MAX-DOAS Instrument
Remote Sens. 2020, 12, x FOR PEER REVIEW  3 of 19 respectively. The retrieved surface volume mixing ratios were validated using collocated in-situ measurements, achieving a correlation coefficient (R) of 0.75. In this study, due to the limited spectral range of spectrometer, we choose a fitting window of 433-462 nm for water vapor retrieval at 443 nm, where the H2O absorption bands are about one order of magnitude weaker than those at 506 nm.
In this paper, we present a MAX-DOAS retrieval algorithm for water vapor vertical profile calculation, and demonstrate that the retrieved results accord well with independent datasets. In the next section, the MAX-DOAS instrument's specifications and MAX-DOAS profile retrieval algorithm are introduced. In Section 3, the ancillary data used for validation (taken from the AERONET, NCDC, and ECMWF ERA-interim datasets) are introduced. In Section 4, the retrieved VCDs, surface concentrations, and vertical profiles are validated with corresponding independent datasets. In Section 5, the error budget estimations for the surface concentrations and VCDs are summarized in this section. Moreover, we further evaluate the retrieval performance of surface concentrations and VCDs under different aerosol loads, by investigating the correlation results at different Aerosol Optical Depths (AODs). Finally, the conclusions are summarized in Section 6.  Figure 1. In this observation experiment, more than 100,000 spectra with outside temperatures varying from 4 to 40 °C (shown in Figure S1) were collected. The Airyx multi-function DOAS instrument employed in this observation consists of an outdoor telescope and two Avantes spectrometer units (UV: 300-410 nm, full width at half maximum (FWHM) = 0.45 nm; Vis: 407-505 nm, FWHM = 0.4 nm) placed indoors at a constant temperature of 20 ± 0.05 °C. The outdoor telescope and indoor spectrometer units are connected via a multi-mode optic fiber and several control lines. The elevation angle of the telescope is controlled by an internal stepping motor with a precision of less than 0.1°. The solar scattering light collected at different elevation angles is refracted by a rotating prism through a lens onto the fiber, eventually reaching the spectrometers for spectral analysis. The telescope's field of view in the horizontal and vertical directions is less than 0.8° and 0.2°, respectively. Its observation azimuth angle is 130° with respect to north. The full sequence of the elevation scan is as follows: 1°, 2°, 3°, 4°, 5°, 6°, 8°, 10°, 15°, 30°, and The Airyx multi-function DOAS instrument employed in this observation consists of an outdoor telescope and two Avantes spectrometer units (UV: 300-410 nm, full width at half maximum (FWHM) = 0.45 nm; Vis: 407-505 nm, FWHM = 0.4 nm) placed indoors at a constant temperature of 20 ± 0.05 • C. The outdoor telescope and indoor spectrometer units are connected via a multi-mode optic fiber and several control lines. The elevation angle of the telescope is controlled by an internal stepping motor with a precision of less than 0.1 • . The solar scattering light collected at different elevation angles is refracted by a rotating prism through a lens onto the fiber, eventually reaching the spectrometers for spectral analysis. The telescope's field of view in the horizontal and vertical directions is less than 0.8 • and 0.2 • , respectively. Its observation azimuth angle is 130 • with respect to north. The full sequence of the elevation scan is as follows: 1 • , 2 • , 3 • , 4 • , 5 • , 6 • , 8 • , 10 • , 15 • , 30 • , and 90 • . The typical integral time for each elevation is approximately 1 min, and the full sequence for elevation takes approximately 11 min.

DOAS Spectral Analysis
The spectra observed in the aforementioned period were analyzed for H 2 O and O 4 Differential Slant Column Densities (dSCDs) using the DOAS method implemented by the QDOAS software (http://uv-vis.aeronomie.be/software/QDOAS/, last access: 16 June 2020). In this step, the zenith spectrum was used as a Fraunhofer reference spectrum for each sequence to reduce the contribution of the stratospheric absorption in the derived off-axis dSCDs. Wavelength calibration was performed using a high-resolution solar spectrum [33]. The direct results retrieved from the DOAS fitting were the dSCDs, defined as the difference between the SCD for off-axis and zenith-sky measurements. These dSCDs were used as the input for the subsequent vertical profile retrieval. The dSCDs were filtered out with a Root Mean Square (RMS) larger than 0.002 or a Solar Zenith Angle (SZA) larger than 75 • for quality control.
For the retrieval of H 2 O, an optimized fitting window of 433-462 nm and a third-order polynomial were selected via a sensitivity test (shown in Figures S2 and S3). For O 4 , a fitting window of 460-490 nm and a fifth-order polynomial were selected [32]. Examples of the DOAS fitting for H 2 O and O 4 are presented in Figure 2, and the DOAS retrieval settings are listed in Table 1. 90°. The typical integral time for each elevation is approximately 1 min, and the full sequence for elevation takes approximately 11 min.

DOAS Spectral Analysis
The spectra observed in the aforementioned period were analyzed for H2O and O4 Differential Slant Column Densities (dSCDs) using the DOAS method implemented by the QDOAS software (http://uv-vis.aeronomie.be/software/QDOAS/, last access: 16 June 2020). In this step, the zenith spectrum was used as a Fraunhofer reference spectrum for each sequence to reduce the contribution of the stratospheric absorption in the derived off-axis dSCDs. Wavelength calibration was performed using a high-resolution solar spectrum [33]. The direct results retrieved from the DOAS fitting were the dSCDs, defined as the difference between the SCD for off-axis and zenith-sky measurements. These dSCDs were used as the input for the subsequent vertical profile retrieval. The dSCDs were filtered out with a Root Mean Square (RMS) larger than 0.002 or a Solar Zenith Angle (SZA) larger than 75° for quality control.
For the retrieval of H2O, an optimized fitting window of 433-462 nm and a third-order polynomial were selected via a sensitivity test (shown in Figures S2 and S3). For O4, a fitting window of 460-490 nm and a fifth-order polynomial were selected [32]. Examples of the DOAS fitting for H2O and O4 are presented in Figure 2, and the DOAS retrieval settings are listed in Table 1.

HEIPRO Algorithm Description
The Heidelberg Profile Retrieval (HEIPRO) algorithm for the vertical profiles of aerosol extinction and trace gases was used to convert the dSCDs into vertical profiles [40,41]. The universality of the HEIPRO algorithm has been demonstrated in many comparative observation experiments [42][43][44][45][46]. HEIPRO is based on a well-established optimal estimation method, which enables it to overcome the poorly constrained inversion problem [47]. The SCIATRAN 2.2 Radiative Transfer Model (RTM) [48] was used as forward model F to simulate the measurement vector y on the basis of atmospheric state vector x. The vertical profiles (both of aerosol and trace gas) were calculated iteratively by adjusting the input parameters until the optimal agreement between the measurements and RTM simulation was reached. This agreement is quantified as the minimization of the cost function χ 2 , which is given by the following equation (Equation (1)): Here, the state vector x consists of aerosol extinction coefficients or trace gas concentrations in discrete atmospheric layers, the thicknesses of which vary from 100-200 m. The measurement vector y, which corresponds to the dSCDs in MAX-DOAS, is a function of x and the meteorological parameters b (i.e., atmospheric pressure and temperature vertical profiles). The a priori profile x a , which serves as the initial profile for profile retrieval, is iteratively used to constrain the ill-posed problem in the optimization. S a and S ε denote the covariance matrices of a priori uncertainty and measurement error, representing the uncertainty in the a priori and measurement states, respectively.
HEIPRO implements a two-step approach to retrieve aerosol and trace gas profiles. First, aerosol extinction profiles are retrieved from the measured O 4 dSCDs for each MAX-DOAS scan. Then, the retrieved aerosol extinction properties serving as the forward model parameters are combined with the measured trace gas dSCDs and used to retrieve the trace gas vertical profiles. In this study, the aerosol extinction and trace gas profiles were divided into 30 layers on a fixed altitude grid such that the first 20 layers (below 2 km) are on a 100 m grid, and the last ten layers (between 2 km and 4 km) are on a 200 m grid. The temporal interval of inversion was set to 15 min, to cover at least a full elevation sequence. For aerosol retrieval, a fixed set of aerosol properties including an asymmetry parameter of 0.72, a single scattering albedo of 0.90, and a ground albedo of 0.05 were assumed in this study [49]. An exponentially decreasing a priori profile with a surface extinction coefficient of 0.3 km −1 and a scale height of 1 km were used as a priori profiles for aerosol retrieval. The a priori uncertainty, which is used to construct S a , was itself set as an a priori profile (relative error = 100%).
For the water vapor vertical profile retrieval, the aerosol information retrieved in the first step provides the atmospheric light path, which serves as the forward model parameter for trace gas retrieval. The overall shape of the water vapor vertical profile is determined by the Clausius-Clapeyron relationship and temperature lapse rate [30]. Thus, an exponentially decreasing a priori profile with a surface concentration of 4.6 × 10 17 molec/cm 3 and a scale height of 1.9 km was chosen for water vapor profile retrieval, by an exponential fitting of the ECMWF ERA-interim water vapor vertical profiles, as shown in Figure 3. Besides the a priori profile, the reasonable estimation of a priori error in each layer is important because of the large variety of profile shapes. In this paper, half of the a priori profile (relative error = 50%) was chosen as a priori uncertainty, to maintain a balance between the flexibility and stability of the profile shape [50,51].

Ancillary Data for Validation
The hourly averaged water vapor VCDs, surface concentrations, and vertical profiles retrieved from MAX-DOAS were validated with corresponding independent datasets. The locations of the MAX-DOAS site and other independent measurement sites are presented in Figure 1 and Table S1. The detailed data processing methods are presented in Section 5 of the supplement. These validations were all performed under cloud-screened conditions, by synchronizing the timetable to that of AERONET [52]. For VCDs, any retrieved results featuring errors exceeding 1 × 10 22 molec/cm 2 , chi-squares exceeding 100, or less than two degrees of freedom (DOFs) were further filtered out (~23% of the data). For surface concentrations, retrieved results featuring errors exceeding 1 × 10 17 molec/cm 3 , chi-squares exceeding 100, or less than two DOFs were further filtered out (~15% of the dataset) [51]. The frequency distributions of DOFs and chi-squares can be found in Figure S4.

AERONET Observation Data
The AERONET project is a federation of ground-based Remote Sens. aerosol networks established by NASA and PHOTONS [53]. Sun photometers, as components of AERONET, not only provide information to calculate AODs but can also be used to retrieve total column water vapor quantities (water vapor VCDs) in the 940 nm channel. The AERONET data are automatically cloud screened. Thus, the absence of AERONET data often serves as a temporal index for the presence of clouds and is an external selection criterion of MAX-DOAS measurements [54,55].
In this study, the Level 1.5 precipitable water data from the Beijing-CAMS AERONET site (39.933°N, 116.317°E; Elevation: 106.0 m) were used to validate MAX-DOAS VCDs. The unit of AERONET precipitable water is the centimeter (cm), which can be converted to number density (molec/cm 2 ) via the following relationship: 1 cm of precipitable water equals 3.3 × 10 22 molec/cm 2 .

Ancillary Data for Validation
The hourly averaged water vapor VCDs, surface concentrations, and vertical profiles retrieved from MAX-DOAS were validated with corresponding independent datasets. The locations of the MAX-DOAS site and other independent measurement sites are presented in Figure 1 and Table S1. The detailed data processing methods are presented in Section 5 of the supplement.

•
The MAX-DOAS water vapor VCDs were validated with AERONET Level 1.5 precipitable water data from the CAMS site, Beijing (39. These validations were all performed under cloud-screened conditions, by synchronizing the timetable to that of AERONET [52]. For VCDs, any retrieved results featuring errors exceeding 1 × 10 22 molec/cm 2 , chi-squares exceeding 100, or less than two degrees of freedom (DOFs) were further filtered out (~23% of the data). For surface concentrations, retrieved results featuring errors exceeding 1 × 10 17 molec/cm 3 , chi-squares exceeding 100, or less than two DOFs were further filtered out (~15% of the dataset) [51]. The frequency distributions of DOFs and chi-squares can be found in Figure S4.

AERONET Observation Data
The AERONET project is a federation of ground-based Remote Sens. aerosol networks established by NASA and PHOTONS [53]. Sun photometers, as components of AERONET, not only provide information to calculate AODs but can also be used to retrieve total column water vapor quantities (water vapor VCDs) in the 940 nm channel. The AERONET data are automatically cloud screened. Thus, the absence of AERONET data often serves as a temporal index for the presence of clouds and is an external selection criterion of MAX-DOAS measurements [54,55].
In this study, the Level 1.5 precipitable water data from the Beijing-CAMS AERONET site (39. water is the centimeter (cm), which can be converted to number density (molec/cm 2 ) via the following relationship: 1 cm of precipitable water equals 3.3 × 10 22 molec/cm 2 . Triplet variability values of precipitable water vapor in the AERONET data were regarded as errors of the water vapor VCDs.

NCDC Data
The hourly surface meteorological data (including temperature, pressure, dew point, wind speed, and wind direction), measured at the BCIA site (USAF ID: 545110; 40.080 • N, 116.585 • E; Elevation: 35.4 m) approximately 27.8 km north-east of the MAX-DOAS instrument, were obtained from the public FTP server of the NCDC (ftp://ftp.ncdc.noaa.gov/pub/data/noaa/isd-lite/, last access: 16 June 2020). The in-situ measured dew point can be converted into the molecular number density of water vapor (molec/cm 3 ).

ECMWF ERA-Interim Reanalysis Data
The ECMWF ERA-interim reanalysis dataset [56] consists of a reanalysis of the global atmosphere and it is often used to validate other datasets [30,57]. In this study, the spatial and time resolutions of the ECMWF data were 0.125 • × 0.125 • and 6 h, respectively. The region between 39.875 and 40.0 • N and between 116.25 and 116.375 • E was averaged for comparison. The specific humidity in the ECMWF ERA-interim data can be converted to a number density using pressure and temperature. The pressure at each model level can be converted into height information via a barometric height formula. Thus, the vertical profiles of water vapor were illustrated by level heights and the corresponding number densities.

Balloon-Borne Radiosonde Data
Balloon-borne radiosonde measurements were conducted weekly at Nanjiao Observatory, Beijing (39.806 • N, 116.470 • E; Elevation = 31 m). The measurements often took place at approximately 6:00 UTC. The maximum observation height exceeded 30 km, and the temporal sampling resolution was 1 s.

Validation of VCDs Retrieved by MAX-DOAS
The hourly averaged VCDs retrieved by MAX-DOAS were validated using AERONET Level 1.5 precipitable water data under clear sky conditions. The aerosol extinction coefficient profiles retrieved in the first step combined with meteorological profiles were used as inputs in the RTM to calculate the AMF. Thus, the dSCDs could be converted into tropospheric VCDs (denoted VCD trop ) [58][59][60], as shown in Equation (2).
As shown in Figure 4  As shown in Figure 5, a good agreement exists between both the daily and monthly retrieved VCDs from MAX-DOAS and AERONET. From April to July, the monthly averaged water vapor VCD derived from MAX-DOAS increased from 4.94 × 10 22 ± 5.38 × 10 21 to 1.29 × 10 23 ± 5.39 × 10 21 . Then, it decreased to 6.48 × 10 22 ± 3.48 × 10 21 in September. From April to September, the differences between the MAX-DOAS VCDs and AERONET results vary from 0.23 × 10 21 to 0.98 × 10 21 , and the relative differences vary from 1.8% (July) to 15% (September and April). The monthly averaged VCDs of MAX-DOAS exhibit their best agreement with the AERONET results in July. All values are in units of molec/cm 2 .

Validation of Surface Concentrations Retrieved by MAX-DOAS
As shown in Figure 6, the hourly averaged water vapor surface concentrations observed by MAX-DOAS at the CAMS, Beijing were validated using the in-situ measurements from the BCIA under clear sky conditions. Although the distance between the two sites is more than 27 km, a good agreement (R = 0.876 and slope = 1.06) can be seen. The bias between the hourly surface concentrations measured by MAX-DOAS and NCDC in-situ instrument is −3.48 × 10 16 with the standard deviation of 9.43 × 10 16 . The daily and monthly averaged time series results of surface concentration from MAX-DOAS and NCDC are displayed in Figure 7. The monthly averaged water vapor surface concentration retrieved from MAX-DOAS increased from 2.08 × 10 17 ± 3.54 × 10 16 to 5.64 × 10 17 ± 3.50 × 10 16 between April and July. Then, it decreased to 2.69 × 10 17 ± 6.37 × 10 16   As shown in Figure 5, a good agreement exists between both the daily and monthly retrieved VCDs from MAX-DOAS and AERONET. From April to July, the monthly averaged water vapor VCD derived from MAX-DOAS increased from 4.94 × 10 22 ± 5.38 × 10 21 to 1.29 × 10 23 ± 5.39 × 10 21 . Then, it decreased to 6.48 × 10 22 ± 3.48 × 10 21 in September. From April to September, the differences between the MAX-DOAS VCDs and AERONET results vary from 0.23 × 10 21 to 0.98 × 10 21 , and the relative differences vary from 1.8% (July) to 15% (September and April). The monthly averaged VCDs of MAX-DOAS exhibit their best agreement with the AERONET results in July. All values are in units of molec/cm 2 .

Validation of Surface Concentrations Retrieved by MAX-DOAS
As shown in Figure 6, the hourly averaged water vapor surface concentrations observed by MAX-DOAS at the CAMS, Beijing were validated using the in-situ measurements from the BCIA under clear sky conditions. Although the distance between the two sites is more than 27 km, a good agreement (R = 0.876 and slope = 1.06) can be seen. The bias between the hourly surface concentrations measured by MAX-DOAS and NCDC in-situ instrument is −3.48 × 10 16 with the standard deviation of 9.43 × 10 16 . The daily and monthly averaged time series results of surface concentration from MAX-DOAS and NCDC are displayed in Figure 7. The monthly averaged water vapor surface concentration retrieved from MAX-DOAS increased from 2.08 × 10 17 ± 3.54 × 10 16 to 5.64 × 10 17 ± 3.50 × 10 16 between April and July. Then, it decreased to 2.69 × 10 17 ± 6.37 × 10 16

Validation of Surface Concentrations Retrieved by MAX-DOAS
As shown in Figure 6, the hourly averaged water vapor surface concentrations observed by MAX-DOAS at the CAMS, Beijing were validated using the in-situ measurements from the BCIA under clear sky conditions. Although the distance between the two sites is more than 27 km, a good agreement (R = 0.876 and slope = 1.06) can be seen. The bias between the hourly surface concentrations measured by MAX-DOAS and NCDC in-situ instrument is −3.48 × 10 16 with the standard deviation of 9.43 × 10 16 . The daily and monthly averaged time series results of surface concentration from MAX-DOAS and NCDC are displayed in Figure 7. The monthly averaged water vapor surface concentration retrieved from MAX-DOAS increased from 2.08 × 10 17 ± 3.54 × 10 16 to 5.64 × 10 17 ± 3.50 × 10 16 between April and July. Then, it decreased to 2.69 × 10 17 ± 6.37 × 10 16   The water vapor surface concentrations observed by MAX-DOAS at CAMS (urban area) are slightly lower than NCDC in-situ measurements at BCIA (suburban area), especially in July and August. This underestimation can be related to urban dryness island effect in Beijing [59]. It has been reported that the urban dryness island intensities (UDII) are high in central urban areas and the UDII is stronger in summer and autumn than in winter and spring [60]. Thus relative humidity in urban areas is smaller than in suburban areas, indicating the effect of urbanization on near-surface atmospheric moisture. This is consistent with the comparison results between MAX-DOAS and NCDC in-situ measurements.

Validation with Balloon-Borne Radiosonde Measurements
Three representative water vapor profiles were retrieved from the MAX-DOAS measurements taken on three clear days:   The water vapor surface concentrations observed by MAX-DOAS at CAMS (urban area) are slightly lower than NCDC in-situ measurements at BCIA (suburban area), especially in July and August. This underestimation can be related to urban dryness island effect in Beijing [59]. It has been reported that the urban dryness island intensities (UDII) are high in central urban areas and the UDII is stronger in summer and autumn than in winter and spring [60]. Thus relative humidity in urban areas is smaller than in suburban areas, indicating the effect of urbanization on near-surface atmospheric moisture. This is consistent with the comparison results between MAX-DOAS and NCDC in-situ measurements.

Validation with Balloon-Borne Radiosonde Measurements
Three representative water vapor profiles were retrieved from the MAX-DOAS measurements taken on three clear days:  The water vapor surface concentrations observed by MAX-DOAS at CAMS (urban area) are slightly lower than NCDC in-situ measurements at BCIA (suburban area), especially in July and August. This underestimation can be related to urban dryness island effect in Beijing [61]. It has been reported that the urban dryness island intensities (UDII) are high in central urban areas and the UDII is stronger in summer and autumn than in winter and spring [62]. Thus relative humidity in urban areas is smaller than in suburban areas, indicating the effect of urbanization on near-surface atmospheric moisture. This is consistent with the comparison results between MAX-DOAS and NCDC in-situ measurements.

Validation with ECMWF ERA-Interim Data
As shown in Figure 9, we conducted a correlation analysis of the water vapor concentrations in different height layers (derived from the MAX-DOAS and ECMWF ERA-interim data) to validate the profiles. As performed for the balloon-borne radiosonde measurements, the ECMWF ERA-interim water vapor profiles were also interpolated onto the MAX-DOAS grid. During this observation period, the MAX-DOAS instruments collected spectra from 00:00 to 10:00 (UTC). However, the temporal resolution of the ECMWF ERA-interim dataset is 6 h. Thus, only the MAX-DOAS profiles measured at 00:00 and 06:00 can be used to validate the corresponding ECMWF profiles. These validations were conducted under no-cloud conditions by synchronizing the timetable to the AERONET data. In total, 138 profiles measured with MAX-DOAS in this observation period from 18 April to 30 September 2018 were validated using the coincident ECMWF profiles.

Validation with ECMWF ERA-Interim Data
As shown in Figure 9, we conducted a correlation analysis of the water vapor concentrations in different height layers (derived from the MAX-DOAS and ECMWF ERA-interim data) to validate the profiles. As performed for the balloon-borne radiosonde measurements, the ECMWF ERA-interim water vapor profiles were also interpolated onto the MAX-DOAS grid. During this observation period, the MAX-DOAS instruments collected spectra from 00:00 to 10:00 (UTC). However, the temporal resolution of the ECMWF ERA-interim dataset is 6 h. Thus, only the MAX-DOAS profiles measured at 00:00 and 06:00 can be used to validate the corresponding ECMWF profiles. These validations were conducted under no-cloud conditions by synchronizing the timetable to the AERONET data. In total, 138 profiles measured with MAX-DOAS in this observation period from 18 April to 30 September 2018 were validated using the coincident ECMWF profiles. constraint of a priori profile together lead to the decreasing linear fitting slope with increasing height. MAX-DOAS results have high sensitivities in the lower atmosphere and the contribution of a priori profile in retrieved profile is relatively small. With the increasing height, the detection sensitivity gets worse and the dependence of retrieved profile in a priori profile becomes strong gradually. The fixed exponentially decreasing a priori profile with a surface concentration of 4.6 × 10 17 molec/cm 3 and a scale height of 1.9 km may be too high to represent the water vapor concentrations in high altitudes. Together with the low sensitivity in high altitudes, the retrieved concentrations in high altitudes can be higher than actual situation, thus the linear fitting slope decreases and the bias between MAX-DOAS and ECMWF increases, as shown in Figure 9.  Good agreement between the MAX-DOAS and ECMWF results can be observed in height layers below 2000 m, with the Pearson correlation coefficient (R) ranging from 0.695 to 0.857. Under an increase in layer height, the detection sensitivity of MAX-DOAS gradually decreases and the correlation analysis results degrade. In the layer height from 600 to 1200 m, the consistency between the MAX-DOAS concentrations and the ECMWF results is slightly worse, which may be the result of the large uncertainties in these layers. The decreasing detection sensitivity and the enhanced constraint of a priori profile together lead to the decreasing linear fitting slope with increasing height. MAX-DOAS results have high sensitivities in the lower atmosphere and the contribution of a priori profile in retrieved profile is relatively small. With the increasing height, the detection sensitivity gets worse and the dependence of retrieved profile in a priori profile becomes strong gradually. The fixed exponentially decreasing a priori profile with a surface concentration of 4.6 × 10 17 molec/cm 3 and a scale height of 1.9 km may be too high to represent the water vapor concentrations in high altitudes. Together with the low sensitivity in high altitudes, the retrieved concentrations in high altitudes can be higher than actual situation, thus the linear fitting slope decreases and the bias between MAX-DOAS and ECMWF increases, as shown in Figure 9.

Error Budget Analysis
The errors of water vapor profile retrievals can be divided into four different types: noise error, smoothing error, model parameter error, and forward model error [63]. The noise error, defined as the fitting error of the DOAS fits, is related to the propagation of noise in the measurements inputted into the retrieval algorithm [64]. The smoothing error, which quantifies the error owing to the limited vertical resolution of profile retrieval, limits the ability of the retrieval procedure to obtain solutions from the a priori profiles. The smoothing errors S s for the profiles are calculated from the a priori covariance matrix S a and the averaging kernel AK [65], as shown in Equation (3).
The model parameter errors include different parameters as aerosol profiles, cross sections and algorithm error. The forward model error originates from the simplification and uncertainties in the relative radiative model.
The algorithm error in vertical profile cannot be estimated because it is difficult to assign the discrepancies between measured and modeled dSCDs to each altitude of profiles, blocking the estimation of model parameters errors [66]. Moreover, it has been reported that the sum of smoothing error and noise error is the dominant error source in the total error budget [40,65]. The vertical distributions of the smoothing error, noise error, and the sum thereof under clear sky conditions are displayed in Figure 10. The smoothing error contributes substantially to the sum thereof, whereas the noise error has only a small influence. The smoothing and sum errors have similar vertical distributions, both increasing rapidly with height in the lower layers (0-800 m) and then decreasing gradually with height. The noise error varies marginally around 2.6 × 10 16 molec/cm 3 in the layers below 600 m. Then it exhibits an almost linear decrease with height.

Error Budget Analysis
The errors of water vapor profile retrievals can be divided into four different types: noise error, smoothing error, model parameter error, and forward model error [61]. The noise error, defined as the fitting error of the DOAS fits, is related to the propagation of noise in the measurements inputted into the retrieval algorithm [62]. The smoothing error, which quantifies the error owing to the limited vertical resolution of profile retrieval, limits the ability of the retrieval procedure to obtain solutions from the a priori profiles. The smoothing errors Ss for the profiles are calculated from the a priori covariance matrix Sa and the averaging kernel AK [63], as shown in Equation (3).
The model parameter errors include different parameters as aerosol profiles, cross sections and algorithm error. The forward model error originates from the simplification and uncertainties in the relative radiative model.
The algorithm error in vertical profile cannot be estimated because it is difficult to assign the discrepancies between measured and modeled dSCDs to each altitude of profiles, blocking the estimation of model parameters errors [64]. Moreover, it has been reported that the sum of smoothing error and noise error is the dominant error source in the total error budget [40,63]. The vertical distributions of the smoothing error, noise error, and the sum thereof under clear sky conditions are displayed in Figure 10. The smoothing error contributes substantially to the sum thereof, whereas the noise error has only a small influence. The smoothing and sum errors have similar vertical distributions, both increasing rapidly with height in the lower layers (0-800 m) and then decreasing gradually with height. The noise error varies marginally around 2.6 × 10 16 molec/cm 3 in the layers below 600 m. Then it exhibits an almost linear decrease with height. The averaged error budgets from different sources in retrieved water vapor surface concentrations and VCDs are estimated in Table 2. Beside smoothing and noise error introduced above, the uncertainty related to aerosols is estimated by taking the retrieved aerosol profiles at 477 nm plus their corresponding errors as inputs in water vapor retrieval and then comparing the results to those of the standard retrievals [65]. The uncertainty related to the a priori profiles is estimated by comparing the retrieval results using different a priori profiles (scale height: 1.5-2.5 km; surface concentration: 3 × 10 17 to 6 × 10 17 molec/cm 3 ) against the standard retrieval results. The uncertainty related to the H2O cross section is set as 3% [66]. The algorithm errors on the near-surface The averaged error budgets from different sources in retrieved water vapor surface concentrations and VCDs are estimated in Table 2. Beside smoothing and noise error introduced above, the uncertainty related to aerosols is estimated by taking the retrieved aerosol profiles at 477 nm plus their corresponding errors as inputs in water vapor retrieval and then comparing the results to those of the standard retrievals [67]. The uncertainty related to the a priori profiles is estimated by comparing the retrieval results using different a priori profiles (scale height: 1.5-2.5 km; surface concentration: 3 × 10 17 to 6 × 10 17 molec/cm 3 ) against the standard retrieval results. The uncertainty related to the H 2 O cross section is set as 3% [68]. The algorithm errors on the near-surface concentrations and the VCDs using the averaged relative differences between measured and simulated dSCDs for a 5 • and 30 • elevation angle, respectively [66]. The total error budgets for the surface concentrations and VCDs are calculated by adding the different error terms using Gaussian error propagation. For surface concentrations, the total uncertainty is approximately 31%, and the dominant error source is that pertaining to aerosols. However, for VCDs, the total uncertainty is approximately 38%, and the dominant error source is the sum of the smoothing and noise errors.

Retrieval Performance under Different Aerosol Loads
Aerosol loads affect the light path through which scattered photons travel, thereby determining their effective optical paths within the atmosphere [69]. Thus, in MAX-DOAS retrieval, the interpretation of trace gases strongly depends on the proper vertical distribution of the aerosol extinction coefficients [70]. As shown in Figure 11, the AOD at 477 nm (AOD 477 ) from AERONET was divided into seven levels, and we analyzed the consistency between the MAX-DOAS results and the corresponding measurements under each AOD 477 level. The root mean square error (RMSE), correlation coefficient (R), slope, and intercept of the fitted line, and the number of effective data points were selected as evaluation criteria [66].
As shown in Figure 11, aerosol loads affect the retrieval of water vapor VCDs and surface concentrations almost uniformly. Under low aerosol loads (AOD 477 ≤ 1.0), the retrieved results are consistent with the corresponding measurements. The RMSEs between the MAX-DOAS results and corresponding measurements were stable and small (For VCDs, the RMSE varies in 13.92-15.98 × 10 21 molec/cm 2 ; for surface concentrations, the RMSE varies in 6.55-9.29 × 10 16 molec/cm 3 ). The linear fitting results were strong, with correlation coefficients (R) and fitting slopes close to 1.0, and low intercepts. Under high aerosol loads (AOD 477 > 1.0), as the AOD 477 increases, the consistencies degrade and the RMSEs increase almost linearly. The correlation coefficients (R) and linear fitting slopes exhibit a slight decrease, whereas the intercepts increase rapidly. The deterioration of the MAX-DOAS retrieval performance under high aerosol loads can be attributed to the more sophisticated atmosphere optical paths and lower SNRs, which obstruct the retrieval of aerosol extinction profiles in the first step. The incorrect aerosol information inputted into the water vapor retrieval procedure results in the deviation of water vapor VCDs and surface concentrations.
From the variation of RMSEs with AOD 477 levels, the retrieval of surface concentrations is more strongly interfered with by aerosol loads than VCD retrieval. Interestingly, the correlation coefficient (R) and linear fitting slope also vary slightly, while the intercept of the fitting line increases rapidly under increasing AOD 477 levels. In other words, a systematic deviation occurs between the MAX-DOAS results and the corresponding measurements under high aerosol loads (AOD 477 > 1.0), and this deviation increases with increasing AOD 477 level.
Aerosol loads affect the light path through which scattered photons travel, thereby determining their effective optical paths within the atmosphere [67]. Thus, in MAX-DOAS retrieval, the interpretation of trace gases strongly depends on the proper vertical distribution of the aerosol extinction coefficients [68]. As shown in Figure 11, the AOD at 477 nm (AOD477) from AERONET was divided into seven levels, and we analyzed the consistency between the MAX-DOAS results and the corresponding measurements under each AOD477 level. The root mean square error (RMSE), correlation coefficient (R), slope, and intercept of the fitted line, and the number of effective data points were selected as evaluation criteria [64].   Figure 11. (a) Validation results for MAX-DOAS VCDs with AERONET measurements under different AOD levels. The root mean square errors (RMSEs) calculated from the two datasets, correlation coefficients (R), slopes, and intercepts derived using linear regression, and the corresponding numbers of data points under different aerosol optical depth (AOD) levels are displayed as red dots. (b) As (a), but for the correlation between surface concentrations retrieved from MAX-DOAS and NCDC in-situ instruments.

Summary and Conclusions
Long-term MAX-DOAS observations of water vapor in Beijing were performed from 18 April to 30 September 2018 and more than 100,000 spectra were collected. The water vapor dSCDs derived for different elevation angles were converted to VCDs, surface concentrations, and vertical profiles using the HEIPRO algorithm. The retrieved results were compared with corresponding measurements for validation.
The retrieved hourly water vapor VCDs agreed well with the AERONET data, presenting a similar temporal behavior with a correlation coefficient (R) of 0.922 and a linear fitting slope of 0.969. The retrieved hourly surface concentrations showed a good correlation with the NCDC in-situ measurements, achieving a correlation coefficient (R) of 0.876 and a fitting slope of 1.06. For the vertical profiles, the MAX-DOAS retrieved results were validated using balloon-borne radiosonde measurements from the Nanjiao Observatory, Beijing. As shown in Figure 8, three representative MAX-DOAS profiles were consistent with the corresponding balloon-borne radiosonde measurements and ECMWF ERA-interim data. Moreover, the MAX-DOAS-retrieved water vapor concentrations for different height layers (100-2000 m) were validated with the corresponding ECMWF ERA-interim datasets, with the correlation coefficient (R) varying from 0.695 to 0.857. Comparing with the previously published work [32], the consistency between the surface concentrations observed by MAX-DOAS and in-situ instrument is better, raising the Pearson correlation coefficient (R) from 0.75 to 0.876. Additionally, the retrieved vertical profile also has a higher vertical resolution. What's more, the retrieved vertical profiles are validated using balloon-borne radiosonde measurements and ECMWF-interim datasets, which is not involved in the previous work.
Further validations were performed under different aerosol loads identified by the AERONET data. High aerosol loads obstruct the retrieval of surface concentrations and VCDs, with surface concentrations more liable to severe interference from such aerosol loads. Under low aerosol loads (AOD 477 ≤ 1.0), both the surface concentrations and VCDs showed strong consistency with the corresponding measurements. Under higher aerosol loads (AOD 477 > 1.0), the atmosphere optical path may become too complex to reconstruct with the HEIPRO algorithm. Thus the deterioration of the MAX-DOAS retrieval performance became evident. To summarize, the feasibility of detecting water vapor profiles using MAX-DOAS under clear sky is confirmed by this work. The retrieval under cloudy or rainy conditions is still in progress.  Figure S1: The hourly variation of surface temperature, relative humidity, and pressure; Figure S2: A test about the fitting windows of H 2 O SCDs; Figure S3: A test about the polynomial order in DOAS fitting; Figure S4: The normalized frequency distribution of DOFs and Chi-square in retrieved results; Figure