Accuracy of Vaisala RS41 and RS92 Upper Tropospheric Humidity Compared to Satellite Hyperspectral Infrared Measurements

: Radiosondes are important for calibrating satellite sensors and assessing sounding retrievals. Vaisala RS41 radiosondes have mostly replaced RS92 in the Global Climate Observing System (GCOS) Reference Upper Air Network (GRUAN) and the conventional network. This study assesses RS41 and RS92 upper tropospheric humidity (UTH) accuracy by comparing with Infrared Atmospheric Sounding Interferometer (IASI) upper tropospheric water vapor absorption spectrum measurements. Using single RS41 and RS92 soundings at three GRUAN and DOE Atmospheric Radiation Measurement (ARM) sites and dual RS92/RS41 launches at three additional GRUAN sites, collocated with cloud-free IASI radiances (OBS), we compute Line-by-Line Radiative Transfer Model radiances for radiosonde proﬁles (CAL). We analyze OBS-CAL differences from 2015 to 2020, for daytime, nighttime, and dusk/dawn separately if data is available, for standard (STD) RS92 and RS41 processing, and RS92 GRUAN Data Processing (GDP; RS41 GDP is in development). We ﬁnd that daytime RS41 (even without GDP) has ~1% smaller UTH errors than GDP RS92. RS41 may still have a dry bias of 1–1.5% for both daytime and nighttime, and a similar error for nighttime RS92 GDP, while standard RS92 may have a dry bias of 3–4%. These sonde humidity biases are probably upper limits since “cloud-free” scenes could still be cloud contaminated. Radiances computed from European Centre for Medium-Range Weather Forecasts (ECMWF) analyses match better than radiosondes with IASI measurements, perhaps because ECMWF assimilates IASI measurements. Relative differences between RS41 STD and RS92 GDP, or between radiosondes and ECMWF humidity proﬁles obtained from the radiance analysis, are consistent with their differences obtained directly from the RH measurements.


Introduction
Balloon-borne radiosonde (or "sonde") observations (RAOBs) are critical in numerical weather prediction (NWP), data assimilation and forecasting, satellite data calibration/validation (cal/val), and upper air climate change detection. Vaisala RS92 was a major sonde type in the global operational upper air network and a reference sonde in the Global Climate Observing System (GCOS) Reference Upper Air Network (GRUAN) [1]. However, RS92 has gradually been replaced by Vaisala RS41 starting in late 2013. RS92 production ended in 2017, and all stations analyzed in this study stopped using RS92 for operational flights by early 2019. Vaisala RS41 includes new sensor technologies aimed at improving measurement accuracy for temperature, humidity and other variables throughout the atmosphere. These include a heated humidity sensor to prevent dew or frost formation in clouds and a separate temperature sensor attached to the humidity sensor. When the humidity sensor temperature differs from the free-air temperature sensor (whether the humidity sensor is heated intentionally or by erroneous solar heating), it is simple to express the relative humidity (RH) reading as RH at the free-air temperature. Characterizing the RS41 measurement improvement and accuracy is key to the GRUAN RS92-to-RS41 transition management program.
This study assesses the accuracy of atmospheric humidity observations of Vaisala RS92 and RS41. The first and most-used approach to estimate radiosonde accuracy is to conduct assessments in RH or specific humidity, primarily through comparing the data measured simultaneously by different radiosonde instruments from field experiments, e.g., [2][3][4][5][6][7]. Vömel et al. [7] identify RS92 dry biases in the upper troposphere through comparing with cryogenic frost point hygrometer (CFH) measurements, and they propose a correction method to remove the mean bias.
A second assessment method is conducted in satellite radiance space. "Radiance space" refers to the fact that satellite remote sensing instruments measure the received radiant energy or radiance, which is emitted at each spectral frequency according to temperature and concentration of atmospheric gases, aerosols, and cloud particles. Desired meteorological variables are derived using radiances in carefully selected spectral bands. This study compares observed (OBS) atmospheric satellite radiances in spectral bands sensitive to moisture, with radiances calculated (CAL) from radiosonde temperature and humidity profiles via a forward radiative transfer model (RTM) [8][9][10][11][12][13]. For example, Moradi et al. [11,12] use microwave radiance values at 183 GHz as the base to analyze humidity characteristics of different radiosondes.
In this paper, OBS satellite radiances are hyperspectral infrared radiances measured in the upper tropospheric water vapor absorption spectral band (1400-1900 cm −1 ) by the Infrared Atmospheric Sounding Interferometer (IASI). The instrument is on the European Organisation for the Exploitation of Meteorological Satellites (EUMETSAT) MetOp-B satellite. IASI is a Fourier Transform Spectrometer that provides 8461 channels covering the IR spectrum from 3.62-15.5 µm (2762-645 cm −1 ). IASI is a well-characterized IR instrument and has been considered as the in-orbit reference sensor in the Global Space Inter-Calibration System (GSICS) [Tim Hewison at EUMETSAT, personal communication; http://gsics.atmos.umd.edu/pub/Development/AnnualMeeting2019/GRWG_ GDWG_2019_Meeting_Minutes]. The instrument radiometric uncertainty is stable with time, its noise-equivalent delta temperature in the upper tropospheric water vapor absorption band is~0.1-0.3 K at 280 K, and the corresponding noise in radiance units is 0.06 mW m −2 sr cm −1 [14]. This paper calculates radiances (CAL) from radiosonde (or model) temperature and humidity profiles using the Line-by-Line RTM (LBLRTM) [15] over the 1400-1900 cm −1 spectral band, which covers practically all atmospheric levels from~700 hPa and above. LBLRTM is considered a standard in computing radiances by the IR RTM community. It is a highly accurate radiation code that describes the interaction between atmospheric matter and radiation with a very high wavenumber resolution [13]. Calbet et al. [13] similarly use LBLRTM radiances to estimate Vaisala RS92 accuracy at Nauru, the former (1998-2013) Tropical Western Pacific (TWP) ARM site, by comparing with observed IASI radiances. We adopt their approach to understand upper tropospheric humidity (UTH) accuracy, and we extend their study to several GRUAN and ARM sites and compare Vaisala RS41 with RS92. As in Calbet et al. [13], we analyze only cases where the IASI pixel is cloud-free, as discussed in Section 2.2.2, because clouds lead to contaminated radiances.
This study uses the established formula in this field to compare instruments (e.g., [1]), which is the OBS-CAL difference such as IASI-RS92, where IASI (OBS) is considered the reference. Note that the sign of OBS-CAL is opposite from the usual bias formula, which Remote Sens. 2021, 13, 173 3 of 25 would be CAL-OBS. For example, if RS92 radiances have a positive (high) bias relative to IASI, OBS-CAL is negative.
Operational soundings, including from GRUAN stations, use standard Vaisala procedures and corrections for rapid processing and transmission, but some biases remain and ongoing efforts to reduce these errors are documented at https://www.vaisala.com/en/ sounding-data-continuity. Special GRUAN data processing (GDP, using GRUAN software version 2), aims to remove systematic data biases and provide uncertainty estimates [16] so GRUAN soundings can meet climate data record requirements. The NOAA Products Validation System (NPROVS) [17] routinely collects radiosondes and collocates them with satellite sounding data, but when a GRUAN processed sounding and operational sounding are both available, the GRUAN sounding is retained in NPROVS. A test version of GDP is being developed for RS41, so all RS41 soundings collected in NPROVS have standard Vaisala processing (referred to as "RS41 STD"). Depending on site and time period, RS92 soundings collected in NPROVS were processed either through GRUAN software ("RS92 GDP") or standard Vaisala procedures ("RS92 STD").
Validation of retrieved vertical profiles of temperature and humidity obtained from satellite sounding instruments, by comparing them with radiosondes is subject to diverse uncertainties. Among these reasons are significant radiosonde biases, actual profile differences due to the collocation time and distance and separation, and biases in radiative transfer modelling [10]. A detailed comparison exercise, such as the one presented in this paper, is therefore very necessary to properly validate satellite sounder retrievals. In particular, these results are applicable to the validation of satellite-derived products, such as those generated by the NOAA Unique Combined Atmospheric Processing System (NUCAPS, [18,19]) algorithm and several EUMETSAT Nowcasting Satellite Application Facility (SAF) products. Section 2 describes data and methods. Section 2.1 lists the sites with either single launches (RS92 or RS41) or dual launches (RS92 and RS41 are suspended under the same balloon) collocated with IASI data. Section 2.2 lists methods or procedures used to process radiosonde data as the input to the LBLRTM radiance calculation, select IASI pixels for cloud-free scenes, assess the consistency of radiosonde data with IASI data (in radiance space), and convert the radiance differences (OBS-CAL) into RH differences to compute bias statistics. In Section 3, we first present the OBS-CAL analysis for RS92 GDP vs. RS41 STD, using the dual launch data from three GRUAN sites. Those dual launches allow us to understand the humidity difference of the two sondes in radiance and RH, and verify their consistency using both approaches. We then assess, through analyzing the OBS-CAL difference, the accuracy of RS41 STD, RS92 GDP and RS92 STD based on single launches closely matched with an IASI overpass. Radiosonde and NWP model sounding profiles are the major datasets used as the references for satellite sounding data validation and calibration [20]. Model analysis soundings closely collocated to radiosondes and IASI measurements from European Centre for Medium-Range Weather Forecasts (ECMWF) are also analyzed at those single launch sites with the aim to find out model accuracy in comparison with radiosondes and IASI. Section 4 summarizes specific uncertainties involved in this analysis.

Collocated Radiosonde Launches and Their Collocations with IASI Data
The target data for the radiosonde data assessment is the radiance measurements of IASI onboard the MetOp-B satellite, with local equator crossing times being 0930 and 2130. Radiosondes at GRUAN and ARM sites are launched at nominal synoptic times (0000, 0600, 1200, and 1800 UTC, with actual launches usually~1 h earlier, so the radiosonde is in the stratosphere at the stated time. Some stations may not have four launches per day). In addition to synoptic launches, dedicated radiosondes are launched from time to time at ARM sites targeting NOAA polar satellites, including SNPP and NOAA20 [18,19], with local equator crossing times at 0130 and 1330. Collocation time and distance mismatch errors are the biggest uncertainties in the assessment using IASI measurements [10,13]. We selected only sondes launched between 30 min before and 15 min after satellite overpass and within 50 km of the IASI pixel location.
The Eastern North Atlantic (ENA) ARM site at Graciosa Airport, Azores, often meets the criteria with synoptic launches approximately in coincidence with IASI overpasses. In high latitudes, MetOp-B swaths view the same location on several consecutive orbits about 100 min apart (but not necessarily synchronized with synoptic radiosondes). The North Slope of Alaska (NSA) ARM site at Barrow (Utqiaġvik), Alaska, and the Ny Alesund, Norway, GRUAN site are used in this study because their radiosondes have higher chances to be close enough in time to IASI overpasses. While these high-latitude sites are very frequently cloudy, that is not a major concern since our assessment focuses on the upper troposphere.
Additionally, to support the RS92-to-RS41 transition, some GRUAN sites made RS92 and RS41 dual launches starting 2014. These provide the most rigorous radiosonde comparisons because both radiosondes sample the same air column, but the comparisons are still relative because neither RS41 nor RS92 provides absolute accuracy. For dual launches collected in NPROVS, RS92 soundings are mostly GDP while RS41 soundings are STD. At the Lauder, New Zealand GRUAN site, synoptic soundings are closely matched with IASI overpasses (within~1 h before satellite overpass), but synoptic launches at Lindenberg (11Z) and Payerne (11Z and 23Z) are mostly 1-3 h after the overpass, and prevent direct determination of radiosonde accuracy using IASI as the reference, as will be discussed for individual stations in Section 3.1. Nevertheless, dual launches allow us to verify that the radiance difference of the dual sondes is consistent with their difference in RH observations. That would give us the confidence to estimate radiosonde RH biases from the radiance analysis (Sections 3.2-3.4). Of course, the close match of Lauder dual launches with IASI also provides the opportunity to infer the absolute accuracy of RS92 and RS41. The upper portion of Table 1 lists information about the three dual launch sites and their respective launch numbers.
The lower portion of Table 1 lists three sites with single launches at synoptic times that often coincide within 30 min before and 15 min after IASI overpasses. The sounding processing is a mixture of RS41 STD, RS92 GDP, and RS92 STD. Analysis of those data via OBS-CAL differences is designed to address the absolute accuracy of the radiosonde humidity data. Table 1 lists the numbers of those sondes along with the respective numbers of collocated soundings with corresponding cloud-free and all-sky IASI scenes. Sounding numbers for nighttime, daytime, or dusk/dawn are stated in Section 3, where the radiosonde accuracies are analyzed for those diurnal times if they have enough samples available for analysis.
As mentioned, all of the radiosonde profiles are collected in NPROVS [17,18], supported by the NOAA Joint Polar Satellite System (JPSS) program and operated at NOAA NESDIS office of Satellite Applications and Research (STAR) starting 2008. NPROVS provides routine data access, collocation, and intercomparison of multiple satellite temperature and water vapor sounding product suites and NWP model profiles respectively matched with a) global operational radiosondes and b) GRUAN including dedicated radiosonde observations. The collocation approach is to select the "single closest" sounding from each product suite for each radiosonde.
The EUMETSAT MetOp-B IASI L2 sounding product [21,22] is one of the retrieval products routinely ingested in NPROVS for collocations with radiosonde data. The L2 are physical retrievals generated using an optimal estimation method (OEM) by using the all-sky retrievals as the first guess. The all-sky retrievals are generated using piecewise regression methods and infrared and microwave channel data. OEM is attempted for clear-sky only as identified using strict cloud screening and other testing procedures (see Appendix A for more information). The L2 physical retrievals are generated at each IASI field-of-view (FOV). IASI level 1c apodized measurements (smoothed to remove artificial diffractive effects that distort the spectra) are appended to the selected collocations of Remote Sens. 2021, 13, 173 5 of 25 radiosonde with IASI retrieval profile for use in the study. The level 1c datasets are accessed from the NOAA Comprehensive Large Array-Data Stewardship System (CLASS) (https://www.avl.class.noaa.gov/saa/products/welcome). Table 1. Data from the radiosonde sites that are used for the analysis. In the last column, the number of soundings given first is those collocated with clear-sky Infrared Atmospheric Sounding Interferometer (IASI) scenes, followed by soundings associated with all sky scenes in parentheses. In that column, the time collocation limits are given in brackets for IASI minus radiosonde observation (RAOB) time. The selected IASI-RAOB collocations need to go through cloud screening to make sure the IASI FOV scene is cloud-free (see Section 2.2.2 and Appendix A for details) before CAL radiances are computed from radiosonde profiles.

Launch Types
ECMWF operational analysis profiles [23] are also collocated to RAOBs at all IASI-RAOB collocations analyzed. The ECMWF analyses are available at 0000, 0600, 1200, and 1800UTC, with 91 vertical pressure levels thinned from the 137 model sigma levels and horizontal resolution of 0.25 • × 0.25 • [24]. The collocated ECMWF profiles are over 1 h from IASI overpasses in most of the dual launch cases, while~1 hr or less from overpasses in most of the single launch cases.

Radiosonde Profile Data Processing
GRUAN RS92 and RS41 soundings report data values at 1-s intervals, or usuallỹ 7000 vertical levels (accessed from gruan.org/data). Those high-density profiles are converted into 100 vertical levels for the rapid transmittance algorithm used in radiative transfer models [20]. The GRUAN sounding objective is to aim for an altitude of 5 hPa, but only about 50% reach 15 hPa and less than 5% reach 5 hPa due to the use of smaller balloons that burst sooner. To apply the radiative transfer equations to the radiosonde profiles, they must be extended above the burst altitude to the top of the atmosphere (TOA) by appending the collocated ECMWF operational analysis to the top of the radiosonde profile. RS92 and RS41 sensors measure the RH of the ambient air, whereas the RTM requires water Remote Sens. 2021, 13, 173 6 of 25 vapor concentration, typically specific humidity. We convert from RH to specific humidity using the Hyland and Wexler formula [25].
To verify the humidity difference estimated from the radiance difference, for example, between two radiosondes or between radiosondes and ECMWF as discussed in Section 3, we compute the humidity (and temperature) difference from their sounding profiles. To minimize the impact of different vertical resolutions on the assessment, the 100-level radiosonde profiles and 91-level ECMWF profiles are averaged to~1-km coarse layers for temperature and~2-km coarse layers for humidity. Statistics are then computed in those layers with the mid-point coarse layer pressures shown in vertical profile figures (e.g., Figure 1d,e). This approach is standard in validating satellite retrieval soundings using radiosonde or NWP data [18][19][20]26]. figures (e.g., Figure 1d,e). This approach is standard in validating satellite retrieval soundings using radiosonde or NWP data [18][19][20]26].

Cloud Screening for IASI Pixels
A key to this sonde humidity data assessment is that IASI pixels collocated with radiosondes should not be cloud-contaminated. Undetected clouds, primarily high clouds, in the "cloud-free" scenes would bias the assessment. Cloud screening flag information included in the EUMETSAT IASI L2 product is used to find the cloud-free IASI pixels (see Appendix A), and their collocations with RAOBs are then used in the study. Table 1 shows the number of accepted cases after IASI cloud screening. On the average, cloud screening rejects ~87% of the soundings with IASI data within the collocation limits.

Consistency of Radiosonde Data with IASI Measurements
Collocated IASI measurements are compared with the computed radiosonde radiances to find out if the two types of measurements are consistent with each other. Following the proposed rationale [27] for statistical consistency of collocated measurements, IASI and radiosonde data are considered to be consistent with each other if their difference in radiance is within 2 times the k value, The solid line is RS92 GDP minus RS41 STD. The dotted lines show ± one standard deviation of the RS92 GDP minus RS41 STD differences from the solid line. (d-e) Mean differences and standard deviations, RS92 GDP minus RS41 STD, at specified pressure levels (hPa), based on same dual launches as in (a-c). (d) Blue line is the mean atmospheric relative humidity (RH) difference, averaged at specified pressure levels, and the red line is its standard deviation. Gray numbers toward the left of the plot are mean RS41 STD RH values (%) at marked pressure levels. (e) As in (d) except for mean atmospheric temperature difference, and gray numbers are RS41 STD mean temperature (K).

Cloud Screening for IASI Pixels
A key to this sonde humidity data assessment is that IASI pixels collocated with radiosondes should not be cloud-contaminated. Undetected clouds, primarily high clouds, in the "cloud-free" scenes would bias the assessment. Cloud screening flag information included in the EUMETSAT IASI L2 product is used to find the cloud-free IASI pixels (see Appendix A), and their collocations with RAOBs are then used in the study. Table 1 shows the number of accepted cases after IASI cloud screening. On the average, cloud screening rejects~87% of the soundings with IASI data within the collocation limits.

Consistency of Radiosonde Data with IASI Measurements
Collocated IASI measurements are compared with the computed radiosonde radiances to find out if the two types of measurements are consistent with each other. Following the proposed rationale [27] for statistical consistency of collocated measurements, IASI and radiosonde data are considered to be consistent with each other if their difference in radiance is within 2 times the k value, where "m 1 " and "m 2 " are OBS and CAL radiances to be compared, "u 1 " and "u 2 " the associated uncertainties, "σ" the uncertainty due to mismatch and "k" the agreement parameter. The uncertainty in LBLRTM should also be listed as one of the uncertainty components inside the square root but is included in the σ term here to keep the formula general. For this study, the unit for radiance variables is mW m −2 sr cm −1 . Ideally, the radiosonde and IASI consistency is assessed for individual collocations by utilizing Equation (1), and based on that, the consistency for the whole collocation sample is then statistically determined. At individual collocations, the IASI instrument uncertainty is generally available (see Introduction), and uncertainty in computed radiance from GRUAN soundings can be estimated via radiative transfer modelling [13], whether assuming the uncertainty is either fully vertically correlated or not. The spatial and temporal collocation error, however, is unknown. The collocation error is suggested to be much bigger than other uncertainty components [13]. Equation (1) is therefore, not directly used in the assessment, and that could be a limitation of the analysis.
As described by Immler et al. [27], with normally distributed variables and independent uncertainty factors, the standard error (ste) of the OBS-CAL difference for an ensemble (for example, all of the collocations of RS41 STD with IASI from a site) is equal to the square root of the (σ 2 + u 1 2 +u 2 2 ) term. The ste value for a specific wavenumber is calculated from the standard deviation (std) of the OBS-CAL difference by dividing std by the square root of the number of samples (i.e., collocations).
The uncertainty derived from ste based on the ensemble-average is named as the overall or total uncertainty. In this study, this total uncertainty term is used to assess the consistency of ensemble-averaged radiosonde and IASI data in radiance space. RS92 GDP is considered to be consistent with IASI if the mean OBS-CAL difference is less than 2 times ste ( [13], their Figures 6 and 8), so this paper uses the same definition. This is a 2-sided test of consistency at approximately the 95% statistical significance level. Note that they estimate the "average" collocation uncertainty from the std of the OBS-CAL difference.

Converting the Radiance Difference to RH Difference
Radiosonde biases estimated from OBS-CAL differences are stated in terms of radiances (or brightness temperatures). The corresponding biases in RH percentage points can be estimated by simply adding various RH values to the corresponding radiosonde profiles and recomputing the radiances until the OBS-CAL difference for RS92 or RS41 becomes negligible.
Calbet et al. [13] conclude, based on their Figures 7 and 8, that their RS92 OBS-CAL difference of −0.11267 mW m −2 sr cm −1 (averaged for the spectral band 1500-1570 cm −1 ) is equivalent to a 2.5% RH dry bias relative to IASI radiances, and our radiance biases infer a daytime RS92 GDP dry bias of 2.58% and a nighttime dry bias of 0.69%. We use this conversion to estimate the radiosonde (or ECMWF) RH biases from their OBS-CAL differences. Note that channels with wavenumber in the range of 1500-1570 cm −1 are highly water vapor absorptive with their peak absorption in the middle to upper troposphere. They are not affected by low-level clouds or the underlying surface. The mean OBS-CAL difference, DIFF, and standard deviation (STD), are formulated as follows: where OBS i,j is the IASI-observed radiance in wavenumber i for collocation j, CAL i,j is the corresponding LBLRTM-simulated radiance, and N w and N c are the number of wavenumbers (or spectral channels) and collocations included in the average, respectively. For the 1500-1570 cm −1 spectral region, the wavenumber w1 and w2, and N w in Equation (2), are 3421 and 3701, and 281. Equations (2)-(4) can be applied to any spectral region to compute the radiance bias statistics. Bias statistics in the water vapor absorption band (1615-1800 cm −1 ) are also computed, another spectral region that is not sensitive to low-level features. For this region, w1, w2, and N w in Equation (2) are 3881, 4621, and 741. An equivalence of the OBS-CAL difference of −0.07239 mW m −2 sr cm −1 averaged for 1615-1800 cm −1 to a 2.5% RH dry bias [13] is applied to estimate the RH bias from the radiance difference averaged for this spectral region.
The radiance difference statistics computed using Equations (2)-(4) and the RH bias statistics estimated from the radiance differences are listed in all tables except Table 1 to calculate the radiosonde or ECMWF data accuracy. Unless a spectral region is specified, the RH bias estimated from radiance analysis stated in the text is the average of the values computed from those two regions to better represent the upper tropospheric water vapor absorption across the spectra.
Direct humidity observations from radiosonde and ECMWF profiles are used to verify their consistency with the UTH characteristics estimated from the radiances, as discussed in the next section. The IASI channels at 1400-1900 cm −1 are actually sensitive to the water vapor content accumulated through an upper tropospheric layer, rather than a single level. At any wavelength, radiation detected by the satellite originates from the atmospheric layer where there is appreciable water vapor. Above the layer, there is negligible absorption, nor is there enough emission of infrared radiation to be detected. Any radiation emitted below that layer is simply absorbed by the water vapor above it. The layer emitting enough radiation to be detected does not have sharp boundaries. This poses a challenge to define the upper-tropospheric layer in the radiosonde or ECMWF humidity profile that best matches the layer defined in radiance space.
In this study, the 200.9-407.4 hPa pressure interval is used to represent that upper tropospheric layer for all sites and time periods analyzed. RH differences between two dual sondes or between radiosondes and ECMWF (third line of each row in the last column of Tables 2-5) are computed from that pressure interval.
Note that atmospheric structure, including the tropopause and the height of the upper troposphere, varies with location, season or even time of day. Uncertainty can be introduced by the factors discussed in this and preceding paragraphs when we compare the RH characteristics computed from radiances with radiosonde humidity observations. We therefore include figures with RH difference statistics depicted from the lower to the upper troposphere (e.g., Figure 1d) as examples to better understand the consistency of the RH difference between radiance space and humidity observations. Table 2. Dual launch sounding comparisons. (Col. 1) Station, and in parentheses, period of day (according to category of solar elevation angle, SEA) and number of dual soundings analyzed with this SEA category. For each station and SEA category, there are 3 pairs of rows showing a set of mean difference comparisons. The first row in each pair is a header (shown in Col. 2 only) that summarizes the two instruments in that difference comparison, with the differences shown in the second row, Cols. 2-5 or 2-6 as applicable. All radiance differences have units of mW m −2 sr cm −1 , RH differences are % (percentage points out of 100), and each number inside parentheses is one standard deviation of the corresponding variable. The third set of comparisons (in italics) is based on differences of differences, specifically Payerne (night, 10)

Dual Launches of RS92 GDP and RS41 STD
Dual launches of RS92 and RS41 radiosondes at Lauder, Lindenberg, and Payerne were made at synoptic times. Table 2 shows the number of analyzed clear-sky collocations at each station. As in Sun et al. [28], solar elevation angles (SEAs) computed at the radiosonde launch location and time are used to group soundings into three categories for analysis: Nighttime (SEA < −7.5 • ), daytime (SEA ≥ +7.5 • )‚ and dusk/dawn (any other SEA). While the time of a collocated IASI observation or ECMWF profile may be in a different SEA category from the sounding, the SEA at the IASI or ECMWF location is not different enough from the radiosonde SEA category to reject that case. Fewer cases are analyzed at Lauder and Lindenberg than totals shown in Table 1 due to insufficient night or dusk/dawn collocations for reasonable statistical analysis.
Lauder, New Zealand. Figure 1 shows the average OBS-CAL differences from 14 daytime collocations for (a) RS92 GDP and (b) RS41 STD. These soundings were launched at~0900 local time, within~1 h prior to an IASI overpass. The negative OBS-CAL radiance differences shown for both RS92 and RS41 indicate that both sonde types are dry-biased in the upper troposphere. The positive RS92 GDP-RS41 STD radiance difference (Figure 1c) computed using IASI radiance as the transfer standard indicates that RS92 GDP appears to be more dry-biased than RS41 STD.
The dotted lines in Figure 1a,b show ±2 ste (from zero) of the combined uncertainties (as stated in Section 2.2.3), indicating that the CAL radiances for RS92 GDP (solid black line) are statistically inconsistent with IASI measurements while the CAL radiances for RS41 STD (solid red line) are mostly consistent with IASI.
Spikes in the OBS-CAL difference in the spectral regions of 1400-1500 cm −1 and 1800-1900 cm −1 reflect the sensitivity of narrow spectral lines to the lower troposphere (usually below 700 hPa). Those features are common to all sites analyzed in the study.
As listed in Table 2, the OBS-CAL mean difference for RS92 GDP averaged for 1500-1570 cm −1 is −0.1291 (±0.085) mW m −2 sr cm −1 and for RS41 STD is −0.0705 (±0.081) mW m −2 sr cm −1 for RS41 STD, where the values inside the parentheses are one standard deviation of the difference. Throughout the paper, values inside the parentheses following mean biases or differences are one standard deviation. The RH dry biases in the upper troposphere computed from the radiance differences are 2.86% for RS92 GDP and 1.56% for RS41 STD. Similarly, as indicated in Table 2, the RH dry biases converted from the radiance differences at 1615-1800 cm −1 are 2.82% and 1.33%, respectively for RS92 GDP and RS41 STD. The daytime dry bias in RS92 GDP humidity data obtained from Lauder is slightly higher (by 0.30% in the absolute RH value) than found from the former TWP Nauru site [13].
The RH differences between RS92 GDP and RS41 STD estimated from their radiance differences are 1.40% (0.65%), basically consistent with the RH difference of 1.33% (0.8%) based on the measured data ( Table 2). Figure 1d indicates that RS92 GDP is systematically drier than R41 STD by 1-1.5% from the lower troposphere to the upper troposphere during daytime (based on only clear sky data). However, the RH (for RS41) STD averages 25.5% at 478 hPa and 5.5% at 156 hPa, and in terms of specific humidity, this means that RS92 GDP (compared to RS41 STD) averages 3.9% drier at 478 hPa, and 18.2% drier at 156 hPa; specific humidity is more fundamental (than RH) to atmospheric radiative transfer.
In the humidity-sensitive channels, atmospheric temperature may also affect the radiation the satellite receives. In Figure 1e, the RS92 GDP temperature appears to be slightly warmer (by <0.2 K except at the highest level) than RS41 STD above 150 hPa, suggesting the existence of a radiation-related warm bias in GRUAN processed data [28]. However, RS92 GDP appears to be colder than RS41 STD in the troposphere with a maximum cold difference of~0.2 K around 300 hPa.
Appendix B further investigates the impact of atmospheric temperature differences between RS92 GDP and RS41 STD in their CAL radiance differences. It appears the colder temperature in the upper troposphere in RS92 GDP (minus RS41 GDP) leads to slightly more negative radiance differences in the spectrum range of 1400-1900 cm −1 (Figure A1), interpreted as being slightly moister in the upper troposphere. However, the radiance difference contributed by the temperature difference is small. For example, −0.0125 mW m −2 sr cm −1 averaged for 1500-1570 cm −1 , is equivalent to 0.28% in RH. Note that the warm temperature difference in the lower stratosphere does not seem to have an impact on the radiance differences in the 1400-1900 cm −1 band.
Lindenberg, Germany. Given the longitude of this station, most of the dual sondes (11Z) launched at this site are 1 to 3 h after the MetOp-B overpass. Because it generally takes 30 min for the balloon to reach~300 hPa [29], the actual time difference is over 1.5 to 3.5 h in the upper troposphere and lower stratosphere, and systematic (always after overpass).
In Table 2, the OBS-CAL radiance differences averaged for 1500-1570 cm −1 for RS92 GDP and RS41 STD are −0.2204 and −0.1307 mW m −2 sr cm −1, respectively, and the radiance derived dry RH biases are 4.89% and 2.90%, respectively, again compared to IASI. Similar values are obtained from the spectral region 1618-1800 cm −1 . Those numbers are statistically different from zero and are much bigger than the values obtained from Lauder and other single launch sites (see Sections 3.2 and 3.3) where the time differences are within 0.5 h.
The σ term in Equation (1) may increase with the increase in time difference, but the mean difference is not affected much as long as the time differences in the ensemble are random in sign [30]. We suspect the big difference values in radiance and hence in RH values estimated from the radiances at Lindenberg are related to the systematic time difference between the radiosonde launch and IASI overpass.
Consistent with Lauder, the positive radiance difference between RS92 GDP and RS41 STD obtained by using IASI as the transfer standard (Figure 2a) indicates that RS92 GDP is drier than RS41 STD. In Table 2, the RH difference estimated from the OBS-CAL differences averages −2.04% over the 1500-1570 and 1615-1800 cm −1 bands, which is close to the directly observed difference of −1.91%, as also shown in the RH difference vertical profile in Figure 2b (i.e.,~−2% in RH at~330 hPa).
that the warm temperature difference in the lower stratosphere does not seem to have an impact on the radiance differences in the 1400-1900 cm −1 band.
Lindenberg, Germany. Given the longitude of this station, most of the dual sondes (11Z) launched at this site are 1 to 3 h after the MetOp-B overpass. Because it generally takes ~30 min for the balloon to reach ~300 hPa [29], the actual time difference is over 1.5 to 3.5 h in the upper troposphere and lower stratosphere, and systematic (always after overpass).
In Table 2, the OBS-CAL radiance differences averaged for 1500-1570 cm −1 for RS92 GDP and RS41 STD are −0.2204 and −0.1307 mW m −2 sr cm −1, respectively, and the radiance derived dry RH biases are 4.89% and 2.90%, respectively, again compared to IASI. Similar values are obtained from the spectral region 1618-1800 cm −1 . Those numbers are statistically different from zero and are much bigger than the values obtained from Lauder and other single launch sites (see Sections 3.2 and 3.3) where the time differences are within 0.5 h.
The σ term in Equation (1) may increase with the increase in time difference, but the mean difference is not affected much as long as the time differences in the ensemble are random in sign [30]. We suspect the big difference values in radiance and hence in RH values estimated from the radiances at Lindenberg are related to the systematic time difference between the radiosonde launch and IASI overpass.
Consistent with Lauder, the positive radiance difference between RS92 GDP and RS41 STD obtained by using IASI as the transfer standard (Figure 2a) indicates that RS92 GDP is drier than RS41 STD. In Table 2, the RH difference estimated from the OBS-CAL differences averages −2.04% over the 1500-1570 and 1615-1800 cm −1 bands, which is close to the directly observed difference of −1.91%, as also shown in the RH difference vertical profile in Figure 2b (i.e., ~−2% in RH at ~330 hPa). Appendix B indicates that a small radiance difference is contributed by the small temperature difference at Lindenberg (Figure 2c). Both Lauder and Lindenberg analyses indicate that lower stratospheric temperatures do not have impacts on upper tropospheric humidity-sensitive radiances (1400-1900 cm −1 ), and mid-upper tropospheric Appendix B indicates that a small radiance difference is contributed by the small temperature difference at Lindenberg (Figure 2c). Both Lauder and Lindenberg analyses indicate that lower stratospheric temperatures do not have impacts on upper tropospheric humidity-sensitive radiances (1400-1900 cm −1 ), and mid-upper tropospheric temperatures can have an impact, but impacts of a temperature difference <0.2 K on radiance in the context of RH are negligible.
Payerne, Switzerland. Similar to Lindenberg, dual launches at Payerne are mostly 1-3 h after IASI overpasses. As shown in Table 2, the daytime RH dry bias converted from the OBS-CAL difference averaged for 1500-1570 cm −1 is 3.50% for RS92 GDP and 2.38% for RS41 STD; and the corresponding night dry biases are 2.57% and 2.93%. As at Lindenberg, those big RH bias values may be "inflated" by a systematic time difference. In the 1500-1570 cm −1 band, the daytime RS92 GDP minus RS41 STD RH difference estimated from radiance differences is −1.12% (0.9%), and the nighttime RH difference is +0.35% (1.2%). Blue lines in Figure 3a,b show similar radiosonde RH differences at~300 hPa.  Analysis of dual sonde data in this subsection indicates that the RH differences estimated from radiance space are basically consistent with the measured upper tropospheric RH differences in the radiosonde observations. This lends confidence in using radiance differences to analyze the sonde accuracy for single launched sondes to be presented in Sections 3.2-3.4.

Single Launches of RS41 STD
As mentioned in the Introduction, all single launches of radiosondes (including RS41 STD, RS92 GDP, and RS92 STD) analyzed in the study are within 50 km and between 0.5 h before and 0.25 h after IASI MetOp-B overpasses. ECMWF analyses are typically ~1h or less from the satellite overpasses in those single launch cases.
ENA. A small UTH dry bias for RS41 STD for both nighttime and daytime is suggested by the slightly negative OBS-CAL radiance differences (Figure 4a,b). The RH dry biases estimated from OBS-CAL are 1.17% (1.25%) and 1.34% (0.95%) for nighttime and daytime, respectively. The ECMWF analyses collocated with radiosondes are ~0.5 h after Analysis of dual sonde data in this subsection indicates that the RH differences estimated from radiance space are basically consistent with the measured upper tropospheric RH differences in the radiosonde observations. This lends confidence in using radiance differences to analyze the sonde accuracy for single launched sondes to be presented in Sections 3.2-3.4.

Single Launches of RS41 STD
As mentioned in the Introduction, all single launches of radiosondes (including RS41 STD, RS92 GDP, and RS92 STD) analyzed in the study are within 50 km and between 0.5 h before and 0.25 h after IASI MetOp-B overpasses. ECMWF analyses are typically~1 h or less from the satellite overpasses in those single launch cases.
ENA. A small UTH dry bias for RS41 STD for both nighttime and daytime is suggested by the slightly negative OBS-CAL radiance differences (Figure 4a,b). The RH dry biases estimated from OBS-CAL are 1.17% (1.25%) and 1.34% (0.95%) for nighttime and daytime, respectively. The ECMWF analyses collocated with radiosondes are~0.5 h after overpasses in this location. OBS-CAL for ECMWF is close to zero (Figure 4c,d), and the RH biases in Table 3 estimated from the radiance differences are 0.00% (1.35%) and 0.39% (1.65%) for nighttime and daytime. The UTH dry biases of RS41 STD relative to ECMWF estimated from the radiance analysis (1.17% for nighttime and 1.73% for daytime) are basically consistent with those directly computed from the RH profiles (Table 3 and Figure 4e,f). Interestingly, RS41 STD appears to be <1% moister than ECMWF for both nighttime and daytime in the low-middle troposphere (Figure 4e,f).
The standard deviations of the RH differences computed from the RH profile data are, however, much bigger than the ones estimated from the radiance differences (e.g., 7.2% vs 1.6% for daytime, Table 3). This contrast occurs with all single launches (Tables 3-5), but does not occur with the dual launches, where they are comparable to each other ( Table 2). The primary reason is that the standard deviations in column 6 of Tables 3-5 are based on radiosonde RH compared with ECMWF RH that may differ up to 1 h and 10 km from the radiosonde, while those in Table 2 are computed from dual sondes with no collocation time or distance error.
The OBS-CAL differences (solid curves of Figure 4c,d) across 1400-1900 cm −1 for ECMWF fall within 2 × ste, suggesting that ECMWF and IASI are consistent with each other in the radiance space after taking into account the uncertainty terms discussed in Section 2.2. This consistency happens at other single launch collocations analyzed in this subsection and the following two subsections too (figures not shown).
For RS41 STD (Figure 4a,b), OBS-CAL differences for nighttime marginally fall within 2 × ste, while OBS-CAL differences for daytime are far beyond 2 × ste. This nighttime vs daytime contrast is partly related to the mean OBS-CAL differences, which are slightly bigger during daytime (Table 2). However, the major factor is that the daytime collocation sample (27) is much bigger than the night sample (12), so the ensemble-averaged σ (and hence ste) is much smaller in the daytime through better averaging out the random collocation noise. Therefore, the consistency evaluation methods discussed in Section 2.2.3 should be exercised cautiously. The collocation sample size and The UTH dry biases of RS41 STD relative to ECMWF estimated from the radiance analysis (1.17% for nighttime and 1.73% for daytime) are basically consistent with those directly computed from the RH profiles (Table 3 and Figure 4e,f). Interestingly, RS41 STD appears to be <1% moister than ECMWF for both nighttime and daytime in the low-middle troposphere (Figure 4e,f).
The standard deviations of the RH differences computed from the RH profile data are, however, much bigger than the ones estimated from the radiance differences (e.g., 7.2% vs. 1.6% for daytime, Table 3). This contrast occurs with all single launches (Tables 3-5), but does not occur with the dual launches, where they are comparable to each other ( Table 2). The primary reason is that the standard deviations in column 6 of Tables 3-5 are based on radiosonde RH compared with ECMWF RH that may differ up to 1 h and 10 km from the radiosonde, while those in Table 2 are computed from dual sondes with no collocation time or distance error.
The OBS-CAL differences (solid curves of Figure 4c,d) across 1400-1900 cm −1 for ECMWF fall within 2 × ste, suggesting that ECMWF and IASI are consistent with each other in the radiance space after taking into account the uncertainty terms discussed in Section 2.2. This consistency happens at other single launch collocations analyzed in this subsection and the following two subsections too (figures not shown).
For RS41 STD (Figure 4a,b), OBS-CAL differences for nighttime marginally fall within 2 × ste, while OBS-CAL differences for daytime are far beyond 2 × ste. This nighttime vs. daytime contrast is partly related to the mean OBS-CAL differences, which are slightly bigger during daytime (Table 2). However, the major factor is that the daytime collocation sample (27) is much bigger than the night sample (12), so the ensemble-averaged σ (and hence ste) is much smaller in the daytime through better averaging out the random collocation noise. Therefore, the consistency evaluation methods discussed in Section 2.2.3 should be exercised cautiously. The collocation sample size and hence the ensemble-averaged uncertainty could play an important role in determining if two variables are consistent with each other.
NSA. The RS41 and IASI cloud-free collocations occur mostly at night and dusk/dawn. At this site, ECMWF is within~0.5 h after each MetOp-B overpass. Similar to the ENA OBS-CAL radiance patterns, the OBS-CAL differences for RS41 STD for both nighttime and dusk/dawn (Figure 5b,d) are slightly negative, equivalent to a small dry bias in RS41 STD (1.29% and 1.46% respectively, Table 3). NSA. The RS41 and IASI cloud-free collocations occur mostly at night and dusk/dawn. At this site, ECMWF is within ~0.5 h after each MetOp-B overpass. Similar to the ENA OBS-CAL radiance patterns, the OBS-CAL differences for RS41 STD for both nighttime and dusk/dawn (Figure 5b,d) are slightly negative, equivalent to a small dry bias in RS41 STD (1.29% and 1.46% respectively, Table 3).  Relative to the RS41 STD minus ECMWF RH differences estimated from the radiance analysis, 1.02% and 0.63% for nighttime and dusk/dawn, the radiosonde RH differences directly computed over 200.9-407.4 hPa are greater ( Table 3). The reason is that the pressure interval does not accurately represent the upper troposphere at the site (see discussion in Section 2.2.4), where the tropopause altitude is generally lower. As a matter of fact, by raising the pressure by~50 hPa, the UTH dryness of RS41 STD relative to ECMWF obtained from the RH profiles (Figure 5c,d) matches well with RH from the radiance analysis.
Note that in Figure 5a,b, the ste values of OBS-CAL for RS41 STD show fluctuations in the channel from 1800 to 1900 cm −1 for both nighttime and dusk/dawn, but not in other spectral ranges. This feature is not seen at ENA (Figure 4a-d) or the three dual launch sites (e.g., Figure 1a,b) while it is also observed at Ny Alesund (figures not shown). Atmospheric water vapor content over polar regions tends to be low and channels in 1800-1900 cm −1 could be sensitive to surface snow/ice which often occurs there.
Ny Alesund. Most of the radiosonde-satellite collocations for cloud-free scenes are for dusk/dawn and daytime. As listed in Table 3, dry biases of 1.46% and 1.82% are estimated for RS41 STD from the OBS-CAL differences for dusk/dawn and daytime respectively. ECMWF shows a smaller dry bias (<0.7%) estimated from the radiance analysis for both dusk/dawn and daytime, but the bias at this site is slightly greater than that at ENA or NSA. The reason for that could be that ECMWF is~1 h after satellite overpass at Ny Alesund while the time difference in other two sites is~0.5 h.

Single Launches of RS92 GDP
ENA. The negative OBS-CAL radiance differences for RS92 GDP (Figure 6a,b) indicate that the GRUAN processed RS92 has a small upper tropospheric dry bias in both nighttime and daytime, with the daytime dry bias being larger. RH dry biases estimated from the OBS-CAL difference are 1.13% and 2.57% for nighttime and daytime, respectively. The nighttime biases for RS92 GDP and RS41 STD at the same site are comparable, but the daytime RS92 GDP bias is greater (by~1% in RH) than for RS41 STD.
Sens. 2020, 12, x FOR PEER REVIEW 19 of 25 Relative to the RS41 STD minus ECMWF RH differences estimated from the radiance analysis, 1.02% and 0.63% for nighttime and dusk/dawn, the radiosonde RH differences directly computed over 200.9-407.4 hPa are greater ( Table 3). The reason is that the pressure interval does not accurately represent the upper troposphere at the site (see discussion in Section 2.2.4), where the tropopause altitude is generally lower. As a matter of fact, by raising the pressure by ~50 hPa, the UTH dryness of RS41 STD relative to ECMWF obtained from the RH profiles (Figure 5c,d) matches well with RH from the radiance analysis.
Note that in Figure 5a,b, the ste values of OBS-CAL for RS41 STD show fluctuations in the channel from 1800 to 1900 cm −1 for both nighttime and dusk/dawn, but not in other spectral ranges. This feature is not seen at ENA (Figure 4a-d) or the three dual launch sites (e.g., Figure 1a,b) while it is also observed at Ny Alesund (figures not shown). Atmospheric water vapor content over polar regions tends to be low and channels in 1800-1900 cm −1 could be sensitive to surface snow/ice which often occurs there.
Ny Alesund. Most of the radiosonde-satellite collocations for cloud-free scenes are for dusk/dawn and daytime. As listed in Table 3, dry biases of 1.46% and 1.82% are estimated for RS41 STD from the OBS-CAL differences for dusk/dawn and daytime respectively. ECMWF shows a smaller dry bias (<0.7%) estimated from the radiance analysis for both dusk/dawn and daytime, but the bias at this site is slightly greater than that at ENA or NSA. The reason for that could be that ECMWF is ~1 h after satellite overpass at Ny Alesund while the time difference in other two sites is ~0.5 h.

Single Launches of RS92 GDP
ENA. The negative OBS-CAL radiance differences for RS92 GDP (Figure 6a,b) indicate that the GRUAN processed RS92 has a small upper tropospheric dry bias in both nighttime and daytime, with the daytime dry bias being larger. RH dry biases estimated from the OBS-CAL difference are 1.13% and 2.57% for nighttime and daytime, respectively. The nighttime biases for RS92 GDP and RS41 STD at the same site are comparable, but the daytime RS92 GDP bias is greater (by ~1% in RH) than for RS41 STD.  The nighttime OBS-CAL differences for RS92 GDP (averaged from 43 collocations) marginally fall within 2 × ste, while daytime OBS-CAL differences (averaged from 50 collocations) are far beyond 2 × ste (Figure 6a,b). That contrast is primarily related to the mean OBS-CAL differences, which are bigger during daytime than nighttime.
NSA. The sample of collocations with cloud-free IASI is small. A dry bias of 1.35% during nighttime and 1.98% during daytime is obtained from the radiance analysis (Table 4). ECMWF is collocated within 0.5 h after the satellite overpass at nighttime and 2-3 h after the overpass for daytime. The OBS-CAL difference for ECMWF averaged over 1500-1570 cm −1 for nighttime is only −0.0043 (0.095) mW m −2 sr cm −1 , equivalent to a RH dry bias of 0.1%. For daytime, the value is −0.0521 (0.029) mW m −2 sr cm −1 , equivalent to a dry bias of 1.16%. We suspect the contrast is related to the difference in ECMWF-IASI collocation time separation, as discussed in Section 3.1 for data at Lindenberg and Payerne.
Ny Alesund. There are only daytime collocations available for a statistical analysis of RS92 GDP launches. A dry bias in RS92 GDP of 2.29% estimated from OBS-CAL is shown (Table 4). ECMWF is~1 h after the satellite overpass. Again, the dryness in RS92 GDP relative to ECMWF estimated from the radiance analysis is verified in radiosonde RH observations (Table 4).

Single Launches of RS92 STD
This study has RS92 STD launches and IASI collocations only at station ENA. A striking feature in the OBS-CAL differences for RS92 STD (Figure 7a,b) is their differences are greater than for RS41 STD and RS92 GDP for both nighttime and daytime, suggesting that UTH dry biases of RS92 STD are larger. RH biases estimated from the radiance analysis are 3.90% (1.7%) and 3.25% (2.6%) respectively for nighttime and daytime. Those big biases exaggerate the statistical inconsistency between RS92 STD and IASI, compared to between RS92 GDP or RS41 STD and IASI. The nighttime OBS-CAL differences for RS92 GDP (averaged from 43 collocations) marginally fall within 2 × ste, while daytime OBS-CAL differences (averaged from 50 collocations) are far beyond 2 × ste (Figure 6a,b). That contrast is primarily related to the mean OBS-CAL differences, which are bigger during daytime than nighttime.
NSA. The sample of collocations with cloud-free IASI is small. A dry bias of 1.35% during nighttime and 1.98% during daytime is obtained from the radiance analysis (Table 4). ECMWF is collocated within 0.5 h after the satellite overpass at nighttime and 2-3 h after the overpass for daytime. The OBS-CAL difference for ECMWF averaged over 1500-1570 cm −1 for nighttime is only −0.0043 (0.095) mW m −2 sr cm −1 , equivalent to a RH dry bias of 0.1%. For daytime, the value is −0.0521 (0.029) mW m −2 sr cm −1 , equivalent to a dry bias of 1.16%. We suspect the contrast is related to the difference in ECMWF-IASI collocation time separation, as discussed in Section 3.1 for data at Lindenberg and Payerne.
Ny Alesund. There are only daytime collocations available for a statistical analysis of RS92 GDP launches. A dry bias in RS92 GDP of 2.29% estimated from OBS-CAL is shown (Table 4). ECMWF is ~1 h after the satellite overpass. Again, the dryness in RS92 GDP relative to ECMWF estimated from the radiance analysis is verified in radiosonde RH observations (Table 4).

Single Launches of RS92 STD
This study has RS92 STD launches and IASI collocations only at station ENA. A striking feature in the OBS-CAL differences for RS92 STD (Figure 7a,b) is their differences are greater than for RS41 STD and RS92 GDP for both nighttime and daytime, suggesting that UTH dry biases of RS92 STD are larger. RH biases estimated from the radiance analysis are 3.90% (1.7%) and 3.25% (2.6%) respectively for nighttime and daytime. Those big biases exaggerate the statistical inconsistency between RS92 STD and IASI, compared to between RS92 GDP or RS41 STD and IASI.   Table 5) we obtained from the radiance analysis appear to be smaller than those reported by Miloshevich et al. [5]. They notice a dry bias of 4% and 5% for nighttime and daytime respectively by comparing with cryogenic frost point hygrometer measurements. A possible explanation of the discrepancy between the two studies is that radiosonde biases estimated from the radiance analysis are for the whole layer with water vapor detected by IASI, while the biases in Miloshevich et al. [5] are for specific levels of the upper troposphere. Also, https://www.vaisala.com/en/soundingdata-continuity documents a change in Vaisala RS92 operational corrections after 2010.

Summary and Discussion
This paper assesses accuracies of upper tropospheric humidity observations for daytime and night separately for Vaisala RS41 STD, RS92 GDP, and RS92 STD, respectively. This is achieved by comparing the humidity sensitive infrared radiances (the 1400-1900 cm −1 spectral band) computed using LBLRTM from radiosonde profiles with collocated cloudfree IASI radiance measurements and with radiances similarly computed from collocated ECMWF model profiles. We primarily use single radiosondes from three GRUAN and ARM sites, with launches (primarily at synoptic times) mostly coincident within 30 min before and 15 min after IASI overpasses. We also compare dual launches (RS92 and RS41) at three other GRUAN sites, with radiosondes within 1 h of IASI overpasses at one station and 1-3 h before overpasses at the other two stations. Dual launches provide a direct comparison of RS92 vs. RS41 and with IASI, and are used as a cross-validation of the results obtained from single launches of RS92 or RS41. Accuracy of ECMWF humidity data is assessed in radiance space utilizing the collocations from single launch sites where ECMWF data is mostly at or within~1 h of IASI. All comparisons of ECMWF vs. IASI radiances show very small systematic ECMWF biases.
Relative to IASI as a practical reference, daytime RS41 (even without GDP) has~1% (percentage points of RH) smaller UTH errors than RS92 GDP. RS41 may still have a dry bias of 1-1.5% in both daytime and nighttime, and RS92 GDP may have a similar dry bias at night, while standard RS92 may have a dry bias of 3-4%. Those characteristics are obtained independently from 1500-1570 cm −1 and 1615-1800 cm −1 , indicating the consistency of water vapor spectroscopy between the two bands. The relative differences between RS41 STD and RS92 GDP or between radiosonde and ECMWF obtained from the radiance analysis are consistent with their differences in RH measurements. The small biases of RS41 STD indicate that RS41 at operational stations is probably almost an "absolute" standard. Note also that RS92 GDP improves accuracy to nearly the level of RS41 STD.
Radiosonde-satellite collocation uncertainty plays a big role in assessing their consistency, but collocation uncertainty generally remains unknown for individual collocations. A method was used to investigate the consistency between ensemble-averaged radiosonde (ECMWF) and IASI by computing an overall or total uncertainty term, including noise from radiosonde and satellite instruments, collocation uncertainty, and uncertainty in the LBLRTM (Section 2.2.3). Results show that RS92 STD for both daytime and nighttime and RS92 GDP for daytime are not statistically consistent with IASI. RS92 GDP for nighttime and RS41 STD for both nighttime and daytime are consistent with IASI for some cases while not for some other cases. Interpretation of the biases and consistency results presented in the study requires caution since the size of the collocation sample can directly affect the standard deviation of the overall uncertainty term, and thus the consistency (and confidence) of the assessment. It is interesting to notice, however, that ECMWF analyses are statistically consistent with IASI in almost all of the cases analyzed. We are uncertain about the reason for high model consistency, but ECMWF assimilation of both radiosonde data and IASI radiances may play a role.
The sonde humidity biases obtained from the radiance analysis are likely to be upper limits since the "cloud-free" scenes selected could still be cloud contaminated (Appendix A). The IASI channels used as the target for the analysis sense the water vapor content of an atmospheric layer in the upper troposphere, and caution is needed to compare Table A1. Mean differences of OBS-CAL (in all cases, IASI-RS92 GDP) calculated for IASI scenes with cloud flag being clear with high confidence ("CLD1") and presumably clear ("CLD2") using collocations of IASI-RS92 GDP data at ENA. The RH biases estimated from OBS-CAL are also listed. Numbers of IASI-RAOB collocations are in the parentheses after CLD1 and CLD2 in the second column. The CLD1 or CLD2 header in Col. 2 applies also to Cols. 3-5. Each value in parentheses in the difference lines (lines 2 and 4 in each station and SEA category) is one standard deviation of the difference to its left.

Appendix B
The impact of atmospheric temperature differences on the calculated radiance differences at 1400-1900 cm −1 is quantified using RS92 and RS41 dual launch radiosonde data, where both sondes sample the same surface and atmosphere. We recalculate the radiances for RS41 STD, but use the RS92 GDP temperature and RS41 STD humidity profiles, and keep other variables needed in the LBLRTM calculation the same. This new RS41 STD is called RS41 STDv. We then compare RS41 STDv radiances with the radiances calculated using RS41 STD temperature and humidity. The difference between the two radiances, CAL (RS41 STDv)-CAL (RS41 STD), if any, is expected to come from their temperature difference. Figure A1 shows the mean difference and ±2 standard deviations of CAL(RS41 STDv)-CAL(RS41 STD), based on the same Lauder dual launch data used in Figure 1. The radiance differences are negative across 1400-1900 cm −1 , indicating that the colder temperature in RS92 GDP (relative to RS41 STD, see Figure 1e) around the upper troposphere tends to "cause" a more "wet" RH. However, the radiance difference is rather small, averaging −0.0125 mW m −2 sr cm −1 for 1500-1570 cm −1 , equivalent to 0.278 % in RH.

Appendix B
The impact of atmospheric temperature differences on the calculated radiance differences at 1400-1900 cm −1 is quantified using RS92 and RS41 dual launch radiosonde data, where both sondes sample the same surface and atmosphere. We recalculate the radiances for RS41 STD, but use the RS92 GDP temperature and RS41 STD humidity profiles, and keep other variables needed in the LBLRTM calculation the same. This new RS41 STD is called RS41 STDv. We then compare RS41 STDv radiances with the radiances calculated using RS41 STD temperature and humidity. The difference between the two radiances, CAL (RS41 STDv)-CAL (RS41 STD), if any, is expected to come from their temperature difference. Figure A1 shows the mean difference and ±2 standard deviations of CAL(RS41 STDv)-CAL(RS41 STD), based on the same Lauder dual launch data used in Figure 1. The radiance differences are negative across 1400-1900 cm −1 , indicating that the colder temperature in RS92 GDP (relative to RS41 STD, see Figure 1e) around the upper troposphere tends to "cause" a more "wet" RH. However, the radiance difference is rather small, averaging −0.0125 mW m −2 sr cm −1 for 1500-1570 cm −1 , equivalent to 0.278 % in RH. Figure A1. Lauder, New Zealand. CAL radiance differences, RS41 STDv minus RS41 STD, based on 14 daytime launches. Dotted lines show ± one standard deviation from the solid line, as in Figure  1c. RS41 STDv includes temperature profile from RS92 GDP and humidity profile from RS41 STD. See text for discussion on the impact of temperature difference (between RS92 GDP and RS41 STD) on the RH difference estimated from the radiance analysis. Figure A2 is based on the dual launch data at Lindenberg (also used for Figure 2). The radiance difference between RS42 STDv and RS41 STD is −0.0010 mW m −2 sr cm −1 for Figure A1. Lauder, New Zealand. CAL radiance differences, RS41 STDv minus RS41 STD, based on 14 daytime launches. Dotted lines show ± one standard deviation from the solid line, as in Figure 1c. RS41 STDv includes temperature profile from RS92 GDP and humidity profile from RS41 STD. See text for discussion on the impact of temperature difference (between RS92 GDP and RS41 STD) on the RH difference estimated from the radiance analysis. Figure A2 is based on the dual launch data at Lindenberg (also used for Figure 2). The radiance difference between RS42 STDv and RS41 STD is −0.0010 mW m −2 sr cm −1 for 1500-1570 cm −1 , equivalent to 0.023% in RH. The temperature difference between RS92 GDP and RS41 STD in the upper troposphere is much smaller at Lindenberg than at Lauder (for example, −0.06 K vs. −0.14 K at 328.6 hPa); the radiance difference between RS92 GDP and RS41 STD is also smaller at Lindenberg than at Lauder. Since the temperature difference is very small in these analyses, the temperature contribution to the CAL radiances and hence the humidity computed from the radiance is negligible.
ns. 2020, 12, x FOR PEER REVIEW 24 of 25 1500-1570 cm −1 , equivalent to 0.023% in RH. The temperature difference between RS92 GDP and RS41 STD in the upper troposphere is much smaller at Lindenberg than at Lauder (for example, −0.06 K vs −0.14 K at 328.6 hPa); the radiance difference between RS92 GDP and RS41 STD is also smaller at Lindenberg than at Lauder. Since the temperature difference is very small in these analyses, the temperature contribution to the CAL radiances and hence the humidity computed from the radiance is negligible. Figure A2. Lindenberg, Germany. Same as Figure A1 but for 19 daytime dual launches.