Biases of Global Tropopause Altitude Products in Reanalyses and Implications for Estimates of Tropospheric Column Ozone

: Accuracy of global tropopause altitude products from reanalyses is important to applications of the products, including the derivation of tropospheric column ozone (TCO). Here, monthly biases in lapse-rate tropopause pressure (P LRT ) in two reanalyses, NCEP/NCAR and MERRA-2, and associated implications for estimating TCO are examined, based on global radiosonde observations over 1980–2017 at 689 stations. Our analysis suggests that the global mean P LRT is underestimated by − 2.3 hPa in NCEP/NCAR and by − 0.9 hPa in MERRA-2, mainly attributable to large negative biases around the subtropics (~20 ◦ –50 ◦ ) in both hemispheres, with generally positive biases at other latitudes. Overall, NCEP/NCAR outperforms MERRA-2 in the Northern Hemisphere but underperforms MERRA-2 in the Southern Hemisphere. P LRT biases in the two reanalyses vary more evidently with latitude than with longitude. From winter to summer, the peaks of negative P LRT biases around the subtropics shift poleward by ~10 ◦ . Approximately, 70% of the reanalysis P LRT biases are within − 10–10 hPa. Consequently, a negative (positive) P LRT bias induces a positive (negative) TCO bias. In absolute magnitude, the mean ozonesonde TCO bias attributable to P LRT biases is ~0.2, ~0.8 and ~1.2 Dobson Units (DU) if a P LRT bias is within 0–5, 10–15, and 10–15 hPa. Using a global ozone climatology, we estimate that the global mean bias in TCO induced by the P LRT biases in both reanalyses is positive, being 0.64 DU (or 2.2%) for NCEP/NCAR and 0.28 DU (or 1.1%) for MERRA-2. either reanalyses remapped into different horizontal resolutions, in terms of the mean over 1980–2017 on a monthly basis. For example, in NCER/NCAR, the P LRT bias averaged over all 10 ◦ zones in the NH is − 1.2, − 0.6, and − 1.3 hPa, for 1 ◦ × 1 ◦ , 2.5 ◦ × 2.5 ◦ and 5 ◦ × 5 ◦ resolutions. At higher resolutions P LRT biases show variations at smaller latitudinal intervals and have larger ﬂuctuations, in the 20–40 ◦ latitude band in both hemispheres. This is the case for the NCER/NCAR reanalysis at 1 ◦ × 1 ◦ and for MERRA-2 at 0.625 ◦ × 0.5 ◦ and 1 ◦ × 1 ◦ resolutions. assessments have wide implications for studies of tropospheric ozone burden, tropospheric ozone radiative forcing, derivation of TCO from satellite data, and data assimilation. In this study, we select long-term global radiosonde data as a basis for the P LRT bias assessment because of the high quality of the data for identifying P LRT . We employ cosine-weighted latitudinal averaging to minimize the impact of spatial inhomogeneity of radiosonde data on our global and hemispheric assessments of P LRT biases. Although it is beyond the scope of this study, P LRT from GPS-RO satellite data could be used to enhance our understanding of the global variation in P LRT biases for recent decades. Furthermore, what drives biases in tropopause altitude in reanalyses could be further explored for improvement of reanalysis LRT products.


Introduction
As the boundary between the well-mixed convective troposphere and the radiatively controlled stratosphere, the tropopause closely relates to multiple processes in the atmosphere. Long-term changes in tropopause height are thought to be an indicator of climate change [1][2][3]. Previous studies [1] suggest that an increase in tropopause height is mainly attributable to warming of the troposphere due to an increase in well-mixed greenhouse gases (GHGs) and cooling of the stratosphere due to a reduction in stratospheric ozone and the increase in GHGs. The tropopause height frequency (THF) methodology is frequently used in exploring the long-term widening of the tropical belt [4]. Furthermore, tropopause properties, including height, pressure, and temperature, are strongly related to stratosphere-troposphere exchange (STE). Changes in tropopause temperature, especially in the tropics, could substantially affect the transport of water vapor from the troposphere to the stratosphere [5][6][7]. The determination of the tropopause is a prerequisite for identifying and classifying STE events using trajectory methods [8,9]. The tropopause is also the upper limit for integration of tropospheric properties in chemistry and physics, such as tropospheric column ozone (TCO) [10,11] and tropospheric temperature [12]. Satellite retrievals of stratospheric aerosol optical depths (SAOD) are often presented by integrating layer containing the tropopause in a profile of interest with a certain vertical resolution. Examples of such profiles include vertical temperature and ozone mixing ratio. LRT biases in reanalyses may cause misidentification of the tropopause layer. It is important to analyze how the misidentification rate varies with LRT biases and the vertical resolutions of the profile.
The tropopause is the upper limit for integration of TCO, and is therefore a prerequisite for estimating TCO. However, how large are the TCO biases induced from the LRT biases in reanalyses? How do the induced TCO biases vary spatially? Would the global mean TCO be overestimated or underestimated due to the LRT biases? These questions are worthy of investigation. An assessment of the LRT bias is useful to separating this bias from the overall bias for an estimation of TCO. Fishman et al. [44] and Ziemke et al. [45] developed the tropospheric ozone residual method to derive TCO based on satellite observations, by subtracting the stratospheric column ozone from the total column ozone. In this method, the information on the tropopause also is a critical prerequisite. TCO values are relevant to estimating the global tropospheric ozone burden and the tropospheric ozone radiative forcing [17,[46][47][48], further suggesting the importance of assessing the induced TCO biases. Such assessments are also useful to wide applications of TCO data, for example, in data assimilation studies [49].
This study aims to address the above-discussed questions. Through comparison with radiosonde observations, we characterize the biases in global LRT data in two reanalyses, NCEP/NCAR and the National Aeronautics and Space Administration Modern-Era Retrospective analysis for Research and Applications, version 2 (MERRA-2), over 1980-2017 (Section 3.1). We examine how changes in horizontal resolution in the reanalyses affect LRT biases (Section 3.2). We quantify how LRT biases affect the identification of the tropopause layer in a vertical profile of interest (Section 3.2). Finally, we analyze the TCO biases induced from using the reanalysis P LRT as the upper integration limit (Section 3.3). The radiosonde, reanalysis, and ozone data are introduced in Section 2, and conclusions are provided in Section 4.

Tropopause Data from Radiosonde Data
Radiosonde observations used in this study were acquired from the Integrated Global Radiosonde Archive (version 2, IGRA 2), which provides historical sounding records at over 2700 stations globally [50]. We selected 689 radiosonde stations with reasonably complete and recently updated sounding records over 1980-2017. Each station selected for analysis satisfies the criteria that the station still reports data after 2014 (inclusive) and has at least 10,000 archived soundings. Figure 1 shows the number of selected stations in each 10 • latitudinal zone.
The method of Zängl and Hoinka [27] was applied to detect LRT based on the geopotential height, pressure, and temperature profiles provided by the radiosonde data. The first lapse-rate tropopause is defined as "the lowest level at which the lapse rate decreases to 2 • C/km or less, provided also the average lapse rate between this level and all higher levels within 2 km does not exceed 2 • C/km" [18]. A secondary tropopause is detected "if above the first tropopause the average lapse rate between any level and all higher levels within 1 km exceeds 3 • C/km, then a second tropopause is defined by the same criterion". In this paper, the tropopause height and the corresponding pressure refer to the first LRT height (H LRT ) and the first LRT pressure (P LRT ) unless stated otherwise. To avoid unrealistic P LRT and H LRT detected in some soundings, a calculated P LRT or H LRT in a sounding was regarded as invalid if (1) the H LRT is lower than 5 km (~550 hPa for P LRT ) or higher than 18 km (~75 hPa for P LRT ), or (2) the sounding has no records above 2 km of the H LRT , or (3) the H LRT exceeds the range of the mean H LRT ± two standard deviations over 1980-2017 at the station. Because both the reanalyses only provide P LRT data, we used P LRT , instead of H LRT , from radiosondes to reduce interpolation errors for direct comparisons between radiosonde and reanalysis P LRT . In Table 1, we provide assessments of both P LRT and H LRT Following the criteria set by Seidel and Randel [28], we calculate a monthly P LRT only if the daily value of P LRT is available on at least 15 days in that month, which is appropriate for balancing temporal homogeneity and completeness of radiosonde soundings. In fact, our result is not sensitive to the selection of the threshold of available daily means (see Appendix A). To compare with the reanalysis data (see Section 2.2), monthly means of P LRT in radiosonde data were binned into the same longitude-latitude grids as the reanalysis data, before bias analysis (Figures 2-8). That is, the monthly mean P LRT at a grid cell was averaged from the monthly means over all stations within the grid cell. The first grid cell center of the gridded dataset was set to 90 • S in latitude and 180 • W in longitude minus a half of grid cell size in latitude and longitude.

Tropopause Data from Reanalysis Data
Monthly means of P LRT are available from the NCEP/NCAR [37] since 1948 and from the MERRA-2 (the National Aeronautics and Space Administration Modern-Era Retrospective analysis for Research and Applications, Version 2) [51] since 1980. In this study, we used the monthly means of P LRT over 1980-2017 from these two reanalysis datasets because (1) in these two reanalyses, P LRT are directly available, which benefits wide applications of the P LRT products. For other reanalyses, users have to derive P LRT from the reanalysis thermal profile data by themselves, like in Xian and Homeyer [40]; (2) using P LRT directly reduces calculation errors in deriving P LRT from the thermal profile data.
The NCEP/NCAR reanalysis is produced with a global spectral model with T62 horizontal resolution and 28 vertical sigma levels [37], while the MERRA-2 is produced with version 5.12.4 of the GEOS atmospheric data assimilation system [51] using a finite-volume dynamical core [52] at a horizontal resolution of 0.5 • × 0.625 • and 72 hybrid-eta levels from the surface to 0.01 hPa. NCEP/NCAR P LRT is derived only partly following the WMO definition, as the thickness criterion is not applied [53]. NCEP/NCAR P LRT data have been used in a wide range of applications [3,11,54]. The upper and lower limits allowed for P LRT in the NCEP/NCAR calculation are 450 and 85 hPa. Kalnay et al. [37] reported that P LRT is placed in the "A" (first) class of reliability in the reanalysis, suggesting that NCEP/NCAR P LRT data are more influenced by the observations than by the assimilation. Nevertheless, there are still P LRT biases in NCEP/NCAR P LRT , as reported in the literature [42]. NCEP/NCAR and MERRA-2 provide data, respectively, on grids of 2.5 • × 2.5 • and 0.625 • × 0.5 • in longitude and latitude. In the upper troposphere-lower stratosphere (UTLS) region, the vertical resolutions are~1.5-2 km for NCEP/NCAR and 1.1 km for MERRA-2 [55]. To examine biases in P LRT under different horizontal resolutions, we remapped P LRT from the original resolution in each of the reanalyses to different resolutions, which are 1 • × 1 • and 5 • × 5 • for NCEP/NCAR data and 1 • × 1 • , 2.5 • × 2.5 • , and 5 • × 5 • for MERRA-2 data. We remapped the P LRT value for a grid cell from the P LRT values at all the nearest neighbor grid cells, following the distance-weighted method. The Climate Data Operators (CDO) software (https://code.mpimet.mpg.de/projects/cdo/files, accessed on 14 October 2020) is used to perform the remapping. Note that the grid cell centers of the reanalysis datasets on the original or other resolutions were also adjusted to be consistent with the radiosonde data before the bias analysis (see Section 2.1).
To assess mean P LRT biases globally and by hemisphere, cosine-weighted latitudinal averages are taken at 10 • latitudinal intervals. In this way, the impact of spatial inhomogeneity of radiosonde data on the global and hemispheric means is minimized. The corresponding standard deviations are calculated in the same way.

Ozone Data and Derivation of TCO
To derive TCO at a grid cell, the following data at that grid cell are needed: the ozone profile, the pressure profile, and the P LRT . We used two sets of ozone profile data. The first is from the global ozonesonde stations, available from the World Ozone and Ultravi-olet Radiation Data Centre (WOUDC). Most of the profiles are from the electrochemical concentration cell (ECC)-type ozonesonde, which was introduced in the early 1970s and adopted by a majority of stations in the global network by the early 1980s. The original ozone profiles over 1980-2008 with various vertical resolutions were uniformly processed to 1-km vertical resolution and ozone volume mixing ratio was calculated for each of the 1-km layers from the sea level. A monthly mean of ozone volume mixing ratio in a layer was calculated only if one or more ozonesonde data are available in that layer and month. Accordingly, pressure profiles were processed at 1-km vertical resolution from radiosonde data. We used the reanalysis P LRT at their original horizontal resolutions, which are 2.5 • × 2.5 • for NCEP/NCAR and 0.625 • × 0.5 • for MERRA-2 (longitude × latitude). To be consistent with horizontal resolution of the reanalyses, all data, including ozonesonde profiles, radiosonde pressure profiles and radiosonde P LRT were gridded to 2.5 • × 2.5 • for NCEP/NCAR and 0.625 • × 0.5 • for MERRA-2 (longitude × latitude). In this processing, the mean of a variable at the corresponding grid cell was taken from one or more data points within that grid cell. Using the same ozone profile and radiosonde pressure profile at a grid cell, we can derive TCO at that grid cell using the P LRT value from the radiosonde and from one of the reanalyses, such as NCEP/NCAR. Therefore, the TCO bias is only attributable to the P LRT bias in that reanalysis. In a given month, the TCO bias at a grid cell can be assessed only if ozonesonde, radiosonde, and reanalysis data are all available at the grid cell. The number of available monthly samples are 6423 for NCEP/NCAR and 4679 for MERRA-2, as the horizontal resolution of NCEP/NCAR is coarser than that of MERRA-2.
Considering the limited spatial and temporal coverages of the ozonesonde data, we also used the second set of ozone profile data, which are the gridded ozone profiles from the Trajectory-mapped Ozonesonde dataset for the Stratosphere and Troposphere (TOST) over 1980-2012. This is derived from ozone soundings using a trajectory-based ozone mapping methodology [56]. The trajectory-mapping approach is an effective method for interpolating sparse ozonesonde measurements [56,57]. TOST provides monthly means of ozone volume mixing ratio binned into grids of 5 • × 5 • × 1 km (latitude, longitude, altitude) from sea level up to 26 km. For consistency with TOST data, we regridded the P LRT from the two reanalyses on grids of 5 • × 5 • (longitude, latitude). This 5 • × 5 • resolution for P LRT has a minor effect on the comparison between radiosonde and reanalysis data (see Section 3.2). We followed the procedure described in Section 2.1 to derive monthly means of vertical pressure profile on grids of 5 • × 5 • × 1 km (latitude, longitude, altitude) from radiosonde data. Using TOST data, we calculated the monthly mean TCO at grids only if ozone mixing ratios are available at over 80% of the vertical layers from surface to the tropopause at these grids. In the TCO calculation, the P LRT from radiosonde and from the two reanalyses were used respectively, while the vertical ozone and pressure profiles are the same. Therefore, the biases in reanalysis-P LRT -based TCO are induced by the P LRT biases in the reanalyses only.

Spatial and Seasonal Variations in P LRT Biases
To evaluate P LRT biases in NCEP/NCAR and MERRA-2, we compared the monthly mean P LRT between the radiosonde and each of the reanalyses at their original horizontal resolutions. Figure 1 shows the climatological P LRT variation with latitude, based on all radiosonde data from 1980-2017. Although stations are predominantly located in the mid-latitudes of the Northern Hemisphere (NH), the latitudinal structure of P LRT is well depicted globally. The climatological P LRT increases from~100 hPa around the equator tõ 300 hPa near the polar regions, consistent with the latitudinal variation in the literature from reanalyses [2], GPS radio occultation [58], and radiosonde data [28]. radiosonde data from 1980-2017. Although stations are predominantly located in the mid-latitudes of the Northern Hemisphere (NH), the latitudinal structure of PLRT is well depicted globally. The climatological PLRT increases from ~100 hPa around the equator to ~300 hPa near the polar regions, consistent with the latitudinal variation in the literature from reanalyses [2], GPS radio occultation [58], and radiosonde data [28].  NCEP/NCAR pioneered development of reanalysis data so that the NCEP/NCAR reanalysis has the longest temporal coverage (1948-present). Figure 2 shows that P LRT in NCEP/NCAR is usually overestimated by 0.5-5.5 hPa in the tropics (20 • S-20 • N) and at high latitudes of the NH (60 • -90 • N). In the tropics, positive P LRT biases of 2-6 hPa in NCEP/NCAR were found by Randel et al. [42], associated with a tropopause temperature overestimated by 3-5 K. In the subtropics (~20 • -50 • ) of each hemisphere, P LRT in NCEP/NCAR is underestimated, and the bias reaches~14 hPa around 30 • -40 • S. In MERRA-2, the largest biases of P LRT occur in the subtropics, consistent with the maximum biases of H LRT found there by Xian and Homeyer [40]. This is the location of the "tropopause break", a sharp discontinuity in the first lapse-rate tropopause, found near the subtropical jets at roughly 30 • S and 30 • N (also evident in Figure 1). The subtropics in both hemispheres are transition zones where P LRT values are remarkably sensitive to the location and intensity of the subtropical jets since the steepest gradients of P LRT are near the jets [2,19].
A notable negative bias of about −20 hPa appears in the south polar region (70 • -80 • S) in NCEP/NCAR. The accuracy of P LRT at high latitudes of the Southern Hemisphere (SH) in MERRA-2, which was first released in 2015, is apparently much better. As a result, the mean P LRT bias over the SH is −4.00 ± 5.64 hPa in NCEP/NCAR, in comparison with −0.49 ± 3.6 hPa in MERRA-2 (Table 1). However, the mean P LRT bias over the NH is −0.57 ± 6.04 hPa in NCEP/NCAR, smaller than the mean value of −1.34 ± 9.83 in MERRA-2. Overall, the global mean P LRT bias is −2.28 ± 5.84 hPa in NCEP/NCAR and −0.92 ± 6.71 hPa in MERRA-2 over 1980-2017 (Table 1). Correspondingly, the global H LRT is overestimated by 52 m in NCEP/NCAR and 16 m in MERRA-2. Therefore, for applications of P LRT products, MERRA-2 may be a better choice between the two reanalyses on the global scale, whereas by hemisphere, NCEP/NCAR outperforms MERRA-2 in the NH but underperforms MERRA-2 in the SH. P LRT biases vary much more with latitude than with longitude in both reanalyses. MERRA-2, the largest biases of PLRT occur in the subtropics, consistent with the maximum biases of HLRT found there by Xian and Homeyer [40]. This is the location of the "tropopause break", a sharp discontinuity in the first lapse-rate tropopause, found near the subtropical jets at roughly 30° S and 30° N (also evident in Figure 1). The subtropics in both hemispheres are transition zones where PLRT values are remarkably sensitive to the location and intensity of the subtropical jets since the steepest gradients of PLRT are near the jets [2,19].    Figure 2). From 60 • -90 • S, the distribution of P LRT biases is more varied and extreme biases (over 20 hPa or below −20 hPa) appear more frequently in NCEP/NCAR than in MERRA-2, indicating better P LRT estimates in MERRA-2 for this region. Surprisingly, for NCEP/NCAR P LRT , most extreme biases are negative in the nine stations between 60 • S and 80 • S but positive at the Amundsen-Scott station located at 90 • S. Except for 60 • -90 • S in the NCEP/NCAR, and the subtropics in the two reanalyses, most P LRT biases fall into a range from −10 hPa to 10 hPa at the remaining latitudes (Figure 3a,b). Additionally, positive P LRT biases appear more frequently (60-80%) than negative biases at the remaining latitudes, while the occurrence frequency of negative P LRT biases peaks over 30 • -40 • near the subtropical jet in each hemisphere (Figure 3c,d). Over the globe, positive and negative biases have comparable frequencies (~50%) and the negative global mean P LRT bias (Table 1) is attributable to the large negative biases around the subtropical jets.  The latitudinal variation of PLRT biases by season (Figure 4) is generally similar to that of the annual mean ( Figure 2). However, some changes with season are notable. The The latitudinal variation of P LRT biases by season (Figure 4) is generally similar to that of the annual mean ( Figure 2). However, some changes with season are notable. The seasonal variation of P LRT biases in each latitudinal zone around the extratropics (20 • -90 • ) appears larger in NCER/NCAR than in MERRA-2. In both reanalyses, positive P LRT biases in the tropics occur in JJA and SON, while negative P LRT biases in the subtropics are persistent in all seasons. Around the subtropics, negative P LRT biases in MERRA-2 are slightly larger in winter or spring (JJA or SON for the SH; DJF or MAM for the NH), as has been found before for ERA Interim, JRA-55, and CFSR, as well as MERRA-2 [40]. However, the negative bias in NCEP/NCAR maximizes in summer in each hemisphere (DJF for the SH; JJA for the NH).
between MERRA-2 and radiosonde data. (c) The total frequency of positive biases (in blue) and negative biases (in red) in each 10° latitudinal band for NCEP/NCAR. (d) The same as (c), but for MERRA-2. The NCEP/NCAR and MERRA-2 data are used in their original resolutions of 2.5° × 2.5° and 0.625° × 0.5° in longitude and latitude.
The latitudinal variation of PLRT biases by season (Figure 4) is generally similar to that of the annual mean ( Figure 2). However, some changes with season are notable. The seasonal variation of PLRT biases in each latitudinal zone around the extratropics (20°-90°) appears larger in NCER/NCAR than in MERRA-2. In both reanalyses, positive PLRT biases in the tropics occur in JJA and SON, while negative PLRT biases in the subtropics are persistent in all seasons. Around the subtropics, negative PLRT biases in MERRA-2 are slightly larger in winter or spring (JJA or SON for the SH; DJF or MAM for the NH), as has been found before for ERA Interim, JRA-55, and CFSR, as well as MERRA-2 [40]. However, the negative bias in NCEP/NCAR maximizes in summer in each hemisphere (DJF for the SH; JJA for the NH).  Additionally, in summer in each hemisphere, the latitude near the subtropics where negative biases of P LRT peak is shifted poleward by~10 • , relative to the latitude where negative biases peak in the corresponding winter. This is associated with the poleward shift of the tropopause breaks in summer over landmasses [43]. In NCER/NCAR, the extreme negative biases over high latitudes in the SH are most pronounced in winter (JJA) and spring (SON), especially in JJA when the bias is up to~−50 hPa. However, such biases are not found in MERRA-2, possibly owing to the assimilation of Microwave Limb Sounder (MLS) stratospheric ozone profiles and Ozone Monitoring Instrument (OMI) column ozone in a system where the assimilated ozone is interactive with radiation [55]. The assimilation of these satellite ozone data improves the representation of ozone profiles in MERRA-2, especially in the SH winter and spring [59][60][61]. The improved assimilation of ozone profiles in MERRA-2 might correct P LRT at high latitudes of the SH through a modified temperature structure. As a result, the averaged bias over the SH is greatly reduced in MERRA-2, especially in JJA and SON (Table 1).

P LRT Biases in the Reanalyses at Different Horizontal Resolutions and Implication of P LRT Biases in Misidentification of the Tropopause Layer in Profiles with Different Vertical Resolutions
In applications of reanalysis LRT data, two issues are relevant. The first is whether P LRT biases vary with horizontal resolution. To address this question, we regridded P LRT to 1 • × 1 • and 5 • × 5 • from its original 2.5 • × 2.5 • resolution in NCER/NCAR, and to 1 • × 1 • , 2.5 • × 2.5 • , and 5 • × 5 • from its original resolution of 0.625 • × 0.5 in MERRA-2, following the distance-weighted method using the Climate Data Operators (CDO) software (see Section 2.2). Figure 5 shows that there is no systematic difference in P LRT biases in either reanalyses remapped into different horizontal resolutions, in terms of the mean over 1980-2017 on a monthly basis. For example, in NCER/NCAR, the P LRT bias averaged over all 10 • zones in the NH is −1.2, −0.6, and −1.3 hPa, for 1 • × 1 • , 2.5 • × 2.5 • and 5 • × 5 • resolutions. At higher resolutions P LRT biases show variations at smaller latitudinal intervals and have larger fluctuations, in the 20-40 • latitude band in both hemispheres. This is the case for the NCER/NCAR reanalysis at 1 • × 1 • and for MERRA-2 at 0.625 • × 0.5 • and 1 • × 1 • resolutions.
1° × 1° and 5° × 5° from its original 2.5° × 2.5° resolution in NCER/NCAR, and to 1° × 1°, 2.5° × 2.5°, and 5° × 5° from its original resolution of 0.625° × 0.5 in MERRA-2, following the distance-weighted method using the Climate Data Operators (CDO) software (see Section 2.2). Figure 5 shows that there is no systematic difference in PLRT biases in either reanalyses remapped into different horizontal resolutions, in terms of the mean over 1980-2017 on a monthly basis. For example, in NCER/NCAR, the PLRT bias averaged over all 10° zones in the NH is −1.2, −0.6, and −1.3 hPa, for 1° × 1°, 2.5° × 2.5° and 5° × 5° resolutions. At higher resolutions PLRT biases show variations at smaller latitudinal intervals and have larger fluctuations, in the 20-40° latitude band in both hemispheres. This is the case for the NCER/NCAR reanalysis at 1° × 1° and for MERRA-2 at 0.625° × 0.5° and 1° × 1° resolutions. In NCEP/NCAR, PLRT biases are positive in the tropics (20° S-20° N) and in high latitudes of the NH (60° N-90° N) with values close to 5 hPa, but are negative in the subtropics in both hemispheres from 20° to 50° with the largest negative bias close to −20 hPa around 30° S. Additionally, the extreme negative bias of ~−50 hPa appears around 70° S, which is only observed in NCEP/NCAR. PLRT biases in MERRA-2 vary with latitude, similarly to that in NCEP/NCAR, regardless of horizontal resolution.
As noted, PLRT biases are much reduced in the SH over 60° S-90° S in MERRA-2. Before bias evaluation, the first grid cell center of the reanalysis is adjusted as 90° S (180° In NCEP/NCAR, P LRT biases are positive in the tropics (20 • S-20 • N) and in high latitudes of the NH (60 • N-90 • N) with values close to 5 hPa, but are negative in the subtropics in both hemispheres from 20 • to 50 • with the largest negative bias close to −20 hPa around 30 • S. Additionally, the extreme negative bias of~−50 hPa appears around 70 • S, which is only observed in NCEP/NCAR. P LRT biases in MERRA-2 vary with latitude, similarly to that in NCEP/NCAR, regardless of horizontal resolution.
As noted, P LRT biases are much reduced in the SH over 60 • S-90 • S in MERRA-2. Before bias evaluation, the first grid cell center of the reanalysis is adjusted as 90 • S (180 • W) minus one-half of the grid size in latitude and longitude, for a better match with gridded radiosonde P LRT . Since the differences in P LRT are large between the tropical and polar sides of the tropopause break near the subtropical jet [25], a minor mismatch in the locations of the grids between observational and reanalysis data might lead to incorrect interpretations of P LRT biases.
The second relevant issue is the implication of P LRT biases in reanalyses for misidentification of the tropopause layer. The vertical profiles of a variable (e.g., ozone mixing ratio or temperature) are often presented in a fixed vertical resolution. In some applications, it is necessary to identify the layer containing the tropopause in such a vertical profile. P LRT values from reanalyses are often used to make such determinations [10,13,57]. The accuracy of the identification will be affected by the magnitude of P LRT biases in reanalyses and the vertical resolution of the variable. We use NCEP/NCAR data as an example (Figure 6a); the results from MERRA-2 are similar ( Figure 6b). As P LRT biases vary little with horizontal resolution (Figure 5), we use P LRT in NCEP/NCAR at 5 • × 5 • resolution in longitude and latitude. In Figure 6a, the frequency distribution of the differences in P LRT between the reanalysis and radiosondes is shown in blue bars. This histogram is based on regridded monthly P LRT from all radiosonde records over 1980-2017 with~130,000 data points. This histogram is similar to a normal distribution and~70% of the P LRT biases range between −10 and 10 hPa. We first examine a case in which the variable of interest is presented at 1-km vertical resolution. In Figure 6, the solid red line indicates misidentification frequency for a profile at 1-km vertical resolution, which is the number of samples with misidentified tropopause layer divided by the total number of samples in each of the P LRT bias intervals. If P LRT biases are within an interval of 0-5 hPa, the ratio is below 10%. The ratio increases with increase of P LRT biases. When P LRT biases are larger than 40 hPa, the ratio reaches 100%, i.e., the tropopause layer is misidentified in all data points. The ratio varies symmetrically with negative P LRT biases. With a vertical resolution of 0.5 km (the dashed red line in Figure 6), the ratio becomes larger at a given P LRT bias and reaches 100% at a smaller value of P LRT bias. Therefore, attention should be paid to the subtropics where absolute P LRT biases in the reanalyses are often greater than 10 hPa ( Figure 3) and so may cause misidentification of the tropopause layer with probability larger than 40%. Of all samples, the samples with misidentified tropopause layer are~30% at 1-km resolution and~50% at 0.5-km resolution.

Implication of P LRT Biases to TCO Estimates
TCO is expressed in milli-atmo-centimeters of ozone, or Dobson Units (DU). We can use radiosonde P LRT and reanalysis P LRT as the upper limit for integration of TCO individually, but keep ozone and pressure profiles the same. Therefore, the difference in TCO (namely TCO bias) from the two ways are caused by the P LRT difference between the reanalysis and radiosonde.
We first assess the TCO biases at global ozonesonde stations. Figure 7 shows the mean TCO bias varying with different P LRT bias intervals in the reanalyses. Here, the reanalysis P LRT at their original resolutions were used; There are 6423 monthly mean samples in Figure 7a with NCEP/NCAR P LRT and 4679 in Figure 7b with MERRA-2 P LRT . A positive bias in P LRT can induce a negative bias in TCO, while a negative bias in P LRT can induce a positive bias in TCO. Regardless of the sign of the biases, TCO biases increase with P LRT biases, up to~5 DU with P LRT biases of~40 hPa in absolute values (Figure 7). The distribution of occurrence frequency in each of P LRT bias intervals is similar to that in Figure 6. Approximately, 85% of the P LRT biases range between −15 and 15 hPa. In absolute magnitude, the induced TCO bias is~0.2,~0.8 and~1.2 DU if the P LRT bias is within 0-5, 10-15, and 10-15 hPa, respectively. Notably, the corresponding standard deviations are larger than the means of TCO biases in the P LRT intervals between −15 hPa and 15 hPa. yses are often greater than 10 hPa (Figure 3) and so m tropopause layer with probability larger than 40%. O misidentified tropopause layer are ~30% at 1-km resol tion.

Implication of PLRT Biases to TCO Estimates
TCO is expressed in milli-atmo-centimeters of ozon use radiosonde PLRT and reanalysis PLRT as the upper l vidually, but keep ozone and pressure profiles the s As ozonesonde stations are sparse, we further explore the influence of P LRT biases on TCO estimates using ozone profiles in TOST. As shown in Section 3.2, P LRT biases hardly vary with horizontal resolution. Therefore, we can neglect the influence of horizontal resolution and regrid the two reanalysis P LRT datasets into 5 • × 5 • in longitude and latitude resolution to match that of TOST.
Atmosphere 2021, 12, x FOR PEER REVIEW 0-5, 10-15, and 10-15 hPa, respectively. Notably, the corresponding stand are larger than the means of TCO biases in the PLRT intervals between −15 h As ozonesonde stations are sparse, we further explore the influence o TCO estimates using ozone profiles in TOST. As shown in Section 3.2, PLR vary with horizontal resolution. Therefore, we can neglect the influence resolution and regrid the two reanalysis PLRT datasets into 5° × 5° in long tude resolution to match that of TOST.
The global distribution of TCO biases (Figure 8) is basically similar biases (Figure 2), including the latitudinal behavior. Negative biases in T the tropics and the high-latitudes of the NH, with means less than 1 DU TCO in the subtropics is overestimated and the positive TCO biases peak a in each hemisphere with values of ~1.5-3.5 DU, corresponding to PLRT from −8 to −14 hPa (Figure 2). For some grid cells in the subtropics, the pos The global distribution of TCO biases (Figure 8) is basically similar to that of P LRT biases (Figure 2), including the latitudinal behavior. Negative biases in TCO prevail in the tropics and the high-latitudes of the NH, with means less than 1 DU in magnitude. TCO in the subtropics is overestimated and the positive TCO biases peak around 30 • -40 • in each hemisphere with values of~1.5-3.5 DU, corresponding to P LRT biases varying from −8 to −14 hPa (Figure 2). For some grid cells in the subtropics, the positive TCO bias even reaches 4 DU or more, corresponding to a negative P LRT bias less than −20 hPa. For the NCEP/NCAR, the overestimation of TCO reaches~3.5 DU around 70 • -80 • S, because the extreme P LRT biases (about −20 hPa) appear there. Overall, the globally averaged bias in TCO due to use of the reanalysis P LRT is 0.64 DU in NCEP/NCAR and 0. 28  that TCO values from TOST can be affected by multiple factors, while this study only quantifies the impact of LRT biases in reanalysis products on TOST TCO estimates. Figure 8. (a) Spatial variation in the TCO biases (in DU) between using NCEP/NCAR PLRT and using radiosonde PLRT. Each value is the mean difference in TCO over 1980-2017 at each of the 5° × 5° grid cells. In the right panels, 10° zonal-mean differences are shown. The text indicates the global mean bias in absolute and relative terms. (b) The same as (a), but for the TCO biases using MER-RA-2 PLRT. Vertical ozone profile data is based on TOST at 1-km resolution. Blank grids are missing data mainly due to unavailability of radiosonde data. Both NCEP/NCAR and MERRA-2 data are remapped to the horizontal resolution of TOST, 5° × 5° in longitude and latitude [57]. The text in each panel indicates the global mean biases if PLRT from the corresponding reanalysis is used.

Discussion and Conclusions
NCEP/NCAR and MERRA-2 provide PLRT data directly for many applications. Here, through comparison with radiosonde IGRA2 data over 1980-2017, we have examined spatial and seasonal variations in PLRT biases in NCEP/NCAR and MERRA-2 on a monthly basis.
PLRT biases in the reanalyses vary more with latitude than with longitude ( Figure 2). The latitudinal variation averaged over 1980-2017 is characterized by positive biases of 0.5-5.5 hPa in the tropics and high latitudes of the NH and negative biases of −0.3 to −14.1 hPa near the subtropics (~20°-50°) of each hemisphere in both reanalyses. The large negative PLRT biases in the subtropics, corresponding to positive biases in HLRT there, are linked to the location of the subtropical jet and tropopause break [2,19,62]. A noticeable difference between the two reanalyses is that in high latitudes of the SH, negative PLRT Figure 8. (a) Spatial variation in the TCO biases (in DU) between using NCEP/NCAR P LRT and using radiosonde P LRT . Each value is the mean difference in TCO over 1980-2017 at each of the 5 • × 5 • grid cells. In the right panels, 10 • zonal-mean differences are shown. The text indicates the global mean bias in absolute and relative terms. (b) The same as (a), but for the TCO biases using MERRA-2 P LRT . Vertical ozone profile data is based on TOST at 1-km resolution. Blank grids are missing data mainly due to unavailability of radiosonde data. Both NCEP/NCAR and MERRA-2 data are remapped to the horizontal resolution of TOST, 5 • × 5 • in longitude and latitude [57]. The text in each panel indicates the global mean biases if P LRT from the corresponding reanalysis is used.

Discussion and Conclusions
NCEP/NCAR and MERRA-2 provide P LRT data directly for many applications. Here, through comparison with radiosonde IGRA2 data over 1980-2017, we have examined spatial and seasonal variations in P LRT biases in NCEP/NCAR and MERRA-2 on a monthly basis. P LRT biases in the reanalyses vary more with latitude than with longitude ( Figure 2). The latitudinal variation averaged over 1980-2017 is characterized by positive biases of 0.5-5.5 hPa in the tropics and high latitudes of the NH and negative biases of −0.3 to −14.1 hPa near the subtropics (~20 • -50 • ) of each hemisphere in both reanalyses. The large negative P LRT biases in the subtropics, corresponding to positive biases in H LRT there, are linked to the location of the subtropical jet and tropopause break [2,19,62]. A noticeable difference between the two reanalyses is that in high latitudes of the SH, negative P LRT biases in NCEP/NCAR are greatly reduced in MERRA-2, possibly owing to including the assimilation of MLS stratospheric ozone profiles and OMI column ozone in MERRA-2.
Except for 60 • -90 • S in NCEP/NCAR and the subtropics in both reanalyses, positive biases prevail and are usually less than 10 hPa at other latitudes. The extreme negative P LRT biases of below −20 hPa appear most frequently around 30 • -40 • S in both reanalyses. Approximately, 70% of the reanalysis P LRT biases are within −10-10 hPa (Figure 3). Taking cosine-weighted latitudinal averages for the globe, we estimate that the global mean P LRT bias is about −2.3 hPa in NCEP/NCAR and −0.9 hPa in MERRA-2, largely attributable to the large negative P LRT biases around the subtropics (Table 1). Correspondingly, the global mean H LRT bias is about 52 m in NCEP/NCAR and 16 m in MERRA-2. The latitudinal variations in P LRT biases in all four seasons are generally similar. However, around the subtropics, the latitudes with the largest negative biases in summer (JJA for the NH, DJF for the SH) are shifted~10 • poleward from the latitudes of winter maximum biases (DJF for the NH, JJA for the SH) ( Figure 4). Overall, for applications of P LRT products, MERRA-2 may be a better choice between the two reanalyses on the global scale, whereas by hemisphere, NCEP/NCAR outperforms MERRA-2 in the NH but underperforms MERRA-2 in the SH.
Two issues relevant to P LRT biases in applications of reanalysis P LRT data are investigated. The first one is related to horizontal resolution in the reanalyses. Although P LRT biases vary little with horizontal resolution, finer horizontal resolution can provide a more detailed variation in grid cell size ( Figure 5). Furthermore, caution should be paid to latitudes near the subtropics, as a small mismatch of the grid cell center there can lead to large differences in P LRT biases. The second issue is related to identifying the layer of the tropopause in a profile of variable of interest, e.g., ozone or temperature, at different fixed vertical resolutions. The accuracy of such identification depends on both the magnitude of P LRT bias and the vertical resolution of the profile. Overall, the accuracy decreases with increase of P LRT bias. When P LRT biases are above 10 hPa in magnitude, at 1 km resolution, the misidentification rate exceeds 40% for both reanalyses ( Figure 6). Therefore, attention should be paid to the subtropics where the absolute P LRT biases in the reanalyses are often more than 10 hPa (Figure 3). The misidentification rate is lower for a vertical profile at 1 km than at 0.5 km resolution. Of all radiosonde samples, samples with misidentified tropopause layer are~30% at 1-km resolution and~50% at 0.5-km resolution ( Figure 6).
We assess TCO biases that are attributed to P LRT biases in the reanalyses based on ozone profiles at ozonesonde stations and in TOST. After matching ozonesonde and radiosonde data with the reanlayses at their original horizontal resolutions (2.5 • × 2.5 • for NCEP/NCAR and 0.625 • × 0.5 • for MERRA-2 in longitude and latitude), we found 6400 and~4700 monthly ozone profiles for NCEP/NCAR and MERRA-2, respectively. As expected, a positive (negative) bias in P LRT leads to a negative (positive) bias in TCO. In absolute magnitude, the induced TCO bias is~0.2,~0.8 and~1.2 DU if the P LRT bias is within 0-5, 10-15, and 10-15 hPa (Figure 7). Most of the P LRT biases in the reanalyses are less than 15 hPa in absolute magnitude. As TOST ozone data cover the globe, the assessment using TOST provides a detailed description of how the induced TCO biases vary with latitude and longitude. The global distribution of the induced TCO biases (Figure 8) is similar to that of P LRT biases in the reanalyses (Figure 2). The TCO biases vary much more with latitude than with longitude. Negative biases in TCO prevail in the tropics and high latitudes of the NH, with mean zonal biases less than 1 DU in absolute magnitude. However, positive TCO biases appear in the subtropics and peak around 30 • -40 • in each hemisphere with values of~1.5-3.5 DU. For the NCEP/NCAR, negative TCO biases also reach~3.5 DU around 70 • -80 • S, because the extreme P LRT biases (about −20 hPa) appear there. Globally, the mean bias in TCO is estimated to be positive, being~0.64 (or 2.2%) and 0.28 DU (or 1.1%), respectively, if NCEP/NCAR and MERRA-2 P LRT products are used in deriving TCO.
This study provides a comprehensive understanding of P LRT biases in the NCEP/NCAR and MERRA-2, as well as the related biases in TCO estimates. The P LRT biases could induce biases in other applications, such as estimates of STE mass fluxes, especially near the subtropics where P LRT biases are large. This study has assessed the magnitude, sign, and spatial variations in TCO biases attributable to P LRT biases in the two reanalyses. Such assessments have wide implications for studies of tropospheric ozone burden, tropospheric ozone radiative forcing, derivation of TCO from satellite data, and data assimilation. In this study, we select long-term global radiosonde data as a basis for the P LRT bias assessment because of the high quality of the data for identifying P LRT . We employ cosine-weighted latitudinal averaging to minimize the impact of spatial inhomogeneity of radiosonde data on our global and hemispheric assessments of P LRT biases. Although it is beyond the scope of this study, P LRT from GPS-RO satellite data could be used to enhance our understanding of the global variation in P LRT biases for recent decades. Furthermore, what drives biases in tropopause altitude in reanalyses could be further explored for improvement of reanalysis LRT products.

Acknowledgments:
The authors thank the agencies that provided reanalysis data used in this study: NCEP/NCAR from the NOAA/OAR/ESRL PSL, Boulder, Colorado, USA and MERRA-2 from the National Aeronautics and Space Administration (NASA) Global Modeling and Assimilation Office (GMAO). We also thank the National Oceanic and Atmospheric Administration (NOAA) National Centers for Environmental Information (NCEI) for providing IGRA radiosonde data. The global ozone sounding data were obtained from the World Ozone and Ultraviolet Radiation Data Centre (WOUDC) operated by Environment Canada, Toronto, Ontario, Canada, under the auspices of the World Meteorological Organization. We acknowledge the Max Plank Institute for Meteorology (MPI-M) for providing the CDO software (https://code.mpimet.mpg.de/projects/cdo/files (accessed on 14 October 2020)) and many contributors to the CDO development. We would like to thank the anonymous reviewers for their valuable comments and suggestions.

Conflicts of Interest:
The authors declare no conflict of interest.