What Do Observational Datasets Say about Modeled Tropospheric Temperature Trends since 1979 ?

Updated tropical lower tropospheric temperature datasets covering the period 1979–2009 are presented and assessed for accuracy based upon recent publications and several analyses conducted here. We conclude that the lower tropospheric temperature (TLT) trend over these 31 years is +0.09 ± 0.03 °C decade −1 . Given that the surface temperature (Tsfc) trends from three different groups agree extremely closely among themselves (~ +0.12 °C decade −1 ) this indicates that the ―scaling ratio‖ (SR, or ratio of atmospheric trend to surface trend: TLT/Tsfc) of the observations is ~0.8 ± 0.3. This is significantly different from the average SR calculated from the IPCC AR4 model simulations which is ~1.4. This result indicates the majority of AR4 simulations tend to portray significantly greater warming in the troposphere relative to the surface than is found in observations. The SR, as an internal, normalized metric of model behavior, largely avoids the confounding influence of short-term fluctuations such as El Niños which make direct comparison of trend magnitudes less confident, even over multi-decadal periods. OPEN ACCESS Remote Sensing 2010, 2 2149


Introduction
The temperature of the tropical lower troposphere (T LT , 20°S-20°N) figures prominently in discussions of climate variability and change because it (a) represents a major geographic portion of the global atmosphere (about one third) and (b) responds significantly to various forcings.For example, when the El Niño-Southern Oscillation mode is active, T LT displays a highly coupled, though few-month delayed, response, with a general warming of the tropical troposphere experienced during El Niño events [1].The T LT also responds readily to the impact of solar scattering anomalies when substantial volcanic aerosols shade the Earth following major volcanic eruptions such as El Chichon (1982) and Mt.Pinatubo (1991).In terms of climate change due to increasing greenhouse gases from (primarily) energy production, climate models project a prominent warming of the T LT which in magnitude is on average twice as large near 300-200 hPa as changes projected for the surface [2].
Variations and trends of T LT have been discussed extensively in the literature and government reports (e.g., [3]) and our contribution here is based on the need to update this information due to (a) the availability of recent data (most data sets are now updated through 2009), (b) adjustments of some of the datasets which change the conclusions of earlier work and (c) the now-available body of evidence that allows for an understanding of problems in the observational datasets which we shall apply.
The magnitude of the trend in recent decades of T LT has become controversial because of differing views on precisely what is its value and how it should be applied in evaluating climate models.The basic issue here is whether the relationship between the observed temperature trend of T LT and the observed temperature trend of the surface (T sfc ) is faithfully reproduced by the results given in climate model simulations.These model simulations indicate that a clear fingerprint of greenhouse gas response in the climate system to date is that the trend of T LT should be greater than T sfc , by a factor on average of 1.4

(see below.)
There have been essentially two groups of publications on this contentious issue, one reporting that trends of T LT in observations and models are statistically not inconsistent with each other (e.g., [4,5]) and the other reporting that model representations are significantly different than observations, thus pointing to the potential for fundamental problems with models (e.g., [2,[6][7][8][9][10].)With the new information noted above, we will look again into this controversy which primarily centers on the acceptance of a magnitude of the T LT trend.Thus, we shall examine information that will allow us to reduce the uncertainties in the tropospheric trend in the -Assessment of Products‖ section.
Though we shall examine the temperature trends of the various observational datasets to arrive at trend values, we note here that eventually we shall study the ratio of trends of T LT to T sfc .This quantity has no formal name, but is often called a -scaling ratio‖ (SR).The SR will be a key metric in this study since it carries with it the advantage of being a normalized metric.Because no climate model is able to simulate the exact progression of the major, natural fluctuations in the climate system such as El Niño's which may strongly affect the secular trend over 20 to 30 year periods, the SR is able to reveal a more fundamental characteristic of how the T LT and T sfc are related.
To keep the paper focused, we shall consider the bulk-layer temperature, as represented by a view-angle combination of microwave brightness temperatures from satellites that represents the temperature of the layer from the surface to about 300 hPa with peak emissions originating from about 730 hPa (T LT ).Though a deeper layer is also available, commonly called the mid-tropospheric temperature (T MT ) representing the surface to about 70 hPa and peaking at about 370 hPa, we shall only refer to T MT briefly near the end.Because T MT encompasses the upper troposphere and part of the lower stratosphere, and because adjustments to the various datasets diverge considerably above 200 hPa, we shall avoid the long and somewhat contentious discussions about those problems [11].More importantly, a recent paper provides detailed analysis of T MT which will provide some supporting information for our present study [9].
The time period for this study is 31 years, 1979-2009, though a few datasets have not yet been updated.This is an especially important period because the IPCC AR4 [12] indicated most of the warming of the past half-century was -very likely‖ due to additional greenhouse gas forcing.However, there was no rise in T sfc from 1950 to 1979, so the aforementioned warming is concentrated in the period since 1979, which we study here.
We shall describe the data products examined, intercompare the time series and trends and, of most interest, assess the products for accuracy.This assessment will result in our conclusion of the likely magnitude and error range of the trend of T LT .These results are then used in the final section in which we compare our results with model expectations, specifically with the SR metric [12].

Data
The data products used in this study are listed in Table 1 with appropriate references, are identified by commonly-used abbreviations, and are briefly described below.For detailed information we refer the reader to the listed publications.The longest time series of T LT are derived from balloon-borne instrument packages that transmit temperature readings back to ground stations as they rise.Through the years, new techniques and improved instrumentation have been adopted in these devices, so corrections are necessary to produce time series which are as homogeneous as possible.Adjustments for HadAT and RATPAC are derived solely from the metadata and internal radiosonde intercomparisons.In analyzing the individual time series, it is apparent that HadAT utilized a more conservative approach in detecting spurious shifts caused by instrument changes in that fewer breakpoints passed the thresholds before adjustments were applied [13].
Breakpoints in the time series of the sondes for RAOBCORE and RICH are taken from shifts in radiosonde temperatures as determined in the production of the operational ECMWF global analysis.RAOBCORE goes further in its dependence on the ECMWF process and computes the adjustments from the ERA-40 Reanalyses as well, while in RICH, the magnitudes of the breakpoint adjustments are determined from nearest neighbor radiosonde comparisons.HadAT, RAOBCORE and RICH utilize essentially all available stations (about 60 in the tropics) while RATPAC uses a selection of the best tropical stations (21 sites).
Significant portions of the tropics are not sampled by the radiosonde network, though this under-sampling is partially overcome by the high coherency of the distribution of T LT with an estimated eight spatial degrees of freedom for seasonal anomalies [14].Pressure level temperature data from the radiosondes are convolved with the microwave sounding unit weighting function to produce a simulated T LT for direct comparisons with satellite data.

Satellite: UAH, RSS
The University of Alabama in Huntsville (UAH) and Remote Sensing Systems (RSS) produce microwave brightness temperatures from the radiometers onboard polar-orbiting NOAA and NASA satellites.The construction of the products differ in (a) how intersatellite biases are determined and removed, (b) how peculiarities in individual calibration issues are solved, (c) which specific spacecraft are used, and (d) how the effect of spacecraft drift through the Earth's diurnal cycle is handled.

Thermal Wind: AS08, C10
A recent publication [15] (the data set of which we will note as AS08) and this paper (C10) followed the original work of [16], and developed indirect calculations of temperature based on trends in the vertical differential in zonal wind acceleration from radiosonde measurements-i.e., application of the thermal wind equation (TWE).For example, if, in the NH, the zonal wind at one level has accelerated over the 30 years faster than that of the level lower, this implies an increasing temperature trend toward the equator in the layer.Trends are computed for each layer and latitude band (5°), then convolved with the microwave weighting function to produce T LT trends (i.e., there are no time series.) In this paper, C10, we calculated TWE estimates from two different datasets and in three spatial averaging methods for a total of six calculations.The two datasets were (1) NCDC's monthly zonal mean wind values by pressure level from the monthly station summaries using the Integrated Global Radiosonde Archive (NOAA/NCDC) or -IGRA‖ datafiles [17] and (2) analysis by us, also referred to as -UAH‖, of individual soundings to create our own monthly average values of zonal wind at each pressure level.In all six C10 cases, TWE temperature trend estimates were obtained for each pressure layer and latitude band for 00UTC and 12UTC separately and then were averaged.The three spatial averaging methods for determining the zonal mean acceleration were (1) determining the median acceleration of all reports in a 5° latitude band, (2) placing the stations in 5° lat × 10° lon grid boxes, taking the grid box average, then averaging the available grid boxes into zonal means, and (3) as in (2) but using 30° lon grid boxes.All calculations here began the integration with initial values at 62.5°N (except in one example) and then integrating southward.The value used for C10 is the mean of the six methods.HadCRUT was used as the surface dataset for the computation of T LT quantities when convolving the profiles with the satellite weighting functions.

Surface: ERSST, HadCRUT, GISS
Finally we display three surface temperature data sets well-known to the climate community.Over the ocean, T sfc represents the sea water near the surface (not the air) and over land the temperature is measured in various ways to represent the near-surface air around 1.5 m above the surface.The three datasets are extremely close in trend magnitude, +0.122, +0.119 and +0.109 °C decade −1 respectively for ERSST, HadCRUT and GISS, which is not surprising since, as reported in [18], the best estimate that has been reported is that 90-95% of the raw data available for each of the analyses is the same (P.Jones, personal communication, 2003).

Results
At this point we introduce the time series.In Figure 1 we show the monthly average anomalies of the average of UAH, RSS, HadAT, RAOBCORE and RICH for T LT .In Figure 2 we show the differences between each dataset and that of the overall average of T LT seasonal anomalies, which now includes RATPAC in the averaging as it is available for seasonal anomalies only (note: due to the heavy common heritage of RICH and RAOBCORE, the average of the two was used as a single time series to be included in the overall average).The standard deviation of these seasonal differences from the mean ranges from 0.050 °C to 0.081 °C.UAH has the smallest standard deviation of -errors‖ with progressively larger standard deviations for the HadAT, RSS, RAOBCORE, RICH, and RATPAC, respectively.RATPAC has the largest differences likely due to the lower geographic sampling with only 21 stations, a fact referred to later.In Figure 3   The four T LT radiosonde datasets are constructed in very different ways, so their agreement to within ±0.02 °C/decade of their own mean value (+0.10 °C decade −1 ) gives some confidence at least in their average result.One can argue that systematic biases still persist in the same way in all datasets, but for T LT , these are likely small in the aggregate [6].
A problematic issue impacts RAOBCORE and RICH and is related to a warming shift in 1991 of the upper troposphere in the ERA-40 Reanalyses on which the two datasets rely.This was shown in [19] to be likely spurious due to a mishandling of a change in an infrared channel, a diagnosis acknowledged by the ECMWF (see also ECMWF Newsletter No. 119, Spring 2009).The shift also led to a sudden and spurious increase in estimates of (a) tropical rainfall, (b) 200 hPa divergence and (c) low-level humidity.Since this has direct influence on RAOBCORE and to a lesser extent RICH, we would expect these products to display warmer-than-actual trends especially for T MT (which is shown in [9].)The latest ERA-I Reanalyses, mentioned later, have largely removed this discontinuity [20].Even though the spurious warm shift occurs in the upper troposphere and lower stratosphere, the trend calculation of T LT will still be slightly affected by a spurious warming tendency in RAOBCORE and RICH by about 0.01 to 0.02 °C dec −1 , though our final range will encompass both of these datasets without adjustment.The latest ERA-I Reanalyses, mentioned later, are corrected for this problem and the discontinuity has largely disappeared [20].

Satellite
The main difference in T LT trends between UAH and RSS occurs around 1992 where RSS drifts to warmer temperatures over a two-year period by about 0.08 °C, but varies from 0.07 to 0.13 °C depending on the specific time periods chosen and particular region selected to calculate the shift (see below).Since this shift is near the center of the time series, it has a noticeable effect on the trend, accounting for at least +0.04 °C decade −1 of the difference in trends between UAH and RSS.It is of interest to note that UAH and RSS have almost identical trends over the extratropics [21].
The shift or drift in RSS temperatures has been documented as a change relative to (a) U.S. NWS radiosonde stations which maintained VIZ instruments, (b) Australian radiosondes, (c) tropical radiosondes, (d) surface datasets and (e) ERA-I reanalyses [6,9,20,22,23].In addition, this RSS drift has appeared as an unphysical event when compared with vertical ratios of other channels [22,24].To demonstrate the drift, we converted each series into seasonal means (to accommodate RATPAC), then calculated, for each data series, the eight season (2-year) moving averages minus the 2-year average commencing 4 years prior to the first month of the original 2-year average (i.e., differences of 2-year averages with a 2-year gap in-between.)We further subtracted from each of these time series the corresponding values, season by season, calculated from RSS differences.This gave nine time series of all datasets (6 upper-air, 3 surface) relative to RSS.The maximum value of the average of these nine time series occurred for the February 1993-January 1995 average minus the February 1989-January 1991 average (we could not begin before January 1989 due to ERA-I beginning at that time.) Figure 4 displays the values of the 2-year average temperature differences as defined above for all datasets.The differences between RSS and the others are significant, except RATPAC, with its larger noise relative to RSS, and ERSST and GISS, but both being marginally significant at the 90% level.Though the impression from the plot is that differences are not very large, the student's t-test for the differences between UAH and RSS, for example, is greater than 6.The median difference between RSS and the other datasets is +0.11 °C.Similarly in previous studies the magnitude of the RSS shift was calculated to be +0.136°C [25], +0.14 °C, [22], +0.10 ± 0.03 °C [6], and +0.12 °C [20].The evidence is therefore substantial that the RSS tropical trend in T LT is too positive by at least +0.04 °C decade −1 due specifically to the slow shift around 1992.The likely explanation for at least part of this difference is that the model-based diurnal corrections applied to the satellite data over land, to account for satellite drifting relative to a fixed crossing time, are too large.Additionally, in direct comparison with independent sets of radiosondes (station by station), UAH data produce lower error characteristics than RSS [6,9,22,23].
A smaller magnitude difference between RSS and UAH arises in 1998-2004 where RSS drifts to warmer temperatures vs. UAH, but then begins drifting back to UAH values by 2008.This creates a relative warm bump in the difference time series between the two and magnifies UAH's overall relatively cooler trend vs. RSS (which already includes the shift around 1992.)A comparison of RSS with ERA-I reveals the same difference issues: the aforementioned RSS warm shift after 1992 and an RSS warming drift to about 2004, then a reversal.There is little difference between ERA-I and UAH.For the ERA-I period (1989-2009) T LT trends are +0.16,+0.13 and +0.20 °C decade −1 for ERA-I, UAH and RSS respectively.This -bump‖ is consistent with the hypothesis that the diurnal correction °C of RSS is too strong as warming is added for NOAA-14 (operating through 2004) and cooling is applied to NOAA-15 (through 2009).(As noted in [9], ERA-I contains a warming shift in the upper troposphere after about 2003 due to the introduction of aircraft temperatures which warmed this layer relative to the reanalysis model's cold bias.This contributes to a slight, artificial warming in T LT in ERA-I.) Since 2002, UAH utilized the microwave radiometer on AQUA, a NASA satellite with on-board propulsion to keep the spacecraft on a consistent crossing schedule, i.e., no drifting through the diurnal cycle.Thus the comparisons after 2001 between RSS (using drifting spacecraft) and UAH (using a non-drifting spacecraft) as well as with ERA-I (a third generation re-analyses) provide evidence that suggests the reason for RSS's differences with other products is that RSS's diurnal corrections are too large in the tropics.
Finally, there is a common satellite feature that appears in multiple and independent radiosonde comparisons as a slight warming drift relative to the radiosondes and ERA-I during 1993-1996.This occurred during the life of NOAA-12, which developed some calibration problems during its service which may have introduced a small but pervasive warming into all derived products.The impact is small, being about 0.01 to 0.02 °C decade −1 if real [22,23].Though this evidence from NOAA-12 indicates UAH (and RSS) contain spuriously warm trends, we shall assume that our central value of the observed tropospheric trend will be more positive than UAH based on the more general comparisons with individual families of radiosondes (e.g., [22].) At this point we mention some of the results of [9] which included updated comparisons of individual radiosonde stations in the U.S. VIZ network and the Australian network vs. UAHv5.3and RSSv3.2T LT (and T MT ).The results indicate, again, that this latest version of RSS continues to experience a shift to warmer temperatures after 1992 of about +0.1 °C not found in the other datasets.Additionally, the basic magnitudes of the various error statistics continue to be larger in RSS than UAH.

Thermal Wind
The temperature trends derived from the thermal wind equation (TWE) (AS08 and C10) are indirect estimates and their magnitudes are significantly higher than the other products which measure the temperature directly.An analysis in [15] recognized the uncertainties of the wind products of AS08, including its limited geographic coverage in the tropics, to develop quite large error bars (up to ±0.47 °C decade −1 per layer).Analysis of wind data, which in some latitudes is very sparse, and then applying the TWE is a sensitive derivation that creates these uncertainties.Indeed, the percentages of 5° Lat by 10° Lon boxes in each 5° Lat band that contain at least one useful radiosonde time series at 200 hPa from 20°S to 40°N are only 14,8,8,6,17,19,22,22,31, 39, 33 and 56% respectively.Vast areas of the tropics and subtropics are not measured at all in this calculation where it is possible that decadal-scale circulation changes may have provided a compensating influence in unmonitored areas.There are fewer stations which meet the requirements for wind data than temperature data.
Another likely reason for the calculation of such high trends is related to changes in the frequency of recording winds on the windiest days.As the radiosonde systems became more sophisticated through time (e.g., onboard GPS), radiosonde data were received with higher success rates in more recent years.Operators often lost contact with radiosondes on windier days in the earlier times as the balloons would drift too far away (generally 8° above horizon was the limit), and the data were not received (a fact attested to by the personal experience of some of the authors.)It is likely that data for the windiest days early in the record were lost, so that average wind speeds for the highest levels were artificially depressed in those years.Since on the windiest days, the wind increases with altitude, this would affect the vertical differential of the acceleration.
For the stations used here in C10 (very similar to AS08) the positive values of the acceleration differential above 300 hPa and poleward of 30°N are those that rapidly accumulate to lift the trend to a high value which then increases less rapidly toward the equator.Between 30°N and 50°N the average number of all station observations above 300 hPa increased between 2.3 and 3.6 per month since 1979.It is in these higher latitudes where the most common reason for missing observations in the early days appears to be high winds near the jet-stream level.For example, missing one sounding per month in the early years in which the wind speed difference was only 5 m s -1 from one level to the next (where mean level differences are about 5 m s −1 already for 30°-50°N) leads to apparent differential accelerations on the same order as the currently calculated values which produce the TWE temperature trends displayed.Thus, it is likely that trends in wind speed are biased by the lack of days with the highest winds (and thus the associated vertical differentials) in the early part of the record at elevations where winds are highest already, and this could explain the apparently large accelerations over time.There are many other issues (see some below) that create concern about the wind as a systematic metric, but these are the major concerns.
Additional evidence to support the claim that computing temperature trends based on wind accelerations (derived from radiosondes alone) will lead to significant errors is revealed in the globally complete National Centers for Environmental Prediction (NCEP) reanalysis.In Figure 5 we display the vertical cross-section of 1979-2005 accelerations of zonal mean wind (ms −1 decade −1 ).In order to generate temperature trends of 0.3 °C decade −1 for T LT in the tropics, as calculated by AS08 [15] and C10, accelerations above 300 hPa in the mid-latitudes (30°N to 50°N, white box) should exceed 0.3 ms −1 decade −1 (orange or red in Figure 5).As can be seen, the NCEP reanalysis accelerations are less than 0.1 ms −1 decade −1 in this region and thus do not support the result obtained by the radiosondes alone and do support the conclusions indicated previously.
To understand the thermal wind method more fully, we display the station data which ultimately undergird the calculation of the temperature trends.In Figure 6 we show the vertical differential of the relative acceleration by the latitude bands of individual stations and their median value for the 500-300 hPa layer.In these figures it is important to note that the error bars are statistical only, i.e., being twice the standard error of the mean (i.e., 95% level) of the values in each 5° latitude band which assumes (a) all values are independent, (b) all are without error, and (c) are geographically located so as to give error-free sampling of the entire latitude band.These assumptions, which are obviously not true, indicate the calculated error bars displayed underestimate the uncertainty by not including these measurement errors (see later).The temperature trend is essentially produced by the integration of the anomalies in Figure 6 from right to left to give Figure 7, so positive values sum to ever-increasing temperature trends toward the equator.In Figure 7 we show the various pressure-layer temperature trends along with the average trend of the four radiosonde datasets used here.Except for some of the C10 median-averaged values, nearly all of the TWE estimates show considerably more warming than the direct observations.In general, the C10 UAH trend values are less positive in the troposphere, but rise rapidly above 300 hPa and then are less positive in the highest layer.The C10 IGRA values tend to be uniformly more positive in the lower troposphere but in contrast become less positive above 300 hPa.The results of AS08 appear often in the middle of the C10 spread of values.When convolved with the satellite weighting function the range of T LT C10 trends for 1979-2005 is +0.05 to +0.41 °C decade −1 .Considering the spread of values at each layer and the spread for the layer integrations producing T LT , we conclude that parametric uncertainty alone (i.e., in terms of how the basic data set is constructed and spatially averaged) is significant, being near ±0.2 °C decade −1 .Other factors, including (a) poor spatial sampling (b) inconsistent measurements over time and (c) value of the initial condition for integration, add to this uncertainty.We estimate the 95% error range for all factors to be at least ±0.25 °C decade −1 for the satellite-simulated products from TWE.This value is very close to that estimated by [15] of ±0.29 °C decade −1 .
A comparison of latitudinal trends for the full T LT layer is shown in Figure 8 and reveals large inconsistencies.Here we also show the latitudinal trends of UAH and RSS T LT .As indicated above, the TWE values are integrated from a common trend value at 62.5°N (as in AS08) where southward from that point there is considerable dispersion.A very different picture emerges if the initial latitude is 32.5°N (Figure 9).Now the tropical values are much more clustered and give clear evidence that the variations in the higher latitude radiosonde selection methods are a large source of the dispersion.Whereas UAH T LT was at the minimum of magnitudes in Figure 8, it is now in the center of the spread in Figure 9. Indeed these results support the hypothesis that variations over time in the measurement of zonal wind speed in regions of high atmospheric winds (i.e., loss of radiosondes early in the period for °C/decade 35°N-60°N) and how one constructs the datasets are plausible causes for the dispersion in Figures 8 and 9.

°C/decade
Given these comparisons and taking the weight of evidence in several other publications regarding system intercomparisons (e.g., in [6,9,21]), we conclude that these trends calculated from the TWE, as applied for AS08 and here (C10), using the current radiosonde coverage and observational limitations (consistency, accuracy, etc.) do not produce results reliable enough for studies such as ours.In particular, AS08 and C10, with T LT trends of +0.29 and +0.28 °C decade −1 are almost three times that of the mean of the directly measured systems, and are values that are, in our view, simply not consistent with the countervailing, directly-measured evidence.

Determining -Best Guess‖
Applying the conclusions from the assessments given above, we estimate the actual T LT tropical trend for 1979-2009 as +0.09 ± 03 °C decade −1 where the ±0.03 range defines the measurement error only (see below for statistical error).In other words, we know that the actual tropical troposphere experienced a specific trend for 1979-2009, and based on our analysis here we estimate that true trend to be between +0.06 and +0.12 °C decade −1 .We come to this conclusion having applied multiple lines of published and displayed evidence for spurious warming drifts in RSS and the large uncertainties with the thermal wind calculations.As such, the current calculated T LT trends of UAH, HadAT, RATPAC, RAOBCORE, and RICH are consistent with this result.For the shorter period of 1989-2009, inter-comparisons indicate ERA-I is also highly consistent with our estimated values of the tropical trend in T LT .Thus, we have determined that, based on the analyses above, we are confident that the actual trend of the Earth's tropical troposphere for the specific period 1979 to 2009 is greater than zero.
A different question can also be addressed, -Is this trend significantly different from zero in a statistical sense?‖In other words, given the magnitude of inter-annual variance and the auto-correlation in this time series, how likely is it that another 31-year realization (with the same statistical properties) will be greater than zero?Even if a time series contained error-free values, i.e., zero measurement error, it would still have statistical uncertainty regarding the trend.Sometimes this type of error is referred to as -temporal statistical error‖ [22].The statistical error range of our 31-year average time series is ±0.10 °C decade −1 , with a t score of 2.27 relative to a null hypothesis of zero slope, and 25 degrees of freedom.This implies there is only a 3% chance that the trend is non-positive in a statistical sense.For the individual time series the probabilities of a non-positive slope range from 1% to 12%.Thus we may also conclude that the tropical tropospheric trend, statistically speaking, is at least marginally significantly positive.

The Scaling Ratio
Directly comparing trend magnitudes between observations and model output over 20 to 30-year periods is confounded by the uncertainty created by short term, relatively random interannual fluctuations.For example, El Niño and La Niña events can have considerable effect on the calculation of a trend on these time scales, yet coupled models will not simulate the exact sequence and magnitude of these events as found in observations.In this section we shall examine a metric that is much less affected by such variations, and thus lends itself to more confident analysis.The Scaling Ratio (SR), as mentioned earlier, is simply the ratio of the trend magnitudes: T LT /T sfc [6].The SR is better described as an internal, normalized characteristic of model behavior as it varies little over multi-decadal periods.The analysis in [26] examined a number of issues regarding the SR and cautioned readers that periods for which the surface trends are small (less than 0.05 °C decade −1 ) or are calculated over short periods, may produce unstable results.
Examined in [26] were 49 IPCC AR4 model runs from 19 models (some were multiple runs of the same model) and calculated the SR median and 95% C.I. as 1.41 ± 0.21.For a single model which was run under different perturbation physics (HadCM3) the SR was 1.44 ± 0.06 suggesting that over multi-decadal periods, the SR is a relatively stable characteristic of individual models, though not necessarily a characteristic with consistent magnitude shared by the different models.
In this study we use the results from 21 IPCC AR4 models all of which portrayed the surface trend at or above +0.08 °C decade −1 (minimizing the problem of instability due to small denominators in the SR.).Some of the 21 models were represented by multiple runs which we then averaged together to represent a single simulation for that particular model.With 21 model values of SR, we will have a fairly large sample from which to calculate such variations created by both the structural differences among the models as well as their individual realizations of interannual variability.From our sample of 21 models (1979-1999) we determine the SR median and 95% C.I. as 1.38 ± 0.38.We shall refer to this error range as the -spread‖ of the SRs as it encompasses essentially 95% of the results.We may then calculate the standard error of the mean and determine that the 95% C.I. for the central value of the 21 models sampled here as 1.38 ± 0.08, and refer to this as the error range which defines our ability to calculate the -best estimate‖ of the central value of the models' SR.Thus, the first error range or -spread‖ is akin to the range of model SRs, and the second error range describes our knowledge of the -best estimate‖ representing the confidence in determining the central value of a theoretical complete population of the model SRs.
The method of [26] was to apply some theoretical error tests to observations (UAH, RSS, HadAT, RAOBCORE, RATPAC), assuming each dataset was equally probable of being accurate.Based on that assumption, [26] came to the conclusion that the SRs produced from these datasets carried a fairly large range of uncertainty (±0.95 for T MT /T sfc ).We have three factors working to greatly reduce this error, (1) an observational time series that is 31 years in length, not 21, (2) the use of T LT which avoids the much wider range of error contributions from varying upper troposphere and lower stratosphere uncertainties in the datasets of T MT , and (3) published and displayed information that identifies specific errors in some of the datasets and quantifies errors for the others [6,9,20,[22][23][24]. Thus, using information not available to [26], our results will produce a narrowed range of uncertainty in T LT trends and thus a narrowed range of uncertainty on the value of SR calculated from observations.
In Table 2 we present the SRs from the long-term datasets described here.Note that the SRs for model values are based on the T LT and T sfc unique to each model and are not affiliated with the observed T sfc trend.Based on the mean and range of the observed T LT trends, we calculate the observed SR as +0.49 to +1.10, or +0.80 ± 0.31.This range encompasses all SRs computed using any of the T sfc datasets and any of the T LT trends except those from RSS and the TWE values, i.e., using the 3 T sfc and 5 T LT datasets, this is the range of the 15 SR values so calculated.(Note: Had we used the same 21-year period simulated by the AR4 models, 1979-1999, the observational SR results would have been smaller at +0.58 ± 0.35.)With the exception of one SR case (RSS T LT ) out of 18, none of the directly-measured observational datasets is consistent with the -best estimate‖ of the IPCC AR4 [12] model-mean.Based on our assumptions of observational values, we conclude the AR4 model-mean or -best estimate‖ of the SR (1.38 ± 0.08) is significantly different from the SRs determined by observations as described above.Note that the SRs from the thermal wind calculations are significantly larger than model values in all cases, which provides further evidence that TWE trends contain large errors.
In Figure 10 we summarize our results.Here we show (a) the values of individual model SR relationships relative to their surface trends for 1979-1999, (b) the error ranges of the -spread‖ (arrow) and -best estimate‖ (box) of model SRs, and (c) the observationally-calculated SR ranges (box, without RSS) from three studies: [6,26] and this study.All realizations here are from the satellite era (i.e., beginning with 1979.)The key feature to observe on this diagram is the vertical axis (SR) as the horizontal axis merely indicates the respective T sfc trends of the various datasets and simulations.Thus, potential agreement or disagreement is only meaningful here relative to the vertical axis, not the horizontal axis.2).Boxes: extent of SRs calculated from observations in this study [6,26].
The results of [6] are based on 26 years of data for which the surface trend was just under +0.13 °C decade −1 .The box for [26] (from their Figure 3) is estimated from multiple 21-year period values (noticeably less than the 31 years of this study) of the SR for the satellite era, and thus encompasses a wider surface temperature trend range.
What is immediately apparent is that the absolute magnitudes of the individual model trends are in general much warmer than the observations [10] and the boxes holding the observational values generally lie outside the range of model values.Also, one sees that as the magnitudes of model temperature trends increase, principally beyond +0.14 °C decade −1 , the SR values show little deviation from the 1.38 median value, indicating that even among various models, the SR is fairly stable.Indeed the full range of model SRs occurs near lower T sfc values not far removed from the observations.This is consistent with [26] who noted that as the magnitude of model's T sfc trend value became small, values of SR experienced greater divergence.It is in this region of divergence that three model results overlap with the upper most portion of the box from [26], and two of those overlapping with the extreme upper portion of the box represented by this paper.Had these models been integrated for 31 years instead of 21 years, their respective SRs may have approached the median value however.
The results above indicate that the central tendency of observationally-calculated tropical SR values from three studies is near 0.8 which is well outside of the spread of model-calculated SRs for the satellite era.It is in this era that the model response to forcing due to the accumulation of greenhouse gases should be most evident.Though not discussed here but addressed in [9], applying the same SR metric to T MT /T sfc values, we found model results averaging 1.4 for SR while these observations average about 0.4, a difference-result even more highly significant.The conclusion from these results is that the depiction of the relationship between surface and tropospheric trends is different between models and observations, with models, on average, warming the tropospheric layer faster than observations indicate.
[Note: if we collapse Figure 10 to the left axis to make a histogram of occurrences of model and observed SRs, we can apply a Chebyshev's test of inequality assuming that the resulting distribution of observational results is not "normal."This much less restrictive test indicates that there is a 12% chance that the observed SR (without RSS) is greater than 1.20 even though none of the 15 samples reached this value.However, the model distribution of values is indeed "normal" which implies a probability of only 2.5% that the model SR is less than 1.00.Thus a statement that the average model SR is significantly different from observed SRs holds, but that a small sample of individual model SRs do overlap with some of the observed SRs.On the other hand, none of the model SRs overlap with SRs from UAH, HadAT and RATPAC.]We do not want to misrepresent the results of [26] who concluded that there was large uncertainty in the SR metric for both models and observations to the extent that models and observations could not be shown to be significantly different.We believe we have narrowed those uncertainties in both models and observations by (a) using only the longer-period time series-up to 31 years of data, (b) focusing on the satellite era only where surface trends are more distinctly positive and (c) bringing to bear newly published, as well as new analyses, to reduce error ranges of the observations.If a fundamental error of one sign were found for the observational datasets utilized here (i.e., if all were too cool in trend) for the calculation of the best estimate of SR, then our results would need to be revisited.However, at this time, the evidence implies that in the satellite era, the relationship between the surface and tropospheric trends in the tropics is significantly different between observations and models.This result is consistent with that of [7,8] who compared global surface and tropospheric trends between observations and models and found significant differences over composites of both land and ocean.

Conclusions
We have gathered several tropical tropospheric temperature datasets and have analyzed their (up-to) 31-year (1979-2009) trends for precision since there are some discrepancies among them.Using newly published information as well as some analyses performed here we conclude that the 31-year tropical tropospheric trend of the layer represented by T LT is +0.09 ±0.03 °C decade −1 .
A key indicator of the response of the tropical temperatures in enhanced-greenhouse gas-forced models is that the ratio of trends of the T LT layer relative to the trend of the surface, T sfc , should be about 1.4, i.e., a -scaling ratio‖ (SR) of 1.4.Using observed trend values, the observed SRs for T LT are significantly less than 1.4, being ~0.8 ± 0.3.This suggests that on average, the model amplification of surface temperature trends is overdone, and that the observed atmosphere manages to adjust to heating processes without allowing (over decades) a temperature change in the troposphere at a higher rate than it changes near the surface.An alternate explanation is that the reported trends in T sfc are spatially inaccurate and are actually less positive.A more positive surface temperature trend than reported here, of course, would make the disagreement with the models even more significant [7,8].
We believe this is an important result that allows model developers to examine convective and cloud parameterizations, feedback effects and other energy transfer processes for the goal of building more effective models [20].We are well aware this issue will not be closed with our result.We fully expect others to become engaged and produce defensible estimates of trends which may or may not support our conclusions.We have begun here the process of examining the different datasets to arrive at the idea of trend accuracy, and thus narrow the uncertainties for useful, characteristic metrics, such as the scaling ratio, which avoid some of the pitfalls in model evaluation created by the randomness of natural, inter-annual fluctuations.
we show the individual decadal trends (least squares regression) for years ending in 2005 to 2009 for T LT and T sfc for direct comparison of all products.

Figure 4 .
Figure 4. Difference of averages of two two-year periods, 1993-1995 minus 1989-1991 for T LT datasets (7 left bars) and T sfc (right 3 bars.)Note that HadAT indicates a difference of 0.00 °C.

Figure 6 .
Figure 6.The difference in acceleration measured at 300 and 500 hPa for individual radiosonde stations.Median (circle) and error bars are derived from the data points within each 5° latitude band (see text.) 99UTC is the average of 00 and 12UTC.

Figure 7 .
Figure 7. Temperature trend for layers for 1979-2005.Symbols: trends calculated by the thermal wind equation (see text).Line: trends calculated directly from radiosonde temperatures from the average of the 4 radiosonde datasets.

Figure 8 .Figure 9 .
Figure 8. Temperature trends of T LT by various TWE calculations as well as direct satellite (UAH and RSS) measurements tied to 62.5°N as the starting integration point.

Figure 10 .
Figure 10.Relationship of T sfc to T LT .Filled squares: the 21 model realizations for 1979-1999.Circles (satellite) and Diamonds (balloon) observational results reported here (see Table2).Boxes: extent of SRs calculated from observations in this study[6,26].

Table 1 .
Listing of references for the various datasets used in this study.Acronyms are spelled out in the -Notes‖ column.

Table 2 .
Scaling ratios of T LT /T sfc .The T sfc trend value used as the denominator is identified by column except in the last row where the AR4 Model SRs are computed from each models' own unique trends and are italicized to highlight this aspect.The -best estimate‖ is calculated from 21 AR4 model realizations and S is the -spread‖.Both AS08 and C10 are relative to 1979-2005 trends for T sfc *.Actual trends are given in brackets associated with each dataset (°C decade −1 ).