Weather Variability Induced Uncertainty of Contrail Radiative Forcing

: Persistent contrails and contrail cirrus are estimated to have a larger impact on climate than all CO 2 emissions from global aviation since the introduction of jet engines. However, the measure for this impact, the effective radiative forcing (ERF) or radiative forcing (RF), suffers from uncertainties that are much larger than those for CO 2 . Despite ongoing research, the so called level of scientiﬁc understanding has not improved since the 1999 IPCC Special Report on Aviation and the Global Atmosphere. In this paper, the role of weather variability as a major component of the uncertainty range of contrail cirrus RF is examined. Using 10 years of MOZAIC ﬂights and ERA-5 reanalysis data, we show that natural weather variability causes large variations in the instantaneous radiative forcing (iRF) of persistent contrails, which is a major source for uncertainty. Most contrails (about 80%) have a small positive iRF of up to 20Wm − 2 . IRF exceeds 20Wm − 2 in about 10% of all cases but these have a disproportionally large climate impact, the remaining 10% have a negative iRF. The distribution of iRF values is heavily skewed towards large positive values that show an exponential decay. Monte Carlo experiments reveal the difﬁculty of determining a precise long-term mean from measurement or campaign data alone. Depending on the chosen sample size, calculated means scatter considerably, which is caused exclusively by weather variability. Considering that many additional natural sources of variation have been deliberately neglected in the present examination, the results suggest that there is a fundamental limit to the precision with which the RF and ERF of contrail cirrus can be determined. In our opinion, this does not imply a low level of scientiﬁc understanding; rather the scientiﬁc understanding of contrails and contrail cirrus has grown considerably over recent decades. Only the determination of global and annual mean RF and ERF values is still difﬁcult and will probably be so for the coming decades, if not forever. The little precise knowledge of the RF and ERF values is, therefore, no argument to postpone actions to mitigate contrail’s warming impact.


Introduction
According to the most recent assessment [1] persistent contrails and contrail cirrus contribute a considerable share to the climate impact of global aviation. Even if this impact is measured as effective radiative forcing (ERF) instead of the more traditional radiative forcing (RF) metric, contrails contribute more than all CO 2 from aviation since mankind conquered air space, although the ERF/RF ratio is much smaller for contrails (0.42) than for CO 2 (1). However, the ERF and RF values for contrails are much more uncertain than those for CO 2 . Lee et al. [1] give best estimates and 90% uncertainty ranges (5-95% confidence intervals) for the components of aviation ERF. For CO 2 these values are 34.3 (28,40) [2] for "aviation induced cloudiness" (which includes contrail cirrus) that gives 33 (11,87) mW m −2 , quite considerably lower than the new estimate, but still the uncertainty bars overlap.
The level of scientific understanding (LOSU) or confidence level for contrails and contrail cirrus is rated low and the wide 90% confidence intervals are due to quite a number of uncertainty sources. Appendix E of [1] provides a thorough account of the procedure they employed to estimate the confidence interval. As the various studies that had been used for their estimate refer to different base years and use different traffic inventories, it required scaling factors to bring them on a common level. Further need for scaling arose from different treatment of flight distances, temporal resolution, and feedback of natural clouds. Another important contribution to uncertainty originates from the radiative transfer calculations, the necessary assumptions therein (e.g., crystal habit, cloud overlap, geometry, soot cores of the ice crystals) and the simplifications made in the scheme. Finally, the ambient situation is a great source of variability and uncertainty. Ref.
[1] estimate an overall uncertainty of about 70% for contrail cirrus RF.
It is striking that the LOSU has not improved since the 1999 IPCC Special Report on Aviation and the Global Atmosphere [3] provided the first estimate of the contrail RF (at that time only for young linear contrails). The confidence level does not rise and the confidence interval does not shrink in spite of ongoing research. Indeed there has been progress in measurements [4], local, e.g., [5,6] and global contrail modelling [7][8][9], theories, e.g., contrail formation for hybrid-electric propulsion and for fuel cells, [10,11], the impact of new fuels has been studied experimentally [12] and in models [13], as well as new flight procedures, e.g., formation flight, [14], automatic contrail detection in satellite imagery [15] (CDA) has been extended to tracking the development of contrails [16] (ACTA), to present a selection. However, the uncertainty ranges in IPCC-type charts are not "significantly" smaller than 20 years ago. This state of affairs is common in climate research. An impressive overview of non-shrinking uncertainty ranges considering the radiative forcings of CO 2 and other greenhouse gases is presented in Figure 14-2 of [17]. Interestingly, some "error bars" even grow over time, and the first and last error bars, referring to the first and 5th assessment report, overlap considerably. We are still uncertain, but on a higher level of knowledge.
Naturally, the question arises why the uncertainty range does not become narrower. One reason is certainly that many subtleties either were unknown or simply ignored in the earliest studies on contrail RF. New modelling possibilities regularly need constraints to parameters needed in the new process formulations, but these parameters may be poorly known or variable to an unknown degree. Most uncertainty contributions listed by [1] belong to this category. It seems to the authors that weather variability is at least as important as modelling uncertainty. However, the weather variability only becomes evident in ensemble simulations or otherwise repeated simulations of the same year when the global and annual mean forcing is output for every year.
First one should note that the instantaneous radiative forcing (iRF, often termed cloud radiative effect; it is the difference of the net radiation flux at top of the atmosphere in two situations, one with contrail, one without, but otherwise equal; it is a local value, not in any way averaged over a certain area) often is two to three orders of magnitude larger than the globally and annually averaged radiative forcing which appears in the IPCC charts. Individual contrails may, under certain circumstances, be exothermic (radiate energy off the atmosphere, cooling) by several tens of W m −2 , under other circumstances endothermic contrails (contrails that retain heat energy within the atmosphere) may provide several tens of W m −2 of heating. The overall global and multiannual mean that is relevant for climate is thus the tiny delicate balance of these huge individual contributions.
Two recent examples of the huge variability induced on RF or ERF by weather variability are [18,19]. Gettelman et al. use traffic data for 2006, scaled to 2019 traffic amounts. These data are then used to compute contrail cirrus ERF in the years 2019 and 2020. Ignoring COVID-19 in one set of experiments, the simulations assume the same amount of emissions for both years, such that only the meteorological conditions differ; nudging is applied to run the simulations close to the actual weather conditions. The computation of the contrail cirrus ERF is thus without model or scenario (emissions) uncertainty and the results for these two years differ only because of the varying weather. The difference turns out large: 90 ± 50 mW m −2 for 2019 and 33 ± 35 mW m −2 for 2020 (which gives a mean of 62 ± 59 mW m −2 [18], incidentally close to the mean value given in [1]). [19] find for a large region in western Europe (20 • W to 20 • E, 35 • N to 60 • ), where air traffic (total flight distance above FL 180) was reduced due to COVID-19 by 72%, a strong reduction in contrail cover and radiative forcing. However, they state, as well, that the aviation-induced changes are relatively small (three to ten times smaller) compared to the natural variations seen in cirrus coverage and radiation fluxes at the top of the atmosphere. The considerable variability of the weather makes it difficult to isolate the contribution of aviation to cloudiness and its subsequent effect on RF and ERF unless special techniques are applied that allow to correct for the weather variability, e.g., [20,21].
In the present paper, the natural variability of the instantaneous radiative forcing (iRF) values of individual contrails is considered. For this purpose, in-situ data of temperature and relative humidity from airborne measurements are used to provide the contrail formation and persistence conditions. These are combined with radiation quantities from reanalysis data, interpolated in space and time to the flight positions, to compute the iRF. Ten years of data are used in order to gain robust statistics. In Section 2, the data sources and evaluation methods are introduced. Results of statistical analyses are presented in Section 3. Some issues are discussed in Section 4 and the results are summarised and conclusions are drawn in the final Section 5.

Data from Commercial Aircraft
The Measurement of Ozone and Water Vapour on Airbus In-service Aircraft (MOZAIC) program [22], which was, in 2011, transferred into the new European Research Infrastructure In-service Aircraft for a Global Observing System ( [23], IAGOS), produces measurements for ozone, water vapour and other atmospheric composition variables by automated equipment installed on commercial passenger aircraft flights. Since its start in 1994, over 62,000 flights are available for download at the IAGOS website (iagos-data.fr (accessed on 14 July 2021)). The MOZAIC data have already been used for extensive studies about the occurrence of ice-supersaturated regions, natural cirrus clouds, and contrails [24,25].
For this study, all flights from the years 2000 to 2009 were selected, amounting to 16,588 flights. We only use temperature, relative humidity, and flight position at cruise level with pressure altitudes between 310 hPa and 190 hPa. The research area extends from 30 • to 70 • latitude (which is the zone with most air traffic) and −125 • to 145 • longitude to cover most of MOZAIC flights, while not including too many different climate zones. It is note worthy that only data points with quality label "0", meaning only reliable data, are taken into account. Measurements are available every 4s which translates to a covered flight distance of around one kilometer. Temperature data are offered with precision of ±0.2 K and accuracy of ±0.5 K. Relative humidity is provided with precision ±1% and accuracy ±5% (MOZAIC). Measurements with relative humidity RH > 100% (i.e., supersaturation with respect to supercooled liquid water) are rejected; they would imply flying in a liquid cloud, but water clouds do not exist at temperatures that allow contrail formation. For humidity measurements, the inertia ranges from only 1 s at 300 K to as much as 120 s at 215 K [26]. Hence, inertia is rather large at cruise height of MOZAIC's flights. Fortunately, this is not a problem for the present study. In order to avoid autocorrelation in our dataset and to obtain independent data, we use a random selection procedure that picks only about one percent of all data with the side effect that two data records picked in sequence have usually a temporal distance of 400 s on average, much more than the mentioned 120 s inertia time scale. The selection is completed using a (0-1)-uniform random number generator that selects data points only if the variate is smaller than 0.01. For each selected data point the corresponding weather forecast data from European Centre for Medium-Range Weather Forecast's (ECMWF) ERA-5 reanalysis [27,28] is then collected.

Reanalysis Data
Hourly ERA-5 forecast data were retrieved from the Copernicus Data Service [29] and datasets for synoptic variables and radiation quantities are used. For pressure levels 200, 225, 250, and 300 hPa, specific cloud ice water content, and for single levels the mean top downward shortwave radiation flux, mean top net longwave radiation flux, and the mean top net short wave radiation flux are taken in 1 • × 1 • spatial resolution. The radiation quantities are flux densities in W m −2 averaged over the hour preceding the respective valid (output) time. The data were produced using 4D-Var data assimilation (12-h window) and model forecasts (CY41R2) of ECMWF's Integrated Forecast System (IFS) (ECMWF ERA5 Documentation).

Methods
Temperature and relative humidity from MOZAIC are used to check whether contrails are possible, applying the Schmidt-Appleman criterion [30] assuming an overall propulsion efficiency of η = 0.35. The criterion states that a contrail is formed only when a supersaturated state with respect to the liquid phase, RH max ≥ 100%, is reached during the mixing of the exhaust gas with the ambient air, which itself must be colder than T max , the threshold temperature for contrail formation. If contrails are indeed possible, it is further checked whether there is ice supersaturation meaning they will become persistent. All computations of water vapour available for condensation will take ice saturation as base value. For grid points where persistent contrails are possible, their potential optical thickness is computed assuming a geometrical thickness of 500 m, and assuming that all humidity in excess of RHi> 100% is converted to ice crystals with an effective radius r eff of 30 µm. The optical thickness of nearby natural cirrus is computed directly from the forecast ice water content values, assuming a cloud thickness of 1500 m and an effective crystal radius of 60 µm. The calculations use the parameterisation of [31]. Computation of shortwave, longwave, and net radiative forcing is performed using the parameterisation given by [32] applying the parameters for "Myhre particles", which are hexagonal column shaped particles with wavelength independent constant optical properties [33]. Note that this procedure is the same as that used in [34], so that also the results can be directly compared.
The values of iRF depend on the special choices made above, but this does not matter for the present purpose. It is only necessary that the mapping of cooling/weakly-/stronglywarming contrails on the iRF-scale is bijective, that is, of two contrails the stronger one must have a larger iRF than the weaker one, and that cooling contrails have a negative iRF. These conditions are not violated by the special choices above.
All contrails and nearby cirrus are therefore treated in the same way, with the same effective radii, vertical extensions, and overall propulsion efficiency, leaving no room for model uncertainty. The iRF and synoptic variables are computed by quadrilinear (3D)/(4D) interpolation for all grid and time points that correspond to a selected MOZAIC data record if a persistent contrail is possible. The calculation is completed for both datasets using the respective temperature and relative humidity values.
For the statistical analysis of contrail's iRF values statistical key figures and cumulative distribution functions (cdfs) are produced. For probability densities, the kernel density estimator was computed using the Epanechnikov kernel [35]. This was performed with an IDL program written by David G. Grier, Henrique Moyses, David Ruffner, and Chen Wang. One hundred bins were set for the yearly pdfs and five hundred bins for the pdf of 2000-2009 with minimum and maximum following the iRF values. The binsizes were calculated by the program. A mathematical fit was produced to fit to the pdf of all positive iRF values. A Monte Carlo experiment was also performed to determine how robust the iRF mean of our 47,032 potential contrails and nearby cirrus is and how much it differs depending on sample size. For this, 1% and 0.1% of all iRF values were randomly chosen and of those samples their mean was calculated to investigate their spread. Results will be discussed in the next chapter. Strongly warming contrails with iRF larger than 20 W m −2 occur in around ten percent of all contrail cases. Negative (cooling) iRF occurs rarely compared to positive iRF cases, in about ten percent of all contrail cases. Evidently the variation within the coloured curves is larger when cdfs include only a single month of data (left) than when whole years are included (right), as expected. The range of iRF for the sample dataset is −40 to 90 W m −2 with a mean value of 7 W m −2 and a standard deviation of 9.2 W m −2 . Means of the single years vary from

Results and Interpretation
Considering that there is no model uncertainty in the analysis, this variability must be ascribed solely to natural climate variability. There is also a distinct seasonal variation in the mean value of iRF, with minima in the cold and maxima in the warm season. Figure 2 (left) shows how the mean of iRF ± its standard deviation vary from month to month. In warmer months (June-September) the mean values are twice to three times as high as in winter months (December-April). How can this be understood? The temperature difference between cruise levels and the ground is larger in summer than in winter which favours a large infrared forcing and, thus, the summer maximum. Of course, the cooling solar effect also acts longer in summer than in winter, but the most extreme shortwave forcings occur in winter, probably caused by a preponderance of large solar zenith angles then, which facilitates the back-scattering of solar radiation to space [36]. The result is surprising because earlier research has shown that contrails warm most during winter [37]. However, [9] have a small RF maximum in summer as well because there is more optically thick contrail cirrus in this than in other seasons. Regardless, as we consider iRF, while [37] considered RF, there is no immediate contradiction, but certainly the relation between iRF and RF is an issue for further research. The wide range covered by the standard deviation of the all-monthly iRF values indicates that the interannual variability (that is the differences between the same months over different years) exceeds the average seasonal amplitude. However, not only the mean varies seasonally, the whole distribution does, and, in particular, the extreme values do so. Figure 2 (right) displays for each season (of all years) the distribution function. It is evident that the curve for winter (DJF, blue) makes the steepest increase at small positive iRF, which implies the lowest fraction of very large iRF values. Contrary to this, the curve for summer (JJA, green) increases much less steeply, which, in turn, means that high and very high iRF occurs predominantly in the warmest months of the year. The black curve in the figure separates months with less strongly warming contrails from those that have more. The positive branch of the overall pdf of the instantaneous radiative forcing of persistent contrails is an exponential: where x in this equation is the iRF in W m −2 . It is not obvious immediately why positive iRF are exponentially distributed. However, it is known that ice supersaturation is exponentially distributed [38,39], and the ECMWF model physics reproduces this kind of distribution as well [40]. As supersaturation (and temperature) are major factors of influence for the ice water concentration in contrails, the optical thickness of the contrails turn out nearly exponentially distributed (with a slight stretch, not shown) and eventually this might be the major factor here for the computed iRF values.   The importance of strong contrails is underrepresented by just looking at the probability of their occurrence. It becomes clearer in a display of the 1st order effect function, that is, a plot of x f (x) vs. x (again x is the iRF in W m −2 .) This is shown in Figure 5. The area under the red curve is proportional to the fractional contribution to the total effect, measured as size of the effect times its relative frequency. Now the positive tail of the distribution is quite large, which shows the overproportionally high contribution of strong contrails to their overall endothermic effect on climate. Note also that the iRF represents only a snapshot in the life of the contrail (here the moment of its production by the respective MOZAIC aircraft). The climate impact of contrails must take into account its total coverage (width) integrated over its lifetime, e.g., [41], factors that cannot be determined from the current datasets. However, a regression of width and lifetime to iRF would possibly produce something similar to EF ≈ iRF 1+δ with EF being the energy forcing, integral of iRF over coverage and lifetime [41] and δ a small positive number. If this turns out possible, then the effect of order 1 + δ instead of the first order would be relevant which would give an even higher weight to strong cases. Unfortunately, no efforts to make such a regression are known to the authors, so that this remains speculation for the time being. pdf x iRF all data pdf all data Figure 5. First order effect function for iRF, that is, the product of iRF with its probability vs. iRF. Such a display demonstrates that cases with high iRF, although they have low probability, have quite some importance in the overall RF of contrails.

Monte Carlo Experiments
The RF (or ERF) values in IPCC-type charts are determined from the results of global models as shown for instance in [1]. Consequently, the error bars reflect the difference in these model results. The question may arise whether better and more precise results could be obtained from measurement campaigns. Ref. [42] gives an overview of all contrail measurement campaigns before 2017 and compiles all quantities that one may think of. The compilation contains data for about 230 cases (which can be found in the supplement of that paper). In total, 230 cases would be a fraction of 0.5% of the data in our study and so we can make an experiment to see what mean iRF an overview over such a number of data would result in, and how much this estimate would vary if further samples consisting of 230 data points were selected. Instead of using a 0.5% sample, we will use one smaller with a size that may be typical for a single measurement campaign and one larger that would then represent a meta-study comprising 10 measurement campaigns. The problem of the RF (in the global and annual mean sense) calculation from the individual data (that is iRF measurements) is not discussed here; this is a difficult problem and worth an investigation of its own. That is, we will see how precise a mean iRF can be determined from samples with realistic sizes typical for measurement campaigns. The random character of the weather can induce quite some uncertainty, as demonstrated in the following.
Two Monte Carlo experiments are performed by randomly drawing samples from the set of 47,032 iRF values. Two sample sizes are used, one where about 1% of the data are used and another one where only 0.1% of the data are used. The experiments are performed 1000 times each, and for each sample the sample mean is computed. The two sets of 1000 sample means are plotted in Figure 6 (plus symbols). Obviously the scatter is quite large. For the smaller 0.1% samples the means range from about 2 W m −2 to 12 W m −2 and for the larger 1% samples from about 5.5 W m −2 to 8.5 W m −2 . The average means in the two experiments, marked as blue lines in the figures, are slightly smaller than the overall mean of 7 W m −2 . However, unfortunately, this overall mean does not necessarily appear in a single measurement sample. Instead one has to expect that the sample mean differs from the actual value; on average, it differs more with decreasing sample size. Since the distributions of the sample means approach normal distributions (according to the central limit theorem), one can expect an uncertainty range (standard deviation) for the mean of about σ/ √ N with σ = 9.2 W m −2 and N ≈ 470 for the larger samples, and N ≈ 47 for the smaller ones. Indeed, the standard deviation from the two Monte Carlo experiments is about 0.43 W m −2 for the large and about 1.31 W m −2 for the small samples, as expected from the σ/ √ N formula. The corresponding ±-one-sigma ranges contain only about 68% of all sample means. 90% confidence ranges correspond to ±1.645-sigma ranges, that is, ±0.71 W m −2 and 2.15 W m −2 , respectively. The estimates can be made more precise if larger samples are drawn, but as the improvement is proportional only to 1/ √ N the rate of improvement is slow.

How Large Is the Maximum Instantaneous Radiative Forcing
The current study uses a sample of iRF values for its conclusions. Although the sample is quite large, it still is a sample, it becomes noisy at its upper extremes and it is not certain whether other samples would not include even higher values. We can, however, use some results from extreme value theory [43] to estimate how large the "true" maximum of iRF could actually be. To do this, only values iRF > 1 W m −2 are considered, which have been used for the fit of the exponential distribution above. This considerably simplifies the calculations.
Let the distribution function of the iRF values be F(x), that is, the probability that a certain iRF exceeds x is 1 − F(x). In Section 3.1, it turned out that the positive values of iRF are exponentially distributed with a parameter b = 0.11 (W m −2 ) −1 , such that The sample is quite large; if only the positive values iRF > 1 W m −2 are counted, there are N = 36,508 samples. Let X be any value in the range of iRF values. Then the probability that none of N values exceeds X, provided that all sample members are independent (which we took care of by the random selection process), is F(X) N . Inserting the exponential distribution gives This expression equals the probability that the maximum value does not exceed X, which gives the distribution function of the maximum as Now one can ask for a value X 1/2 at which it is 50% probable that the maximum actually exceeds X 1/2 . This is given implicitly as Solving for X 1/2 yields Inserting the numbers N and b gives a value of about 98.8 W m −2 , that is, there is a 50% chance that the true maximum value of iRF exceeds 98.8 W m −2 . This result can be generalised. Let α be the probability that the true maximum of iRF exceeds a value X 1−α , that is, let Φ(X 1−α ) = 1 − α. Then after a little algebra one arrives at This means for instance that the maximum of iRF exceeds 122.5 W m −2 with a probability of 5% (i.e., α = 0.05). These extreme values are not just the result of an academic exercise. Our database contains one record with a longwave forcing exceeding 100 W m −2 , but counterbalanced by a certain shortwave forcing. If this would have happened at night, the net forcing would have exceeded 100 W m −2 . Although this contrail had a very large optical thickness of 1.88, even larger values (e.g., 2.3) are reported in the literature [44]. Values like 122.5 W m −2 are thus possible, but with probabilities of the order 10 −6 or lower.

Discussion
In this paper, we have considered the weather influence on instantaneous radiative forcing variability. Three points need to be stressed: (1) Weather is only one source of natural variability, (2) instantaneous radiative forcing is only a partial metric of the total contrail climate impact, and (3) the values computed in this paper are only rough approximations.
Let us start with issue 3. The actual iRF values computed in this paper, their mean values and other statistical characteristics should not be taken too literally. The important point here is not the numbers themselves, it is the various relations between these numbers. For instance, the application of "Myhre" particles leads to iRF values that certainly differ from values for other crystal habits; the assumption of certain effective radii of contrail and cirrus crystals and the assumption of certain depths of these clouds give iRF values that differ from iRF values for different sets of assumptions. It is possible but futile to perform sensitivity studies by changing the assumptions because, as said, the actual numbers are not so important, they are contingent. The "substance" of the results will not change: The mean iRF is slightly positive (i.e., contrails are endothermic on average), the positive wing of the distribution is exponential such that large values occur with non-negligible probability, the natural variability is large as well, and the impact of the strong events is larger than their rare occurrence would suggest. These considerations become aggravated by the two other issues mentioned above to which we turn now.
Weather is only one source of variability of individual contrail climate impacts. The first source of variability is the aircraft itself [6,45,46] and its engines: wingspan, actual weight and speed determine the vertical extension of a contrail [47], fuel flow, number of engines [48], and their maintenance state [49] the number of soot particles emitted per meter of flight path and, thus, the final ice crystal number concentration. Turbulence and stratification of the ambient atmosphere have impacts [50,51]. The impact of an individual contrail on the radiation flux through the atmosphere depends also on the traffic density in the ambient environment. The impact of a new contrail in a field of other contrails decreases with the coverage of older contrails that are already there, because the older contrails have already consumed part of the condensible water vapour and the contrail coverage of the sky has a fixed upper boundary [14,52]. Finally the variability of natural cirrus [53,54] has been completely neglected here. Obviously, the actual variability of contrail iRF is considerably larger than that the weather alone causes.
The instantaneous radiative forcing corresponds only to a single moment in the whole contrail lifetime; it is preferable to consider instead the total climate impact of individual contrails, integrated over their spatial extension (length, width) and their lifetime. Such a metric is the energy forcing, as defined in [55] which cannot be computed from the data of this study alone; its computation would need a model of contrail evolution in the actual weather, such as CoCiP, Contrail Cirrus Prediction model [8]. Be that as it may, one can envisage that EF has still larger variability than iRF because there are still more sources of variability: vertical wind shear (in relation to flight direction) affects contrail width, contrail lifetime varies strongly (indeed it varies according to a stretched exponential or Weibull distribution), that is, its distribution has stronger extremes than the simple exponential, see [56], the varying background and weather conditions during a contrail's lifetime, its termination mechanism dynamically or microphysically triggered, see [57]. It is, thus, highly probably that a plot of the first order effect, that is EF· f (EF) vs. EF would display significantly more weight on strong events than the corresponding iRF· f (iRF) vs. iRF plot shown in the previous section. This conjecture is corroborated by the results presented by [41] who found that about two percent of flown distances in their study contributed about 80% of the energy forcing.
A final question to be discussed is how the present "in-situ" results can be compared to results obtained from global simulations, especially those that have been used for various IPCC charts and the recent re-assessment [1]. It seems that this is a similar relation such as the iRF vs. EF relation discussed above; the global models consider contrail coverage or volume, not single points in time in the contrails lifetime. However, the relation is still more complex because contrails are parameterised in such models and not treated individually. These models can treat the interaction between contrail cirrus and natural cirrus (more contrails -> less cirrus), [7], which the present data analysis cannot. Finally, it is not possible to compute a radiative forcing from the iRF values, since the necessary contrail coverage (and lifetime) is not given. A direct comparison of the present results to results from global simulations is thus not possible. However, the present results clarify that the error-bars (or uncertainties) in the IPCC-charts would not become very small even if the global models would be perfectly adequate for representing contrail cirrus in the climate system. The natural variabilities do not allow obtaining precise estimates of contrail RF or ERF. Again, the recent paper of Gettelman et al. [18] provides a striking example. The 2-standard deviations for the two years 2019 and 2020 (ignoring COVID) are 35 and 50 mW m −2 , respectively. They are determined from a 10-member ensemble simulation of the two years with initial temperature noise of the order 10 −10 K. The chaotic nature of the atmosphere then produces this tremendous variability after just one year. The authors note that the standard deviations do not become smaller if the number of members of the ensemble simulation is doubled. It seems that the chaotic nature of the atmosphere sets a fundamental limit to how precisely the RF and ERF of contrail cirrus can be determined using model simulations. Theoretically this would be achievable with data from radiation measurements, but the necessary amount of such data would be enormous.

Summary and Conclusions
To reduce the level of uncertainty of the radiative forcings of the non-CO 2 impacts of aviation has been a goal ever since the 1999 special report on aviation and the global atmosphere has been published by the IPCC [3]. Unfortunately, this goal has not yet been reached, in particular for contrails and contrail cirrus, despite more than 20 years of research and considerable progress in process understanding and increasing modelling and measurement capabilities. In this paper, we ask for potential reasons for the failure of these efforts. Uncertainty or error bars in the IPCC charts have three components [58]: natural variability, model inadequacy, and scenario uncertainty. The present investigation focuses on natural variability, and, in particular, on the weather induced variability. The "model", that is, the method to compute instantaneous radiative forcing of individual contrail events is fixed as is the set of assumptions; this excludes a model uncertainty from the investigation. The study uses data from the past, such that there is neither a scenario uncertainty. The question to be answered in this paper is thus: how large is the weather induced variability on instantaneous contrail radiative forcings, how precisely is it possible to determine its long-term mean?
To answer the question, ten years of measurement data (2000-2009) from commercial passenger aircraft in the MOZAIC framework are used, comprising data of 16,588 flights. Temperature and relative humidity data from these flights are combined to determine whether a persistent contrail was formed at the time and position of the aircraft. About 1% of the data from cruise levels are chosen randomly for the analysis, and for these data the corresponding atmospheric and radiation data are obtained from the ERA-5 reanalysis dataset, interpolated to the flight data positions and times. From these data the iRF is computed for each of 47,032 flight positions where a persistent contrail was formed according to the flight data. The random selection process assures the statistical independence of the 47,032 iRF values.
The statistical distributions of iRF show large annual and interannual variability and a seasonal variation with higher average iRF values in the summer and lower ones in the winter months. The interannual variability of each single all-monthly (Figure 2, left) distribution exceeds the seasonal amplitude, but the seasonal variation is larger than the interannual variation, compare Figure 1, right and Figure 2, right. The iRF pdfs are clearly non-Gaussian, heavily skewed towards positive iRF. The upper wings of the iRF distributions show exponential decay with a small prefactor (b ≈ 0.11 (W m −2 ) −1 ) in the exponent. The exponential character may well have its origin in the exponential distribution of the degree of ice supersaturation, which, in turn, determines the ice water concentration in contrails and thus their optical thickness. The latter is nearly exponentially distributed in our data.
In order to see whether data from measurement campaigns or even compilations of such campaigns (meta-studies) could help to determine the global and annual mean RF (ERF) with more precision than models do, we ran Monte Carlo experiments to determine the average iRF from 1% and 0.1% samples of the 47,032 iRF data. These exercises demonstrate how difficult it actually is to determine a precise average. The scatter of the results-exclusively caused by weather variability-is enormous, but it can, in principle, be reduced by taking larger sample sizes. However, the difficulty is aggravated by the fact that all other natural sources of variability have been intentionally neglected. The actual natural variability of iRF is thus probably much larger than the weather-caused variability alone. It is well conceivable that one will never be able to considerably reduce the uncertainty range associated with contrail cirrus RF. This needs to be acknowledged; the uncertainty is not a sufficient reason for not to act against contrails' impact on climate.
Even though most persistent contrails produce iRF in the range of 0 to 20 W m −2 , strong contrails with iRF > 20 W m −2 still happen in about 10% of all cases. Extreme value theory lets us assume that actually the highest values of iRF could exceed the upper range of the present dataset. Their overall endothermic effect on climate is disproportionate, and proceeding from iRF to the energy forcing, the properly relevant quantity in this respect, would stress the impact of strong contrails even further. This is why an aspiration for the capability of numerical weather forecast models to predict the strong endothermic contrails is justified and timely.
Author Contributions: The work described here is part of L.W.'s Master thesis. She did all the data processing and produced the figures. S.R. provided the data, K.G. conceived the study. All authors wrote the text. All authors have read and agreed to the published version of the manuscript. Acknowledgments: ERA-5 data are provided by the European Copernicus Data Service. The authors thank Lisa Bock for reading and commenting a draft version of this paper. We are grateful to the reviewers who have helped to improve the clarity of the paper.

Conflicts of Interest:
The authors declare no conflicts of interest.