How Well Can Persistent Contrails Be Predicted?

: Persistent contrails and contrail cirrus are responsible for a large part of aviation induced radiative forcing. A considerable fraction of their warming effect could be eliminated by diverting only a quite small fraction of ﬂight paths, namely those that produce the highest individual radiative forcing (iRF). In order to make this a viable mitigation strategy it is necessary that aviation weather forecast is able to predict (i) when and where contrails are formed, (ii) which of these are persistent, and (iii) how large the iRF of those contrails would be. Here we study several data bases together with weather data in order to see whether such a forecast would currently be possible. It turns out that the formation of contrails can be predicted with some success, but there are problems to predict contrail persistence. The underlying reason for this is that while the temperature ﬁeld is quite good in weather prediction and climate simulations with speciﬁed dynamics, this is not so for the relative humidity in general and for ice supersaturation in particular. However we ﬁnd that the weather model shows the dynamical peculiarities that are expected for ice supersaturated regions where strong contrails are indeed found in satellite data. This justiﬁes some hope that the prediction of strong contrails may be possible via general regression involving the dynamical state of the ambient atmosphere. S.M.; formal analysis, K.G.; data curation, S.M. and S.R.;


Introduction
Persistent contrails contribute substantially to the climate effect of aviation (for recent estimates see, e.g., [1][2][3]). Before the shutdown of aviation early this year as a consequence of the COVID-19 pandemic, aviation demand continued its steady increase and the annual total fuel consumption and emission of carbon dioxide increased at an annual rate of 1.1% per year in the period from 2010 to 2018, in spite of growing engine efficiency and load factors [3]. The growth rates are higher in non-OECD than in OECD countries which implies that contrail coverage might grow especially in regions where they were rare in the past. The annual and global mean radiative forcing (RF) of all contrails is of the order 50 mW/m 2 , which represents a third to one half of the total estimated RF of aviation. The instantaneous RF of a single contrail (iRF) can, however, be three orders of magnitude larger with both negative (cooling) and positive (warming) sign, such that the value reported by the Intergovernmental Panel on Climate Change (IPCC) and in the estimates mentioned above is just the small net effect that remains when large negative and positive contributions almost balance. The balance is not perfect, such that a small positive RF remains.
There are two atmospheric conditions necessary for persistent contrails. First, the so called Schmidt-Appleman [4] criterion must be fulfilled, which is a pure thermodynamic criterion that rules whether contrail formation is possible in a given situation or not. This criterion covers both short-term (lifetime seconds to a few minutes) and persistent contrails. Thus a second condition is required for persistent contrails: the ambient air must be in a state of ice supersaturation, that is, the relative than 60,000 flights contributed to the data base. Here we use only temperature and relative humidity. Flight position (longitude, latitude, pressure altitude) are given for each data record. The data contain also quality labels for each record. We use only data with quality flag "0", which means that the measurement is reliable. Records with RH > 1 (or RH > 100%, i.e., water supersaturation) are rejected.
For the current study we select months of data that represent the four seasons: January, April, July and October 2014. For these months we have simulation results available obtained using the Earth System model EMAC [18], see below. 230 flights from Europe to mainly Asia and North-America have been selected from the data base. Only cruise level data have been retained with pressure altitudes between 310 hPa and 190 hPa.
The routes covered in these data are mostly between Europe (Frankfurt, Germany) and either North America (Montreal, Canada, and destinations from the eastcoast to the westcoast in the United States), Africa (Khartoum and Addis Abeba), and Asia (mainly destinations in Japan, China and India, but also Dubai and Tehran).
MOZAIC data are recorded every 4 seconds. In order to avoid autocorrelation we select about 1% of all data points by invoking a (0,1)-uniform random number generator. We proceed with the analysis of an individual record only when the random variate is smaller than 0.01. This procedure secures the mutual independence of single data points. For each selected data point we then collect the corresponding data (humidity, temperature, ice cloud water content, radiation quantities) from ERA-5, see below.

Satellite Data
The Automatic Contrail Tracking Algorithm (ACTA) [19] is a tool to spot and track individual contrails during part of their life. It uses the Rapid Scan Service of the Meteosat Second Generation satellite to track contrails in 5 min intervals over the heavily flown areas of the North-Atlantic and Europe. The ACTA data set spans a full year and contains observational records of 2305 individual contrails, including their optical thickness, computed using the COCS algorithm [20], their effects on the radiation field at the top of the atmosphere, computed using the RRUMS algorithm [21], as well as their length and width. The data set is described in detail in [22].
The ACTA data set has been applied to determine the statistical distribution of contrail lifetimes [23], but here we will use it to study two cases where the RRUMS algorithm indicated very strong positive iRF. We selected two situations: (a) 25  The optical thickness of these contrails (or more precisely, contrail segments) are 0.4 to 0.5 and 0.6 to 0.7, respectively.

Reanalysis Data
ERA-5 data have been retrieved from the Copernicus Data Service [16]. We use dynamic and thermodynamic data on four pressure levels: 200, 225, 250 and 300 hPa, as well as radiation flux densities at the top of the atmosphere (i.e., the highest model level). Data are retrieved in two versions, one for the comparison with aircraft data, and another for a small selection of cases from satellite observations.
For comparison with aircraft data we use data from four months in 2014 selected above that each represents another season: January, April, July and October. The horizontal resolution is 1 • × 1 • and data cover the northern hemisphere between the equator and 60 • N. Temporal resolution is one hour. We selected the pressure level fields cloud ice water content, relative humidity and temperature. Note that relative humidity is to be understood with respect to ice at temperatures lower than −20 • C in ERA-5. Additionally we use the radiation quantities mean top downward shortwave radiation flux, mean top net longwave radiation flux, and mean top net shortwave radiation flux. In ERA-5 these quantities are flux densities in W/m 2 averaged over the hour preceeding the respective valid (output) time.
For the analysis of the two ACTA cases we retrieved reanalysis data from ERA-5 for two 20 • × 20 • longitude-latitude areas, with a spatial resolution of 0.25 • × 0.25 • , and for each 6 hours before and around the event with hourly resolution. Pressure levels are as above. For these case studies we retrieved dynamical quantities as well, that is, geopotential, divergence, vorticity, three velocity components (u, v, w), and potential vorticity.

EMAC Simulation Data
For further comparison we use results from the ECHAM/MESSy Atmospheric Chemistry (EMAC) model which is a numerical chemistry and climate simulation system that includes sub-models describing tropospheric and middle atmosphere processes and their interaction with oceans, land and human influences [24]. It uses the second version of the Modular Earth Submodel System (MESSy2) to link multi-institutional computer codes. The core atmospheric model is the 5th generation European Centre Hamburg general circulation model (ECHAM5) [25]. For the present study we applied EMAC (ECHAM5 version 5.3.02, MESSy version 2.54.0) in the T42L90MA-resolution, i.e., with a spherical truncation of T42 (corresponding to a quadratic Gaussian grid of approx. 2.8 by 2.8 degrees in latitude and longitude) with 90 vertical hybrid pressure levels up to 0.01 hPa. We perform a hindcast simulation with specified dynamics, i.e., ERA-Interim reanalysis data serve as time dependent boundary conditions for the model in order to perform a simulation with specified dynamics for the year 2014 [18]. For the retrieval of atmospheric parameters along the aircraft trajectories of IAGOS aircraft we use a diagnostics module for online sampling, which extracts the value of temperature, relative humidity, etc., in the grid cell at the respective geographic position and altitude with a temporal resolution of 12 min.

Methods
For the comparison of ERA-5 with the selected MOZAIC data we use the actual time and position of the aircraft, including its flight pressure level, and determine the corresponding atmospheric properties by quadrilinear interpolation in the meteorological data set. If the actual aircraft flight level is between 310 and 300 hPa, or between 200 and 190 hPa, interpolation in the vertical direction is not performed. Instead the values at 300 or 200 hPa are used.
For EMAC we use 660 points along various MOZAIC flights in the first half of April 2014. These points are all between 190 and 310 hPa, as before. They are each 12 min apart (time step of the model) which translates to about 150 km flight distance. For the comparison we match the coordinates and times with the closest MOZAIC measurement. We didn't use 12-min averages since contrail formation is a local phenomenon, not an average one.
For the comparison with the ACTA data we use more (6) time points than actually necessary (2). The idea behind this is to not only perform the best matching but to look as well at the development of the synoptic situation over six hours.
For the calculation of iRF we use radiation formulae from [26] with constants for "Myhre particles". These are pseudo-particles with wavelenght-independent optical properties. We consider this simplification sufficient for the present purpose. The required radiation quantities are included in the ERA-5 data. We identify outgoing longwave radiation (OLR) flux with mean top net longwave radiation flux, solar direct radiation (SDR) with mean top downward shortwave radiation flux, and reflected solar radiation (RSR) flux with mean top net shortwave radiation flux. The effective albedo is RSR/SDR during sunlit hours and zero at night. The solar constant is set to 1361 W m −2 . To compute optical thickness of contrails and nearby cirrus we use a simple formulation [27]. The necessary vertical extensions are assumed as 500 m for contrails and 1500 m for cirrus. Effective radii of ice crystals are taken as 30 µm for contrails and 60 µm for cirrus. Furthermore we need the ice concentration, for which we assume that all water mass in excess of ice saturation is condensed as ice. The iRF is computed for all grid and time points that correspond to a selected MOZAIC data record (if a persistent contrail is possible). The calculation is done using the relative humidity and temperature of both data sets, MOZAIC and ERA-5. Differences in iRF are thus due to differences in temperature and relative humidity alone.
The calculation of the Schmidt-Appleman criterion assumes an overall propulsion efficiency of η = 0.35.

Results
We begin with the statistical analysis using the comparison between MOZAIC and ERA-5 data to obtain general results. Then we proceed with the two ACTA cases.

Conditions for Contrail Formation and Persistence
The condition for contrail formation is that the so called Schmidt-Appleman criterion [4] is fulfilled. In words, it says that the mixture consisting of the expanding exhaust gases and ambient air must transiently become water saturated during the expansion of the plume. This can be cast into the physical terms of the maximum temperature, T max , at which contrail formation is still possible and the relative humidity RH max the mixture obtains at the moment when it has temperature T max . The Schmidt-Appleman criterion can then be expressed as follows: contrails form once the ambient temperature is below T max and (logical and) RH max ≥ 1 (100%). Note that RH max can be negative for situations with T > T max . In this case it has no physical meaning. We prefer to express the contrail formation conditions in terms of RH max because its value has implications for contrail microphysics as well: the fraction of exhaust soot particles that serve as condensation and ice forming nuclei increases with RH max − 1, that is, with the attained supersaturation with respect to liquid water. Figure 1 shows RH max from MOZAIC flights on the x-axis and from corresponding ERA-5 data on the y-axis. We use here a random selection of about 1% of all flight data in order to avoid autocorrelation (see Section 2.1). That is, the figure (and others to come) displays a sample. The data pairs follow the diagonal y = x line (black) quite well in the four considered months (colour coded). A seasonal variation can be seen with the lowest RH max values in summer and the highest in winter. The upper half of Table 1 casts the comparison between the two data sets into a 2 × 2 contingency table with the following entries: • Y/Y, hits: MOZAIC data show that a contrail is formed and the corresponding interpolated ERA-5 data have properties that would allow contrail formation as well. The interpretation of the contingency table is explained in Appendix A where we also introduce the equitable threat score, ETS, that we use to characterise the degree of agreement between the two data sets. Reasons for using ETS instead of other score types are given as well in the Appendix A. In short: ETS is unity for a perfect agreement, zero for a totally random relation between the two data sets, and negative when the two datasets tend to contradict each other. Note that the entries in the table are random numbers because of the random data selection method. The statistical stability of these values is discussed later. The rightmost column in the table lists the ETS. The values range from 0.55 in July to 0.85 in January. These values confirm the impression given in Figure 1 that the agreement between the two data sets with respect to the possibility of contrail formation is not bad. It is quite good in fall and winter, a bit less good in spring and, say, mediocre in summer.
Jan Apr Jul Oct Figure 1. Comparison of contrail formation conditions expressed as relative humidity in the exhaust plume in the moment when the temperature reaches T max , for MOZAIC (x-axis) and the corresponding ERA-5 data (y-axis). The seasons are plotted with intuitive colours: winter (January) blue, spring (April) green, summer (July) red, and fall (October) yellow. Contrails are possible when RH max ≥ 1.
Negative RH max signifies conditions warmer than the maxmimum temperature where contrails are possible (that is, the plume does not get as cold as T max ). The Schmidt-Appleman criterion rules only contrail formation but not its persistence. For a contrail to exist more than a few minutes, and thus to be relevant to climate at all, it needs to be formed in an ice-supersaturated region (ISSR), that is, the ambient relative humidity with respect to ice, RH i , must exceed unity (or 100%). Thus we compare now the measured RH i from MOZAIC with the corresponding data from ERA-5. The result is presented in Figure 2 as a scatterplot and the degree of agreement between the data sets is shown in the lower half of Table 1. This comparison is, unfortunately, quite discouraging. The plot shows data pairs scattered all over the place, with only a weak indication of concentration of the points to the diagonal line. The entries of the contingency table (lower part of Table 1) and the corresponding ETS values confirm that the degree of agreement between the two data sets is quite low. Here we have at least a weak coherence in winter and spring with ETS values of 0.25 and 0.11, respectively, but in the other seasons the relation is mostly random, although the computed slightly positive numbers are statistically significant. But this is merely the fortunate result of having a large number of cases. Thus, the summary of this analysis is that while contrail formation can be forecasted quite reliably, their much more important property of persistence cannot currently be predicted reliably. Figure 3 shows cumulative distribution functions (cdf) of instantaneous radiative forcing (iRF) for all cases where persistent contrails are diagnosed by MOZAIC, for the four months considered (colour coded) and separately for MOZAIC and the corresponding ERA-5 interpolated grid and time points. Please note that these curves refer to the mentioned random samples and not to all data. The most general observation is that cooling contrails (negative iRF) occur rarely, in less than 20% of the cases. Rather the cdfs rise steeply at zero and small positive iRF values which shows that most persistent contrails in the data sets have a slightly positive warming effect up to 10 W/m 2 . The sample mean values plus/minus one standard deviation of iRF for the four months (01, 04, 07, 10) are in the MOZAIC data (3.22 ± 5.27, 9.76 ± 8.91, 6.69 ± 11.09, 5.16 ± 6.27) W/m 2 and in the ERA-5 data (0.54 ± 1.13, 0.84 ± 1.27, 1.12 ± 2.82, 0.43 ± 0.69) W/m 2 . That is, the iRF tends generally to higher values in MOZAIC than in ERA-5. As stated in the methods section, the only difference in the calculation is that we use the measured humidity and temperature for the radiation calculation in the MOZAIC case, which then translates into differences in the optical thicknesses. It turns out that these MOZAIC optical thickness values are often larger than their reanalysis counterparts, as Figure 4 demonstrates.

Comparison with EMAC Data
For April 2014 we have a sample of 660 matched data points for comparison. They cover a longitude range from 83 • W to 138 • E and extend in latitude from 14 • N to 70 • N. The temperature ranges from about 207 to 239 K, the model has a slight negative bias (mean difference) of −0.2 K relative to the observations, with a standard deviation of 1.2 K. For the relative humidity with respect to ice there is an average positive model bias of 13-14% (RHi percent, with standard deviation of 24%), although MOZAIC data have much higher maximum values than EMAC: the MOZAIC maximum is 150%, that of EMAC is 113%, but already the 3rd quartile is smaller for MOZAIC than for EMAC, 68% vs. 84%. Obviously there are many points in the EMAC data with RHi just around saturation, whereas MOZAIC, although showing some very high values, has the bulk of data with low RHi values. Note that the temperature bias is too small to explain the bias in relative humidity; there must be a bias in absolute humidity as well. The comparison for the water vapour volume mixing ratio suffers from an outlier in the EMAC data. If this outlier is removed, we find, in accordance to the results regarding RHi, a positive bias of 15 ppmv (with a standard deviation of 62 ppmv) of the EMAC data with reference to MOZAIC.
In spite of these biases the prediction of contrail formation would be quite successful with the EMAC data. A plot like Figure 1 to compare the pairs of RH max looks as promising as in that figure (not shown), and indeed, the corresponding ETS is 0.79. But, unfortunately, the prediction of contrail persistence would be almost random with an ETS of 0.11. We see that EMAC with dynamics specified by using ECMWF meteorological fields as time dependent boundary conditions, has not better contrail formation and persistence prediction capabilities than the ECMWF model itself.

Case Studies
Let us begin with the case of 25 August 2008. This is a short story. We find that at the position and time when ACTA detected very strong contrails ERA-5 confirms the validity of the Schmidt-Appleman criterion with a RH max of about 1.3, but is far from ice supersaturation with RH i ≈ 0.6. This miss is not just a question of an error of a few degrees or a temporal mismatch of an hour or so. In fact, there is no ISSR closer than 10 • in both longitude and latitude directions. For this case ERA-5 predicts short-living contrails where ACTA found very strong persistent ones.
Fortunately the case of 15 May 2009 is more yielding. The Schmidt-Appleman criterion is fulfilled with RH max > 1.8 and there is an ISSR at the time and position of the contrail ( Figure 5). However, the degree of ice supersaturation ERA-5 indicates is low ( Figure 5, right panel); RH i slightly exceeds 100% which has consequences for the estimate of the optical properties of the contrail. While the ACTA data indicate an optical thickness between around 0.5 and 0.7, the contrail as predicted from ERA-5 data would perhaps just be visible under very clear conditions with an optical thickness between 0.01 and 0.03 (Figure 6, left). Consequently, the estimated iRF using ERA-5 data is with less than 1 W/m 2 much smaller than what the ACTA data suggest, more than 55 W/m 2 (Figure 6, right).    Although this looks discouraging, let us consider the meteorological situation for 15 May 2009 a bit closer. We identify the altitude of the contrails with the 225 hPa isobaric surface. At the location of the contrail there is a local minimum in the temperature field (215 K, Figure 7, left), which implies a local minimum in water vapour saturation pressure which hence favours the formation of ISSRs. The vertical air motion is relatively fast upward, about −0.2 Pa/s (or about 5-6 cm/s, Figure 7, right). These circumstances promote not only ice supersaturation but also cirrus formation and indeed the contrail is found close to the western edge of a cirrus cloud that is embedded in a more extended cirrus field in the upper troposphere (Figure 8, left). Right at the position of the contrails there is a local maximum of upward motion. The contrail is in tropospheric air, but the dynamic 2-PVU tropopause is nearby to the west, perhaps 40 km (Figure 8, right). On the 200 hPa surface the PV does already exceed 2 PVU at the coordinates of the contrails; obviously these two contrails are indeed situated in the uppermost troposphere, directly beneath the dynamic 2-PVU tropopause. The upward motion encounters a barrier and is thus accompanied with a relatively strong (horizontal) divergence of 30 to more than 40 · 10 −6 s −1 (Figure 9, left), which is typical for ISSRs but untypical for non-ISSRs [ Table 1 of [5]]. The relative vorticity has a very strong gradient at the time and position of the contrails. On Figure 9, right, the contrails are situated just at the boundary between cyclonic and anticyclonic rotation. It seems that these Big Hits are situated at the top of an anticyclonic vortex that moves eastward. At 200 and 225 hPa the anticyclonic system is narrow. The vorticity in this anticyclonic system has at 1600 UTC quite large negative values that are again typical for ISSRs, around −100 · 10 −6 s −1 , but the values at the contrail location increase steeply while the whole system moves west. At 1900 UTC the location is in cyclonic rotation and belongs to the stratosphere [or perhaps better, the extra-tropical transition zone, [28]]. Accordingly, there is then a strong decrease of the relative humidity to substantial subsaturation.
The ACTA cases thus show an ambivalent picture. In one case (25 August 2008) the forecast and observation disagree seriously, but in the second case (15 May 2009) there are a number of features present at the location of the Big Hits that are typical for ice supersaturation, in particular the dynamic situation characterised by cold air, upward motion close to the tropopause, strong divergence and anticyclonic rotation. The strong discrepancy between the retrieved and forecasted optical thickness and hence instantaneous radiative forcing results from the very small supersaturation in the reanalysis. The presence of nearby cirrus clouds in the model may yield an explanation, namely that supersaturation has been consumed by cirrus formation and growth. If cirrus formation takes place too early in the model, supersaturation is consumed too early as well and then there is not enough water to form thick contrails.

Statistical Stability
In order to avoid effects of autocorrelation in the data analysis we selected randomly about 1% of the flight data to determine the entries in the contingency table. These entries are thus random numbers and the resulting ETS is random as well. If we had another random sample, the values would differ. Here we study how variable the numbers are when many samples are drawn. For this purpose we use all data of April 2014. From these 216,809 data we draw 1000 samples of size 2000 (i.e., N = 2000) and determine for each sample the ETS values for both the Schmidt-Appleman criterion and for supersaturation. From the resulting 1000 ETS values we determine their statistical properties. We obtain the following results: • For the Schmidt-Appleman criterion we get an ETS range from 0.616 to 0.795, the interquartile range is 0.709 to 0.740, the mean value plus/minus one standard deviation is 0.725 ± 0.024.

•
For ice supersaturation we get an ETS range from 0.018 to 0.149, the interquartile range is 0.068 to 0.099, the mean value plus/minus one standard deviation is 0.083 ± 0.023.
When we use all data, the corresponding ETS values are 0.72 and 0.08, respectively, quite close to the sample mean. The entries in Table 1 are 0.73, which is close to the mean as well, and 0.11, which is in the upper quartile and thus not close to the mean.

Spin-Up Effects
Ice supersaturation was introduced in the ECMWF forecast system in 2006 [29], but only three years later in the analysis system. During this time the forecast ice supersaturation displayed spin-up in both extent and degree [30], leading to the recommendation to use forecasts with, say, 24 h lead time for applications that need supersaturation. However, since September 2009 this problem seems to be solved because the analysis system allows ice supersaturation as well. Ref. [31] show (in their (Figure 6, right)) that analyses and forecasts with 18 to 24 h lead time show quite similar statistics of relative humidity with respect to ice after September 2009. ERA-5 provides only short forecasts (18 h) so that tests of spin-up effects are confined to a comparison of 1 to 6 h lead time to 13-18 h lead time. We have tested these differences for the April data without finding significant changes relative to the evaluation presented above.
Ref. [31] show also that in-situ measurements of relative humidity from the CARIBIC measurement program [32] yields a RHi distribution with considerably higher values than both ECMWF analysis and forecast. This is probably due to the necessary assumption in the model that as soon a cirrus cloud forms all supersaturation within the cloud is immediately converted to ice (which actually can take hours, depending on crystal size distribution and number concentration). This might explain why the model indicates much thinner contrails than are found in the data sets.

Alternative Vertical Interpolation Procedures
So far we have used linear interpolation in p (pressure), which is not the best choice when a quantity like temperature rather varies linearly with altitude. We tested alternatives by replacing the interpolation in p with an interpolation in log p which is like a linear interpolation in geometric height (according to the barometric height formula). We also tested still another alternative weighting following [33], who use p κ instead of p for vertical interpolation, where κ = R/c p is the adiabatic coefficient for dry air (0.286). This measure allows a better estimation of the tropopause pressure in data with coarse vertical resolution like reanalyses. We have tried both methods for a 1% sample of the April data: for the Schmidt-Appleman criterion we obtain ETS = 0.73, and for ice supersaturation ETS = 0.11. These are the same values that we had before in the p-system. Indeed it turns out that the alternative weights differ only very little from the weights in the linear p interpolation. The reason is that the pressure ratios of the ERA-5 levels (e.g., 300 to 250 hPa) are close to unity, and thus the functions

The Relative Humidity Field
The results of this analysis seem to indicate that the relative humidity field in ERA-5 is unreliable, at least for the considered upper-tropospheric pressure levels. But this impression is not correct. In fact, the linear correlation between RH i from MOZAIC and ERA-5 is relatively high for the drawn samples, ranging from 0.67 for October to 0.92 in July. However, high linear correlation is not enough for RH i from ERA-5 to be a good predictor of contrail persistence. If we restrict the 4 samples to cases where both MOZAIC and ERA-5 find supersaturation, we find mean RH i differences (ERA-5 minus MOZAIC) plus/minus corresponding standard deviations of (−0.14 ± 0.12, −0.17 ± 0.15, −0.02 ± 0.06, −0.02 ± 0.08) for the four months, respectively; that is, the mean difference is in all cases negative, albeit insignificantly in July where ice supersaturation is rare, and as well insignificantly in October where the amount of flight data is lower than in the other months. These results indicate that ERA-5 underpredicts high humidity values and this is consistent with the result from above that the diagnosed optical thickness and resulting iRF values from ERA-5 are much smaller than the estimates based on the measurements. A possible reason for the underprediction is that the reanalysis provides grid-mean humidity values. When a grid box contains a cloud, the humidity in the cloudy part of the box is immediately (within one time step) reduced to saturation. The provided value of RH i is a weighted mean: RH i = C × 1 + (1 − C) × RH i,clear . When the humidity in the clear part of the grid box, RH i,clear , increases and surpasses a critical value, the cloud fraction C increases as well; this balancing effect inhibits the increase of grid mean humidity to observed maxima. For a successful forecast of contrail persistence it would be probably better to use the clear sky humidity values, RH i,clear . Although the clear-sky value can in principle be computed from RH i and cloud fraction C, it is not a good strategy for our purpose, since RH i,clear becomes undefined when C approaches unity. Instead, one can simply try to choose instead of the grid-mean RH i a C-weighted mean between the in-cloud value 1 and a maximum value that accounts for the threshold for homogeneous nucleation, given in Equation (4) of [29]. We tried both a linear (weights C and (1 − C)) and a non-linear (weights C 1/4 and (1 − C 1/4 )) weighting; the resulting ETS values did not improve for contrail formation, but slightly for the prediction of persistence. However, with ETS values of (0.31/0.29) for January, 0.12 for both weightings for April, (0.06/0.07) for July, and (0.11/0.07) for October the resulting improvements are too low (and indeed within the statistical noise determined above) to base a successful contrail persistence prediction on this kind of clear-sky relative humidity.

Conclusions
In order to assess whether a promising strategy to eliminate a considerable share of contrail's warming impact on climate is ready to be realized, namely to avoid the formation of Big Hits, which are contrails with a particularly large positive individual radiative forcing [15], we tested whether a state-of-the-art weather prediction model, represented by the ERA-5 reanalysis, is able to predict reliably (i) the formation of contrails, (ii) their persistence, and (iii) the magnitude of their instantaneous radiative forcing. As a representation of the actual conditions in the atmosphere we used measurements conducted on regular commercial passenger flights in the framework of the IAGOS/MOZAIC project. Additionally we used data from a global climate simulation from the EMAC model, extracted along the flight paths of a number of MOZAIC flights. These data have been analysed in a statistical fashion. Two case studies have been presented using strong contrail cases observed by satellites, that are available in the ACTA data.
It turns out that the thermodynamic condition for contrail formation can be predicted quite reliably, and this seems to work both for a pure weather prediction model and for a climate model in specified dynamics mode. The comparison of the results from the climate model simulation with the MOZAIC/IAGOS data shows that while the simulated temperature deviates only little from the measured one, the deviations in relative humidity are large. It is apparently the quality of the temperature prediction that leads to a promising result concerning contrail formation.
However, the problems of both weather and climate models to predict the relative humidity correctly, and in particular the underestimation of frequency and degree of ice supersaturation [see also [31]], lead to very low reliability in a forecast of contrail persistence. The underprediction of the degree of supersaturation leads to too low estimates of contrail optical thickness and thus to too low estimates of instantaneous radiative forcing. The results do not get substantially better with alternative vertical interpolation procedures or longer prediction lead times, nor with modified relative humidity values that might rather represent cloud-free areas than the standard grid-mean values that are provided in the reanalysis data.
We admit that this study may lack representativeness, since it covers only four months in one year and two case studies. We might have picked unlucky situations and the results might be better in other years or for other cases. Also using forecasts instead of reanalysis data may result in improved predictions of ISSR. The observations have their own uncertainties, but the statistical approach leads to cancellation of random errors. The determination of optical thickness and radiative forcing for the ACTA data is a problem when a contrail is located in cloudy background, because the determination of radiation fluxes for a cloud free environment turns out difficult. Thus there is a chance that the situation is better than what we have found here, but the necessary improvement is substantial.
Prediction of ice supersaturation at the right time and correct locations is basic to all operational contrail avoidance strategies. Some of them are quite complex and sophisticated, e.g., [34,35], but the current state of their concepts assume that the weather forecast is correct and that ISSRs will be where and when they are forecast. Before these concepts can be made operational, the prediction of ice supersaturation must become more reliable.
It seems that this reanalysis data has currently only limited capabilities for estimating real-world contrail formation along an aircraft trajectory. However, the ACTA data demonstrate that where thick contrails are found the model shows peculiar dynamical properties that are characteristic for ice supersaturation. This fact might offer new ways for prediction of Big Hits via general regression methods. Acknowledgments: This work contributes to an ambition initiated by John Green and the Greener by Design working group of the Royal Aeronautical Society to take steps towards a real trial of reducing contrail formation by ATM measures. We thank Margarita Vázquez-Navarro (EUMETSAT) for provision of the ACTA data. Furthermore we thank Luca Bugliaro for critically reading the manuscript before submission and for his suggestions for improvement. A discussion with Robert Sausen on the methods was fruitful for the interpretation of the results. ERA-5 data are provided by the European Copernicus Data Service. Two anonymous reviewers helped to improve the paper. We are grateful for their positive assessment.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A. The Equitable Threat Score
There are several possibilities to describe how a certain dichotomous (e.g., 0/1, Y/N, black/white, etc.) feature in one set of data is reflected (or predicted) in another. Here we use 2 × 2 contingency tables together with a certain measure of skill, the equitable threat score ETS. This and further skill scores are listed in [36].
A 2 × 2 contingency table has 4 entries that are usually labelled hit (e.g., Y/Y), miss (Y/N), false alarm (N/Y), and correct negative (N/N). Let the numbers of these events be a, b, c, d so that a is the number of hits, d is the number of correct negatives, and b, c are the number of mixed results. The two datasets compare the better the larger a, d is compared to b, c and the degree of similarity can be expressed with skill scores. The nature of the data should guide the choice of the skill score. If the feature one is looking for is a rare event, d will be the largest number in the contingency table and this can feign a quite positive impression of the similarity of the two data sets, that is merely due to the rarity of the feature. It is thus appropriate to select a skill score that avoids to use d as much as possible. Furthermore, b and c should appear symmetrically in the formula if the same trust is given to both data sets or if one does not want to make one of the data sets the reference. The equitable threat score, ETS = (a − r)/(a + b + c − r) with r = (a + b)(a + c)/(a + b + c + d) has the desired properties. ETS = 1 for a perfect match (b = c = 0). For a completely random relation between the two data sets (a = b = c = d), ETS = 0, and for a completely inverse relation (a = d = 0) ETS is negative. In the case that N/N is predominant (i.e., the feature one is looking for occurs rarely), ETS approaches the threat score TS = a/(a + b + c), that is, the predominant nature of d is filtered.
Monte Carlo experiments show that a purely random relation of the data gives an ETS of very nearly zero, since N = a + b + c + d is quite large. Thus, all ETS values in the table indicate a statistically significant, albeit partly very weak agreement beween the data sets.