On Hens, Eggs, Temperatures and CO 2 : Causal Links in Earth’s Atmosphere

: The scientiﬁc and wider interest in the relationship between atmospheric temperature ( T ) and concentration of carbon dioxide ([CO 2 ]) has been enormous. According to the commonly assumed causality link, increased [CO 2 ] causes a rise in T . However, recent developments cast doubts on this assumption by showing that this relationship is of the hen-or-egg type, or even unidirectional but opposite in direction to the commonly assumed one. These developments include an advanced theoretical framework for testing causality based on the stochastic evaluation of a potentially causal link between two processes via the notion of the impulse response function. Using, on the one hand, this framework and further expanding it and, on the other hand, the longest available modern time series of globally averaged T and [CO 2 ], we shed light on the potential causality between these two processes. All evidence resulting from the analyses suggests a unidirectional, potentially causal link with T as the cause and [CO 2 ] as the effect. That link is not represented in climate models, whose outputs are also examined using the same framework, resulting in a link opposite the one found when the real measurements are used.


Introduction
A recent (2020) study [2] examining data from measurements of temperature (T) and atmospheric concentration of carbon dioxide ([CO 2 ]) challenged the conventional, and well established, wisdom that increased [CO 2 ] causes an increase in temperature.The study examined whether the commonly assumed causal chain is supported by data or, alternatively, whether a hen-or-egg (HOE) causal relationship is more plausible.The phrase "hen or egg" (originally in Greek,

Introduction
A recent (2020) study [2] examining data from measurements of temperature () and atmospheric concentration of carbon dioxide ([CO2]) challenged the conventional, and well established, wisdom that increased [CO2] causes an increase in temperature.The study examined whether the commonly assumed causal chain is supported by data or, alternatively, whether a hen-or-egg (HOE) causal relationship is more plausible.The phrase "hen or egg" (originally in Greek, ὄρνις ἢ ᾠὸν) was first used in a philosophical context by Plutarch [3] to describe situations in which it is not clear which of two interrelated events or processes is the cause and which the effect.) was first used in a philosophical context by Plutarch [3] to describe situations in which it is not clear which of two interrelated events or processes is the cause and which the effect.
The study examined a case where the causal link is not between two events but between two processes, represented as stochastic processes.Denoting these processes as x(t) and y(t) (where we follow the Dutch notational convention of underlining stochastic variables), in a typical causal system, denoted as x → y , earlier realizations of x(t) affect the current realization of y(t).In an HOE causal system, earlier realizations of x(t) affect the current realization of y(t), but also earlier realizations of y(t) affect the current realization of x(t).
In terms of its applications, the study used global temperature data from satellites (University of Alabama in Huntsville-UAH) and ground-based (CRUTEM.4.6.0.0 global T 2 m land temperature) and [CO 2 ] data at several sites (Mauna Loa, HI, USA; Barrow, AK, USA; South Pole; global average) with monthly time steps for the period 1980-2019.An innovative element of this study was that it explained the reasons why using the original T and [CO 2 ] data series yielded spurious results, and it proposed using the changes (differences in time) thereof instead.We note that differencing is of very common use in economics literature (e.g., [4,5]).In particular, for the [CO 2 ] it proposed taking the logarithm before differencing (something resembling techniques used in economics [5]) and thus the time series that were correlated were ∆T and ∆ln[CO 2 ], where the differences are taken over 12 months.By studying lagged correlations of the two, the study asserted that, while both causality directions exist, the results support the hypothesis that the dominant direction is T → CO 2 .Changes in [CO 2 ] follow changes in T by about six months on a monthly scale or about one year on an annual scale.In turn, the study attempted to interpret this mechanism by referring to biochemical reactions, since at higher temperatures soil respiration, and hence CO 2 emission, increases.
In a subsequent (2022) two-paper study, Koutsoyiannis et al. [6,7] developed a more complete theoretical framework by revisiting causality over the entire knowledge tree, from philosophy to science and to scientific and technological application.By reviewing various approaches to causality, the study located several problems in identifying causal links.Hence, the study developed the theoretical background of a stochastic approach to causality, with the objective of formulating necessary conditions that are operationally useful in identifying or falsifying causality claims.It also developed an effective algorithm applicable to large-scale open systems, which are neither controllable nor repeatable.The proposed framework was illustrated and showcased in a number of case studies, some of which were controlled synthetic examples and others real-world ones arising from interesting scientific problems in geophysics and, in particular, hydrology and climatology.The relationship of globally averaged temperature with [CO 2 ] concentration (again in terms of differences ∆T and ∆ln[CO 2 ] over 12 months) was included in the thirty case studies presented.In brief, the related analyses pointed to the following (quoting from [7]): Clearly, the results [. ..] suggest a (mono-directional) potentially causal system with T as the cause and [CO 2 ] as the effect.Hence the common perception that increasing [CO 2 ] causes increased T can be excluded as it violates the necessary condition for this causality direction.
[. ..] in other words, it is the increase of temperature that caused increased CO 2 concentration.Though this conclusion may sound counterintuitive at first glance, because it contradicts common perception [. ..], in fact it is reasonable.The temperature increase began at the end of the Little Ice Period, in the early nineteenth century, when human CO 2 emissions were negligible [. ..].
However, the scope of that study [6,7] was to formulate a general methodology for the detection of causality-in particular, necessary conditions thereof-rather than to study a specific system in detail.Therefore, no detailed modeling was made in the case studies, including the hydrological and climatic applications.However, given the enormous interest in the T-[CO 2 ] relationship, here we will go deeper into this.
Specifically, this paper, after summarizing the methodology (Section 2) and the data used (Section 3), focuses on the latter relationship with the following objectives: 1.
To expand the time frame of the investigation backward and forward by exploiting the longest available data series (Section 4). 2.
To check whether seasonality, as reflected in different phases of [CO 2 ] time series at different latitudes, plays any role that could modify or possibly reverse the detected causality relationship (Section 5). 3.
To propose and apply a method for investigating the effect of the timescale in causality detection (Section 6). 4.
To extend the methodology for disambiguating cases in which the type of causality, HOE or unidirectional, is not quite clear (Section 7). 5.
To exploit the methodology in defining a type of data analysis that, regardless of the detection of causality per se, could shed light on modeling performance by comparing observational data with model results (Section 8).6.
To discuss possible extensions of the scope of the methodology, i.e., from detecting possible causality to building a more detailed model of stochastic type (Section 9). 7.
To provide logical support for the findings by revisiting the carbon balance in the atmosphere (Appendix A.1) and investigating additional processes that may have caused increases in temperature (Appendices A.2-A.4).

Summary of the Stochastic Approach to Causality
The methodology in [6,7] is based on the impulse response function (IRF) between two processes x(t), y(t), denoted as g(h) where h denotes time lag, based on the convolution where v(t) is another stochastic process representing the part that is not explained by the causal link.To see that the function g(h) is the impulse response function (IRF) of the system (x(t), y(t)), we set v(t) ≡ 0 and x(t) = δ(t) (the Dirac delta function, representing an impulse of infinite amplitude at t = 0 and attaining the value of 0 for t = 0), and we readily get y(t) = g(t).
On the other hand, if we set g(h) = b δ(h − h 0 ) (with constant b and h 0 ), which means that the IRF is zero for every lag except for the specific lag h 0 , then Equation (1) becomes y(t) = bx(t − h 0 ) + v(t).This special case is equivalent to simply correlating y(t) with x(t − h 0 ) at any time instance t.It is easy to find (cf.linear regression) that in this case the multiplicative constant b is the correlation coefficient of y(t) and x(t − h 0 ) multiplied by the ratio of the standard deviations of the two processes.In general, however, we expect that the actual g(h) is not a Dirac delta function but a continuous one over some domain.Thus, the IRF is a much more powerful tool than correlation as it integrates the correlations in the entire spectrum of lags.
For any two processes x(t) and y(t), Equation (1) has infinitely many solutions in terms of the function g(h) and the process v(t).An obvious and trivial one is g(h) ≡ 0, y(t) ≡ v(t).The sought solution is the one that corresponds to the minimum variance of v(t), called the least-squares solution.Equivalently, this solution maximizes the explained variance ratio: where γ υ and γ y denote the variances of the processes v(t) and y(t), respectively.(This is similar as in the correlation at a single time lag.)If the attained maximum e is close to zero, this will mean that the two processes are uncorrelated and thus no causality condition can exist between them (a non-causal system).Otherwise, we may assume, without loss of generality, that processes x(t) and y(t) are positively correlated (i.e., an increase in x(t) would result in an increase in y(t)).In the opposite case (if the processes are negatively correlated), by multiplying one of the two series by −1 we make the correlation positive.Therefore, we impose a nonnegativity constraint for the sought IRF, g(h) ≥ 0 In the estimation of IRF, we may also impose a roughness constraint, where E is the roughness of the IRF determined in terms of the second derivative of g(h): Further justification for the two constraints is provided in [6].In applications, the continuous time representation is replaced by a discrete time one, the IRF becomes a sequence of values g j , where j denotes the time lag, the infinite range of the time lag h becomes a finite window of time lag j, specified in the interval [−J, J], the integrals are replaced by sums, and the true values of statistics are replaced by estimates.Furthermore, the roughness E is standardized as where ε ranges in (0,1) for nonnegative g j .The determination of the IRF ordinates g j is thus formulated as a constrained optimization problem, whose numerical solution is always possible, simple and fast, and can be attained even by commonly available solvers, e.g., in commercial or open source spreadsheet software.We note that in applications, each of the directions x → y and y → x is investigated separately, as there is no symmetry (or antisymmetry) in the produced IRFs in the two directions.When we refer to direction y → x we mean that we interchange the time series x and y and still estimate the IRF in the same way, as described in our equations in which the direction x → y is assumed.
In applications investigating causality, we start by assuming a potentially hen-or-egg (HOE) causal model with a fixed number of weights g j , j = −J, . . ., 0, . . .J, where in most of the cases studied in [7] the value J = 20 was adopted, and this is also followed here.Depending on the results of the estimation procedure, if e is non-negligible, the system is deemed:

•
Potentially HOE causal if we have g j > 0 for both some positive and some negative lags j,

•
Potentially causal if g j = 0 for all j < 0, and Potentially anticausal if g j = 0 for all j > 0 Note that anticausal means that the actual causality direction is opposite to that assumed.These three cases are graphically illustrated in Figure 1.The adverb "potentially" in the above characterizations highlights the fact that the conditions tested provide necessary but not sufficient conditions for causality.
In a potentially causal (or anticausal) system, the time order is explicitly reflected in the above characterizations.In a potentially HOE causal system, the time order needs to be clarified by defining the principal direction.The most natural indices for this are: (a) the time lag h c maximizing the absolute value of cross-covariance; (b) the mean (time average) µ h of the function g(h); and (c) the median h 1/2 of the function g(h).We note that h c , which was the basis in the original study [2], is completely independent from the g(h).The other two, µ h and h 1/2 , depend on the g(h) that is determined.However, extensive analyses in [7] showed that their estimation is quite robust; for example, the use of the roughness constraint, while affecting the resulting g(h), practically does not affect the values of µ h and h 1/2 .In general, the characteristic lags µ h and h 1/2 do not differ substantially from each other, and any of them could be chosen for further use.Here we prefer to note both, as well as h c , as they all provide useful information about the relationship of the two processes (like in the case of using both mean and median in the characterization of the probability distribution of a single variable).
average)  of the function  ℎ ; and (c) the median ℎ / of the function  ℎ .We note that ℎ , which was the basis in the original study [2], is completely independent from the  ℎ .The other two,  and ℎ / , depend on the  ℎ that is determined.However, extensive analyses in [7] showed that their estimation is quite robust; for example, the use of the roughness constraint, while affecting the resulting  ℎ , practically does not affect the values of  and ℎ / .In general, the characteristic lags  and ℎ / do not differ substantially from each other, and any of them could be chosen for further use.Here we prefer to note both, as well as ℎ , as they all provide useful information about the relationship of the two processes (like in the case of using both mean and median in the characterization of the probability distribution of a single variable).Needless to say, the literature offers a spectrum of alternative methods for estimating an IRF, using different tools such as auto-and cross-correlations functions [8,9], power spectra and cross-spectra based on the Fourier transform [10] or on a wavelet transform [11], as well as principal component analysis [12].The method described above has some advantages over these alternatives, as it is a direct method that can work with time series of observations per se, rather than transformations thereof, being easily understandable and reproducible by any reader using simple computational means.Additionally, it is more reliable as it avoids using uncertain estimates of autocorrelation functions or their more uncertain transformations, such as the power spectrum, i.e., the Fourier transform of the autocorrelation function.Note that here we also used autocorrelation, but only to validate and confirm our results-not in the estimation procedure.
Additionally, as detailed in [6], our method differs conceptually and computationally from the so-called "Granger causality" [13,14] (a misnomer, as the method does not infer causality but prediction) and more recently reformulated in the study of Moraffah et al. [15], which also discusses other similar methods.Finally, our method has substantial differences from a framework proposed by Pearl and collaborators [16][17][18] as again discussed in detail in [6].Needless to say, the literature offers a spectrum of alternative methods for estimating an IRF, using different tools such as auto-and cross-correlations functions [8,9], power spectra and cross-spectra based on the Fourier transform [10] or on a wavelet transform [11], as well as principal component analysis [12].The method described above has some advantages over these alternatives, as it is a direct method that can work with time series of observations per se, rather than transformations thereof, being easily understandable and reproducible by any reader using simple computational means.Additionally, it is more reliable as it avoids using uncertain estimates of autocorrelation functions or their more uncertain transformations, such as the power spectrum, i.e., the Fourier transform of the autocorrelation function.Note that here we also used autocorrelation, but only to validate and confirm our results-not in the estimation procedure.
Additionally, as detailed in [6], our method differs conceptually and computationally from the so-called "Granger causality" [13,14] (a misnomer, as the method does not infer causality but prediction) and more recently reformulated in the study of Moraffah et al. [15], which also discusses other similar methods.Finally, our method has substantial differences from a framework proposed by Pearl and collaborators [16][17][18] as again discussed in detail in [6].

Data and Case Studies
To explore the T-[CO 2 ] relationship, case studies #23 and #24 in [7] used satellite-based temperature series (UAH) for the lower troposphere and [CO 2 ] data from Mauna Loa.Temperature data of the other two satellite levels for the troposphere were also examined, where the results were very similar to those reported for case studies #23 and #24.
Here we present additional case studies, listed in Table 1.In addition to the satellitebased temperature series in [7], here we use surface data (at 2 m) from the NCEP/NCAR Reanalysis 1 by the National Centers for Environmental Prediction (NCEP) and the National Center for Atmospheric Research (NCAR) [19], which are publicly available.The temporal coverage of the NCEP/NCAR Reanalysis 1 includes data collected four times daily to provide daily and monthly values from 1948 to the present at a horizontal resolution of 1.88 • (∼210 km at the equator).It uses a state-of-the-art analysis and forecast system to perform data assimilation using observations and a numerical weather prediction model.The data assimilation and the model used are identical to the global system implemented operationally at NCEP, except for the horizontal resolution.A large subset of the data is available as daily and monthly averages.Table 1.Main case studies and resulting summary indices.∆t is the time step of differencing; h c is the time lag maximizing the cross-covariance c yx (h), or equivalently the cross-correlation r yx (h) := c yx (h)/ c xx (0)c yy (0); µ h is the mean (time average) of the function g(h); h 1/2 is the median of the function g(h); e is the explained variance ratio; and ε is the roughness ratio.The case studies #1 and #2 are contained in [7]  * The roughness was calculated without considering the second derivative at zero.
For CO 2 concentration, in addition to the Mauna Loa data set, which we updated to include the latest measurements of more than one year, we also added the South Pole data set compiled by the US National Oceanic and Atmospheric Administration (NOAA).The measurements began in 1975 and are taken for two cases, flask and in situ, of which we used the former on a mean monthly basis, except in a few cases of missing data where we filled in with in situ data.Some of the analyses presented here refer to climate model outputs.Here we used the mean (CMIP6 mean) of the output series of the Coupled Model Intercomparison Project (CMIP6) averaged over the globe.The model outputs also go back to the past, extending over the time period of 1850-2100.When we studied past behaviors, we used the data up to 2021, as in the other case studies, but in some cases, we also used the entire data set up to 2100.In the latter case, among the scenarios provided, we used Scenario Shared Socio-Economic Pathways 245 (SSP2-4.5, [20]).
The [CO 2 ] time series used in climate models for scenario SSP2-4.5 have also been retrieved and analyzed.We note, though, that these time series are given on an annual timescale, unlike all other data that are provided on a monthly scale.
To check whether the results of our methodology would change if we chose any particular member of the ensemble instead of the mean, we also retrieved outputs from a single model, namely the UK Earth System Model (UKESM1 [21]).For the sake of brevity of this paper, we give this latter analysis (whose results eventually do not differ from those of the CMIP6 mean) in the Supplementary Information (and therefore we do not list it in Table 1).The main case studies in which these data were used are summarized in Table 1, along with the summary indices of g j that are related to potential causality.The details of the case studies are given in the following sections.In all of them, we started by assuming a potentially hen-or-egg (HOE) causal model with a fixed number of weights g j , namely 41 (i.e., J = 20, as in [7]).
Additional case studies examining additional data sets have also been performed but are kept out of the body of the paper and contained in Appendices A.2-A.4.All data used are available online for free and the related links are given in the Data availability section.

Investigating the Maximum Time Span That Modern Data Allow
The longest time series of systematic [CO 2 ] measurements is that of Mauna Loa, which began in 1958.The global temperature at 2 m of the NCEP/NCAR reanalysis series goes back to 1948 and thus allows studying the T-[CO 2 ] relationship for the period 1958-2022 (two additional decades of data in comparison to those studied in [7]).
As seen in Table 1, this provided better characteristics than the UAH/Mauna Loa case examined in [7]: maximum cross correlation r yx (h c ) = 0.56 against 0.48; explained variance e = 34% against 31%, at time lags greater than or equal to those in [7] (close to 8 months).As seen in Figure 2, again, we have a potentially causal system with the directionality being ∆T → ∆ln[CO 2 ] , while the possible causality ∆ln[CO 2 ] → ∆T can be excluded as not satisfying the necessary condition of time precedence.

Investigating the Possible Effect of Seasonality
To enrich our results and also check whether seasonality, as reflected in different phases of [CO2] time series at different latitudes, could modify or possibly reverse the detected causality relationship, we have conducted an additional analysis with the South Pole [CO2] measurements, which began in 1975.
As seen in Table 1, this again provided better characteristics than the UAH/Mauna Loa case examined in [7] in terms of explained variance (e = 35% against 31%) and time lags higher than in [7] (close to 10 months).As seen in Figure 3, again, we have a poten-

Investigating the Possible Effect of Seasonality
To enrich our results and also check whether seasonality, as reflected in different phases of [CO 2 ] time series at different latitudes, could modify or possibly reverse the detected causality relationship, we have conducted an additional analysis with the South Pole [CO 2 ] measurements, which began in 1975.
As seen in Table 1, this again provided better characteristics than the UAH/Mauna Loa case examined in [7] in terms of explained variance (e = 35% against 31%) and time lags higher than in [7] (close to 10 months).As seen in Figure 3, again, we have a potentially causal system with the directionality being ∆T → ∆ln[CO 2 ] , while again the possible causality ∆ln[CO 2 ] → ∆T can be excluded as not satisfying the necessary condition of time precedence.

On the Timescale of Validity of Results
Overall, our results in this paper are those allowed by the available data at the time periods and timescales resolved by those data-more than 6 decades at the monthly scale.What would happen at other times-or if the data sets were longer and would resolve intermediate or even longer timescales-we cannot tell.The climate system is too complex to allow for hasty generalizations.
One may not exclude the case that the timescale of analysis is important in a detected potential causality relationship and that the latter would change if the timescale changed.This raises the question: up to which timescale does the validity of certain results hold?Certainly, this timescale is a fraction (not greater than 1/2) of the length of the time series.An indication can be obtained by inspecting the empirical cross-covariance function and how this compares with the theoretical one.As shown in [6], the latter (  ℎ ∶ cov   ℎ ,   ) for lag ℎ , is related to the autocovariance function of  ,  ℎ ∶ cov   ℎ ,   , by: Thus,  ℎ can be estimated from the IRF and the empirical  ℎ from the data.Figure 4 shows the autocorrelation and cross-correlation functions (which are covariances standardized by standard deviations).For the Δ → Δln CO₂ case (i.e., case study #3; upper panels of Figure 4), the reconstructed cross-correlation function, calculated from the IRF (seen in Figure 2) and the empirical autocorrelation function of temperature (seen in the upper left panel of Figure 4) using the discretized version of equation ( 7), agrees well with the empirical function for time lags up to ±10 years, i.e., spanning 20 years (1/3 of the series length).As time lag has an equivalence relationship with timescale [22], we may conclude that the potential causality relationship detected is valid for timescales of two decades.For comparison, the lower panels of Figure 4 show the reverse case, Δln CO₂ → Summarizing the two case studies in Sections 4 and 5-and similarly to what we found in [7]-we note that:

•
The system T-[CO 2 ] appears to be potentially causal (unidirectional) in the direction ∆T → ∆ln[CO 2 ] , rather than hen-or-egg causal.
(ranging from about 7 to about 10 months), and negative in the direction ∆ln[CO 2 ] → ∆T .

On the Timescale of Validity of Results
Overall, our results in this paper are those allowed by the available data at the time periods and timescales resolved by those data-more than 6 decades at the monthly scale.What would happen at other times-or if the data sets were longer and would resolve intermediate or even longer timescales-we cannot tell.The climate system is too complex to allow for hasty generalizations.
One may not exclude the case that the timescale of analysis is important in a detected potential causality relationship and that the latter would change if the timescale changed.This raises the question: up to which timescale does the validity of certain results hold?Certainly, this timescale is a fraction (not greater than 1/2) of the length of the time series.An indication can be obtained by inspecting the empirical cross-covariance function and how this compares with the theoretical one.As shown in [6], the latter (c yx (h) := cov y(t + h), x(t) ) for lag h, is related to the autocovariance function of x, c xx (h) := cov[x(t + h), x(t)], by: Thus, c yx (h) can be estimated from the IRF and the empirical c xx (h) from the data.Figure 4 shows the autocorrelation and cross-correlation functions (which are covariances standardized by standard deviations).For the ∆T → ∆ln[CO 2 ] case (i.e., case study #3; upper panels of Figure 4), the reconstructed cross-correlation function, calculated from the IRF (seen in Figure 2) and the empirical autocorrelation function of temperature (seen in the upper left panel of Figure 4) using the discretized version of equation ( 7), agrees well with the empirical function for time lags up to ±10 years, i.e., spanning 20 years (1/3 of the series length).As time lag has an equivalence relationship with timescale [22], we may conclude that the potential causality relationship detected is valid for timescales of two decades.For comparison, the lower panels of Figure 4 show the reverse case, ∆ln[CO 2 ] → ∆T , (case study #4), where there is no longer good agreement between empirical and reconstructed cross-correlation, which provides additional support for the claim that the true causality link is A final observation in Figure 4 is the appearance of six peaks in 20 years, which may be interpreted as indicating a quasi-periodic behavior with an average period of 3.33 years, i.e., much different from the annual.However, this does not reflect periodicity but rather antipersistence imposed by the differencing operation, which necessarily results in some negative autocorrelations [23].
A more direct technique to deal with timescales is to average the time series on aggregate timescales and again investigate the causality relationships on these scales.This technique could detect whether a similar relationship holds for the larger timescales.In our case, since we are differencing the process, taking the average at a timescale k is equivalent to taking a difference for a time step k (notice that ( . Therefore, to increase the timescales it suffices to increase the time step of differencing.In Figure 2, this was 1 year.Now we increase the time step of differencing, replacing the 1-year step with 2, 4, 8 and 16 years.The results are shown in Figure 5 and in Table 1 (case studies #5 to #13).They are essentially the same, except that, as the differencing time step increases, the explained variance slightly decreases (from 0.34 down to 0.27) in the direction ∆T → ∆ln[CO 2 ] , and is again much higher than that in the direction ∆ln[CO 2 ] → ∆T .The time lags increase in the former direction and are again negative in the latter direction.
A characteristic pattern is that, as the time step increases, so does the rightmost ordinate of the IRF, g 20 .This behavior, i.e., the increasing limb of g j beyond some time lag, has been explained in [7] and is an artifact of an insufficient (small) window of time lag for determining IRF.Thus, it suggests that a higher J should be used.It is quite reasonable to expect that if the differencing time step increases, so should the window size do.In particular, it is interesting to observe that in the lowest panel of Figure 5, corresponding to a differencing time step of 16 years, while the time window of IRF is only 40 months (a little more than 3 years), the IRF is a monotonously increasing function.Apparently, this is an artifact due to the too small window of time lag.In such a case, it is also reasonable to expect some nonnegative estimates of IRF ordinates for negative lags, even if the system is unidirectionally causal.Indeed, this has appeared in the case of a differencing time step of 16 years, even though the lowest panel of Figure 5 shows the purely unidirectional version of the IRF.This may create some ambiguity in the identification of causality, which we analyze and resolve in the next section.A final observation in Figure 4 is the appearance of six peaks in 20 years, which may be interpreted as indicating a quasi-periodic behavior with an average period of 3.33 years, i.e., much different from the annual.However, this does not reflect periodicity but rather antipersistence imposed by the differencing operation, which necessarily results in some negative autocorrelations [23].
A more direct technique to deal with timescales is to average the time series on aggregate timescales and again investigate the causality relationships on these scales.This technique could detect whether a similar relationship holds for the larger timescales.In our case, since we are differencing the process, taking the average at a timescale k is equivalent to taking a difference for a time step k (notice that     ⋯     ).Therefore, to increase the timescales it suffices to increase the time step of differencing.In Figure 2, this was 1 year.Now we increase the time step of differencing, replacing the 1-year step with 2, 4, 8 and 16 years.The results are shown in Figure 5 and in Table 1 (case studies #5 to #13).They are essentially the same, except that, as the differencing time step increases, the explained variance slightly decreases (from 0.34 down to 0.27) in the direction Δ → Δln CO₂ , and is again much higher than that in the direction Δln CO₂ → Δ.The time lags increase in the former direction and are again negative in the latter direction.
A characteristic pattern is that, as the time step increases, so does the rightmost ordinate of the IRF,  .This behavior, i.e., the increasing limb of  beyond some time lag, has been explained in [7] and is an artifact of an insufficient (small) window of time lag for determining IRF.Thus, it suggests that a higher  should be used.It is quite reasonable to expect that if the differencing time step increases, so should the window size do.In particular, it is interesting to observe that in the lowest panel of Figure 5, corresponding to a differencing time step of 16 years, while the time window of IRF is only 40 months (a little more than 3 years), the IRF is a monotonously increasing function.Apparently, this is an artifact due to the too small window of time lag.In such a case, it is also reasonable to expect some nonnegative estimates of IRF ordinates for negative lags, even if the system is unidirectionally causal.Indeed, this has appeared in the case of a differencing time step of 16 years, even though the lowest panel of Figure 5 shows the purely unidirectional version of the IRF.This may create some ambiguity in the identification of causality, which we analyze and resolve in the next section.

Possible Ambiguities and Disambiguation
In some applications, the detection of the type of causality, unidirectional or HOE, is direct, but in other cases, it may be more difficult.This is illustrated in Figure 6.The left panel is equivalent to that of Figure 2 (left) but not neglecting some very small ordinates  that originally the algorithm produced for negative .The right panel is equivalent to that of Figure 5 (bottom left) but letting the optimization algorithm produce  for negative , which in this particular case seem not negligible.Both figure panels refer to the same processes with the differencing time step being 1 and 16 years for the left and right panels, respectively.For the 16-year step, in comparison to the IRF of Figure 5, which explains a fraction 0.31 of the variance, that of Figure 6 yields a slightly higher explained variance, 0.325, while it has some small weights at negative lags.Should we conclude then that it indicates a potential HOE, rather than unidirectional, causality?Even if our reply is affirmative, it is important to note that the characteristic lags are again positive, suggesting a principal direction Δ → Δln CO₂ .

Possible Ambiguities and Disambiguation
In some applications, the detection of the type of causality, unidirectional or HOE, is direct, but in other cases, it may be more difficult.This is illustrated in Figure 6.The left panel is equivalent to that of Figure 2 (left) but not neglecting some very small ordinates g j that originally the algorithm produced for negative j.The right panel is equivalent to that of Figure 5 (bottom left) but letting the optimization algorithm produce g j for negative j, which in this particular case seem not negligible.Both figure panels refer to the same processes with the differencing time step being 1 and 16 years for the left and right panels, respectively.For the 16-year step, in comparison to the IRF of Figure 5, which explains a fraction 0.31 of the variance, that of Figure 6 yields a slightly higher explained variance, 0.325, while it has some small weights at negative lags.Should we conclude then that it indicates a potential HOE, rather than unidirectional, causality?Even if our reply is affirmative, it is important to note that the characteristic lags are again positive, suggesting However, the reply is not necessarily affirmative.The explained variance of 0.31 is associated with 21 IRF ordinates, while that of 0.325 is associated with 41 IRF ordinates.Is it reasonable to accept 20 additional parameters for an increase in the explained variance of 0.015?Arguably, it is more reasonable in this case to change from a symmetric to a nonsymmetric window of time lag.Hence, we use a window of length 21 and slide it so that the lowest nonzero time lag vary from −20 to 0. Only the case where this is 0 denotes a potential unidirectional causality, where all 20 other cases correspond to potential HOE causality.The resulting IRFs are plotted in Figure 7, while the fraction of explained variance is plotted in Figure 8 as a function of the lowest nonzero time lag.It may be noticed that the curve for the lowest time lag 0 is different from Figure 5 (bottom left).In particular, the ordinate at 0 is higher in Figure 7, thus producing a concave shape of IFR.This is a result of the fact that the window length is fixed, while the lowest ordinate no longer contributes to the roughness (no second derivative can be determined for the lowest point and thus the algorithm can increase the ordinate at 0 at no cost).In turn, the explained variance further increased to 0.327.
Clearly, the result of this investigation is a unidirectional, rather than HOE, potential causality, as the explained variance reaches its maximum when the lowest j is 0. However, the reply is not necessarily affirmative.The explained variance of 0.31 is associated with 21 IRF ordinates, while that of 0.325 is associated with 41 IRF ordinates.Is it reasonable to accept 20 additional parameters for an increase in the explained variance of 0.015?Arguably, it is more reasonable in this case to change from a symmetric to a nonsymmetric window of time lag.Hence, we use a window of length 21 and slide it so that the lowest nonzero time lag vary from 20 to 0. Only the case where this is 0 denotes a potential unidirectional causality, where all 20 other cases correspond to potential HOE causality.The resulting IRFs are plotted in Figure 7, while the fraction of explained variance is plotted in Figure 8 as a function of the lowest nonzero time lag.It may be noticed that the curve for the lowest time lag 0 is different from Figure 5 (bottom left).In particular, the ordinate at 0 is higher in Figure 7, thus producing a concave shape of IFR.This is a result of the fact that the window length is fixed, while the lowest ordinate no longer contributes to the roughness (no second derivative can be determined for the lowest point and thus the algorithm can increase the ordinate at 0 at no cost).In turn, the explained variance further increased to 0.327.
Clearly, the result of this investigation is a unidirectional, rather than HOE, potential causality, as the explained variance reaches its maximum when the lowest  is 0.

Comparing Observational Data with Model Results
Investigation of causality in systems that can be isolated from the environmen generally based on experiments.These are typically performed in laboratories and suppose control actions (intervention) by the experimenter.

Comparing Observational Data with Model Results
Investigation of causality in systems that can be isolated from the environment is generally based on experiments.These are typically performed in laboratories and presuppose control actions (intervention) by the experimenter.In the absence of such intervention, it has long been regarded that we cannot say what causes what (Strotz and Wold, 1960 [24]).In complex systems, such as Earth's climatic system, experimentation is impossible.Yet it is a widespread belief that climate models are faithful representations of the climatic system and hence offer the possibility of so-called in silico experimentation (Hannart et al. [25]).Furthermore, it has been claimed (Hannart and Naveau [26]) that "in silico experimentation [is] the only option" and that "the increasing realism of climate system models renders such an in silico approach plausible".Such claims are epistemologically problematic.A hypothetical "causality" that is incorporated in any model, particularly of a complex system, is not necessarily identical to natural causality.In addition, the agreement of climate model outputs with reality has been questioned (e.g., [27][28][29][30][31]).
Our methodology can help with this epistemological problem in two ways.First, it provides a different option to test causality, showing that the so-called in silico experimentation is not the only option as claimed.Second, it can additionally test whether there is indeed realism in the representation of causality of the climatic system by the climate models.As already stated in the Introduction, our methodology, regardless of the detection of causality per se, can define a type of data analysis that could shed light on modeling performance by comparing observational data with model results.This is particularly useful in the case of climate modeling.In other words, it could help in verifying or falsifying the commonly accepted theory, which is incorporated in the climate models.
Specifically, we can test whether the climate model results are consistent with the findings of our T-[CO 2 ] causality analysis, which is based on measurements.To this aim, we use climate model outputs as specified in Section 3, in the case studies #16 to #23 detailed in Table 1.Numerical results of our analysis are shown in Table 1, and graphical depictions of IRFs are shown in Figure 9 for the cases in which no roughness constraint is used and in Figure 10 for the cases in which the roughness constraint is used.
Unfortunately, and unlike the [CO 2 ] time series of measurements, which are available on a monthly scale, the SSP2 [CO 2 ] data series is provided on an annual scale.Therefore, case studies #16 to #23 had to be made on an annual scale.If we did the analysis for the period 1958-2021, as in case studies #3 and #4 (NCEP/NCAR Reanalysis temperature at 2 m; and Mauna Loa [CO 2 ] time series), the annual data would be too few to support estimation of the IRFs (63 data values to estimate 41 coefficients).Therefore, in case studies #16 to #23 we extended the period back to 1850, which is covered by the climate model outputs.We performed separate analyses for the periods 1850-2100 (entire period covered by climate models) and 1850-2021 (only the past).
The results of these case studies allow us to make the following observations.

1.
There is no essential difference between the results for the periods 1850-2100 and 1850-2021.2.
While, as expected, the IRFs differ if they are calculated with or without constraining roughness, the characteristic lags are similar in the two cases (with the exception of h 1/2 in cases #17 and #21).

3.
In all cases, the lags are negative in the direction ∆T → ∆ln[CO 2 ] and positive in the direction ∆ln[CO 2 ] → ∆T , suggesting a HOE causality with principal direction Clearly, the finding in point 3, resulting from climate model outputs, is opposite to the results found when real measurements are used (NCEP/NCAR Reanalysis temperature and Mauna Loa [CO 2 ] time series).5.
Oddly, while the principal direction suggested by the models is ∆ln[CO 2 ] → ∆T , the explained variance is impressively low (10-15%) in this direction and impressively high (reaching 90%) in the opposite direction, ∆T → ∆ln[CO 2 ] .One may argue that the main result of this analysis, i.e., the point 4 above, may be affected by the difference in the study periods, i.e., 1958-2021 for the real measurements and 1850-2021 for the model outputs.To examine whether the origin of the different behavior is the time period or the system dynamics (actual vs. modeled), we performed an additional analysis, graphically depicted in Figures 11 and 12.
Sci 2023, 5, x FOR PEER REVIEW 15 of 36 1960 [24]).In complex systems, such as Earth's climatic system, experimentation is impossible.Yet it is a widespread belief that climate models are faithful representations of the climatic system and hence offer the possibility of so-called in silico experimentation (Hannart et al. [25]).Furthermore, it has been claimed (Hannart and Naveau [26]) that "in silico experimentation [is] the only option" and that "the increasing realism of climate system models renders such an in silico approach plausible".Such claims are epistemologically problematic.
A hypothetical "causality" that is incorporated in any model, particularly of a complex system, is not necessarily identical to natural causality.In addition, the agreement of climate model outputs with reality has been questioned (e.g., [27][28][29][30][31]).
Our methodology can help with this epistemological problem in two ways.First, it provides a different option to test causality, showing that the so-called in silico experimentation is not the only option as claimed.Second, it can additionally test whether there is indeed realism in the representation of causality of the climatic system by the climate models.As already stated in the Introduction, our methodology, regardless of the detection of causality per se, can define a type of data analysis that could shed light on modeling performance by comparing observational data with model results.This is particularly useful in the case of climate modeling.In other words, it could help in verifying or falsifying the commonly accepted theory, which is incorporated in the climate models.
Specifically, we can test whether the climate model results are consistent with the findings of our T-[CO2] causality analysis, which is based on measurements.To this aim, we use climate model outputs as specified in Section 3, in the case studies #16 to #23 detailed in Table 1.Numerical results of our analysis are shown in Table 1, and graphical depictions of IRFs are shown in Figure 9 for the cases in which no roughness constraint is used and in Figure 10 for the cases in which the roughness constraint is used.The results of these case studies allow us to make the following observations.
1.There is no essential difference between the results for the periods 1850-2100 and 1850-2021.2. While, as expected, the IRFs differ if they are calculated with or without constraining roughness, the characteristic lags are similar in the two cases (with the exception of ℎ / in cases #17 and #21).
3. In all cases, the lags are negative in the direction Δ → Δln CO₂ and positive in the direction Δln CO₂ → Δ , suggesting a HOE causality with principal direction Δln CO → Δ. 4. Clearly, the finding in point 3, resulting from climate model outputs, is opposite to the results found when real measurements are used (NCEP/NCAR Reanalysis temperature and Mauna Loa [CO2] time series).5. Oddly, while the principal direction suggested by the models is Δln CO₂ → Δ, the explained variance is impressively low (10-15%) in this direction and impressively high (reaching 90%) in the opposite direction, Δ → Δln CO₂ .
One may argue that the main result of this analysis, i.e., the point 4 above, may be affected by the difference in the study periods, i.e., 1958-2021 for the real measurements and 1850-2021 for the model outputs.To examine whether the origin of the different behavior is the time period or the system dynamics (actual vs. modeled), we performed an additional analysis, graphically depicted in Figures 11 and 12.
The upper left panel of Figure 11 is similar to that in Figure 4 (upper right), where, in addition, we have plotted the empirical cross-correlation function for the same data but averaged at the annual scale.This agrees relatively well with the empirical function at the monthly scale, which provides a basis for the comparisons that follow.
The empirical cross-correlation function at the monthly scale is copied in all other panels of Figure 11 and serves as a basis for the comparisons.In the upper right panel, where we replaced the Mauna Loa time series for [CO2] with the CMIP6 time series for the same period, 1958-2021, while keeping the NCEP/NCAR time series for T, there is still agreement of the annual with the monthly cross-correlations.However, when we also replace the NCEP/NCAR time series for T with the CMIP6 series (lower left plot), the two plots decouple.The decoupling is even more prominent if we go to the longer period 1850-2021 (lower right panel).
Similar observations can be made about autocorrelations in Figure 12.In particular, the CMIP6 autocorrelation of T decouples from the actual one for both periods 1958-2021 and 1850-2021, while the two latter do not differ substantially from each other.These observations allow us to assert that the main cause of disagreement has to do with problems with the modeling of system dynamics rather than the time period of study.The upper left panel of Figure 11 is similar to that in Figure 4 (upper right), where, in addition, we have plotted the empirical cross-correlation function for the same data but averaged at the annual scale.This agrees relatively well with the empirical function at the monthly scale, which provides a basis for the comparisons that follow.
The empirical cross-correlation function at the monthly scale is copied in all other panels of Figure 11 and serves as a basis for the comparisons.In the upper right panel, where we replaced the Mauna Loa time series for [CO 2 ] with the CMIP6 time series for the same period, 1958-2021, while keeping the NCEP/NCAR time series for T, there is still agreement of the annual with the monthly cross-correlations.However, when we also replace the NCEP/NCAR time series for T with the CMIP6 series (lower left plot), the two plots decouple.The decoupling is even more prominent if we go to the longer period 1850-2021 (lower right panel).
Similar observations can be made about autocorrelations in Figure 12.In particular, the CMIP6 autocorrelation of T decouples from the actual one for both periods 1958-2021 and 1850-2021, while the two latter do not differ substantially from each other.These observations allow us to assert that the main cause of disagreement has to do with problems with the modeling of system dynamics rather than the time period of study.

Discussion and Further Results
The mainstream assumption of the causality direction [CO 2 ] → T makes a compelling narrative, as everything is blamed on a single cause, the human CO 2 emissions.Indeed, this has been the popular narrative for decades.However, popularity does not necessarily mean correctness, and here we have provided strong arguments against this assumption.Since we have identified atmospheric temperature as the cause and atmospheric CO 2 concentration as the effect, one may be tempted to ask the question: What is the cause of the modern increase in temperature?Apparently, this question is much more difficult to reply to, as we can no longer attribute everything to any single agent.
We do not claim to have the answer to this question, whose study is far beyond the article's scope.Neither do we believe that mainstream climatic theory, which is focused upon human CO 2 emissions as the main cause and regards everything else as feedback of the single main cause, can explain what happened on Earth for 4.5 billion years of changing climate.
Nonetheless, as a side product, in the Appendices to the paper, we provide several indications of the following: 1.
The dependence of the carbon cycle on temperature is quite strong and indeed major increases of [CO 2 ] can emerge as a result of temperature rise.In other words, we show that the natural [CO 2 ] changes due to temperature rise are far larger (by a factor > 3) than human emissions (Appendix A.1).

2.
There are processes, such as the Earth's albedo (which is changing in time as any other characteristic of the climate system), the El Niño-Southern Oscillation (ENSO) and the ocean heat content in the upper layer (represented by the vertically averaged temperature in the layer 0-100 m), which are potential causes of the temperature increase, unlike what is observed with [CO 2 ], their changes precede those of temperature (Appendices A.2-A.4).

3.
On a large timescale, the analysis of paleoclimatic data supports the primacy of the causal direction T → [CO 2 ], even though some controversy remains about this issue (Appendix A.5).
In terms of the carbon cycle (point 1 above), several physical, chemical, biochemical and human processes are involved in it.The human CO 2 emissions due to the burning of fossil fuels have largely increased since the beginning of the industrial age.However, the global temperature increase began succeeding the Little Ice Period, at a time when human CO 2 emissions were very low.To cast light on the problem, we examine the issue of CO 2 emissions vs. atmospheric temperature further in the Supplementary Information, where we provide evidence that they are not correlated with each other.The outgassing from the sea is also highlighted sometimes in the literature among the climate-related mechanisms.On the other hand, the role of the biosphere and biochemical reactions is often downplayed, along with the existence of complex interactions and feedback.This role can be summarized in the following points, examined in detail and quantified in Appendix A.1.

•
Terrestrial and maritime respiration and decay are responsible for the vast majority of CO 2 emissions [32], Figure 5.12.

•
Overall, natural processes of the biosphere contribute 96% to the global carbon cycle, the rest, 4%, being human emissions (which were even lower in the past [33]).

•
The biosphere is more productive at higher temperatures, as the rates of biochemical reactions increase with temperature, which leads to increasing natural CO 2 emission [2].

•
Additionally, a higher atmospheric CO 2 concentration makes the biosphere more productive via the so-called carbon fertilization effect, thus resulting in greening of the Earth [34,35], i.e., amplification of the carbon cycle, to which humans also contribute through crops and land-use management [36].
In addition to the biosphere, there are other factors that drive the Earth's climate in periodic and non-periodic way.Orbital parameters of Earth's revolution change quasicyclically in a multi-millennial scale (variations in eccentricity, axial tilt, and precession of Earth's orbit), as interpreted by Milanković [37][38][39][40][41], and changes in the orbit geometry influence the amount of insolation.The non-periodic drivers of the Earth's climate variability include volcanic eruptions and collisions with large extraterrestrial objects, e.g., asteroids.An important climate driver is water in its three phases [33].Another apparent factor is solar activity (including solar cycles) and the solar radiation (im)balance on Earth (e.g., albedo changes; see [33] and Appendix A.2). Notably, a recent study [42], by assessing 20 years of direct observations of energy imbalance from Earth-orbiting satellites, showed that the global changes observed appear largely from reductions in the amount of sunlight scattered by Earth's atmosphere.
ENSO and ocean heating, both of which affect temperature, are examined in Appendices A.3 and A.4, respectively.The results of Appendices A.2-A.4 are summarized in the schematic of Figure 13.Changes in all three examined processes, albedo, ENSO and the upper ocean heat, precede in time the changes in temperature and even more so those in [CO 2 ].Generally, the time lags shown in Figure 13 complete a consistent picture of potential causality links among climate processes and always confirm the T → [CO 2 ] direction.
Sci 2023, 5, x FOR PEER REVIEW 20 of 36 In addition to the biosphere, there are other factors that drive the Earth's climate in periodic and non-periodic way.Orbital parameters of Earth's revolution change quasicyclically in a multi-millennial scale (variations in eccentricity, axial tilt, and precession of Earth's orbit), as interpreted by Milanković [37][38][39][40][41], and changes in the orbit geometry influence the amount of insolation.The non-periodic drivers of the Earth's climate variability include volcanic eruptions and collisions with large extraterrestrial objects, e.g., asteroids.An important climate driver is water in its three phases [33].Another apparent factor is solar activity (including solar cycles) and the solar radiation (im)balance on Earth (e.g., albedo changes; see [33] and Appendix A2).Notably, a recent study [42], by assessing 20 years of direct observations of energy imbalance from Earth-orbiting satellites, showed that the global changes observed appear largely from reductions in the amount of sunlight scattered by Earth's atmosphere.
ENSO and ocean heating, both of which affect temperature, are examined in Appendices A3 and A4, respectively.The results of Appendices A2-A4 are summarized in the schematic of Figure 13.Changes in all three examined processes, albedo, ENSO and the upper ocean heat, precede in time the changes in temperature and even more so those in [CO2].Generally, the time lags shown in Figure 13 complete a consistent picture of potential causality links among climate processes and always confirm the  → CO₂ direction.
The examined processes in the Appendices are internal to the climatic system.Other processes affecting T, not examined here, could also be external (e.g., solar and astronomical [43,44] and geological [45][46][47][48][49]).Generally, in complex systems, an identified causal link, even though it gives some explanation of a phenomenon, raises additional questions, e.g., what caused the change in the identified cause, etc.In turn, causal links in complex systems may form endless sequences.For this reason, it is naïve to expect complete answers to problems related to complex systems or to assume that a complex system is in permanent equilibrium and that an external agent is needed to "kick" it out of the equilibrium and produce change.Yet the investigation of a single causal link between two processes, as is the focus of this paper, provides useful information, with possible significant scientific, technical, practical, epistemological and philosophical implications.These are not covered in this paper.Readers interested in epistemological and philosophical aspects of causality are referred to Koutsoyiannis et al. [6], while those interested in the perennial changes in complex systems are referred to Koutsoyiannis [50,51].The examined processes in the Appendices are internal to the climatic system.Other processes affecting T, not examined here, could also be external (e.g., solar and astronomical [43,44] and geological [45][46][47][48][49]).Generally, in complex systems, an identified causal link, even though it gives some explanation of a phenomenon, raises additional questions, e.g., what caused the change in the identified cause, etc.In turn, causal links in complex systems may form endless sequences.For this reason, it is naïve to expect complete answers to problems related to complex systems or to assume that a complex system is in permanent equilibrium and that an external agent is needed to "kick" it out of the equilibrium and produce change.Yet the investigation of a single causal link between two processes, as is the focus of this paper, provides useful information, with possible significant scientific, technical, practical, epistemological and philosophical implications.These are not covered in this paper.Readers interested in epistemological and philosophical aspects of causality are referred to Koutsoyiannis et al. [6], while those interested in the perennial changes in complex systems are referred to Koutsoyiannis [50,51].
As already clarified, the scope of our work is not to provide detailed modeling of the processes studied but to check causality conditions.We highlight the fact that the relationship we established explains only about 1/3 of the actual variance of ∆ln[CO 2 ].This is not negligible for investigating causality, but also leaves a margin for many other climatic factors to act.
Nonetheless, our results can certainly be improved if we change our scope to more detailed modeling.To illustrate this, we provide the following toy model.Based on our result that the T-[CO 2 ] system is potentially causal with direction ∆T → ∆ln and we proceed a step further, assuming that the mean µ v also depends on past temperature, averaged at timescale m, i.e., where T m is the average temperature of the previous m years, and α and T 0 are constants (parameters).Such a simple linear relationship is supported by the above-listed points related to the productivity of the biosphere.Equation (9) will result in negative values µ v if T m < T 0 and positive otherwise.By re-evaluating the IRF coordinates g j simultaneously with the parameters of Equation (9), we find the modified version of the IRF plotted in Figure 14.The optimized additional parameters are m = 4 (years), α = 0.0034, T 0 = 285.84K. Similarly to [6], we used a common spreadsheet software solver to perform the optimization, adding the two parameters α and T 0 to the unknown coordinates g j of the IRF and performing the (nonlinear) optimization for all unknowns (m was found by trial-and-error).A graphical comparison of the actual ∆ln[CO 2 ] and [CO 2 ] with those simulated by the model of Equa- tions ( 8) and ( 9) is given in Figure 15.The explained variance for ∆ln[CO 2 ] was drastically increased from 34% to 55.5% and that for [CO 2 ] is an impressive 99.9%.
For the convenience of the readers who are interested in repeating the calculations, we also give a parametric expression of IRF and summarize the toy model as: g j = 0.00076 j 0.67 e −0.2j /K, µ v = 0.0034 (T 4 /K − 285.84) (10) (where K is the unit of kelvin).
We emphasize, however, that we do not exploit the impressive result of explained variance of 99.9% to assert that we have built a decent model, even though this toy model is both accurate (in the lower panel of Figure 15, the simulated curve is indistinguishable from the actual) and parsimonious (the model expression in (10) contains only 5 fitted parameters).We prefer to highlight the fact that the hugely complex climate system entails high uncertainty, and its study needs reliable data that provide the basis for the modeling and testing of hypotheses.8) and (9).

Conclusions
With reference to points 1-7 of the Introduction setting the paper's scope, the results of our analyses can be summarized as follows.

1.
All evidence resulting from the analyses of the longest available modern time series of atmospheric concentration of [CO 2 ] at Mauna Loa, Hawaii, along with that of globally averaged T, suggests a unidirectional, potentially causal link with T as the cause and [CO 2 ] as the effect.This direction of causality holds for the entire period covered by the observations (more than 60 years).

2.
Seasonality, as reflected in different phases of [CO 2 ] time series at different latitudes, does not play any role in potential causality, as confirmed by replacing the Mauna Loa [CO 2 ] time series with that in South Pole.

3.
The unidirectional T → ln[CO 2 ] potential causal link applies to all timescales resolved by the available data, from monthly to about two decades.4.
The proposed methodology is simple, flexible and effective in disambiguating cases where the type of causality, HOE or unidirectional, is not quite clear.5.
Furthermore, the methodology defines a type of data analysis that, regardless of the detection of causality per se, assesses modeling performance by comparing observational data with model results.In particular, the analysis of climate model outputs reveals a misrepresentation of the causal link by these models, which suggest a causality direction opposite to the one found when the real measurements are used.6.
Extensions of the scope of the methodology, i.e., from detecting possible causality to building a more detailed model of stochastic type, are possible, as illustrated by a toy model for the T-[CO 2 ] system, with explained variance of [CO 2 ] reaching an impressive 99.9%.7.
While some of the findings of this study seem counterintuitive or contrary to mainstream opinions, they are logically and computationally supported by arguments and calculations given in the Appendices.
Overall, the stochastic notion of a causal system, based on the concept of the impulse response function, proved to be very effective in studying demanding causality problems.A crucial characteristic of our methodology is its direct use of the data per se, in contrast with other methodologies that are based on uncertain estimates of autocorrelation functions or on the more uncertain tool of the power spectrum, i.e., the Fourier transform of the autocorrelation function.The methodology has the potential for further advances, as we first reported here (e.g., the asymmetric time lag window, the definition of a type of data analysis to be used in assessing modeling performance, and the extensions of its scope from detecting possible causality to building a more detailed model).

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/sci5030035/s1:Section SI1, Additional analysis of climate model behavior; Section SI2, On correlations of temperature with CO 2 emissions.Funding: This research received no external funding but was motivated by the scientific curiosity of the authors.
Institutional Review Board Statement: Not applicable.

Figure A1.
Annual carbon balance in the Earth's atmosphere in Gt C/year, based on the IPCC [32] estimates.The balance of 5.1 Gt C/year is the annual accumulation of carbon (in the form of CO2) in the atmosphere.
The greatest part of the inflows is due to the respiration of the biosphere, i.e., the biochemical reaction whereby living organisms convert organic matter (e.g., glucose) to CO2, releasing energy and consuming molecular oxygen [32].As seen in Figure A1 (and in several publications, e.g., [55]), respiration has increased in recent years, the main reason for this being the increased temperature.Photosynthesis, the biochemical process that removes CO2 from the atmosphere, producing carbohydrates in plants, algae and bacteria using the energy of light [32], has also increased, resulting in the greening of Earth [34][35][36] due to the increased atmospheric concentration of CO2, which is plants' food.
It is not difficult to quantify the increase in respiration due to the temperature rise.The mechanism has been known in chemistry for more than a century.The rate of a chemical reaction  at temperature T is an increasing function of T, given by the Arrhenius equation [56]: where A and a are free parameters and  * is the universal gas constant.Typically, the rate is measured in moles per unit volume, but it can readily be expressed in mass units.
Expressing the relationship at a reference temperature  and dividing with (A1), we obtain: and assuming that Δ/ is small (nb.,  is of the order of 300 K, while typical values of Δ is of the order of 1-10 K).We can neglect all terms beyond first order and find: The greatest part of the inflows is due to the respiration of the biosphere, i.e., the biochemical reaction whereby living organisms convert organic matter (e.g., glucose) to CO 2 , releasing energy and consuming molecular oxygen [32].As seen in Figure A1 (and in several publications, e.g., [55]), respiration has increased in recent years, the main reason for this being the increased temperature.Photosynthesis, the biochemical process that removes CO 2 from the atmosphere, producing carbohydrates in plants, algae and bacteria using the energy of light [32], has also increased, resulting in the greening of Earth [34][35][36] due to the increased atmospheric concentration of CO 2 , which is plants' food.
It is not difficult to quantify the increase in respiration due to the temperature rise.The mechanism has been known in chemistry for more than a century.The rate of a chemical reaction k T at temperature T is an increasing function of T, given by the Arrhenius equation [56]: where A and a are free parameters and R * is the universal gas constant.Typically, the rate is measured in moles per unit volume, but it can readily be expressed in mass units.
Expressing the relationship at a reference temperature T 0 and dividing with (A1), we obtain: Taking the logarithms and setting ∆T := T − T 0 we find ) and assuming that ∆T/T 0 is small (nb., T 0 is of the order of 300 K, while typical values of ∆T is of the order of 1-10 K).We can neglect all terms beyond first order and find: where Notice that both Q 1 and Q 10 are dimensionless numbers > 1.The exponential expression in which Q 10 is raised to power ∆T/10 is known as the Q 10 model [57].
The exponential increase of the process rate with temperature is a general chemical behavior, also extending to biochemical reactions.This is not a hypothesis or speculation but a proven fact that is widely used in engineering.For example, the metabolic rate in wastewater and sewer systems is expressed by the so-called effective BOD (EBOD, with BOD standing for biochemical oxygen demand).It has been known for more than 75 years that the metabolic rate increases with temperature, as microorganism activity generally increases with temperature.The relationship of EBOD with temperature has been expressed by Pomeroy and Bowlus [58] as [EBOD] = [BOD] (1.07) T−20 , which is similar to (A4), where the reference temperature is T 0 = 20 • C and Q 1 = 1.07 (Q 10 = 2.0).The latter relationship is the standard of engineering design in sewer systems.
To apply this framework to find the increase of respiration in the last 65-year period investigated in our study, we first retrieved the global temperature separately for land and sea from the NCEP/NCAR Reanalysis data set.These are shown on an annual timescale in Figure A2.The resulting linear trends, also shown in Figure A2, yield an increase ∆T = 1.69 • C for the terrestrial and 0.78 • C for the maritime part for the 65-year period.Now the literature gives representative average Q 10 values of 3.05 for terrestrial respiration [57] and 4.07 for maritime respiration [59].If R B and R E denote the respiration rate at the beginning and the end of the 65-year period, and ∆R := R E − R B , then according to (A4), and hence Notice that both  and  are dimensionless numbers > 1.The exponential expression in which  is raised to power Δ/10 is known as the  model [57].
The exponential increase of the process rate with temperature is a general chemical behavior, also extending to biochemical reactions.This is not a hypothesis or speculation but a proven fact that is widely used in engineering.For example, the metabolic rate in wastewater and sewer systems is expressed by the so-called effective BOD (EBOD, with BOD standing for biochemical oxygen demand).It has been known for more than 75 years that the metabolic rate increases with temperature, as microorganism activity generally increases with temperature.The relationship of EBOD with temperature has been expressed by Pomeroy and Bowlus [58] as EBOD BOD 1.07 , which is similar to (A4), where the reference temperature is  20 °C and  1.07 ( 2.0 .The latter relationship is the standard of engineering design in sewer systems. To apply this framework to find the increase of respiration in the last 65-year period investigated in our study, we first retrieved the global temperature separately for land and sea from the NCEP/NCAR Reanalysis data set.These are shown on an annual timescale in Figure A2.The resulting linear trends, also shown in Figure A2, yield an increase Δ 1.69 °C for the terrestrial and 0.78 °C for the maritime part for the 65-year period.For the above given values of Q 10 and ∆T, the expression in parentheses becomes 0.172 for the terrestrial part and 0.104 for the maritime part.Multiplying these by the R E values shown in Figure A1, i.e., 136.7 and 77.6 Gt C/year, respectively, we find ∆R = 23.5 and 8.1 Gt C/year, respectively, i.e., a total global increase in the respiration rate of ∆R = 31.6Gt C/year.This rate, which is a result of natural processes, is 3.4 times greater than the CO 2 emission by fossil fuel combustion (9.4 Gt C /year including cement production).

Appendix A.2. Investigation of Causality between Albedo and Atmospheric Temperature
There are several factors causing changes to the Earth's temperature.Solar radiation is a principal one, yet the changes in it have not been substantial at timescales of a few years.However, the Earth's response to solar radiation may change on such scales.Here, we investigate the changes of Earth's albedo.In the 21st century, the albedo at the top of the atmosphere (TOA) can be estimated from satellite data.Specifically, this can be done using the data from the ongoing project Clouds and the Earth's Radiant Energy System (CERES).This is part of NASA's Earth Observing System, designed to measure both solar-reflected and Earth-emitted radiation from the TOA to the Earth's surface.The data we used here are from the TERRA platform for the monthly timescale and are available online [60] for the period of March 2000 to date.The global TOA albedo was calculated as the ratio for each month of the globally integrated observed TOA shortwave flux (all-sky) over the globally observed TOA solar insolation flux.The resulting time series is shown in Figure A3.A falling linear trend of -0.0019/decade is also shown in the figure .A falling trend means that less solar radiation is reflected by the Earth, which may result in an increase in temperature.For the entire period, the decline of the albedo is about 0.004.As the average incoming solar radiation (insolation), according to the same data set, is 340 W/m 2 , this implies a difference (imbalance) of received energy by Earth of 0.004 × 340 = 1.4 W/m 2 .This result does not disagree with that of Hansen et al. [54], who found that in the period January 2015 through March 2022, the global absorbed solar energy is +1.01 W/m 2 higher than the mean for the first 10 years of data (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009).These figures are greater than the average imbalance (net absorbed energy) of the Earth, which, if calculated from the ocean heat content data, is about 0.4 W/m 2 [33].According to mainstream science, this imbalance is attributed to the increase of [CO 2 ], but the analyses in this study do not support this hypothesis.Moreover, it is hard to see how the albedo fall could be caused by increased [CO 2 ] (and for this reason, it is usually blamed on aerosols).However, the potential causal link of albedo (α) with atmospheric T can be more thoroughly investigated by the stochastic framework discussed here.The resulting characteristics are shown in Table A1 and the resulting IRFs are shown in Figure A4.Notice that, as an increase of α is expected to cause a decrease of T, we changed the sign in albedo However, the potential causal link of albedo (α) with atmospheric T can be more thoroughly investigated by the stochastic framework discussed here.The resulting characteristics are shown in Table A1 and the resulting IRFs are shown in Figure A4.Notice that, as an increase of α is expected to cause a decrease of T, we changed the sign in albedo differences (−∆α) so as to have a positive correlation with temperature differences (∆T).
The IRF determined suggests a potential HOE causation, with principal direction α → T and time lags of 1-3 months.However, the explained variance is small, 13%.However, the potential causal link of albedo (α) with atmospheric T can be more thoroughly investigated by the stochastic framework discussed here.The resulting characteristics are shown in Table A1 and the resulting IRFs are shown in Figure A4.Notice that, as an increase of α is expected to cause a decrease of T, we changed the sign in albedo differences ( Δ) so as to have a positive correlation with temperature differences (Δ).
The IRF determined suggests a potential HOE causation, with principal direction  →  and time lags of 1-3 months.However, the explained variance is small, 13%.A second process that is known to affect atmospheric temperature globally is the El Niño-Southern Oscillation (ENSO) (see [12,61] and a recent study by Kundzewicz et al. [62] for details).ENSO is associated with irregular variations in sea surface temperature and air pressure over the tropical Pacific Ocean.Several indices associated with ENSO are used in climatic studies, among which the most popular is US NOAA's Southern Oscillation Index (SOI), the time series of which is plotted in Figure A5.
Our stochastic methodology was previously applied with SOI along with satellite temperature data for the period 1979-2021 in [7].Here we repeat the investigation replacing the temperature data with those of the NCEP/NCAR reanalysis and expanding the data back to 1951 (the beginning of the availability of SOI data) and forth to 2022.We also examined the potential causality between SOI and [CO 2 ].In both cases, we tested differences with a time step of differencing of 1 year (thus reducing the effect of seasonality) and to make the correlation positive, we used −∆SOI (as in the albedo case).
ences with a time step of differencing of 1 year (thus reducing the effect of seasonality) and to make the correlation positive, we used ΔSOI (as in the albedo case).
The results are shown in Table A2, Figures A6 and A7.The principal directions are SOI →  and SOI → CO₂ .In the former case, the explained variance is 33%, and the causality type is HOE but very close to unidirectional with a time lag of 4 months.In the latter case, the explained variance is 30%, and the causality type is unidirectional with a lag of about a year.The results are shown in Table A2, Figures A6 and A7.The principal directions are SOI → T and SOI → [CO 2 ] .In the former case, the explained variance is 33%, and the causality type is HOE but very close to unidirectional with a time lag of 4 months.In the latter case, the explained variance is 30%, and the causality type is unidirectional with a lag of about a year.A third process examined in connection to both T and [CO 2 ] is the heat content of the upper-layer ocean.This is indirectly represented by data of the upper ocean mean temperature of the layer 0-100 m [63] (OMT0-100), also known as Vertically Averaged Temperature Anomaly (0-100 m layer) [64].These are based on historical ocean temperature data, bathythermograph data and Argo data [65] and are made available at the timescale of three months by the National Oceanographic Data Center of the US NOAA [64], as well as by the ClimExp platform, which we used to download the data.The time series, starting in 1955, is plotted in Figure A8.A third process examined in connection to both T and [CO2] is the heat content of the upper-layer ocean.This is indirectly represented by data of the upper ocean mean temperature of the layer 0-100 m [63] (OMT0-100), also known as Vertically Averaged Temperature Anomaly (0-100 m layer) [64].These are based on historical ocean temperature data, bathythermograph data and Argo data [65] and are made available at the timescale of three months by the National Oceanographic Data Center of the US NOAA [64], as well as by the ClimExp platform, which we used to download the data.The time series, starting in 1955, is plotted in Figure A8.The results of the stochastic analyses are shown in Table A3, Figures A9 and A10.The principal directions are OMT0-100 →  and OMT0-100 → CO₂ and the causality type is HOE.In the former case, the explained variance is substantial, 53%, and the time lag is 3-7 months, depending on the metric used (nb., the time lags in Table A3 are given in 3monthly steps).In the latter case, the explained variance is 35%, and the time lag is greater, 7-9 months.The results of the stochastic analyses are shown in Table A3, Figures A9 and A10.The principal directions are OMT0-100 → T and OMT0-100 → [CO 2 ] and the causality type is HOE.In the former case, the explained variance is substantial, 53%, and the time lag is 3-7 months, depending on the metric used (nb., the time lags in Table A3 are given in 3-monthly steps).In the latter case, the explained variance is 35%, and the time lag is greater, 7-9 months.Table A3.Summary indices of the case studies related to OMT0-100.Data are on a three-month timescale and the time step of differencing is 1 year; for explanation of symbols see Table 1.Table A3.Summary indices of the case studies related to OMT0-100.Data are on a three-month timescale and the time step of differencing is 1 year; for explanation of symbols see Table 1.

Figure 1 .
Figure 1.Explanatory sketch for the definition of the different potential causality types.For each graph, the mean  is also plotted with a dashed line.

Figure 1 .
Figure 1.Explanatory sketch for the definition of the different potential causality types.For each graph, the mean µ h is also plotted with a dashed line.

Figure 4 .
Figure 4. (Left column) Empirical autocorrelation function for the period 1958-2021 and for monthly timescale of (upper) the NCEP/NCAR Δ time series and (lower) the Δln CO₂ time series for Mauna Loa.(Right column) Empirical cross-correlation function of the two time series (continuous lines in blue), compared with those reconstructed from the IRF and the autocorrelation function on the left panel using the discretized version of Equation (7) (dashed line), for case studies (upper) #3 and (lower) #4.

Figure 4 .
Figure 4. (Left column) Empirical autocorrelation function for the period 1958-2021 and for monthly timescale of (upper) the NCEP/NCAR ∆T time series and (lower) the ∆ln[CO 2 ] time series for Mauna Loa.(Right column) Empirical cross-correlation function of the two time series (continuous lines in blue), compared with those reconstructed from the IRF and the autocorrelation function on the left panel using the discretized version of Equation (7) (dashed line), for case studies (upper) #3 and (lower) #4.

36 Figure 6 .
Figure 6.IRFs for temperature-CO2 concentration based on the NCEP/NCAR Reanalysis temperature at 2 m and Mauna Loa time series, respectively, also enabling negative lags (HOE) for causality direction Δ → Δln CO₂ and for differencing time step of 1 year (left, corresponding to Figure 2, left) and 16 years (right, corresponding to Figure 5, bottom left).

Figure 6 .Figure 7 .
Figure 6.IRFs for temperature-CO 2 concentration based on the NCEP/NCAR Reanalysis temperature at 2 m and Mauna Loa time series, respectively, also enabling negative lags (HOE) for causality direction ∆T → ∆ln[CO 2 ] and for differencing time step of 1 year (left, corresponding to Figure 2, left) and 16 years (right, corresponding to Figure 5, bottom left).Sci 2023, 5, x FOR PEER REVIEW 1

Figure 7 .
Figure 7. IRFs for temperature-CO 2 concentration based on the NCEP/NCAR Reanalysis temperature at 2 m and Mauna Loa time series, for 21 time lags, differencing time step of 16 years and direction ∆T → ∆ln[CO 2 ].The lowest nonzero lag of each IRF is marked at the upper-right end of its curve.

Figure 7 .
Figure 7. IRFs for temperature-CO2 concentration based on the NCEP/NCAR Reanalysis temp ture at 2 m and Mauna Loa time series, for 21 time lags, differencing time step of 16 years and rection Δ → Δln CO₂ .The lowest nonzero lag of each IRF is marked at the upper-right end o curve.

Figure 8 .
Figure 8. Explained variance and median of IRF as a function of the lowest nonzero lag of the in Figure 7 for the investigation of Section 7.

Figure 8 .
Figure 8. Explained variance and median of IRF as a function of the lowest nonzero lag of the IRFs in Figure 7 for the investigation of Section 7.

Figure 9 .Figure 9 .
Figure 9. IRFs for temperature-CO2 concentration based on the CMIP6 mean temperature and SSP2-4.5 CO2 time series, respectively, calculated without using the roughness constraint; upper row:

Figure 11 .Figure 11 .
Figure11.Empirical cross-correlation functions for monthly and annual timescales (continuous lines in blue without markers and red lines with circles, respectively) for the data sets indicated in each panel.In all panels, the plot for the monthly scale is that of the NCEP/NCAR data for T and

Figure 12 .Figure 12 .
Figure12.Empirical autocorrelation functions for monthly and annual timescales (continuous lines in blue without markers and red lines with circles, respectively) for the data sets indicated in each panel.In all panels, the plot for the monthly scale is that of the NCEP/NCAR data for T and Mauna Loa data for [CO2], for the period 1958-2021.

(Figure 13 .
Figure13.Schematic of the examined possible causal links in the climatic system, with noted types of potential causality, HOE or unidirectional, and its direction.Other processes, not examined here, could be internal of the climatic system or external.

Sci 2023, 5 , 36 Figure 14 .
Figure 14.Modified IRF for temperature-CO2 concentration based on the NCEP/NCAR Reanalysis temperature at 2 m and Mauna Loa time series, respectively, similar to Figure 2 but with IRF coordinates simultaneously optimized with the parameters of Equation (9).

Figure 14 . 36 Figure 14 .
Figure 14.Modified IRF for temperature-CO 2 concentration based on the NCEP/NCAR Reanalysis temperature at 2 m and Mauna Loa time series, respectively, similar to Figure 2 but with IRF coordinates simultaneously optimized with the parameters of Equation (9).

Figure A1 .
Figure A1.Annual carbon balance in the Earth's atmosphere in Gt C/year, based on the IPCC[32] estimates.The balance of 5.1 Gt C/year is the annual accumulation of carbon (in the form of CO 2 ) in the atmosphere.

Figure A2 .Figure A2 .
Figure A2.Evolution of global land (terrestrial) and sea (maritime) temperature at 2 m from the NCEP/NCAR Reanalysis data set, retrieved from the ClimExp platform, and resulting slopes of linear trends.Now the literature gives representative average  values of 3.05 for terrestrial respiration [57] and 4.07 for maritime respiration [59].If  and  denote the respiration

Figure A3 .
Figure A3.TOA albedo time series (continuous line), as provided by NASA's Clouds and the Earth's Radiant Energy System (CERES), along with linear trend (dashed line).

Figure A3 .
Figure A3.TOA albedo time series (continuous line), as provided by NASA's Clouds and the Earth's Radiant Energy System (CERES), along with linear trend (dashed line).

Figure A5 .
Figure A5.SOI time series (continuous line) along with rolling (right-aligned) 10-year average (dashed line).Negative and positive values indicate the El Niño and La Niña phases, respectively.

Figure A5 .
Figure A5.SOI time series (continuous line) along with rolling (right-aligned) 10-year average (dashed line).Negative and positive values indicate the El Niño and La Niña phases, respectively.

Sci 2023, 5 ,
x FOR PEER REVIEW 31 of 36 Appendix A4.Investigation of Causality between Ocean Heat Content, Atmospheric Temperature and CO2
as case studies #23 and #24 and are included in the table only for comparison.
a monthly scale, the SSP2 [CO2] data series is provided on an annual scale.Therefore, case studies #16 to #23 had to be made on an annual scale.If we did the analysis for the period 1958-2021, as in case studies #3 and #4 (NCEP/NCAR Reanalysis temperature at 2 m; and Mauna Loa [CO2] time series), the annual data would be too few to support estimation of the IRFs (63 data values to estimate 41 coefficients).Therefore, in case studies #16 to #23 we extended the period back to 1850, which is covered by the climate model outputs.We performed separate analyses for the periods 1850-2100 (entire period covered by climate models) and 1850-2021 (only the past).

Table A1 .
Summary indices of the case studies related to albedo.Data are on a monthly timescale and the time step of differencing is 1 year; for explanation of symbols see Table1.

Table A1 .
Summary indices of the case studies related to albedo.Data are on a monthly timescale and the time step of differencing is 1 year; for explanation of symbols see Table1.

Table A2 .
Summary indices of the case studies related to ENSO.Data are on a monthly timescale and the time step of differencing is 1 year; for explanation of symbols see Table1.

Table A2 .
Summary indices of the case studies related to ENSO.Data are on a monthly timescale and the time step of differencing is 1 year; for explanation of symbols see Table1.

Table A3 .
Summary indices of the case studies related to OMT0-100.Data are on a three-month timescale and the time step of differencing is 1 year; for explanation of symbols see Table1.