1. Introduction
The extent of the consensus among scientists on the anthropogenic origin of modern global warming has become a key issue in the “anthropogenic global warming” (AGW) debate (see, e.g., [
1] for an excellent introduction). While a consensus alone clearly does not serve as scientific proof or substantiate a specific scientific hypothesis, it is nonetheless influential in bolstering the reception of a particular thesis within the broader public sphere. This influence is amplified by the inherent trust that society places in scientists, who provide informed opinions grounded in empirical evidence. This has crucial importance in the context of the AGW debate. The well-defined scientific hypothesis that “humans, through the emission of CO
2, are responsible for most of the recent (since the mid-20th century) changes in global average temperature” (dubbed the AGW hypothesis hereafter) is typically linked with various, somewhat less quantifiable, statements, such as that humanity is facing an imminent climate crisis, and are followed by calls for action on individual, community, country, and global levels.
In order to quantify the consensus on the AGW hypothesis, Lynas and co-authors [
2] recently conducted a survey of the literature, reaching the titular conclusion that the consensus on the AGW hypothesis in the scientific literature exceeds 99%. Here, we demonstrate that this conclusion does not follow from their data. We highlight direct flaws and biases in their study and show that even with the data presented by Lynas et al., the consensus can be quantified in other ways which must be considered since they provide lower bounds for the consensus assessment. Then, by directly evaluating the so-called “skeptic” papers, e.g., papers in which the contents directly or indirectly oppose the AGW hypothesis or its implications, we show that many such papers would support the AGW hypothesis according to the methodology of Lynas et al. due to what we describe as the “mellow abstract bias”.
Before we describe the methodology of Lynas et al. and its apparent flaws, it is useful to recall what it is that we mean when we say “consensus”. The definition (from the Merriam-Webster English Dictionary) is a “general agreement” or “the judgment arrived at by most of those concerned”. As we will demonstrate, no claim for “consensus” can be made from the data presented in Lynas et al.
2. The Blurriness of the AGW Hypothesis
When endeavoring to evaluate a consensus concerning a certain hypothesis, precision in formulating the hypothesis is of the utmost importance. A scientific hypothesis should contain a direct, quantitative statement (for example, “the speed of light is constant, irrespective of the velocity of the light source”). Lynas et al. do not give such a quantification but rather, as their title suggests, their hypothesis is “the existence of human-caused climate change”. However, the effect of humans on climate, and specifically on global temperature, is quantifiable. For example, one quantification could be obtained by assessing the role of anthropogenic emissions of CO
2 on the equilibrium climate sensitivity [
3]. Then, a specific hypothesis would be “Humans are causing the majority (more than 50%) of the recent rise in global temperatures”, and support for such a specific hypothesis should be searched for in the literature. Conversely, one could write a hypothesis of the form, “Humans are contributing to some degree (more than 0%) to the recent rise in global temperatures”. Clearly, these two statements are very different, and one paper can implicitly support the latter but reject the former. Indeed, Bray [
1,
4] has already pointed out that consensus, in the context of the climate discussion, can mean different things and has different dimensions. We further stress that within each dimension, the specific consensus statement must be as specific, and quantitative, as possible.
By blurring the hypothesis, Lynas et al. leave room for a subjective decision on supporting some version of the hypothesis. On one hand, they rate papers as pro-consensus if they explicitly state that humans are the primary cause of global warming (“1”) or just refer to anthropogenic climate change as known fact (“2”). On the other hand, they conclude that the consensus is about “the principal role of greenhouse gases (GHG) emission from human activities” (the word “principal” interpreted as “most prominent factor”). As we shall show in the next sections, this choice of a blurred hypothesis may lead to a substantial bias.
3. The Lynas et al. Methodology and No-Position Papers
In order to establish whether a certain scientific paper supports the AGW hypothesis, Lynas et al. read through the abstracts and titles of 3000 papers and rated the papers in accordance with how AGW was referred to in the abstract (Table 2 in [
2]) in the following way: (1) explicit quantitative support of the AGW hypothesis, (2) explicit non-quantitative support, (3) implicit support, (4) no position or uncertainty, (5) an implicit rejection of the AGW hypothesis, (6) explicit rejection without quantification, and (7) an explicit quantitative rejection. Their results are summarized in
Table 1 below (Table 3 in [
2]).
It is clear that there are more papers which, based on their abstracts, support the AGW hypothesis than reject it. However, it is also clear that most papers do not take any position regarding the AGW hypothesis. Thus, the following question arises: how should these no-position papers be accounted for when attempting to quantify the consensus?
One possible route would be to discard them altogether. In this case, among the papers taking a position, there is a striking 892/896 ≃ 99.5% which support the AGW hypothesis. However, this was not the main result presented by Lynas et al., although they briefly mention this point. The reason seems clear: discarding the “neutral papers” would leave out of the survey the majority of the examined papers, thereby failing to provide an accurate representation of the true state within the studied community. In other words, leaving in the survey only abstracts which explicitly state agreement/disagreement with the “consensus statement” would generate a strong bias, and the goal (as stated by Lynas et al.) was to examine the level of consensus among the broader scientific literature.
Indeed, the central result of Lynas et al. (Section 3.1 in [
2]) claim 99.85% support the AGW hypothesis in the following way. First, out of the 3000 total papers, they subtracted 282 papers which were categorized as “non-climate-related papers”, leaving 2718 “climate-related” papers. Since there were only four papers rejecting the AGW hypothesis, they defined c = 1–4/2718 = 99.85%. This demonstrates a direct bias in the definition of the consensus, which can be described as “if you are not against, you are for”. In simple words: when counting the consensus this way, all papers which had no opinion regarding the AGW hypothesis (e.g., neutral papers) are counted as supporting.
The authors’ justification for such a broad definition of a consensus, as given in their paper, relies on the following argument: “…defines consensus as lack of objection to a prevailing position or worldview”. In other words, assuming the existence of a consensus and then using this assumption in order to strengthen it. However, the pre-assumption for the existence of a consensus is not substantiated, and specifically, this is inherently what Lynas et al. sought to demonstrate. Unlike the ”bootstrap” method [
5], here, the authors did not show that this self-consistent method is justified and accurately captures the original view of the different papers they reviewed.
Equivalently, one could conduct a similar calculation, counting “how many papers support the AGW hypothesis”. In this case, the answer would be 892/2718 = 32%, leading to the (somewhat unrealistic) result that 1–892/2718 = 67% reject the AGW hypothesis. Such a conclusion would be perfectly consistent with the methodology presented in Lynas et al., although it is quite clear that this is a biased result. A methodologically correct statement of their analysis should have been that the consensus regarding the AGW thesis is 32% < c < 99.8%. Of course, this is a theoretical limit that probably does not reflect the true level of consensus, which has been examined in other ways [
1,
4], and it is quite unlikely that all neutral papers oppose the consensus. However, at least from the methodological perspective, it is equivalently unlikely that all neutral papers support the consensus.
This inherent bias is not special to the work of Lynas et al., and in fact, many consensus studies take this route, as was discussed, e.g., by the authors of [
6,
7]. The justification for this methodology was that it is not reasonable that scientists working in a scientific field in which a certain paradigm prevails would disagree with this paradigm, and that disagreement with a prevailing paradigm must be expressed via an explicit rejection of the paradigm. First, we point again to the circular logic here: if studies of consensus aim at understanding if and to what extent there is a prevailing paradigm, they cannot use the prevalence of the paradigm as an a priori assumption, which leads to a direct bias. Second, as we show below, the data indicate that the situation may actually be the reverse—authors will tend not to explicitly (or even implicitly) reject a prevailing paradigm, or even a conceived paradigm, existing or not, in order to increase the probability of minimizing objection to their paper. We corroborate this idea with data in the next section.
4. The Rating of Papers and the “Mellow Abstract Bias”
The analysis above shows that the “no position” papers are extremely important in quantifying the consensus. Clearly, the correct way to establish whether “no position” papers support or reject the AGW hypothesis would be to read them thoroughly and rate them according to their conclusions. Such an analysis was not performed by Lynas et al.
Here, we take a slightly different, yet effective, route. In order to quantify the possibility that a paper which was rated as “no position” rejects (or supports) the AGW hypothesis, we proceeded with the following scheme. We chose 50 papers which are either known to have conclusions which reject the AGW hypothesis or are known to have been written by known (and publicly active) “climate skeptics’’ (e.g., scientists taking a public stand in the AGW debate in favor of rejecting the AGW hypothesis and/or its consequences, as described in the introduction). Each paper out of the 50 was read thoroughly and its conclusions were verified to indeed reject the AGW hypothesis, either qualitatively or quantitatively. Then, the papers were rated by two independent raters according to the methodology of Lynas et al. The full list of papers, their conclusions, and their Lynas ratings are provided in the
Supplementary Information, and readers are encouraged to browse through them and determine the authors’ stances on the hypothesis, preferably after reading the full texts.
The results are quite surprising: 54% of the papers examined scored between 3 and 4 in the Lynas rating system (see
Appendix A) and would thus qualify as supporting the AGW consensus, even if it is clear from the paper that they do not. While this does not mean that 54% of the papers rated by Lynas et al. as “no position” actually reject the AGW hypothesis, it does mean that there is some non-zero chance that a substantial portion of them do. How much? To determine this, one must go through each paper in the Lynas et al. data and establish its position regarding the AGW hypothesis. This should have been carried out, at least for a few samples, by Lynas et al., in order to substantiate their conclusions.
We find it useful to give a practical example for this case. In a recent paper [
8], Soon and co-authors examined the role of the urban heat island (UHI) effect as a possible source of bias in estimations of northern-hemisphere temperatures over the last century, showing that indeed the UHI effect may lead to a bias of ~40%. Importantly, the first three authors of the paper are very publicly out-spoken climate skeptics who have expressed their objections to the AGW hypothesis on many occasions, for instance, in a recent joint podcast appearance [
9]. Nonetheless, the abstract of the paper contains the conclusion that “the scientific community is not yet in a position to confidently establish whether the warming since 1850 is mostly human caused, mostly natural, or some combination”, which would, under Lynas et al.’s methodology, put their paper in the “uncertain” category and have it counted as supporting the consensus. Indeed, out of the 3000 papers surveyed by Lynas et al., 18 discussed the UHI effect in their abstracts, but only one of them was rated “2”, i.e., stating explicitly that humans are causing global warming, and four were rated “3“ (implicit endorsement). The vast majority (13 papers) were rated “4a”, which means that they have no position, but they were still counted as supporting the AGW hypothesis.
A natural question then arises: why do so many of the papers which reject the AGW hypothesis quite clearly in their text convey a “no-position” (or in some instances an opposite endorsing position) in the abstract? We believe that this is a result of what we refer to as the “mellow abstract bias”, which has to do with the way scientific abstracts are written. Typically, the abstract is the most widely read part of a scientific paper [
10], and the tendency of writers is to make them as general as possible. This is especially true in a field as socially significant (and highly debated, sometimes emotionally) as climate study. Therefore, it is only reasonable that authors whose results are not in line with (what they believed to be) the perceived wide-spread belief held by their peers or simply the general public would “tone down” the abstract, making it mellow in comparison to the main text. Put simply, there is no reason for an author whose results may disagree with the common belief, or the perception thereof, to put a “contrarian” statement in the abstract, given that some of the referees may develop a negative bias toward the paper at the abstract-reading stage.
This “mellow abstract bias”, which deserves a study of its own, could be considered a side-effect of the well-known “positive publication bias” (e.g., [
11]) where negative results tend to be published less than positive results. In the various fields of climate science, clearly AGW-hypothesis-rejecting results might be considered negative results. However, instead of not publishing these results, it is likely that many authors would tend to “mask” their negative results via a mellow abstract.
This bias has a well-studied counterpart in social science known as the “good subject” effect [
12]. The necessity of using human subjects in social or psychological experiments invoked the fact that many of them, by willingly agreeing to participate, are actually motivated by the noble cause of advancing the scientific community and are thus biased in favor of the experimenter. In other words, participants may respond by exhibiting behaviors designed to confirm the hypothesis, thereby serving as a “good subject” [
13]. We argue that a somewhat similar motivation, i.e., the need to corporate with a “good” cause (saving humanity from the effects of global warming), drives the authors to omit from their abstracts any controversial statements which are not directly related to the subject of the paper in hand, or in some cases, support the perceived consensus even if the scientific content of the paper does not.
We end this section by stressing again that our results, which show that there is a ~50% chance that a skeptic paper would pass as “neutral” (or even supporting) in the rating scheme used by Lynas et al., does not mean that 50% of the papers rated “neutral” be Lynas et al. in fact do not support the consensus. Rather, these results clarify that there is a real possibility of bias when using abstracts to determine the attitude of a particular paper (and more so its authors) toward the consensus statement. Probably the only way to overcome this bias is to scan the papers themselves (or at least a sample) and not the abstracts.
5. Inter-Rater Variability
Another fundamental issue which arises from Lynas et al. is the reliability of the rating process. Considering the aforementioned subjectivity of the criteria, agreement between independent raters is a possible proxy for inter-rater reliability. Unfortunately, the paper does not indicate the degree of agreement among different examiners, nor does it indicate how many examiners participated in the grading process. It is extremely difficult to adopt this data in the absence of this information, especially given that the measure itself is subjective. From a careful reading of Lynas et al., there is no way of knowing if this was indeed the protocol or if papers were cross-rated. However, since this point was not mentioned, a reasonable assumption is that abstracts were divided among the authors and each paper was examined by a single rater.
Here, we performed the following procedure: each paper from the papers rated in Sec. 4 was examined by two examiners independently. One can define various metrics for cross-rater variability. For example, a simple binary test (whether the examiners agree or not) shows that only 42% of the papers were rated the same scoring (from 1 to 7) by two independent examiners (see
Appendix A). A more sophisticated and well-suited tool for quantifying inter-rating correlation is the so-called Cohen’s kappa coefficient [
14,
15]. We found Cohen’s kappa coefficient to be κ = 0.19 (
p = 0.01), which indicates a very weak agreement. If we divide the scoring as binary, i.e., provide the score “pro-AGW + no-position” for the papers that scored 1–4 and “skeptic” to the rest, we obtain an agreement of 62%. In the binary case, Cohen’s kappa coefficient is κ = 0.11 (
p = 0.427), which indicates an even weaker agreement which has no statistical significance. These results point to the strong subjectivity of Lynas et al.’s rating methodology and place bounds on the certainty of their results. Put simply, when the classification criterion is as vague as the one suggested by Lynas et al., there is a large chance that two independent raters will score the same paper differently. The ratings subjectivity implied by inter-rater variability means that there is a potential for bias if the raters have a presupposition in favor of the consensus status of the AGW hypothesis. Of course, if all raters are subject to the same presupposition, then having two or more raters would not be useful in overcoming the bias. Thus, a true rating system must have two or more blind raters, who are not part of the author list, reading the full paper and determining its stance on the topic at hand. And even then, it might still be incorrect to project from the attitude of a paper (rated by external raters) to the opinions of all its authors.
6. Summary and Conclusions
We pointed out several flaws in the Lynas et al. manuscript claiming a consensus of >99% on the AGW hypothesis. These flaws can be summarized as (i) a blurred hypothesis, which allows for the subjective rating of support, (ii) a positive-consensus bias, which assumes that all no-position papers are in support of the hypothesis, (iii) the subjective rating of papers which we assess as “no position” to be included as “implicit support”. We demonstrated these flaws quantitatively, first by setting error bounds on the consensus using the data of Lynas et al. We then showed quantitatively that even “skeptic papers”, which clearly oppose the consensus, tend not to emphasize this point in the abstract (a phenomenon we called “the mellow abstract bias”), thus leading to a possible bias in support of the AGW hypothesis. It is important to point out that these flaws should be considered guidelines for future studies of consensus based on literature surveys.
We note here that ours is not the first criticism to appear on consensus studies regarding the AGW hypothesis (see, e.g., [
16,
17,
18,
19,
20,
21,
22,
23,
24] for some examples). The criticism raised here also applies in general to earlier consensus studies based on abstract scanning. It is regretful that some of the flaws which we raised here (and demonstrated quantitatively) were already raised by other authors pertaining to earlier attempts to quantify the AGW consensus and yet were not taken into consideration by Lynas et al.
We stress here that this work does not wish to discuss the level of the climate consensus nor to express support or objection to the claim of an existing climate consensus. Indeed, other indications for a “climate consensus” (in the form of, e.g., surveys and questionnaire [
1,
4]) have been published in the literature, and some degree of consensus seems to be plausible. However, the work of Lynas et al. (and of Cook at al. before them, who used similar methods [
22]) has attracted significant public attention, much beyond the academic scope. It is thus crucially important to understand the limitations of, and the good practices required of, these types of consensus studies.
The debate on how and how much humans are affecting the climate is important—it affects many aspects of modern life, and its conclusions will affect major parts of humanity. Clarifying what scientists think about it is indeed important from a societal point of view. However, claims for consensus should be made carefully; we need to understand exactly (and quantitatively) what is the statement that the scientific literature supports and eliminate possible biases and statistical errors in the quantification of the consensus. This matter is too important to be left blurry and subjective.