Having reviewed the ontological and epistemological principles of Bunge’s medical philosophy, which heavily rests on his general account of systemism, we are now going to formulate our major critique against Bunge’s account:
In the following three subsections we will justify our claims (C1) to (C3) in more detail.
4.1. Evidence and Confirmation as Separate Concepts
According to von Bertalanffy, who is cited in Bunge’s book as a pioneer of systemism, every living system is an open system [14
]. In system terms, open systems are characterized by steady interactions between the different components of the systems, the systems as a whole with their components and the systems and their components with other systems and their components in the environment. These interactions would give rise to events and hence causality, because interactions represent causal relationships between events; however, they are so complex and dynamic that causality cannot be considered as a constant conjunction of events. That would only be the case in closed systems, something that can be only be established artificially through particular experimental setups in some natural sciences such as physics. In natural open systems, however, causality arises from a tendency on behalf of the system to produce certain patterns or regularities under particular contexts [32
]. As a consequence, there are no universal regularities of the form “whenever event X, then event Y”, only what appear as such on average
and what Lawson has named “demi-regularities”. Conceptualizing humans as open systems, it becomes clear that the same medical intervention applied in different study settings not always leads to the same outcome since the outcome depends on the environment/context in which it occurs. It is crucial to point out that statistical methodologies not only presuppose, but only work in closed systems if the goal is to establish causality [33
]. One must therefore accept that variations in regularity are predicted to occur in experimental studies in biology, medicine, and sociology. This will require the utilization of probabilities:
Variations in regularity are generally specified probabilistically or stochastically, as random processes occurring in the ontic domain. Probability is a measure of the likelihood of an event occurring. The re-conceptualization of stochastic event regularities using the concepts of probability, might be styled ‘whenever event x, then on average event y’.
For realists, the truth of causal hypotheses cannot be established in an objective way through statistical data alone due to the unavoidable limitations of experiments conducted on biological open systems, or in other words, the impossibility to achieve complete closure of a biological open system in order to nail down the true causal effect of an intervention. It follows that the confirmation or disconfirmation of a hypothesis by statistical data is not about assigning (objective) truth values
as Bunge claims, but about raising or lowering an agent’s (subjective) belief in the truth
of the hypothesis. Once framed, a realist will seek to scrutinize a causal hypothesis in further tests which hopefully provide stronger and stronger confirmation of it [32
]. At the same time, the realist will consider different competing hypotheses/models about the data-generating causal processes that she attributes to different entities that are or may be real; the data may then decide between these hypotheses in an objective way.
We have developed two distinct Bayesian accounts to capture these two concepts about the testing of statistical hypotheses [34
]. The first is an account of belief/confirmation, the second of evidence. Many Bayesians interpret confirmation relations in various ways. For us, an account of confirmation explicates a relation, C(D,H,B) among data D, hypothesis H, and the agent’s background knowledge B. For Bayesians, degrees of belief need to be fine-grained. A satisfactory Bayesian account of confirmation, according to us, should be able to capture this notion of degree of belief. In formal terms:
D confirms H to some degree if and only if P(H|D) > P(H)
The posterior/prior probability of H could vary between 0 and 1. Confirmation becomes strong or weak depending on how great the difference is between the posterior probability, P(H|D), and the prior probability of the hypothesis, P(H). P(H|D) represents an agent’s degree of belief in the hypothesis after the data are accumulated.7
P(H) stands for an agent’s degree of belief in the hypothesis before the data for the hypothesis have been acquired. The likelihood function, P(D|H), provides an answer to the question “how likely are the data given the hypothesis”? P(D) is the marginal probability of the data averaged over the hypothesis being true or false. The relationships between these terms, P(H|D), P(H), and P(D|H), and P(D) are succinctly captured in Bayes’ theorem:
P(H|D) = P(D|H) × P(H)/P(D) > 0.
While this account of confirmation is concerned with belief in the truth of a single hypothesis, our account of evidence compares the merits of two hypotheses, H1 and H2 (which could be ¬H1) relative to the data D, auxiliaries A, and background information B. We conceive the evidence of one hypothesis versus the other as an objective function of the data generating process8
, which takes place via observed or unobserved mechanisms within the system under study. This is also how Bandyopadhyay, Brittan, and Taper interpret the likelihood that determines the evidence [35
] (p. 30):
It is natural to assume that the “propensity” of a model to generate a particular sort or set of data represents a causal tendency on the part of natural objects being modeled to have particular properties or behavioral patterns and this tendency or “causal power” is both represented and explained by a corresponding hypothesis.
As such, evidence provides the link between “the Real” about which we construct hypotheses and “the Empirical” which we observe as patterns or regularities.9
Our concept of evidence is therefore consistent with a realist-systemic ontology. Note that this concept also fulfills Bunge’s postulate (S5) by explicitly taking background knowledge into account. Such background knowledge and auxiliaries allow deriving evidence through a variety of methodologies, as long as the data are relevant to an aspect of the hypotheses being compared. For example, observing a high correlation between treatment X and effect Y in a RCT may in theory
provide the strongest evidence for the claim that X causes Y when the alternative is that X is no direct cause of Y, but X and Y are correlated because both are caused by some third (confounding) factor. In contrast, a single case report of a patient taking a drug and developing a serious side effect together with background knowledge about the biological actions of the drug may provide strong evidence for the hypothesis that the drug is harmful in particular contexts.10
Finally, preclinical in vitro and in vivo studies may provide strong evidence in favor of a particular mechanism underlying an observed correlation between treatment and outcome.
Because evidence is not a belief relation, but a likelihood ratio, it need not satisfy the probability calculus. The data D constitute evidence for H1&A1&B against H2&A2&B if and only if
[P(D|H1,A1&B)/P(D|H2,A2&B)] > 1.
Bayesians use the Bayes factor (BF) to make this comparison, while others use the likelihood ratio (LR) or other functions designed to measure evidence. For simple statistical hypotheses with no free parameters, the Bayes factor and the likelihood ratio are identical, and capture the bare essentials of an account of evidence without any appeal to prior probability. However, the LR becomes an inadequate measure of evidence whenever there are free parameters to estimate; the greater the number of parameters, the more biased the LR becomes. This is what information criteria such as AIC or BIC try to account for [39
]. For hypotheses under which there are unknown parameters θ, the densities11
P(D|H,A&B) are obtained by integrating over the parameter space, so that [42
P(D|H,A&B) = ∫P(D|θ,H,A&B)π(θ|H,A&B)dθ.
An immediate corollary of the evidential condition (E) is that there is equal evidential support for both hypotheses only when BF = 1 (or LR = 1). The numerical value of the BF or LR which distinguishes weak from strong evidence for H1 versus H2 is determined contextually and may vary depending on the nature of the problem. It also follows that evidence is accompanied by confirmation and vice versa in the special case that two hypotheses are mutually exclusive and jointly exhaustive. In this case, if the data provide evidential support for H against ¬H, i.e., P(D|H) > P(D|¬H), then it follows from Bayes’ theorem that P(H|D) > P(H). However, even in this case, a hypothesis for which the evidence is very strong may not be very well confirmed while a claim that is very well confirmed may have no more than weak evidence going for it [35
] (p. 38). Finally, we note that in most scientific studies, no precise quantitative determination of likelihoods, priors, and posteriors of hypotheses might be possible. Even then our concepts remain useful for making qualitative or comparative statements about hypotheses. For example, a qualitative evidential statement may be “the data provide more/equal/less evidence for H1 compared to H2”; a comparative statement relating to confirmation may be “H1 is better confirmed/equally confirmed/less confirmed by the data than H2”.12
We will now demonstrate the usefulness of our confirmation/evidence distinction using an example provided by Bunge himself, the purpose of which was supposed to reject Bayesianism as unreasonable [1
It is well known that HIV infection is a necessary cause of AIDS: no HIV, no AIDS. In other words, having AIDS implies having HIV, though not the converse. Suppose now that a given individual b has been proved to be HIV-positive. A Bayesian will ask what is the probability that b has or will eventually develop AIDS. To answer this question, the Bayesian assumes that the Bayes’ theorem applies, and writes down this formula: P(AIDS|HIV) = P(HIV|AIDS). P(AIDS)/P(HIV), where an expression of the form P(A) means the absolute (or prior) probability of A in the given population, whereas P(A|B) is read (or interpreted) as “the conditional probability of A given (or assuming) B.”
If the lab analysis shows that b carries the HIV, the Bayesian will set P(HIV) = 1. And, since all AIDS patients are HIV carriers, he will also set P(HIV|AIDS) = 1. Substituting these values into Bayes’ formula yields P(AIDS|HIV) = P(AIDS). But this result is false, since there are persons with HIV but no AIDS. What is the source of this error? It comes from assuming tacitly that carrying HIV and suffering from AIDS are random facts, hence subject to probability theory. The HIV-AIDS connection is causal, not casual; HIV infection is only a necessary cause of AIDS. In conclusion, contrary to what Bayesians (and rational-choice theorists) assume, it is wrong to assign probabilities to all facts. Only random facts, as well as facts picked at random, have probabilities.
The example is supposed to show a paradox arising from Bayesian reasoning. The paradox is that a positive HIV test result provides no confirmation for the hypothesis AIDS, i.e., that b has or will develop AIDS, since the posterior probability of AIDS after obtaining a positive test result is the same as its prior probability. However, the paradox only arises because Bunge is wrong in two assumptions: First, that a positive test result is “true”, so that P(HIV) = 1, and second that the test has perfect sensitivity, so that P(HIV|AIDS) = 1. Both assumptions are at odds with realistic assumptions about tests on open systems which are never perfect. Regarding Bunge’s first assumption, he mistakenly identifies “knowing or observing the data” with “the probability of the data” [35
] (p. 137). For a Bayesian realist, the positive test result is the realization of some data generating mechanisms (in this case, mechanisms of the disease AIDS) modelled by a binary random variable taking on the value of either 0 or 1, so that the correct way of writing P(HIV) is
P(HIV=1) = ∑P(HIV = 1|Hi)P(Hi) = P(HIV = 1|AIDS)P(AIDS) + P(HIV = 1|¬AIDS)P(¬AIDS).
This expression includes both the true positive rate (sensitivity) and false positive rate (1- specificity) none of which are exactly 100% or 0%, respectively, in medical tests. In this specific example, the assumptions P(HIV|AIDS) ≈ 1 and P(HIV|¬AIDS) ≈ 0 can indeed be justified based on generally very high sensitivity and specificity of HIV tests (although there is clear variation in these test performances across different settings [44
], emphasizing the importance of environment/context). However, the prior probability of b having or not having AIDS before the test result is known is also important. In general, observing HIV is the case does not therefore imply P(HIV) = 1.
On our account, a positive test indeed provides strong evidence for the AIDS hypothesis because P(HIV|AIDS)
P(HIV|¬AIDS). In accordance with our intuition, this does not depend on our prior beliefs about the person having or not having AIDS in the first place. What we should believe about b having AIDS after the positive test result has been obtained is however a different question, and again in accordance with our intuition, the answer to this question should now depend on the context, e.g., what we know about the individual and its social relationships. Bunge is not able to capture these intuitions. On his account, solving the inverse problem of going from the results of a medical test or some sign S of a disease D to the precise diagnosis of D can only be achieved if there is a single mechanism M that when conjoined with S is necessary and sufficient for D to occur. In the AIDS example above, his reasoning goes as follows: AIDS occurs
⇔ HIV infection & slow immune reaction
, where slow immune reaction describes the mechanism by which HIV leads to immune system failure. More generally, his reasoning is (p. 88):
For all x: (Dx ⇔ Mx) & For all x: (Mx ⇔ Sx) ∴ For all x: (Dx ⇔ Sx)
Bunge’s solution presupposes that the mechanisms causally linking the signs and the disease always operate the same way, regardless of the context. In other words, he presupposes a closed system, which is not even approximately the case given that medical tests have sensitivities and/or specificities varying across contexts and often less than 100%.
Bunge simply fails to realize that medical (as well as biological and social) observations are never “facts” because we deal with open systems and hence uncertain inferences.
4.2. RCTs and the Truth Claim
We are now investigating Bunge’s claim that RCTs are necessary to infer the truth of a causal hypothesis in more detail. To this aim, it is helpful to first review some methodological principles of RCTs. A good overview has recently been provided by Deaton and Cartwright [27
], and we follow their account to a large extent. Without loss of generality we assume that the medical hypothesis to be tested in a RCT is a proposition of the form “treatment T is effective” the truth of which is typically assessed by measuring some particular outcomes in the randomly allocated treatment and control groups. To measure the truth of a medical hypothesis then means to measure the true average treatment effect (ATE) of the intervention, where the ATE is the difference between the average outcome in the treatment group and the average outcome in the control group.13
Assuming a linear causal model for the individual treatment effects one could write for an individual outcome
Here, is the outcome for patient , is a treatment indicator ( if treatment, if control), the individual treatment effect for patient , and the ’s are observed or unobserved other linear causes of the outcome. By averaging the effects in both treatment (T) and control (C) group and subtracting the means one obtains an estimate for the ATE
The major interest in conducting a RCT is on
which is the true ATE in case that the averages of the other causes are exactly balanced between both groups. Bunge claims that the aim of randomization is to bring the error term on the right-hand side of the (ATE) equation as close to zero as possible. However, this is not what any RCT can guarantee [24
]. What randomization actually does is guaranteeing that the error term is zero only in expectation.
The expectation refers to an infinite number of repeated randomizations of the trial sample into treatment and control group—for an individual randomization the estimated ATE can be arbitrarily far away from the true ATE. Repeating the trial and estimating the ATE many times allows one to estimate a mean ATE and its standard error—this is the true benefit of randomization. Contrary to what Bunge claims, therefore, randomization will not guarantee that an individual RCT will provide us with an estimate of the ATE that is close to the truth. Instead, if there is background knowledge about the main other causes of the outcome one would be better of matching patients according to these other causes without randomization. But background knowledge is exactly what is omitted by Bunge when he proposes RCTs as the gold standard of clinical trials. Deaton and Cartwright put it this way [27
The gold standard or “truth” view does harm when it undermines the obligation of science to reconcile RCTs results with other evidence in a process of cumulative understanding.
The conception of patients as open systems forces us to accept that we can never infer the true
effect of a treatment through RCTs, even if we would be able to repeat one and the same trial an infinite number of times. The reason is that in each repetition, some changes in the environment or context in which the RCT is conducted are unavoidable. The best we can therefore do is to seek higher and higher confirmation for our hypotheses and determine their evidence against realistic competing hypotheses. But these goals are achievable by collecting relevant data across a variety of study types of both statistical and mechanistic character. A famous example is the establishment of the hypothesis that smoking causes lung cancer that was based on observational and laboratory data, but not on RCTs. Surprisingly, Bunge himself has used this example in one of his previous papers [16
For example, since the mid-20th century, it has been known that lung cancer and smoking are strongly correlated, but only laboratory experiments on the action of nicotine and tar on living tissue have succeeded in testing (and confirming) the hypothesis that there is a definite causal link underneath the statistical correlation: we now know definitely that smoking may cause lung cancer.
Note that for Bunge it was knowledge of the mechanisms that confirmed the hypothesis that smoking causes lung cancer. However, on our account, the strong correlational data between smoking and lung cancer on its own provided a strong degree of confirmation for the hypothesis that smoking causes lung cancer. Knowing the mechanism regarding how smoking may cause lung cancer provided an additional, independent confirmation of this hypothesis so that the total confirmation became higher than with either the statistical or mechanistic data alone.14
However, causation was not established by the observational studies alone since the data they provided for a direct causal relationship between smoking and lung cancer was interpreted as not providing strong enough evidence compared to alternatives such as a “smoking gene” increasing both the tendency to smoke and to develop lung cancer.15
Only by knowing the carcinogenic mechanisms of tar and nicotine directly linking smoking and lung tumorigenesis was the observed correlation to be interpreted as strong data that smoking directly causes lung cancer instead of both being due to some third factor. Given our likelihood-based account of evidence, which must be comparative, we have three possible ways to compare two hypotheses. One could be a comparison between a causal hypothesis and a statistical hypothesis. The second could be a comparison between a causal hypothesis with another causal hypothesis. The third and final one could be between a statistical and another statistical hypothesis. The current scenario is concerned with case one in which a comparison is made between a causal hypothesis and a non-causal statistical hypothesis. Given the accumulated data regarding several observational studies on the proportion of tar in tobacco, the hypothesis that smoking causes cancer was supported more strongly than the hypothesis that smoking and cancer were merely correlated without implying any causal connection between them. This evidential relationship between the hypotheses, smoking causes cancer, and smoking and cancer are statistically related, given the data, holds independent of what an agent believes about those hypotheses and data. Therefore, from the basis of our account of evidence, we could say that data provide strong evidential support for the hypothesis that smoking causes cancer as against its alternative. So, from the perspectives of both accounts of evidence and confirmation, the hypothesis “smoking causes cancer” is more evidentially supported by data than its alternative hypothesis as well as it is strongly confirmed. This both-way vindication is possible because of the theorem: If two hypotheses are mutually exclusive and jointly exhaustive as well as simple statistical hypotheses, then data will provide evidential support for a hypothesis over its alternative if and only if data will confirm the hypothesis to some degree:
[Pr(D|H)/Pr(D|¬H)] > 1 iff Pr(H|D) > Pr(H).
Another example, also mentioned by Bunge himself [1
] (p. 147), is appendectomy to treat appendicitis. In this case, conducting a RCT with a control group receiving sham operation would not only be unethical, but also not necessary to highly confirm the hypothesis that appendectomy is an effective treatment. The reason is that the mechanism (infection of the appendix) and the way to shut it down (by removing the appendix) are very well known. This example is noteworthy since it could serve to illustrate that a causal hypothesis may be established by knowledge of mechanisms alone without the necessity for statistical data.16
A third example is the treatment of the rare disease glucose transporter 1 (GLUT1) deficiency syndrome through prescribing a high-fat, low-carbohydrate ketogenic diet: despite only “low level” clinical evidence available, a recent consensus guideline recommends ketogenic diets as the treatment of choice for GLUT1 deficiency syndrome mainly based on the physiological mechanism that ketone bodies are able to cross the blood–brain barrier independent from GLUT1, providing an alternative fuel for the brain instead of glucose [51
As these examples show, RCTs are neither necessary nor sufficient for determining whether some factors cause a disease or an intervention is effective. Rather, causal claims may be established based on a variety of data from mechanistic, observational, and other study types that conventionally sit below RCTs in the “evidence hierarchy”. We note that the difference between mechanistic and statistical (or probabilistic) data is not one between qualitative and quantitative data, nor one between observations stemming from laboratory versus clinical studies. In fact, mechanistic hypothesis may be framed as statistical models and applied to clinical data. For example, in radiotherapy, mathematical models describing the mechanisms of cell killing through DNA damage caused by ionizing radiation are frequently utilized. They may be used clinically to convert between different fractionation schemes having the same biological effect or for predicting radiotherapy outcomes such as tumor control and normal tissue complication probability [52
]. In our interpretation, the main distinction between mechanistic and statistical data is that the former can explain why
the latter are observed. In the radiotherapy example, the mechanism itself is stochastic as it describes the killing of cells which obeys statistical laws; however, it also explains why
higher radiation doses result in higher probability of tumor control, and even allows the derivation of the mathematical form of the dose-response relationship [52
Each causal claim has both probabilistic and mechanistic consequences that may be observed or not. Therefore, either mechanistic or statistical data are able to confirm a causal claim to some degree, whereas data from both sources provide even stronger confirmation according to the “variety-of-evidence thesis” [46
]. Mechanisms can also serve as background knowledge to increase the evidence of a causal hypothesis over a merely statistical one; this was the case in the smoking and lung cancer example. Finally, the optimal methodology for establishing a causal claim may depend on the exact type of hypothesis posed, e.g., the claim that an intervention worked in some setting versus the claim that it works for a particular patient, or a claim about a harmful effect [28
4.3. RCTs and Background Knowledge
For any researcher, prior or background knowledge plays a crucial role for the evaluation of causal hypotheses. This is naturally captured in our account of evidence and confirmation described in Section 4.1
, but not in Bunge’s account relying on RCTs as necessary methods for hypothesis testing. Andrew Gelman [54
] has pointed out that using tools such as randomization and p-values to enforce scientific rigor misses the most important point of causal inferences, which is interpreting and understanding the results within the context of background knowledge. RCTs are not designed to rely on background knowledge which “is an advantage when persuading distrustful audiences, but it is a disadvantage for cumulative scientific progress, where prior knowledge should be built upon, not discarded” [27
]. Thus, demanding the conduct of a RCT as a necessary condition for confirming a causal hypothesis, as Bunge does, violates his own postulates (S4) and (S5), because it discourages grouping hypotheses into medical theories and makes only minimal assumptions about the validity of other benchmark items, i.e., background knowledge. Judea Pearl has emphasized that if an agent is able to use her background knowledge in order to frame a causal model of reality, RCTs are no longer the only means to estimate the effect of interventions. In this case observational studies can do just as good with two additional advantages: their conduction is often more practical, and they study populations in their natural environment, instead of an artificial environment created by experimental protocols [55
]. It is also well known that RCTs are not immune to bias, so that poorly designed RCTs may provide less certain results than well designed observational studies [27
]. Background knowledge of the structure and mechanisms in the system under study is also important for meaningfully interpreting RCT results, or generally results from any statistical study. Deaton and Cartwright illustrate this using Bertrand Russel’s famous chicken example [27
] (p. 11):
The bird infers, on repeated evidence, that when the farmer comes in the morning, he feeds her. The inference serves her well until Christmas morning, when he wrings her neck and serves her for dinner. Though this chicken did not base her inference on an RCT, had we constructed one for her, we would have obtained the same result that she did. Her problem was not her methodology, but rather that she did not understand the social and economic structure that gave rise to the causal relations that she observed.
The importance of taking background knowledge into account also arises each time that results are discordant either between individual RCTs or between a RCT and another study type. From a purely empiricist standpoint ignoring what we know about the interplay between an intervention, the context under which it is applied and its mechanisms, such discrepancies are usually explained by invoking certain quality criteria based on design and statistical arguments [56
]. For example, if there is discrepancy between RTCs and observational studies, results from the former are usually taken to “override” results from the latter. However, acknowledging that observing an intervention effect presupposes some mechanisms at work whose activation in turn may be context-dependent opens up much more possibilities for interpreting negative study results or discrepancies between study types. In particular, studies investigating an intervention may vary in context, in the mechanism that is exerted, or in both simultaneously [56
]. An example for the latter situation is the supplementation of antioxidant vitamins for preventing cardiovascular disease which have been declared ineffective based on mostly negative findings in RCTs, although evidence from observational studies showed preventive effects. Connelly [56
] emphasizes the realist standpoint that different antioxidant vitamins may act via different mechanisms that in turn may depend on the age of an individual and whose effects may only be observed over much longer follow-up periods than usually used in RCTs. He concludes:
It seems that an alternative realistic perspective on this question [whether antioxidant vitamins can prevent cardiovascular disease] is again ignored in favour of what purports to be an unassailable scientific observation of the results from RCTs. Here, once more, the effect of ignoring differences in mechanisms and contexts may be to close down research in this area prematurely.
By claiming that only RCTs can establish or not establish the efficacy of medical interventions, both Bunge and EBM discourage realist thinking about mechanisms and contexts whenever RCT results are available. As the antioxidant example shows, such thinking may preclude scientific progress, especially when interventions and effects are related via complex mechanisms. This is despite EBM explicitly stating the importance of evaluating “the totality of evidence” as one of its epistemological principles [19
]. Maybe this is also the reason why Bunge downgrades complementary and alternative medicine (CAM) as “unscientific”, because by its very nature CAM works with complex interventions differently from simple drug administrations for which an evidence hierarchy with RCTs on top appears inadequate [29
]. While for Bunge mechanisms are essential for understanding empirical phenomena through what he calls mechanismic
], he restricts their main epistemological role within the context of scientific medicine to “boosting” the confirmation of causal hypotheses provided by RCTs – in fact, no mention is made in his book how mechanisms may be used in conjunction with methodologies other than RCTs. In our opinion, this underestimates the role mechanisms should play in medical hypothesis testing and treatment design. As the examples of appendectomy and GLUT1 deficiency syndrome given in Section 4.2
show there are situations where data of mechanisms become equally or more important than statistical data for establishing a causal claim of treatment efficacy.
As another example, consider the establishment of a causal relationship between benzo[a]pyrene exposure and carcinogenesis in humans despite a lack of clinical data. Wilde and Parkkinen [57
] have argued that one is justified in believing that benzo[a]pyrene causes cancer in humans because animal studies have provided evidence for both the robustness of the causal association and the mechanisms at work. In this example, knowledge of mechanisms provides the basis for the extrapolation of study results across different contexts, or more generally for building a causal theory that can then be applied to varying contexts. Such a theory resting on mechanismic
explanation is much more flexible than individual hypotheses. In particular, predictions can be made from one context to another. At the same time, once particular mechanisms have confirmed a causal theory, “we should attempt to eliminate alternative explanations by testing the potential effects of these mechanisms, particularly in contexts other than the one where the theory was created” [58
]. In other words, we should try to establish evidence that the mechanisms that are part of our theory are at work across a variety of contexts. To this aim, even case studies are valuable tools because they usually study individuals within more natural environments different from the ones artificially created by clinical study protocols and at the same time perform more thorough measurements than epidemiological studies.