Next Article in Journal
How to Achieve Compliance with GDPR Article 17 in a Hybrid Cloud Environment
Previous Article in Journal
Trend Analysis of Temperature Data for the Narayani River Basin, Nepal
Previous Article in Special Issue
Mechanical Energy before Chemical Energy at the Origins of Life?
 
 
Hypothesis
Peer-Review Record

Are Scientific Models of Life Testable? A Lesson from Simpson’s Paradox

by Prasanta S. Bandyopadhyay 1,2, Nolan Grunska 1,*, Don Dcruz 3 and Mark C. Greenwood 4
Reviewer 1:
Reviewer 2:
Submission received: 22 August 2019 / Accepted: 7 December 2020 / Published: 22 December 2020
(This article belongs to the Special Issue Molecules to Microbes)
Version 1
DOI: 10.3390/sci1020054

Version 2
DOI: 10.3390/sci2030073

Round 1

Reviewer 1 Report

This paper presents two separate mechanisms for studying the origins of life, along with a brief description of some of the evidence to support these hypotheses. The authors then point out that both groups discount the other hypothesis as chemically implausible and provide reasons for each. The overview given is a reasonable representation of proponents of both fields given the very succinct nature of these descriptions, although perhaps a more modern point of view maybe that these processes are not exclusive and likely both needed to occur simultaneously (i.e. neither need be first for the origin of life).

The authors then describe Simpson’s paradox, which shows that two groups can have a similar trend, but when mixed together, they show the opposite trend. This statistical anomaly can result in the misinterpretation of real world data by ignoring “hidden” groups within the data set. The authors then show that this could occur in either origins of life scenario (MFT or RWT) by chance when many reactions are occurring in different locations/times by choosing seemingly random values for variables.

It seems like it must be mathematically known what conditions must be met for the paradox to occur, yet the authors only choose one data set under which this is possible without explaining the underlying conditions. They then suggest that this should be testable in the real sense, although it may be technologically limited.

If I imagine what this looks like as an experimentalist: I have a 96 well plate and put in some of the molecules that can react (in either scenario) under different environmental conditions (like pH, salt, temperature, minerals, etc.). Each well has a different amount of functional and non-functional molecules. I measure the amount of functional and non-functional molecules in each well at the end of the experiment, and, low and behold, all of the functional molecules grew at a slower pace than non-functional molecules. If I combinatorically mix one well with every other well, according to Simpson’s paradox, 1/60 of the resulting wells would have a higher rate of formation of functional molecules.

While this very simple experiment is likely testable, I cannot tell if it is meaningful. With the variable of “growth rate” of both functional and non-functional molecules life may have simply preferred the fastest growth rate for functional molecules, not necessarily a higher growth rate than non-functional molecules.

Also, if the environments needed to create this effect need to be very diverse, then a gradient set of environments like would be found in a geological setting may not produce the “two sets of reaction rates mix” effect that is proposed by the authors. Also, this doesn’t need to happen once but rather hundreds (thousands? Millions?) of times to generate a complex set of molecules for life. While it seems possible that this could have overcome a single instance of chemical implausibility, leading to abiogenesis, I feel it is unlikely to be the solution to chemical implausibility as a whole.

Without knowing more about the limitations of Simpson’s paradox, it seems like the authors could predict what the limits of the reaction rates would be to allow for this to take place. I also have questions:

How does degradation fit into the model? What if the values are both negative and positive for G?

Some of the values (90% functional molecules compared to only 10% non-functional molecules) seem a bit ambitious from a chemical plausibility standpoint. If your model requires using an implausible chemical situation to explain away chemical implausibility, is it solving the problem?

Author Response

This paper presents two separate mechanisms for studying the origins of life, along with a brief description of some of the evidence to support these hypotheses. The authors then point out that both groups discount the other hypothesis as chemically implausible and provide reasons for each. The overview given is a reasonable representation of proponents of both fields given the very succinct nature of these descriptions, although perhaps a more modern point of view maybe that these processes are not exclusive and likely both needed to occur simultaneously (i.e. neither need be first for the origin of life). The authors then describe Simpson’s paradox, which shows that two groups can have a similar trend, but when mixed together, they show the opposite trend. This statistical anomaly can result in the misinterpretation of real world data by ignoring “hidden” groups within the data set. The authors then show that this could occur in either origins of life scenario (MFT or RWT) by chance when many reactions are occurring in different locations/times by choosing seemingly random values for variables. It seems like it must be mathematically known what conditions must be met for the paradox to occur, yet the authors only choose one data set under which this is possible without explaining the underlying conditions. They then suggest that this should be testable in the real sense, although it may be technologically limited. If I imagine what this looks like as an experimentalist: I have a 96 well plate and put in some of the molecules that can react (in either scenario) under different environmental conditions (like pH, salt, temperature, minerals, etc.). Each well has a different amount of functional and non-functional molecules. I measure the amount of functional and non-functional molecules in each well at the end of the experiment, and, low and behold, all of the functional molecules grew at a slower pace than non-functional molecules. If I combinatorically mix one well with every other well, according to Simpson’s paradox, 1/60 of the resulting wells would have a higher rate of formation of functional molecules. While this very simple experiment is likely testable, I cannot tell if it is meaningful. With the variable of “growth rate” of both functional and non-functional molecules life may have simply preferred the fastest growth rate for functional molecules, not necessarily a higher growth rate than non-functional molecules. Also, if the environments needed to create this effect need to be very diverse, then a gradient set of environments like would be found in a geological setting may not produce the “two sets of reaction rates mix” effect that is proposed by the authors. Also, this doesn’t need to happen once but rather hundreds (thousands? Millions?) of times to generate a complex set of molecules for life. While it seems possible that this could have overcome a single instance of chemical implausibility, leading to abiogenesis, I feel it is unlikely to be the solution to chemical implausibility as a whole. Without knowing more about the limitations of Simpson’s paradox, it seems like the authors could predict what the limits of the reaction rates would be to allow for this to take place. I also have questions: How does degradation fit into the model? What if the values are both negative and positive for G? Some of the values (90% functional molecules compared to only 10% non-functional molecules) seem a bit ambitious from a chemical plausibility standpoint. If your model requires using an implausible chemical situation to explain away chemical implausibility, is it solving the problem? We would like to thank Dr. Maurer (of the Central Connecticut State University) for taking the time to read our manuscript and making outstanding observations. On this occasion, it is not possible to extrapolate on what would happen during degradation of molecules etc. This is simply because her comments have to be taken into account for any repeat of the original experiments in order to arrive at a conclusive answer to her queries. It should be borne in mind that it is not possible to cover all angles to any experiments at the outset, because such queries as highlighted by Dr. Maurer only arise with hindsight. However, we maintain that Simpson’s Paradox (SP) is a statistical analysis tool which can help resolve or iron out certain trends or other anomalies that may be present in several different datasets to the extent that trends may arise, disappear or even be reversed within the datasets depending on how such datasets are treated. In general, SP is used in social-science as well as in medicinal treatment settings, where it has been shown to be a valuable statistical tool. We have applied SP to statistically analyse the Metabolism First and the RNA world hypotheses and our findings are reported in this paper, also noting that the application of SP to these hypotheses is an innovative concept in itself.

Reviewer 2 Report

This is an interesting paper as it uses the mathematical principles of Simpson's paradox (SP) to provide an argument against the "inefficiency problem"; a problem levelled equally against the RNA World and Metabolism First hypotheses in origin's of life (OOL) studies.

The paper itself is logically well-constrained and the results that the authors draw from their main argument based on the SP, are internally consistent. 

There are some points, however, that I'd like flag up in terms of how best to interpret the results of this paper;

(i) One of the key take-home points of this paper which I feel could be flagged up as a little more important is that, rather than focus on individual reaction sequences, one would do better in OOL to consider networks of interconnected chemical processes. In other words, one should consider the chemical environment as being intimately connected to those reactions that lead to molecules of function. This is a positive of the SP approach in that it allows us to consider the effects on a larger global whole rather than a part of that whole. 

(ii) Overall, the thrust of this paper appears to be that "inefficiency principle" should not be considered a significant objection to OOL studies. Indeed, this is well-worth supporting as it is well-recognised (in catalysis for example) that it is not so much the thermodynamic stability of key molecules that are important in their being selected for but what other reactions result from them in a dynamic, interconnected chemical network. 

(iii) Thus, whether or not the SP in itself is a valuable consideration to OOL studies is connected to what the R1 and R2 etc..systems actually are, how they interconnect in a dynamic manner and how certain key chemicals in these combined networks have functional value. 

(iv) Going back to point (i), the overall reason that biological life on earth exists is ultimately linked to the benefit that such life provides to the universe in its entirety. That is life's true environment and it is to that set of processes that we should aim to apply SP if we can! 

Author Response

This is an interesting paper as it uses the mathematical principles of Simpson's paradox (SP) to provide an argument against the "inefficiency problem"; a problem levelled equally against the RNA World and Metabolism First hypotheses in origin's of life (OOL) studies. The paper itself is logically well-constrained and the results that the authors draw from their main argument based on the SP, are internally consistent. There are some points, however, that I'd like flag up in terms of how best to interpret the results of this paper; (i) One of the key take-home points of this paper which I feel could be flagged up as a little more important is that, rather than focus on individual reaction sequences, one would do better in OOL to consider networks of interconnected chemical processes. In other words, one should consider the chemical environment as being intimately connected to those reactions that lead to molecules of function. This is a positive of the SP approach in that it allows us to consider the effects on a larger global whole rather than a part of that whole. (ii) Overall, the thrust of this paper appears to be that "inefficiency principle" should not be considered a significant objection to OOL studies. Indeed, this is well-worth supporting as it is well-recognised (in catalysis for example) that it is not so much the thermodynamic stability of key molecules that are important in their being selected for but what other reactions result from them in a dynamic, interconnected chemical network. (iii) Thus, whether or not the SP in itself is a valuable consideration to OOL studies is connected to what the R1 and R2 etc..systems actually are, how they interconnect in a dynamic manner and how certain key chemicals in these combined networks have functional value. (iv) Going back to point (i), the overall reason that biological life on earth exists is ultimately linked to the benefit that such life provides to the universe in its entirety. That is life's true environment and it is to that set of processes that we should aim to apply SP if we can! We are grateful to Prof Kee for first reading the MS and then making great suggestions in order to improve the same. However, we feel that it is not possible, on this occasion, to overhaul the MS completely as the “interconnected chemical network” is much more complex, this means having to unpick each and every reactant and product within the said network. Further, the reactants and products within the network may also require calculus mathematics, especially differential equations, in order to refine and improve the algorithm. In addition, we don’t quite know what Prof Kee means by interconnected chemical network as this is much more of theoretical concept than an experimentally workable solutions to the question of OOL altogether. Thus, we feel that the only way to address Prof Kee’s suggestion is to write another paper afresh with interconnected chemical network at the heart of experimental/simulation runs rather than tamper with the present MS.

Round 2

Reviewer 1 Report

The author's did not address any of the previous concerns in this "revision".

More specifically, it seems like Simpson's Paradox can only ever occur if one of the data sets has a much higher value for the functional compared to the non-functional molecules. As the authors declined to comment on whether this must be true or not, I am going to assume that this is a condition that must be met for Simpsons Paradox to apply. I don't know that this condition could be met in a "messy" prebiotic environment, even a small portion of the time, say twice in 600,000,000 years. 

Reviewer 2 Report

I'm happy to let this paper proceed through to final publication. I nice contribution to the field

Back to TopTop