Appendix A.1. Choosing Parents
The probability that two children will pick the same parent is operationally, the reciprocal of the effective population size [
30,
31]. Likewise, the same interpretation was made by [
5] in the construction of the coalescent. Given a set of
’s in a given generation
t, the probability that two children will pick the same parent is
. While the
’s define the probability that children pick their parents,
N does not play a direct role in determining the course of the algorithm in constructing the book of populations but will affect the shape of the ARG that is traced in the second stage.
With selection on a single locus in effect, each generation will have individuals that contains the allele under selection, yielding , and individuals without the allele with .
Transfer of Genetic Material
After the children have randomly selected their parents, the child requests one chromosome from each of the parents. The parents randomly select whether to pass one of their two chromosomes, or to construct a new chromosome via a recombination event involving a crossover between its two chromosomes with respect to the recombination rate,
r. If a crossover is generated, the parent randomly selects a location and transfers the genetic material up to that location from one chromosome and the rest from the its other copy. This is done in part to reconstruct the ARG, and to characterize genetic variation along chromosomes yielding the final recombinations [
32]. In case of no recombination, the parent randomly decides which chromosome’s genetic material should be passed over to the child (see
Figure 1 of the main manuscript).
Each newly constructed chromosome is painted with new SNP mutations randomly generated according to a mutation rate probability , a randomly selected location, and allele value. With probability of mutation on each polymorphic site, the resultant mutated chromosomes are finally passed to the child from the parents along with the sites of mutations and recombinations.
Throughout the generation, forward in time, we keep track of the sites of recombinations and mutations to efficiently trace the ARG from extant individuals to its GMRCA.
Appendix A.5. Experiments and Comparison Study
Here we exhaustively list all the box-whisker diagrams, Q-Q and CDF plots for two-way epistasis and P-P plots for all experiments conducted while comparing the two simulators fwd-EpiSimRA and EpiSimRA (
Figure A3,
Figure A4,
Figure A5).
Figure A1.
Comparing the height of the ARG (H) for different scenarios of selection in EpiSimRA with epistatis and recombination for , kbp, , , with epistastic parameters for for and . The box-and-whisker diagram summarizes the result for each m and selection scenarios such as neutral (), single locus (), epistatic interaction at two loci and three loci respectively.
Figure A1.
Comparing the height of the ARG (H) for different scenarios of selection in EpiSimRA with epistatis and recombination for , kbp, , , with epistastic parameters for for and . The box-and-whisker diagram summarizes the result for each m and selection scenarios such as neutral (), single locus (), epistatic interaction at two loci and three loci respectively.
Figure A2.
Comparing the height of the ARG (H) for different scenarios of selection in EpiSimRA with and without epistatis in recombination for N = 10,000, kbp, , , with epistastic parameters for for and . The box-and-whisker diagram summarizes the result for each m and selection scenarios with and without epistatic interaction at three loci.
Figure A2.
Comparing the height of the ARG (H) for different scenarios of selection in EpiSimRA with and without epistatis in recombination for N = 10,000, kbp, , , with epistastic parameters for for and . The box-and-whisker diagram summarizes the result for each m and selection scenarios with and without epistatic interaction at three loci.
Figure A3.
Comparing the height of the ARG (H) between fwd-EpiSimRA and EpiSimRA with and without epistatis in two loci with recombination for , kbp, , , with epistastic parameters for . (A) The box-and-whisker diagram summarizes the result for each. On each box, the central mark is the mean, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points not considered outliers, and outliers are plotted individually. (B) QQ plot and (C) CDF plot of the backward and forward models show similar distributions.
Figure A3.
Comparing the height of the ARG (H) between fwd-EpiSimRA and EpiSimRA with and without epistatis in two loci with recombination for , kbp, , , with epistastic parameters for . (A) The box-and-whisker diagram summarizes the result for each. On each box, the central mark is the mean, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points not considered outliers, and outliers are plotted individually. (B) QQ plot and (C) CDF plot of the backward and forward models show similar distributions.
Figure A4.
Comparing the height of the ARG (H) between fwd-EpiSimRA and EpiSimRA for selection in single locus with recombination for , kbp, , , . (A) The box-and-whisker diagram summarizes the result for each. On each box, the central mark is the mean, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points not considered outliers, and outliers are plotted individually. (B) QQ plot and (C) CDF plot of the backward and forward models show similar distributions.
Figure A4.
Comparing the height of the ARG (H) between fwd-EpiSimRA and EpiSimRA for selection in single locus with recombination for , kbp, , , . (A) The box-and-whisker diagram summarizes the result for each. On each box, the central mark is the mean, the edges of the box are the 25th and 75th percentiles, the whiskers extend to the most extreme data points not considered outliers, and outliers are plotted individually. (B) QQ plot and (C) CDF plot of the backward and forward models show similar distributions.
Figure A5.
P-P plots of distributions of the height of the ARG (H) between fwd-EpiSimRa and EpiSimRA for (A) single locus selection, (B) epistatic interaction at two loci and (C) epistatic interaction at three loci , , , , and .
Figure A5.
P-P plots of distributions of the height of the ARG (H) between fwd-EpiSimRa and EpiSimRA for (A) single locus selection, (B) epistatic interaction at two loci and (C) epistatic interaction at three loci , , , , and .
We also provide the test statistics and
p-values obtained by running K-S test which does not reject the null hypothesis that the samples of
H as returned by the two simulators are indeed drawn from the same distribution as shown in
Table A1.
Table A1.
K-S test statistics with corresponding p-values showing that the probability distributions of H as returned by fwd-sSimRA and back-sSimRA abstracts each other very closely.
Table A1.
K-S test statistics with corresponding p-values showing that the probability distributions of H as returned by fwd-sSimRA and back-sSimRA abstracts each other very closely.
3 Interacting Loci | | m | p-Value | Test Statistic |
---|
| | |
---|
| | | | 10 | 0.1400 | 0.16 |
| | | | 20 | 0.4431 | 0.12 |
× | × | × | × | 30 | 0.3439 | 0.13 |
| | | | 40 | 0.9995 | 0.05 |
| | | | 10 | 0.6766 | 0.08 |
| | | | 20 | 0.7942 | 0.08 |
| × | × | × | 30 | 0.6766 | 0.10 |
| | | | 40 | 0.5750 | 0.11 |
| | | | 10 | 0.9921 | 0.06 |
| | | | 20 | 0.5560 | 0.11 |
| | × | × | 30 | 0.7942 | 0.09 |
| | | | 40 | 0.8938 | 0.08 |
| | | | 10 | 0.8938 | 0.08 |
| | | | 20 | 0.9995 | 0.05 |
| | × | | 30 | 0.9710 | 0.06 |
| | | | 40 | 0.7942 | 0.09 |
| | | | 10 | 0.3439 | 0.13 |
| | | | 20 | 0.7942 | 0.08 |
| | | × | 30 | 0.6766 | 0.10 |
| | | | 40 | 0.5576 | 0.11 |
| | | | 10 | 0.9610 | 0.07 |
| | | | 20 | 0.9610 | 0.07 |
| | | | 30 | 0.3556 | 0.13 |
| | | | 40 | 0.6766 | 0.10 |