# Evaluating DNA Mixtures with Contributors from Different Populations Using Probabilistic Genotyping

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

^{TM}software implements a population stratified likelihood ratio that uses weighted averages of the likelihoods across population groups in the numerator and the denominator:

^{TM}software [16] version 1.2 implements a likelihood ratio framework for the evaluation of propositions involving one or more samples where contributors may be related according to a pedigree [17]. The comparison of a POI to a mixture can be formulated within this framework. Our intention is to develop an updated version of the software that is able to flexibly model population stratification in this general framework so that the population of each founder in the pedigree can be modelled separately. As a first step towards this development, we have implemented population stratification in a probabilistic way for pedigrees comprising founders only. The population from which each person originates is modelled using a prior distribution. The joint prior distribution of the populations for all persons can be constrained such that all sample donors are from the same ethnic group, or all permutations of populations can be considered.

^{TM}framework. In the first approach, which we call simple stratification, each person is assumed to originate from the same population. In the second approach, which we call full stratification, the population of all persons is considered to be independent a priori. Hence, all permutations of assignments of persons to populations are considered. These two approaches are compared to the method that is implemented in the STRmix

^{TM}software (2) and the pragmatic option of considering a minimum across population-specific likelihood ratios.

## 2. Methods

#### 2.1. Likelihood Ratio Framework

#### 2.2. Likelihood Ratio for Comparison of a POI to a Single Sample

^{TM}(2) is the probability of the observed mixture profile conditional on the genotype of a POI. The DBLR

^{TM}approach, on the other hand, includes the probability of all genotypes in both the numerator and denominator likelihoods. Below, we derive expressions for the DBLR

^{TM}likelihood ratio for the specific case of the comparison of a POI to a two-person mixture to facilitate comparison with the STRmix

^{TM}approach.

#### 2.2.1. Simple Stratification

#### 2.2.2. Full Stratification

#### 2.2.3. Example Calculations

**Population specific likelihood ratios**. Using allele frequencies from population A only, we write ${LR}_{A}={P}_{A}\left(O\right|{H}_{1},{G}_{p})/{P}_{A}\left(O\right|{H}_{2},{G}_{p})$ where the subscript A emphasises the population that is used in the calculations. Because this is a two-person mixture and we evaluate the sub-source likelihood ratio, we consider two sub-hypotheses ${H}_{11}$ and ${H}_{12}$ stating respectively that the POI is the first or the second contributor. These sub-hypotheses are assumed to have equal prior probability. In this example, the POI genotype is $(13/14)$. Hence, ${P}_{A}\left(O\right|{H}_{1},{G}_{p})=\frac{1}{2}({P}_{A}\left(O\right|{H}_{11,{G}_{p}})+{P}_{A}\left(O\right|{H}_{12,{G}_{p}}))$. There is only a single matching genotype combination, so we obtain ${P}_{A}\left(O\right|{H}_{11,{G}_{p}})={\left({f}_{11}^{A}\right)}^{2}\times 0.3$ and ${P}_{A}\left(O\right|{H}_{12,{G}_{p}})={\left({f}_{11}^{A}\right)}^{2}\times 0.2$ which yields ${P}_{A}\left(O\right|{H}_{1,{G}_{p}})={\left({f}_{11}^{A}\right)}^{2}\times 0.25$. To evaluate ${P}_{A}\left(O\right|{H}_{2},{G}_{p})$, we sum over the genotype combinations:

**Simple stratification**. To evaluate the likelihood ratio using simple stratification we need the posterior probabilities that the POI’s genotype originates from population A or B. The probability of the POI’s genotype in population A is ${P}_{A}\left({G}_{p}\right)=2{f}_{13}^{A}{f}_{14}^{A}=0.06$ and the probability of the POI’s genotype in population B is ${P}_{B}\left({G}_{p}\right)=2{f}_{13}^{B}{f}_{14}^{B}=0.12$. Hence, the posterior probability that the POI’s genotype originates from population A equals ${p}_{A|{G}_{p}}=\frac{\frac{1}{2}0.06}{\frac{1}{2}0.06+\frac{1}{2}0.12}=\frac{1}{3}$; and ${p}_{B|{G}_{p}}=\frac{2}{3}$. Next, we need to evaluate

**Full stratification**. The full stratification method considers all permutations of population assignments. Under ${H}_{1}$, this means that the populations of the two mixture donors $({E}_{1},{E}_{2})$ can take the values $(A,A)$, $(A,B)$, $(B,A)$ or $(B,B)$. Under ${H}_{2}$, the populations of the two mixture donors and the POI $({E}_{,}{E}_{2},{E}_{p})$ can take eight values. We first decompose the sub-source hypothesis ${H}_{1}$ into ${H}_{11}$ and ${H}_{12}$ and then consider different population assignments in the calculation.

**STRmix**. The population stratification method implemented in STRmix

^{TM}stratification^{TM}considers a weighted average of the likelihoods in each population. For our example, we obtain

#### 2.3. Simulation Study

^{TM}kit using the

`simDNAmixtures`[18,19] package for

`R`. The template parameter of the major contributor was kept fixed at 500 rfu, while the minor contributor had a template parameter sampled uniformly between 25 (a third of the analytical threshold) and 125 rfu. The low template values for the minor contributor ensure that a large proportion of the samples exhibit severe dropout. The genotypes of the two contributors were sampled according to NIST allele frequencies [10], one from the African American sample and the other from the Caucasian sample. It was randomised which one was the major and which one was the minor contributor. All samples were interpreted using STRmix

^{TM}version 2.9.

^{TM}and STRmix

^{TM}) that were used in the calculations. The effect of not using population stratification was investigated first by evaluating likelihood ratios using African American allele frequencies only and Caucasian allele frequencies only using both DBLR

^{TM}and STRmix

^{TM}. Next, the different ways of population stratification were applied. All likelihood ratio calculations were first done with ${F}_{ST}=0$ and then repeated with ${F}_{ST}=0.01$. The simulations were not repeated, i.e., the mixture donors were simulated according to a population genetic model with ${F}_{ST}=0$ in both cases.

#### 2.4. Contributors from the Same Population

## 3. Results

#### 3.1. The Case of ${F}_{ST}=0$

^{TM}assumes that all mixture donors are either from the one or from the other population which does not correspond to the ground truth. For some POIs this leads to an inflated WoE with the largest WoE difference being about two bans which may be a meaningful difference in some circumstances. The median WoE difference for the simple stratification method is $-0.30$ with a standard deviation of 0.57. The full stratification method implemented in DBLR

^{TM}does take into account the possibility that mixture donors originate from different populations. The median WoE difference for the full stratification method is $-0.03$ with a standard deviation of 0.32. Hence, the full stratification method is more accurate and less biased than the simple stratification method in these simulations. The STRmix

^{TM}stratification method (bottom-right panel) employs a weighted average of likelihoods in the numerator and denominator of the likelihood ratio. This method is on average slightly anti-conservative with a median WoE difference of 0.31 and a standard deviation of 0.82. For all simulated POIs the WoE difference was less than four bans.

#### 3.2. The Case of ${F}_{ST}=0.01$

^{TM}stratification are lower when ${F}_{ST}=0.01$ is used compared to when ${F}_{ST}=0$ is used, while the WoE difference for the full stratification method is shifted less. Using ${F}_{ST}=0.01$, the simple stratification method has a median WoE difference of $-0.53$ with a standard deviation of 0.57. The full stratification method is close to unbiased with a median WoE difference of $-0.05$ and a smaller standard deviation of 0.32. For the STRmix

^{TM}stratification method the median WoE difference is $-0.07$ and the standard deviation is 0.72. The fraction of POIs for which the WoE difference is positive (i.e., the method is anti-conservative) is 12.4% for the simple stratification method, 15.4% for the full stratification method and 46.6% for the STRmix

^{TM}stratification method. Although these fractions are non-negligible, most of the positive WoE differences are small in magnitude. The fraction of POIs for which the WoE difference exceeds one ban is 0.6% for the simple stratification method, 0 for the full stratification method and 6.6% for the STRmix

^{TM}stratification method.

#### 3.3. Contributors from the Same Population

^{TM}stratification method yields results that are most of the time very close to the ground truth population likelihood ratios.

## 4. Discussion

^{TM}software. In the absence of sub-population correction (i.e., when ${F}_{ST}=0$), the findings indicate that the full stratification method is most accurate when the mixture contributors originate from different populations. The standard deviation of the WoE difference between this method and the ground truth is also the smallest among the compared methods. The simple stratification method is on average slightly conservative, however the standard deviation of the WoE difference between the simple stratification method and the ground truth is comparatively large (0.57 versus 0.32 for the full stratification method). This means that the simple stratification method is more often slightly non-conservative. The highest WoE differences observed for the simple stratification method are about 2 bans. The stratification method implemented in STRmix

^{TM}is on average slightly anti-conservative with the median WoE difference being 0.31 bans. The standard deviation of 0.82 is higher than the standard deviation obtained for the other two stratification methods. The highest WoE differences were close to four bans. Besides the stratification methods, the simulation study also investigated the possibility of taking the minimum WoE across the two populations that were considered. This is, on average, a conservative approach. However, it is important to note that this approach does not take into account that the mixture donors originate from different populations and is therefore not always conservative.

## 5. Conclusions

## Author Contributions

## Funding

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Meester, R.; Slooten, K. Probability and Forensic Evidence: Theory, Philosophy, and Applications; Cambridge University Press: Cambridge, UK, 2021. [Google Scholar]
- Buckleton, J.S.; Bright, J.A.; Taylor, D. Forensic DNA Evidence Interpretation; CRC Press: Boca Raton, FL, USA, 2018. [Google Scholar]
- Balding, D.J.; Steele, C.D. Weight-of-Evidence for Forensic DNA Profiles; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
- Balding, D.J.; Nichols, R.A. DNA profile match probability calculation: How to allow for population stratification, relatedness, database selection and single bands. Forensic Sci. Int.
**1994**, 64, 125–140. [Google Scholar] [CrossRef] [PubMed] - Balding, D.J.; Bishop, M.; Cannings, C. Handbook of Statistical Genetics; John Wiley & Sons: Hoboken, NJ, USA, 2008. [Google Scholar]
- National Research Council. The Evaluation of Forensic DNA Evidence; National Research Council: Washington, DC, USA, 1996. [Google Scholar]
- Gill, P.; Haned, H.; Bleka, O.; Hansson, O.; Dørum, G.; Egeland, T. Genotyping and interpretation of STR-DNA: Low-template, mixtures and database matches—Twenty years of research and development. Forensic Sci. Int. Genet.
**2015**, 18, 100–117. [Google Scholar] [CrossRef] [PubMed] - Butler, J.M. The future of forensic DNA analysis. Philos. Trans. R. Soc. B Biol. Sci.
**2015**, 370, 20140252. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Devesse, L. Characterisation and Differentiation of five UK Populations Using Massively Parallel Sequencing of Forensic STRs. Ph.D. Thesis, King’s College London, London, UK, 2022. [Google Scholar]
- Steffen, C.R.; Coble, M.D.; Gettings, K.B.; Vallone, P.M. Corrigendum to ‘US population data for 29 autosomal STR loci’[Forensic Sci. Int. Genet. 7 (2013) e82–e83]. Forensic Sci. Int. Genet.
**2017**, 31, e36–e40. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Budowle, B.; Moretti, T.R.; Baumstark, A.L.; Defenbaugh, D.A.; Keys, K.M. Population data on the thirteen CODIS core short tandem repeat loci in African Americans, US Caucasians, Hispanics, Bahamians, Jamaicans, and Trinidadians. J. Forensic Sci.
**1999**, 44, 1277–1286, Erratum in J. Forensic Sci.**2015**, 60, 1114–1116. [Google Scholar] [CrossRef] - SWGDAM Ad Hoc Working Group on Genotyping Results Reported as Likelihood Ratios. Recommendations of the SWGDAM Ad Hoc Working Group on Genotyping Results Reported as Likelihood Ratios. 2018. Available online: https://www.swgdam.org/_files/ugd/4344b0_dd5221694d1448588dcd0937738c9e46.pdf (accessed on 12 October 2022).
- Ge, J.; Budowle, B. Kinship index variations among populations and thresholds for familial searching. PLoS ONE
**2012**, 7, e37474. [Google Scholar] [CrossRef] [PubMed] - Gill, P.; Hicks, T.; Butler, J.M.; Connolly, E.; Gusmão, L.; Kokshoorn, B.; Morling, N.; van Oorschot, R.A.; Parson, W.; Prinz, M.; et al. DNA commission of the International society for forensic genetics: Assessing the value of forensic biological evidence-Guidelines highlighting the importance of propositions: Part I: Evaluation of DNA profiling comparisons given (sub-) source propositions. Forensic Sci. Int. Genet.
**2018**, 36, 189–202. [Google Scholar] [CrossRef] [PubMed] [Green Version] - Triggs, C.; Harbison, S.; Buckleton, J. The calculation of DNA match probabilities in mixed race populations. Sci. Justice J. Forensic Sci. Soc.
**2000**, 40, 33–38. [Google Scholar] [CrossRef] [PubMed] - Kelly, H.; Kerr, Z.; Cheng, K.; Kruijver, M.; Bright, J.A. Developmental validation of a software implementation of a flexible framework for the assignment of likelihood ratios for forensic investigations. Forensic Sci. Int. Rep.
**2021**, 4, 100231. [Google Scholar] [CrossRef] - Kruijver, M.; Taylor, D.; Bright, J.A. Evaluating DNA evidence possibly involving multiple (mixed) samples, common donors and related contributors. Forensic Sci. Int. Genet.
**2021**, 54, 102532. [Google Scholar] [CrossRef] - Kruijver, M. simDNAmixtures: Simulate Forensic DNA Mixtures. R Package Version 0.2. 2022. Available online: https://linkinghub.elsevier.com/retrieve/pii/S1872497321000703 (accessed on 12 November 2022).
- Kruijver, M.; Bright, J.A. A tool for simulating single source and mixed DNA profiles. Forensic Sci. Int. Genet.
**2022**, 60, 102746. [Google Scholar] [CrossRef] - Toscanini, U.; Salas, A.; García-Magariños, M.; Gusmão, L.; Raimondi, E. Population stratification in Argentina strongly influences likelihood ratio estimates in paternity testing as revealed by a simulation-based approach. Int. J. Leg. Med.
**2010**, 124, 63–69. [Google Scholar] [CrossRef] - Laurent, F.X.; Fischer, A.; Oldt, R.F.; Kanthaswamy, S.; Buckleton, J.S.; Hitchin, S. Streamlining the decision-making process for international DNA kinship matching using Worldwide allele frequencies and tailored cutoff log10LR thresholds. Forensic Sci. Int. Genet.
**2022**, 57, 102634. [Google Scholar] [CrossRef] [PubMed] - Oldt, R.F.; Kanthaswamy, S. Expanded CODIS STR allele frequencies–Evidence for the irrelevance of race-based DNA databases. Leg. Med.
**2020**, 42, 101642. [Google Scholar] [CrossRef] [PubMed]

**Figure 2.**WoE difference the compared methods and the WoE evaluated the ground truth populations with ${F}_{ST}=0$ for the 500 donors to the 250 simulated two-person mixtures where the contributors originate from different populations.

**Figure 3.**WoE difference between compared methods and the WoE evaluated the ground truth populations with ${F}_{ST}=0.01$ for the 500 donors to the 250 simulated two-person mixtures with contributors from different populations.

**Figure 4.**WoE difference between the compared methods and the WoE evaluated using the ground truth population with ${F}_{ST}=0.01$ for the 500 donors to 250 simulated two-person mixtures with contributors from the same population.

**Table 1.**Mixture deconvolution for the electropherogram shown in Figure 1.

Genotype Combination (s) | Weight (${\mathit{w}}_{\mathit{s}}$) |
---|---|

(13/14, 11/11) | 0.3 |

(11/14, 11/13) | 0.27 |

(11/13, 11/14) | 0.23 |

(11/11, 13/14) | 0.2 |

**Table 2.**Allele frequencies used in likelihood ratio calculation for the electropherogram shown in Figure 1.

Frequency in | Frequency in | |
---|---|---|

Allele (a) | Population A (${\mathit{f}}_{\mathit{a}}^{\mathit{A}}$) | Population B (${\mathit{f}}_{\mathit{a}}^{\mathit{B}}$) |

11 | 0.6 | 0.5 |

13 | 0.3 | 0.3 |

14 | 0.1 | 0.2 |

**Table 3.**Overview of likelihood ratios calculated for comparison of each of the 500 true donors to the 250 simulated two-person mixtures.

Population | Implementation | ${\mathit{F}}_{\mathbf{ST}}$ |
---|---|---|

African American only | DBLR^{TM}, STRmix^{TM} | 0, $0.01$ |

Caucausian only | DBLR^{TM}, STRmix^{TM} | 0, $0.01$ |

Simple stratified | DBLR^{TM} | 0, $0.01$ |

Fully stratified | DBLR^{TM} | 0, $0.01$ |

STRmix^{TM} stratified | STRmix^{TM} | 0, $0.01$ |

Ground truth (mixed) | DBLR^{TM} | 0, $0.01$ |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Kruijver, M.; Kelly, H.; Bright, J.-A.; Buckleton, J.
Evaluating DNA Mixtures with Contributors from Different Populations Using Probabilistic Genotyping. *Genes* **2023**, *14*, 40.
https://doi.org/10.3390/genes14010040

**AMA Style**

Kruijver M, Kelly H, Bright J-A, Buckleton J.
Evaluating DNA Mixtures with Contributors from Different Populations Using Probabilistic Genotyping. *Genes*. 2023; 14(1):40.
https://doi.org/10.3390/genes14010040

**Chicago/Turabian Style**

Kruijver, Maarten, Hannah Kelly, Jo-Anne Bright, and John Buckleton.
2023. "Evaluating DNA Mixtures with Contributors from Different Populations Using Probabilistic Genotyping" *Genes* 14, no. 1: 40.
https://doi.org/10.3390/genes14010040