Single nucleotide polymorphisms (SNPs) support robust analysis on degraded DNA samples. However, the development of a systematic method to interpret the profiles derived from the mixtures is less studied, and it remains a challenge due to the bi-allelic nature of SNP markers. To improve the discriminating power of SNPs, this study explored bioinformatic strategies to analyze mixtures. Then, computer-generated mixtures were produced using real-world massively parallel sequencing (MPS) data from the single samples processed with the Precision ID Identity Panel. Moreover, the values of the frequency of major allele reads (FMAR
) were calculated and applied as key parameters to deconvolve the two-person mixtures and estimate mixture ratios. Four custom R language scripts (three for autosomes and one for Y chromosome) were designed with the K-means clustering method as a core algorithm. Finally, the method was validated with real-world mixtures. The results indicated that the deconvolution accuracy for evenly balanced mixtures was 100% or close to 100%, which was the same as the deconvolution accuracy of inferring the genotypes of the major contributor of unevenly balanced mixtures. Meanwhile, the accuracy of inferring the genotypes of the minor contributor decreased as its proportion in the mixture decreased. Moreover, the estimated mixture ratio was almost equal to the actual ratio between 1:1 and 1:6. The method proposed in this study provides a new paradigm for mixture interpretation, especially for inferring contributor profiles of evenly balanced mixtures and the major contributor profile of unevenly balanced mixtures.
This is an open access article distributed under the Creative Commons Attribution License
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.