Population Genetic Data of 30 Insertion-Deletion Markers in the Polish Population

(1) Background: Insertion-deletion (InDel) markers show the advantages of both short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs) and are considered alternative markers in forensic genetics. (2) Methods: Allelic frequencies and corresponding forensic efficiency parameters of 30 autosomal polymorphic InDel loci included in the Investigator DIPplex kit (Qiagen) were obtained in a sample of 631 unrelated Polish individuals. Allelic frequency data were compared with those reported for selected populations (3) Results: All the loci conformed with Hardy-Weinberg equilibrium after applying a Bonferroni correction and no pair-wise significant linkage disequilibrium was detected. (4) Conclusions: DIPplex Kit differences were high among populations worldwide. The InDel markers are highly discriminating for human identification purposes in the Polish population.


Introduction
Insertion-deletion (InDel) diallelic polymorphisms are spread in the human genome on all 24 chromosomes (approximately one InDel per 7.2 kbps) and result from insertion and/or deletion of short sequences of 1 to 10,000 bps in length [1]. Due to amplicon sizes designed to be short (50-160 bps), relatively low mutation rates (less than 1 × 10 −8 ), absence of microvariant products and stutter peaks, and automated typing using capillary electrophoresis-based instruments InDel markers have received special attention in forensic genetics practice as alternative markers to Short Tandem Repeats (STRs) and Single Nucleotide Polymorphisms (SNPs) [2][3][4]. In addition to forensic DNA casework, InDels exhibiting large differences in allelic frequencies among different ancestral groups or geographically distant populations may serve as ancestry-informative markers (AIMs) to ascertain population substructure and predict biogeographical origin [5]. The Investigator DIPplex kit (Qiagen) contains 30 forensic-related InDel loci and a sex marker-amelogenin for the simultaneous PCR amplification. Numerous Investigator DIPplex datasets have been reported for mainly Asian and European populations, however, studies on other population data for these markers are still limited.
Poland is located in central Europe at latitude 51.919438 and longitude 19.145136. Based on the allochthonous theory, Poles descended from Western Slavs from the Upper Dnieper basin which expanded to the region between Rivers Warta (Varta) and Wisla (Vistula) in the 5th or 6th century [6]. Since the early Middle Ages, the country has been invaded successively by Germans, Balts, and Mongols, yet sustained its national integrity. From 1772 to 1918 the country was partitioned by the empires of Russia, Prussia, and Austria. Before World War II Poland was inhabited by a variety of ethnic communities including Germans, Ukrainians, and Yiddish-speaking Jews. The official figure of Polish war losses issued in 1947 was 6,028,000 and referred exclusively to losses within the post-war frontiers. The post-war period starting in 1946 witnessed intense demographic processes and an unprecedented birth rate resulting in the number of inhabitants increasing the number by ca. 14 million until 1988. Since then, the natural increase rate (balance of births against deaths) has neared nil [6]. According to UN estimates the population in Poland was expected to reach 39,857,145 by 1 July 2022 [7]. The observed genetic homogeneity within Poland, accompanied by minor differences at the regional level, is most probably due to a potentially homogeneous population of ancestral Slavs, a substantial loss of both major and minor ethnic communities from the country's territory, and/or the forced displacement, expulsion, and deportation during and soon after WWII. Currently, around 97% of the population claim sole or partial Polish nationality with only 450,000 members of ethnic groups of non-Polish ancestry, including Belarusian, Czech, German, Lithuanian, Russian, Slovak, and Ukrainian minorities settled nearby Poland's borders as a result of population displacements from bordering pre-war areas [8]. Furthermore, not until the collapse of communism in 1989, was minority ethnic identity cultivated officially. Previous population genetic studies confirmed that the Polish population is homogenous in terms of autosomal STR and mtDNA polymorphisms [9,10]. Also, studies on Y chromosome Y-SNP and Y-STR distributions indicated that paternal lineages are homogenous within Poland and distinct from the patrilineages in the neighboring populations [11][12][13].
The aim of our study was to provide reference allelic frequencies of 30 autosomal InDels for the Polish population sample and to calculate forensic efficiency parameters to be used in forensic genetics practice. We were also interested in whether betweenpopulation differences can be detected using the Indel set based on our results and selected published data.

Sampling
Buccal swabs were collected from 631 unrelated healthy Polish individuals (319 males and 312 females) living in Poznan, Warsaw, and Bialystok regions. DNA samples were extracted using the QIAamp DNA Mini (Qiagen, Hilden, Germany) and quantified using the Quantifiler Human DNA Quantification Kit on a 7500 Real-Time PCR instrument (Thermo Fisher Scientific, Waltham, MO, USA). Sample concentrations were adjusted to 0.5 ng/µL according to the manufacturer's recommendations.

PCR Amplification and InDel Genotyping
About 0.5 ng genomic DNA templates were amplified in a 25 µL reaction volume on a GeneAmp PCR system 9700 (Thermo Fisher Scientific, Waltham, MO, USA), following the kit manufacturer's manual. The PCR products were separated and detected on the 3500 Genetic Analyzer (Thermo Fisher Scientific, Waltham, MO, USA). The SST-BTO size standard and reference allelic ladder provided in the kit were used for data analysis and genotyping by GeneMapper ID-X v.1.5 software (Thermo Fisher Scientific, Waltham, MO, USA). The experiments were carried out using the 9948 male DNA positive control and the negative control of ddH2O. The recommendations of the DNA Commission of the International Society for Forensic Genetics (ISFG) on the analysis of forensic markers [14] and internal quality control requirements according to the ISO 17025 standard were strictly followed. The HLD (human locus deletion/insertion polymorphism) numbers were used to designate Indel loci in the DIPplex kit. The corresponding RefSNP (rs) numbers are listed in Table 1.

Results
The allele frequency distributions and corresponding forensic efficiency parameters based on the raw genotypes (submitted in Supplementary Table S1) are shown in Table 1. Insertion allelic frequencies (DIP+) of 30 markers range from 0.3487 for HLD114 to 0.6648 for HLD56. After applying the Bonferroni correction of multiple comparisons (0.05/30 = 0.00167), no deviations from HWE were observed (0.0157 < p < 1.0000) with the lowest p-value at HLD93 locus. No significant LD (p > 0.000115) was observed between the pairwise InDels after applying the Bonferroni adjustment (0.05/435), which indicates random association among the 30 InDel loci in the studied population. Moreover, no significant linkage was detected (p > 0.0190) between the three most likely STR candidates and DIPplex locus pairs (CSF1PO-HLD67, D7S820-HLD81, and D8S1179-HLD84)

Discussion
The single locus parameters and the cumulative forensic efficiency indexes calculated in this study indicate that this panel is informative in the Polish population and can be useful for forensic individual identification. As it has been shown previously [27,34] based on the calculated TPI value the Investigator DIPplex kit is not sufficient as a standalone system in paternity tests, however, due to their reduced mutation rates [35]. InDels may serve as an extension to STR platforms in deficient or inconclusive cases [36,37]. As estimated by Krawczak, at least 60 maximally informative SNPs would be required to yield the same power of paternity exclusion as the set of 14 microsatellites of the average allele number 9.5 and the average gene diversity 0.77, since the gene diversity of an SNP will normally be smaller than 0.5% [38].
Genetic distance is an important indicator of relatedness among populations. The fixation index (FST) is a comparative measure of genetic variation in a population due to genetic structure or differentiation between populations. To compare the Polish population with the 19 previously investigated populations, sample bias corrected FST distances were calculated among all pairs of populations based on allelic frequencies of the 30 InDel loci ( Table 2). In general, higher FST values represent more genetic differentiation between two populations. Large genetic distances were found between the Polish population and Somalians, Nigerians, South African Zulus, South Koreans, Vietnamese, and Chinese. We then reconstructed phylogenetic relationships on the basis of FST genetic distances ( Figure 1). As is shown in the graphic representation of these distances, the populations in our study are grouped in separate branches according to continental or regional biogeographical ancestry. Among the other populations, Poles share the most genetic relatedness with Slovenians and Lithuanians, followed by Finns and Spanish in the same cluster. Black African and East Asian populations cluster in two different genetic structures distant from European populations and Pakistanis are found on a separate branch. To further investigate genetic relationships between Poles and worldwide populations an MDS plot was drawn from the FST values to represent genetic relationships between the populations in multidimensional space. As shown in Figure 2, East Asians, Pakistanis, and most Black Africans are allocated apart on the bottom left, upper, and bottom right of the plot, respectively, thus can be clearly distinguished from the other groups. Europeans are found in the middle of the plot. The other populations are distributed between the Europeans and Africans. South African Afrikaners tend to be in a close relationship with Europeans, which may be due to their descent from predominantly Dutch settlers in the 17th and 18th centuries.     Most of the evaluated Indels show Ho of 0.5000 approx., which makes them suitable for forensic human identification as identity-informative markers. On the other hand, markers that exhibit low heterozygosity and different allelic frequency distributions between populations (high individual locus-specific FST) may be potentially used as ancestry-informative (AIM-InDels) in distinguishing between populations of interest. Therefore, three candidates for AIM-InDels are likely in our batch: HLD111, HLD118, and HLD81 of FST = 0.2607, 0.2781, and 0.2221, respectively. Thus, due to increased interest, enhanced sets of more effective AIMs are needed for commercial development and validation to identify ancestry contributions of admixtures.

Conclusions
We provided the first comprehensive analysis of DIPplex Kit markers for the Polish population with details to calculate forensic efficiency parameters and investigate genetic diversity.
Based on interpopulation comparisons the 30 InDel differences are high enough to perform intercontinental forensic population analysis.
Our findings indicate that the DIPplex Kit can be used in forensic applications in the Polish population to increase the power of evidence of the conventional STR markers.
Supplementary Materials: The following supporting information can be downloaded at: www.mdpi.com/xxx/s1 Table S1: Raw genotype data of 30 InDels in the Polish population (n = 631).  Most of the evaluated Indels show Ho of 0.5000 approx., which makes them suitable for forensic human identification as identity-informative markers. On the other hand, markers that exhibit low heterozygosity and different allelic frequency distributions between populations (high individual locus-specific FST) may be potentially used as ancestry-informative (AIM-InDels) in distinguishing between populations of interest. Therefore, three candidates for AIM-InDels are likely in our batch: HLD111, HLD118, and HLD81 of FST = 0.2607, 0.2781, and 0.2221, respectively. Thus, due to increased interest, enhanced sets of more effective AIMs are needed for commercial development and validation to identify ancestry contributions of admixtures.

Conclusions
We provided the first comprehensive analysis of DIPplex Kit markers for the Polish population with details to calculate forensic efficiency parameters and investigate genetic diversity.
Based on interpopulation comparisons the 30 InDel differences are high enough to perform intercontinental forensic population analysis.
Our findings indicate that the DIPplex Kit can be used in forensic applications in the Polish population to increase the power of evidence of the conventional STR markers.