An MPS-Based 50plex Microhaplotype Assay for Forensic DNA Analysis

Microhaplotypes (MHs) are widely accepted as powerful markers in forensic studies. They have the advantage of both short tandem repeats (STRs) and single nucleotide polymorphisms (SNPs), with no stutter and amplification bias, short fragments and amplicons, low mutation and recombination rates, and high polymorphisms. In this study, we constructed a panel of 50 MHs that are distributed on 21 chromosomes and analyzed them using the Multiseq multiple polymerase chain reaction (multi-PCR) targeted capture sequencing protocol based on the massively parallel sequencing (MPS) platform. The sizes of markers and amplicons ranged between 11–81 bp and 123–198 bp, respectively. The sensitivity was 0.25 ng, and the calling results were consistent with Sanger sequencing and the Integrative Genomics Viewer (IGV). It showed measurable polymorphism among sequenced 137 Southwest Chinese Han individuals. No significant deviations in the Hardy–Weinberg equilibrium (HWE) and linkage disequilibrium (LD) were found at all MHs after Bonferroni correction. Furthermore, the specificity was 1:40 for simulated two-person mixtures, and the detection rates of highly degraded single samples and mixtures were 100% and 93–100%, respectively. Moreover, animal DNA testing was incomplete and low depth. Overall, our MPS-based 50-plex MH panel is a powerful forensic tool that provides a strong supplement and enhancement for some existing panels.


Introduction
Microhaplotypes (MHs) are novel genetic markers, proposed by the Kidd lab in 2013, to complement current DNA genotyping tools used in forensic genetics [1,2]. They are characterized by the presence of two or more closely linked single nucleotide polymorphisms (SNPs) within 300 bp, with three or more alleles (haplotypes). Therefore, they provide more information than single SNPs, and exhibit a low rate of recombination over such short distances (assuming an average of 1% recombination per megabase and no recombination hotspots within the locus) [3,4]. Microhaplotypes do not preferentially amplify certain alleles within a locus because all alleles at a locus are the same size. Compared to short tandem repeats (STRs), MHs have no stutters, lower mutation rates, and fewer alleles [1]. A large set of MHs can approach the same discrimination power as a set of STRs [5] and provide valuable information on individual identification, mixture interpretation, ancestry prediction, kinship testing, and medical diagnostic applications [1,6,7].

MH Selection
We used the homemade MH screening software combined with PHASE v2.1.1 (https://stephenslab.uchicago.edu/phase/download.html, accessed on 1 January 2022, Seattle, WA, USA) to analyze the 1000 G data (Phase 3). Based on a previous study by our research group [27], we extracted MHs consisting of two or more SNPs within 80 bp in the CHS and an effective number of alleles (A e ) value ≥ 3, and estimated the theoretical value of population haplotype frequencies. On this basis, we screened candidate MHs according to the following criteria: (1) all SNPs of MHs must show a minor allele frequency (MAF) > 0 in the dbSNP database; (2) an A e value ≥ 4 because MHs with high A e can enhance individual identification, mixture interpretation, and kinship analysis [18]; (3) the MH with the largest A e from all overlapping sequences in each group, taking each autosome as a unit; (4) the MHs with apparent repeat motifs in the base sequence were removed; (5) the initial set of MHs with a physical position ≥10 Mb were selected as an interval to avoid linkage disequilibrium (LD) among the selected MHs; and (6) only MHs for which functional primers could be designed.

Primer Design
After obtaining the candidate MHs, we handed over the region of interest (ROI), that is, the physical location information of the MHs, to iGeneTech Biotechnology Beijing Co., Ltd. using the online MFEprimer v3.1 (https://mfeprimer3.igenetech.com/muld, accessed on 19 January 2022, Beijing, China) to design and validate multiple PCR primers that targeted the genomic sequence of the MHs in our panel. Based on thermodynamic stability [33], highly specific multiplex primers were designed on both sides of the ROI; the amplicon was 120-200 bp. We then evaluated primer dimerization and non-specific amplification, tested the designed and synthesized primers, and replaced primers with a poor detection effect.

Sample Collection
Peripheral blood samples of 137 unrelated Southwest Chinese Han individuals were collected after obtaining informed consent with the approval of the Medical Ethics Committee of Sichuan University (No. KS2022770). Genomic DNA over 18 ng/µL, extracted using the phenol-chloroform method, were quantified using the Qubit™ dsDNA HS Assay Kit on a Qubit ® 4.0 Fluorometer according to the manufacturer's protocol (https://assets. thermofisher.com/TFS-Assets/LSG/manuals/MAN0017209_Qubit_4_Fluorometer_UG.pdf, accessed on 14 April 2022, Thermo Fisher Scientific, Waltham, MC, USA).
Seven unrelated samples were randomly selected, and their original bam files obtained from MPS were input into the Integrative Genomics Viewer (IGV) v2. 16.0 (https://software. broadinstitute.org/software/igv/userguide, accessed on 10 October 2022, Cambridge, MA, USA) to analyze the genotype of all the target 50 MHs. Among them, two MH loci and four unrelated samples were randomly selected for Sanger sequencing (Tsingke Biotechnology Co., Ltd., Beijing, China). Finally, the MH genotypes, obtained using the pipelines developed by our laboratory, were compared with those obtained by IGV and Sanger sequencing simultaneously.

Sequencing Data Analysis
The raw image data obtained after sequencing were converted and deduplicated from base calling files using the bcl2fastq v2.20.0.422 (Illumina, San Diego, CA, USA). The resulting raw sequencing sequences (FASTQ files) were submi ed to Trimmomatic v0.38 (Max Planck Institute, Potsdam, BB, Germany) and FastQC v0.11.3 (Babraham Institute, Cambridge, UK) in-house quality control software to remove low-quality reads, followed by the Bwa v0.7.12 (Wellcome Trust Sanger Institute, Cambridge, UK) [34] and Samtools to align them with the reference human genome (Hg19, GRCh37). Single BAM files were submi ed to variant calling at SNP/INDEL sites using Samtools v1.9 (UChicago, Chicago, IL, USA) and Varscan v2.4.3 (UWashington, Sea le, WA, USA) to generate VCF files [35]. Raw identification calls for SNV and InDels were further filtered using the thresholds read depth > 4, mapping quality > 20, and variant quality score > 20. Variation loci were annotated using Annovar v201707 (UPenn, Philadelphia, PA, USA). Annotation databases included ExAC, ESP6500, 1000 Genomes, gnomAD, SIFT, CADD, and Polyphen 2. We then used our laboratory pipelines for MH calling using the CIGAR and MD: Z tag information of BAM files [12]. The minimum DOC for each target region and threshold for each MH allele were set to 100× and 25×, respectively, for further analysis. After initial filtering with a threshold of 25 reads, the default minimum read coverage for an allele was set at 5%. If the number of reads for an allele are below this value, the alleles will not be called. The default minimum value for allele frequency for heterozygous markers was set at 10%. If two or more alleles are detected at a marker, any single allele must have coverage of at least this percentage of total reads at the marker to be called. The default minimum value for allele frequency for homozygous markers was set at 90%. A single allele at a marker The first round of multiple PCR reactions is to obtain the amplicon product of the target region. By the NovaSeq 6000 S4 Reagent Kit v1.5, the multiple PCR reaction system contained 3.5 µL of Enhancer buffer NB (1N), 2.5 µL of enhancer buffer M, 10 µL of IGT-EM808 polymerase, 5 µL of primer pool, 1-5 ng a DNA/reaction tube, and finally made up to 30 µL with ddH 2 O. The multiple PCR reaction conditions consisted of a preincubation at 95 • C for 3 min 30 s, followed by 22 cycles of 98 • C for 20 s, 60 • C for 4 min, and a final extension at 72 • C for 5 min on an ETC811 PCR thermocycler (Dongsheng Innovation Biotechnology Co., Ltd., Beijing, China) using a customized MultipSeq ® Custom Panel (iGeneTech Biotechnology Beijing Co., Ltd., Beijing, China) with amplicons between 120 and 200 bp. The pure amplification product was obtained through the first round of magnetic bead purification, which was used as the template for the second round of PCR reaction.
In the second round of adapter PCR reaction, sequencing adapters were introduced to both sides of the amplicon product to obtain a library. The adapter sequence PCR reaction system contained 2.5 µL of Enhancer buffer M, 10 µL of IGT-EM808 polymerase, 2 µL of CDI Primer (premix adapter primer), 13.5 µL of PCR product mixture, and finally made up to 30 µL with ddH 2 O. The adapter sequence PCR reaction conditions consisted of a preincubation at 95 • C for 3 min 30 s, followed by 9 cycles of 98 • C for 20 s, 58 • C for 60 s, 72 • C for 30 s, and a final extension at 72 • C for 5 min on an ETC811 PCR thermocycler. The pure amplicon library was obtained through the second round of magnetic bead purification.

Sequencing Data Analysis
The raw image data obtained after sequencing were converted and deduplicated from base calling files using the bcl2fastq v2.20.0.422 (Illumina, San Diego, CA, USA). The resulting raw sequencing sequences (FASTQ files) were submitted to Trimmomatic v0.38 (Max Planck Institute, Potsdam, BB, Germany) and FastQC v0.11.3 (Babraham Institute, Cambridge, UK) in-house quality control software to remove low-quality reads, followed by the Bwa v0.7.12 (Wellcome Trust Sanger Institute, Cambridge, UK) [34] and Samtools to align them with the reference human genome (Hg19, GRCh37). Single BAM files were submitted to variant calling at SNP/INDEL sites using Samtools v1.9 (UChicago, Chicago, IL, USA) and Varscan v2.4.3 (UWashington, Seattle, WA, USA) to generate VCF files [35]. Raw identification calls for SNV and InDels were further filtered using the thresholds read depth > 4, mapping quality > 20, and variant quality score > 20. Variation loci were annotated using Annovar v201707 (UPenn, Philadelphia, PA, USA). Annotation databases included ExAC, ESP6500, 1000 Genomes, gnomAD, SIFT, CADD, and Polyphen 2. We then used our laboratory pipelines for MH calling using the CIGAR and MD: Z tag information of BAM files [12]. The minimum DOC for each target region and threshold for each MH allele were set to 100× and 25×, respectively, for further analysis. After initial filtering with a threshold of 25 reads, the default minimum read coverage for an allele was set at 5%. If the number of reads for an allele are below this value, the alleles will not be called. The default minimum value for allele frequency for heterozygous markers was set at 10%. If two or more alleles are detected at a marker, any single allele must have coverage of at least this percentage of total reads at the marker to be called. The default minimum value for allele frequency for homozygous markers was set at 90%. A single allele at a marker must have coverage of at least this percentage of total reads at the marker to be called.
We displayed the alleles of each MH and compiled the DoCs (i.e., depth of sequencing) and ACRs in an Excel output format. The ACR was defined as the lower coverage of the allele at a heterozygous locus divided by the higher coverage in a single gDNA sample. It is commonly used to assess the balance between the two alleles of heterozygotes detected by high-throughput sequencing of genetic markers.

Statistical Analysis
Based on the above pipeline, we obtained the allelic genotype, AF, and forensic statistical parameters of 50 MHs among 137 Southwest Chinese Han individuals, including homozygosity (Hom), heterozygosity (Het), match probability (MP), discrimination power (DP), probability of exclusion (PE), polymorphism information content (PIC), and the typical paternity index (TPI) by using the Modified-Powerstates v. 1.2 (Promega, Madison, WI, USA) [36]. Then we used the following formula to calculate combined match probability (CMP), combined discrimination power (CDP), and combined probability of exclusion (CPE), respectively, including CMP

Mixture Design
Two unrelated individuals were randomly selected to simulate the two-person DNA mixtures. The minor DNA amount was fixed at 0.5 ng, and different major DNA amounts were then added to form mixtures at ratios of 1:1, 1:3, 1:5, 1:10, 1:20, and 1:40. For MPS detection to evaluate the efficiency of the panel, 1 µL of each mixture was used. All mixtures were prepared using TE (Solarbio Science & Technology Co., Ltd. Beijing, China) and sterile 0.2 mL amplification tubes (Axygen Scientific, Union City, CA, USA), and samples were stored at −20 • C until use. The degree of mixing was detected using the AGCU EX22 kit (Applied ScienTech, Suzhou, Jiangsu, China) on an ABI 3500 Genetic Analyzer according to the manufacturer's protocol (https://tools.thermofisher.com/content/sfs/manuals/4401661.pdf, accessed on 14 July 2022, Applied Biosystems, Thermo Fisher Scientific, Waltham, MC, USA). The results were analyzed using the GeneMapper ID-X v1.2 according to the manufacturer's protocol (https://assets.thermofisher.com/TFS-Assets/LSG/manuals/cms_072557.pdf, accessed on 17 July 2022, Applied Biosystems, Thermo Fisher Scientific, Waltham, MC, USA).

Degradation Design
To simulate single-source degraded samples, two randomly extracted DNA samples were diluted to a concentration of 5 ng/µL and treated with DNase I (Thermo Fisher Scientific, Waltham, MC, USA), respectively [38]. Subsequently, 45 µL of intact DNA (5 ng/µL) was mixed with 3.75 µL of 10× MgCl 2 buffer (Thermo Fisher Scientific, Waltham, MC, USA). To the mixture, 0.6 µL of 0.3 U/µL DNase I was added, followed by incubation at 37 • C, after which 10 µL of degraded DNA from the incubated mixture was removed at predetermined time intervals (2.5, 5, 10, and 15 min, respectively), and placed in separate sterile 0.2 mL amplification tubes (Axygen Scientific, Union City, CA, USA), respectively. EDTA (1.6 µL, 30 mM) was immediately added to each tube and incubated at 65 • C for 10 min to stop DNA degradation. The degree of degradation was then evaluated using the AGCU EX22 Kit on an ABI 3500 Genetic Analyzer and the High Sensitivity DNA Kit on an Agilent 2100 Bioanalyzer according to the manufacturer's protocol (https://www. agilent.com/cs/library/usermanuals/public/2100_Bioanalyzer_Expert_USR.pdf, accessed on 30 July 2022, Agilent Technologies, Santa Clara, CA, USA). For the MPS, 1 µL of each sample treated with DNase I was used.
To simulate mixed degradations, one of the above single-source degradations was set as the minor DNA and fixed at 0.5 ng, and the other was set as the major DNA. The major DNA, degraded at different times, was added to corresponding minor DNA to form mixtures at a ratio of 1:10. The subsequent evaluation and detection processes of degraded degrees were the same as the above single-source degradations. For the MPS, 1 µL of each 1:10 mixed degradation was used.

Species Specificity
We tested common animal DNA to assess the specificity of our panel because nonhuman DNA may be present in forensic biological evidence. Thus, animal DNA samples from cats, bovines, chickens, ducks, fish, pigs, rabbits, and sheep were sequenced using multi-PCR targeted capture sequencing in the same manner as human DNA, with an input DNA amount of 3.753-6.506 ng.

MH Selection and Primer Design
A total of 178 candidate MHs were screened from 1000 G (Phase 3), and the MPS-based protocol allowed primer design and multiplex detection of 128 of these MHs in a single assay. Six rounds of optimization were performed on the initially constructed panel using six samples (the company's internal standard DNA H01, 2800 M, and four experimental samples). Some MHs were excluded, such as those with many nonspecific amplification products, large amplification and sequencing deviations between different samples, and low sequencing coverage. Fifty MHs were reserved to ensure the best system performance of the panel (Table 1, Figure 2) and distributed on 21 autosomes (no target MH on chr22 after six rounds of optimization). We observed 1-5 MHs on each autosome (average 2.38), with each MH comprising 3-15 SNPs (total 251, average 4.83), marker lengths of 11-81 bp (average 65.58 bp), and an amplicon of 123-198 bp (average 156.02 bp). Specific information on the 50 MHs and primers is provided in Supplementary Table S1.

Sensitivity and Accuracy Analysis
For three replicates with different inputs of 2800 M (10, 5, 1, 0.5, 0.25, and 0.125 ng), we detected complete profiles for all 50 MHs at 0.25 ng. Only one MH (MH-37) dropout was observed in the third replicate at 0.125 ng, as the reads were 20×, which is below the analytical threshold of 25× (Supplementary Figure S1A). The overall DoCs were 801.24-11,010.84× (average 5623.39×) and decreased gradually with decreasing DNA input (linear correlation coefficient R 2 = 0.8814) (Supplementary Figure S1B). The minor DNA of the

Sensitivity and Accuracy Analysis
For three replicates with different inputs of 2800 M (10, 5, 1, 0.5, 0.25, and 0.125 ng), we detected complete profiles for all 50 MHs at 0.25 ng. Only one MH (MH-37) dropout was observed in the third replicate at 0.125 ng, as the reads were 20×, which is below the analytical threshold of 25× (Supplementary Figure S1A). The overall DoCs were 801.24-11,010.84× (average 5623.39×) and decreased gradually with decreasing DNA input (linear correlation coefficient R 2 = 0.8814) (Supplementary Figure S1B). The minor DNA of the non-degraded and degraded mixtures in the next simulation study was fixed at 0.5 ng.
The MH, sample numbers, and Sanger primers are shown in Supplementary Table S2. We did not observe inconsistent haplotypes among Sanger sequencing, IGV, or our pipeline in the analyzed MH loci or unrelated individuals. Figure 3 shows the corresponding genotypes of the three analysis methods for a random MH in a random sample. The results showed 100% concordance. Supplementary Figure S2

Panel Performance
Fifty MHs of all 137 unrelated Southwest Chinese Han individuals in this study were consistently captured and sequenced to obtain complete MH alleles. These samples were genotyped at 1.825-25.992 ng of input DNA using the DoCs and ACRs of all 50 MHs to assess the panel sequencing performance. The average DoC was 7928.39 ± 4990.952× (Figure 4). The average ACR was 0.90 ± 0.045, and 96% of the MHs (48/50) exhibited a proportion of allele balance ≥ 80% (Figure 4), indicating the panel had a good balance in detecting heterozygotes (i.e., good heterozygosity balance). No correlation was found between the DoCs and ACRs (linear correlation coefficient R 2 = 0.0771).

Panel Performance
Fifty MHs of all 137 unrelated Southwest Chinese Han individuals in this study were consistently captured and sequenced to obtain complete MH alleles. These samples were genotyped at 1.825-25.992 ng of input DNA using the DoCs and ACRs of all 50 MHs to assess the panel sequencing performance. The average DoC was 7928.39 ± 4990.952× (Figure 4). The average ACR was 0.90 ± 0.045, and 96% of the MHs (48/50) exhibited a proportion of allele balance ≥ 80% (Figure 4), indicating the panel had a good balance in detecting heterozygotes (i.e., good heterozygosity balance). No correlation was found between the DoCs and ACRs (linear correlation coefficient R 2 = 0.0771).
consistently captured and sequenced to obtain complete MH alleles. These samples were genotyped at 1.825-25.992 ng of input DNA using the DoCs and ACRs of all 50 MHs to assess the panel sequencing performance. The average DoC was 7928.39 ± 4990.952× (Figure 4). The average ACR was 0.90 ± 0.045, and 96% of the MHs (48/50) exhibited a proportion of allele balance ≥ 80% (Figure 4), indicating the panel had a good balance in detecting heterozygotes (i.e., good heterozygosity balance). No correlation was found between the DoCs and ACRs (linear correlation coefficient R 2 = 0.0771).

Polymorphism Information
All the 50 MHs in our panel were successfully sequenced. Haplotype (i.e., allele) frequencies calculated from sequencing data from all 137 unrelated individuals are shown in Figure 5 and Supplementary Table S3 Table S4). The highest Het (0.801-0.867) also had the highest A e (5.026-7.547). In general, Het and A e were larger when there were more alleles of an MH, and the frequency of each allele tended to be the same.

Mixture Analysis
The 50-MH panel was developed as a stand-alone forensic panel but could also be used as a complement to STR markers. To explore the detection threshold of the mixture ratio, the simulated two-person mixtures were genotyped after a series of dilutions (1:1, 1:3, 1:5, 1:10, 1:20, and 1:40). Based on the sensitivity results, the minor DNA was fixed at 0.5 ng, and the major DNA was added at the mixing ratio. The AGCU EX22 Kit (Applied ScienTech, Jiangsu, China) can only detect the complete genotype of the major and minor DNA at a 1:1 ratio. Thus, minor DNA was incompletely genotyped at the mixing ratios of 1:3, 1:5, and 1:10, and the identity-informative alleles of STR were partially dropped. Minor DNA was undetectable at 1:20 and 1:40, and the identity-informative alleles of STR completely dropped out (Supplementary Figure S3). The overall DoCs of the MPS-based 50plex MH panel were 24,597.48-41,927.99× (average 31,121.65×) and was able to detect the complete genotype of major and minor DNA at a ratio as low as 1:40, with a maximum number of individual alleles of 132 (Table 2). For a two-person mixture with 1 µL of input DNA, complete MH profiles of the minor DNA were observed at a ratio as low as 1:40, and 100% (61/61) of unique alleles for the minor DNA were reported.
Genes 2023, 14, x FOR PEER REVIEW 10 of 19 All the 50 MHs in our panel were successfully sequenced. Haplotype (i.e., allele) frequencies calculated from sequencing data from all 137 unrelated individuals are shown in Figure 5 and Supplementary Table S3   MHs, 10 Ae were <3.0, 8 Ae were ≥3.0, 24 Ae were ≥4.0, 3 Ae were ≥5.0, 3 Ae we 2 Ae were ≥7.0. We observed that both the Ae and Het increased with increa with R 2 of 0.6294 and 0.3166, respectively ( Figure 6). Meanwhile, Ae increas creasing Het (R 2 = 0.9222, Supplementary Table S4). The highest Het (0.801-0.8 the highest Ae (5.026-7.547). In general, Het and Ae were larger when there alleles of an MH, and the frequency of each allele tended to be the same.  Table S5 and S6).

Mixture Analysis
The 50-MH panel was developed as a stand-alone forensic panel but co used as a complement to STR markers. To explore the detection threshold of ratio, the simulated two-person mixtures were genotyped after a series of di 1:3, 1:5, 1:10, 1:20, and 1:40). Based on the sensitivity results, the minor DNA w 0.5 ng, and the major DNA was added at the mixing ratio. The AGCU EX22 K ScienTech, Jiangsu, China) can only detect the complete genotype of the major DNA at a 1:1 ratio. Thus, minor DNA was incompletely genotyped at the mixi 1:3, 1:5, and 1:10, and the identity-informative alleles of STR were partially dr nor DNA was undetectable at 1:20 and 1:40, and the identity-informative all completely dropped out (Supplementary Figure S3). The overall DoCs of the 50plex MH panel were 24,597.48-41,927.99× (average 31,121.65×) and was ab the complete genotype of major and minor DNA at a ratio as low as 1:40, with a number of individual alleles of 132 (Table 2). For a two-person mixture with 1 DNA, complete MH profiles of the minor DNA were observed at a ratio as l and 100% (61/61) of unique alleles for the minor DNA were reported.

Analysis of Degraded Samples
The lengths of the DNA fragments ranged from 120 to 320 bp after different DNase I treatment times (2.5, 5, 10, and 15 min). The degree of degradation of single and mixed samples detected using the Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) is shown in Supplementary Figure S4. The degradation results were consistent with the fragment distribution of STR genotypes (Supplementary Figure S3). Long-STR genotyping failed when random single DNA was treated with DNase I at 37 • C for 2.5, 5, 10, and 15 min (Supplementary Figure S5A,B). In contrast, the MPS-based 50-MH panel successfully obtained complete alleles in all single degraded DNAs (Supplementary Table S7 Table 2, Supplementary Table S7). These results suggested that 50plex MHs were more efficient than CE-STRs in sequencing and genotyping degraded single and mixed DNAs.

Species-Specific Analysis
Complete genotypes of 50-plex MHs were not achieved for all eight animal DNA samples with 1 µL of DNA input. For animal DNA, the overall DoCs ranged from 103.00 to 548.00× (average 322.00×) and 2-8 MHs were detected for each DNA (Supplementary  Table S8). For MHs, the overall DoCs ranged from 33.00 to 337.00× (average 103.04×), of which only 25 MHs containing 1-4 alleles were genotyped. The current data showed that our panel incompletely genotyped different animal samples with very low signals, so the species specificity of the 50-plex assay is sufficient for routine casework situations.

Discussion
In this study, we developed a thermodynamic stability-based multiple PCR (i.e., highly specific multiplex primers) capture-sequencing protocol targeting 50 MHs based on the Illumina HiSeq platform. The forensic power of the 50-plex-MH panel in 137 unrelated individuals was evaluated based on DoCs and ACRs. The sensitivity, accuracy, polymorphism, forensic parameters, degraded samples, mixtures, and animal samples of the panel performed adequately, thereby indicating that our panel was a powerful forensic tool and could provide a good supplement and enhancement to existing detection methods. Based on our previous studies of 15 SNP-SNP MHs [16,17], we comprehensively optimized the MH screening, sequencing, and analysis protocols in this study.
Microhaplotypes combine the advantages of STRs and SNPs, with no stutter peak or amplification bias, short markers and amplicons, low mutation and recombination rates, and high polymorphism. They are recognized as powerful markers for various forensic purposes [7,39]. Compared with phased Sanger sequencing and CE platforms, single sequence reads of MPS can cover a wide range of analyzed MHs and are highly informative following MH detection. Therefore, they can be used to analyze true haplotypes. Moreover, MPS is a powerful platform for simultaneously analyzing several target areas and different sample types, thereby addressing relevant forensic questions in a single assay [22].
At present, most MHs of reported panels are selected from published articles [8]. Therefore, the current screening method is not systematic, and its genome coverage is not extensive. The number of MHs in some panels is small, and the detection platform still uses first-generation sequencing [16,17,32,40]. Moreover, the analysis methods of some MPS panels, such as Flfinder [11] and MHtyper [41], are more suitable for their own research analyses. These panels are limited by the number of loci, so the performances of polymorphism, forensic parameters, and mixture detection are limited. To compensate for these deficiencies, we aimed to develop a method that quickly and effectively screens short and high-A e MHs sets (including SNPs only) in a target population using our developed MH screening software combined with PHASE software based on the 1000 G [27]. Highthroughput sequencing of multiple markers and different sample types was performed using the MPS platform. Finally, automatic sequencing data analysis was performed using our developed pipeline.
We initially selected 178 candidate MHs and retained 50 MHs after six optimizations to ensure the best system efficiency of the panel. Only one of the 50 MHs (MH-32) was included in the Kidd- [30], respectively. Studies have shown that Het > 0.4 and A e > 3.0 loci can be effectively used to analyze individual identification, kinship testing, degradation, mixtures, and ancestral inferences [18]. Therefore, our panel has significant research value for forensic applications. Among the 50 MHs, one MH (MH-24) had three pairs of primers after optimization and testing. The amplicons were the same, and therefore did not affect data analysis.
We added sensitivity gradients of 10, 5, 1, 0.5, 0.25, and 0.125 ng, with three replications showing sensitivities as low as 0.25 ng (Supplementary Figure S1). This provided a theoretical basis for the scientific setting of minor DNA amounts for subsequent studies on non-degraded and degraded mixtures. A multiple PCR-targeted capture and sequencing protocol based on MPS was used to obtain the complete genotypes of 50 MHs from 137 unrelated Southwest Chinese Han individuals. Combined with the sensitivity results, the DNA input for sequencing was 0.25-26 ng; the greater the DNA input, the higher the sequencing depth. The average DoC was 7928.39 ± 4990.95×, the average ACR was 0.90 ± 0.05, and 96% of MHs (48/50) showed an allele balance ratio ≥ 80% (Figure 4), indicating that the sequencing efficiency of our panel was high. Each MH had an average of seven alleles, and 85.7% (300/350) of alleles had a frequency ≥0.01, with the highest being 0.803, indicating good polymorphism in our panel ( Figure 5, Supplementary Table S3). The sensitivity of 250 pg is in the range reported for other MPS-based systems used for forensic STR analysis [48] and will be sufficient for many routine applications. For samples with low DNA amounts, such as minute traces, touch DNA, or degraded samples, further improvement of our system will be required.
For sequencing data analysis, we tried the Flfinder we had developed earlier [11], but because of the proximity of SNPs in some MHs, it could not meet the input file format requirements of Flfinder. Therefore, on the basis of Flfinder, we created a set of scripts using the Python and R languages for MHs calling. We compared read thresholds of 15×, 20×, 25×, and 30×, and found that at 25× the alignment accuracy of calling obtained by our pipeline and IGV was the highest, which was also consistent with Sanger sequencing (Figure 3, Supplementary Table S2, Supplementary Figure S2).
Heterozygosity (Het) is the most important parameter for familial identification, as a higher Het at the locus increases the chance that the associated allele will be uncommon in a given population, but is more likely to be found in relatives than in unrelated individuals [3]. In our study, we observed that A e increased with increasing Het, and the highest A e corresponded to the highest Het. This is related to the number and frequency of the alleles in the population. Therefore, the selection of the most informative marker for familial identification depends on the A e value. The A e value is also an important index for evaluating the ability of a mixture analysis [49]. For our 50-plex MH panel, Het values of more than 98% (49/50) of MHs were >0.40, A e values of more than 80% (40/50) MHs > 3.0, and CDP and CPE were 1-3.109 × 10 −49 and 1-8.727 × 10 −16 , respectively (Supplementary Table S4). The results showed that our panel has surpassed the capacity of commonly used 23 STRs [50,51] or 52 SNPs [52] and several other reported MH panels [30,53], indicating that our panel could be potentially effective for future applications in individual identification, kinship testing, mixture interpretation, and non-invasive prenatal paternity testing (NIPPT) [3,42].
For undegraded mixtures, single-degraded samples, and degraded mixtures, complete STR genotypes could not be detected using the AGCU EX22 Kit (Applied ScienTech, Jiangsu, China) (except for the 1:1 undegraded mixture) ( Supplementary Figures S3 and S5). However, our MPS-based panel was able to observe all complete MH genotypes ( Table 2  and Table S7). For the degraded mixtures, a ratio of 1:10 was selected for analysis because it was the lowest limit at which STR could detect the mixture, and matched the actual proportion of cell-free fetal DNA (cffDNA) in maternal plasma (range 5-20%, average 9-10%) [17,54]. We set 1, 3, and 5 µL of DNA input to explore the effects of sequencing genotypes corresponding to different sequencing inputs. The degraded fragment at 15 min was too short to be combined with STR genotypes, so only degradations at 2.5, 5, and 10 min were simulated. The Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) performed well in detecting degraded samples, basically conforming to the fragment distribution of STR genotypes (Supplementary Figure S4). The detection rate of minor DNA unique (effective) alleles was 93-100% in the nine simulated degraded mixtures (Table 2). When the DNA input was 1, 3, and 5 µL, the results showed that 3 and 5 µL performed better, which provided a solid theoretical basis for the DNA input in further degraded mixtures research. In addition, maternal plasma DNA containing cffNDA is a special degraded mixture essentially, in which cffNDA accounts for about 10% on average, and the median fragment length is about 143 bp owing to its apoptotic nature [55]. Therefore, we suggest that the DNA input for MPS be 3 or 5 µL to improve the detection rate for the future degraded mixtures and NIPPT study.

Conclusions
In this study, we constructed an MPS-based 50-plex MH panel for forensic DNA analysis combined with multiple PCR-targeted capture sequencing technology and a homemade calling pipeline. We comprehensively explored the potential of the panel for forensic applications, including sensitivity, accuracy, polymorphism, forensic parameters, undegraded mixtures, single-degraded samples, degraded mixtures, and species specificity. We also improved the primer optimization of our panel, explored the influence of different DNA inputs on the efficiency of MH detection in mixtures, and developed a universally applicable MHs forensic analysis software package. Furthermore, our panel characterized a new set of 49 MHs, which may contribute to an international community consensus on a possible MH core panel.
In a nutshell, the current findings demonstrated that our MPS-based 50plex MH panel is a unique and powerful DNA tool. It is also an alternative method that can complement and improve the interpretation ability of mixtures and the efficiency of kinship testing with traditional STRs. Our future studies will focus on more family sample pairs to evaluate the value of the panel in NIPPT.   Figure S3: STR profiles of two-person mixtures. 1 µL of each mixture was used to obtain 1:1, 1:3, 1:5, 1:10, 1:20, and 1:40 genotypes using the AGCU EX22 Kit (from left to right). Figure S4: Degree of degradation of single and mixed DNA detected using the Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). Random individual DNA C51, C131 and its two 1:10 mixtures were treated with DNase I at 37 • C for 2.5, 5, 10, and 15 min, respectively. Then, 1 µL of each was collected to obtain the corresponding electropherogram using a High Sensitivity DNA Kit (Agilent Technologies, Santa Clara, CA, United States) (from left to right). (A) For C51, the degraded fragments were dispersed at 2.5 min, concentrated at 200 bp at 5 min, and then concentrated at about 150 bp. (B) For C131, the degraded fragments were dispersed at 2.5 min, concentrated at 150 bp at 5 min, and then concentrated in shorter fragments. (C) For 1:10-C51 + C131, the degraded fragments were dispersed at 2.5 min, concentrated at 150 bp at 5 min, and then concentrated in shorter fragments. The last picture is a summary of electropherograms. Both degraded single and mixed samples were treated with DNase I to achieve ideal simulated degradation results. Figure S5: STR profiles of degraded single and mixed DNAs. Random individual DNAs C51, C131, and its two 1:10 mixtures were treated with DNase I at 37 • C for 2.5, 5, 10 and 15 min, respectively. 1 µL of each was taken to collected to obtain the corresponding electropherogram using an AGCU EX22 kit (Applied ScienTech, Jiangsu, China) (from left to right). (A) For C51, the peak height decreased significantly at 2.5 min, dropped from 320 bp at 5 min, and then decreased gradually. (B) For C131, the peak height dropped from 280 bp at 2.5 min, from 120 bp at 5 min, and then gradually decreased. (C) For 1:10-C51 + C131, the peak height dropped from 280 bp at 2.5 min, from 150 bp at 5-10 min, and from 120 bp at 15 min. Regardless of whether degraded single or mixed samples were used, the STR kit could not obtain complete profiles at different degradation times.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Medical Ethics Committee of Sichuan University (No. KS2022770).
Informed Consent Statement: All participants gave their informed consent and their right to privacy was protected.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest:
The authors declare no conflict of interest.