Genetic Reconstruction and Forensic Analysis of Chinese Shandong and Yunnan Han Populations by Co-Analyzing Y Chromosomal STRs and SNPs

Y chromosomal short tandem repeats (Y-STRs) have been widely harnessed for forensic applications, such as pedigree source searching from public security databases and male identification from male–female mixed samples. For various populations, databases composed of Y-STR haplotypes have been built to provide investigating leads for solving difficult or cold cases. Recently, the supplementary application of Y chromosomal haplogroup-determining single-nucleotide polymorphisms (SNPs) for forensic purposes was under heated debate. This study provides Y-STR haplotypes for 27 markers typed by the Yfiler™ Plus kit and Y-SNP haplogroups defined by 24 loci within the Y-SNP Pedigree Tagging System for Shandong Han (n = 305) and Yunnan Han (n = 565) populations. The genetic backgrounds of these two populations were explicitly characterized by the analysis of molecular variance (AMOVA) and multi-dimensional scaling (MDS) plots based on 27 Y-STRs. Then, population comparisons were conducted by observing Y-SNP allelic frequencies and Y-SNP haplogroups distribution, estimating forensic parameters, and depicting distribution spectrums of Y-STR alleles in sub-haplogroups. The Y-STR variants, including null alleles, intermedia alleles, and copy number variations (CNVs), were co-listed, and a strong correlation between Y-STR allele variants (“DYS518~.2” alleles) and the Y-SNP haplogroup QR-M45 was observed. A network was reconstructed to illustrate the evolutionary pathway and to figure out the ancestral mutation event. Also, a phylogenetic tree on the individual level was constructed to observe the relevance of the Y-STR haplotypes to the Y-SNP haplogroups. This study provides the evidence that basic genetic backgrounds, which were revealed by both Y-STR and Y-SNP loci, would be useful for uncovering detailed population differences and, more importantly, demonstrates the contributing role of Y-SNPs in population differentiation and male pedigree discrimination.

Direct amplification (aka DNA extraction-free method) was performed using the GeneAmp ® PCR System 9700 (Thermo Fisher Scientific, Foster City, CA, USA) in accordance with the
The Y-SNP data in concert with the Y-STR data obtained were validated and submitted to the release R62 of the Y-chromosomal haplotype reference database (YHRD, https://yhrd.org). The assigned accession numbers of Shandong and Yunnan Han populations were YA004617 and YA004618, respectively.

Forensic Parameters and Statistical Analysis
The allele of the locus DYS389II was defined by subtracting that of DYS389I. Forensic parameters regarding haplotype information were calculated for the 27 Y-STR loci included in the AmpFLSTR™ Yfiler™ Plus Kit [22]. Haplotype diversity (HD) was estimated by the formula in which, n denotes the sample size, and p i represents the frequency of the ith haplotype [23]. Match probability (MP) and discrimination capacity (DC) were calculated as follows P = p i 2 , and DC = N d /N t according to Shannon's instruction [24]. In other words, MP was the sum of the squared of unique haplotypes' frequencies, while DC denoted the ratio of the number of unique haplotypes (Nd) to the total number of haplotypes (N t ). The allele frequency of each Y-STR locus was generated by direct counting. The estimation method of gene diversity (GD) was analogous to that used for HD, where p i represents the frequency of the ith allele. To calculate these parameters, the "Basic.stats()" function included in the "heirfstat" package developed by Thierry et al. [25] was utilized. GD values were generated for all samples, as well as for several major Y-SNP haplogroups of the two Chinese Han populations. Chi-square test was performed using the "chisq.test()" function in R language (version 3.5.3, https://www.r-project.org/).

Y-Chromosomal Haplogroup-Based Network Analysis
A median-joining (MJ) network based on 27 Y-STR haplotypes was constructed using Network 5.0.0.3 software (http://www.fluxus-engineering.com/sharenet.htm) [26], in order to uncover the phylogenetic relationships among the samples carrying special rare variants. The weight was set for all Y-STR loci according to their mutation rates [4].

Phylogenetic Reconstruction on the Individual Level
Haplotypes with intermediate alleles, copy number variations, and null alleles were removed from the individual-level phylogenetic reconstruction. Pairwise genetic distance (d-value) of 27 Y-STR loci was calculated using = i n (a i − b i ) 2 /2m i /n, according to Nei's molecular evolutionary theory [43].
In the formula, n denoted the number of Y-STR loci, and i was the ith locus, while m i , a i , and b i represented the mutation rate of the ith Y-STR loci and the genotyping information of two different individuals. As for the 24 Y-SNPs, all samples were covered. The calculation of genetic distance (D-value) using Y-SNP loci was adopted according to Nei where i and j denote the ith and jth individuals, L is the number of Y-SNP loci, and d kij equals 0 or 1 depending on whether their SNP alleles are identical or not [44]. Phylogeny was reconstructed and illustrated by a "complete" method with Hierarchical Clustering ("hclust") function [45] of R language. The MDS plots of 27 Y-STRs and 24 Y-SNPs on individual level were illustrated as described above.

MDS and AMOVA
With the aim to verify the sampling representativeness of Shandong and Yunnan Han populations and to reveal the genetic backgrounds of the two populations located in southern and northern parts of China, population data of other 13 representative Chinese populations ( Figure 1) were selected, and the population structure was reconstructed. AMOVA analysis (Table S1) based on all 27 Y-STR markers was conducted for the 15 populations and visualized in an MDS plot ( Figure 2).
After Bonferroni correction, the significant difference was set to 0.05/105 ≈ 0.0005. Insignificant differences, referred to p-values above 0.0005 which are not indicated in bold in Table S1, were only observed for three southern Chinese Han populations from Guangdong, Shenzhen, and Yunnan, which indicated their close consanguinity to each other. The Altaic-speaking populations were all significantly distant from Sino-Tibetan-speaking groups. Additionally, explicit differences could be found among some Han populations with ancestry from northern, central, and southern Chinese Han divisions.
In Figure 2, two MDS plots ware illustrated to explain the genetic landscape of various Chinese ethnic groups (initial stress = 0.0573), as well as the genetic make-up of Han groups (initial stress = 0.0824). Both plots reached a good quality of configuration. The results for the Altaic-and Sino-Tibetan-speaking groups complied with the AMOVA results, except for Yunnan Yi population, which may be caused by its small sample size. The distribution pattern on the abscissa axis explained the divergence between Chinese Han and Tibetan groups. As is known to all, genetics is strongly correlated with linguistics, as language carries cultural information. Inter-disciplinary efforts by archaeology, genetics, and linguistics help provide insights into historical human evolution [46,47]. The latest linguistic finding by Zhang et al. showed that the Chinese Han and Tibetan subgroups originated from a Sino-Tibetan language family which diverged about 4200-7800 years BP (before present), with an average value of 5900 years BP [48]. The phylogenetic evidence could be traced back to the late Neolithic.
were divided into four clusters which matched their geographic locations approximately: Shandong and Jining (top), Shanghai (bottom left), Hainan (right), and Changzhou, Jiangsu, Guangdong, Shenzhen, and Yunnan (middle). Though geographically close, the Han population from Shanghai was not genetically close to those from Changzhou and Jiangsu, which may be related to the persistent Chinese migration to the metropolitan Shanghai [49]. In addition, the Changzhou Han population was genetically close to three Chinese Southern Han populations (Guangzhou Han, Shenzhen Han, and Yunnan Han) rather than to other central or northern Han populations, which indicated that the major component of male Changzhou Han population was from southern China. Primarily, the reconstructed genetic structure demonstrated the different genetic backgrounds of the two Han populations analyzed, which were genetically similar to other geographically close populations independently. The samples enrolled had a high degree of population representativeness.
In order to demonstrate substructure differences, we analyzed the Y chromosomal subhaplogroup characteristics between in two Han populations. Y-SNP allelic frequencies of the two populations were compared ( Figure 3A) to figure out the primary differences. Four Y-SNP loci with no derived allele (E-M96, I-M170, G-M201, and D1a2a-P47) are not shown. Significant differences in 5 of the 24 Y-SNP allelic frequencies (p = 0.05/20 = 0.0025) were discovered. In detail, the Yunnan Han population showed much higher frequencies in O1a-M119 and O1b-M268, but lower frequencies in Specific to the Chinese Han populations cluster, another MDS plot focused on the construction of the inner structure of Han population to dissect subtle population relationships. The pattern matched the substructure of Han Chinese described in a previous study [9]. The eight populations were divided into four clusters which matched their geographic locations approximately: Shandong and Jining (top), Shanghai (bottom left), Hainan (right), and Changzhou, Jiangsu, Guangdong, Shenzhen, and Yunnan (middle). Though geographically close, the Han population from Shanghai was not genetically close to those from Changzhou and Jiangsu, which may be related to the persistent Chinese migration to the metropolitan Shanghai [49]. In addition, the Changzhou Han population was genetically close to three Chinese Southern Han populations (Guangzhou Han, Shenzhen Han, and Yunnan Han) rather than to other central or northern Han populations, which indicated that the major component of male Changzhou Han population was from southern China. Primarily, the reconstructed genetic structure demonstrated the different genetic backgrounds of the two Han populations analyzed, which were genetically similar to other geographically close populations independently. The samples enrolled had a high degree of population representativeness.
The haplogroup C is typical of the residents of Eurasian temperate steppe and can be found in northern Han populations as well. The high proportion of the C2 haplogroup in northern Han (Shandong Han) might be related to nomad incursions into the Central Plain in history [52]. The differences in haplogroups O1 and O2 of Shandong and Yunnan Han populations might be the result of the initial founder effect of the early large-scale north migration and of geographical isolation [53]. The O2 haplogroup was probably dominant among northward migrants after population expansion in southern China, causing a higher proportion of the O2 haplogroup in Shandong Han.
Four haplogroups-O1a, O1b, O2a1, and O2a2b, were characterized by dominant but different distributions in the two populations. The O1a haplogroup is mainly distributed in southern China, Malaysia, Vietnam, and Indonesia males. The O1b haplogroup is unique to modern Eastern Eurasian In order to demonstrate substructure differences, we analyzed the Y chromosomal sub-haplogroup characteristics between in two Han populations. Y-SNP allelic frequencies of the two populations were compared ( Figure 3A) to figure out the primary differences. Four Y-SNP loci with no derived allele (E-M96, I-M170, G-M201, and D1a2a-P47) are not shown. Significant differences in 5 of the 24 Y-SNP allelic frequencies (p = 0.05/20 = 0.0025) were discovered. In detail, the Yunnan Han population showed much higher frequencies in O1a-M119 and O1b-M268, but lower frequencies in O2-M112, O2a2-P201, and O2a2b-P164, which was also reflected by the disparity in haplogroup distribution ( Figure 3B).
The haplogroup C is typical of the residents of Eurasian temperate steppe and can be found in northern Han populations as well. The high proportion of the C2 haplogroup in northern Han (Shandong Han) might be related to nomad incursions into the Central Plain in history [52]. The differences in haplogroups O1 and O2 of Shandong and Yunnan Han populations might be the result of the initial founder effect of the early large-scale north migration and of geographical isolation [53]. The O2 haplogroup was probably dominant among northward migrants after population expansion in southern China, causing a higher proportion of the O2 haplogroup in Shandong Han.
Four haplogroups-O1a, O1b, O2a1, and O2a2b, were characterized by dominant but different distributions in the two populations. The O1a haplogroup is mainly distributed in southern China, Malaysia, Vietnam, and Indonesia males. The O1b haplogroup is unique to modern Eastern Eurasian populations. Previous findings indicated that the proportion of the O1 haplogroup is significantly higher in southern China compared with northern China [54], which was also confirmed in this research. In contrast, the two haplogroups O2a1 and O2a2b, which are also dominant in East Asian populations [55], had relatively higher proportions in the Shandong Han population.  Table S2.

Y-STR Allele Variants
Interestingly, among the 69 samples with micro-variants, 29 carried ".2" mutation at DYS518, which were all found to be descents of QR haplogroup ancestors, accounting for 64.4% of individuals assigned to the QR-M45 haplogroup. Albeit it has been concluded that the QR haplogroup is not a major haplogroup of the East Asian population [65], 92.2% of the reported "DYS518~.2" mutations were found in Chinese samples in the YHRD database. Additionally, Lang et al. found a relationship between the "DYS518~.2" alleles and the haplogroup Q [66]. In order to define the evolutionary history of the "DYS518~.2" allele, the median-joining network was utilized to construct the inner structure of the QR-M45 haplogroup (Figure 4). The ancestral structure is indicated by the red torso of the network, with unmutated event at DYS518, indicating the allele 37 was possibly the ancestral allele. All nodes of the samples are linked so to form two independent clusters. In addition, two nodes for allele 37.2 and other two nodes for the unmutated allele 38 are located closely at the joint of two clusters. The closeness of these samples indicated that "DYS518~.2" alleles likely derived from the mutated allele 37.2, which might be characteristic for the QR haplogroup in Chinese populations. However, the underlying evolutionary pathway leading to the shift of unmutated allele 38 to mutated allele 37.2 remains unclear. In order to explain this observation, more samples from the QR haplogroup should be collected and profiled utilizing massive parallel sequencing technology. A higher resolution definition of Y-DNA paragroups will provide insights for a comprehensive knowledge of Y-STR haplotype evolution in ancient major haplogroups.

Distribution Spectrums of Y-STR Alleles within C2, O1, and O2 Haplogroups in Shandong and Yunnan Han Populations
To uncover the varying patterns of Y-STRs within haplogroups from different populations, three major haplogroups, i.e., C2, O1, and O2 were selected, to which the majority of the Shandong and Yunnan Han samples belonged ( Figures S1-S3).
Significant differences of allelic frequencies were observed at DYS627 within all three haplogroups, which was also the only difference observed in both O2 and C2. In O1, however, significant differences could also be observed in single-copy loci DYS481, DYS389I, DYS389II, and DYS570 and in the multi-copy locus DYS385a/b. Albeit the varying patterns of most Y-STR loci showed close correlation with Y haplogroups instead of populations, significant differences at some Y-STR loci within the identical major Y haplogroups may reflect their different ancestral sources. In general, the regularity of a varying pattern demonstrated that the Y-STR gene pool remained stable, regardless of the different haplotypes in Yunnan and Shandong Han populations or the big differences in geographic and cultural definitions. Further, the different patterns for different major haplogroups revealed the primary superiority of Y-SNP haplogroups in classifying male groups, dissecting population structure, and exploring population migration. In forensic practices, especially for the Chinese Y-STR haplotype database which includes tens of millions of Y-STR haplotypes, Y-SNP could play a critical role for pedigree discrimination, as well as biogeographic inference.

Forensic Parameters
The GDs of all 27 Y-STR markers were calculated both for the three major haplogroups (C2, O1, and O2) of the two populations separately and for the total population (Table S3, Figure 5). For the total population, it could be found that albeit gene diversity of most Y-STR loci was high (>0.5), in some cases, it was low, such as for DYS438 (0.2664), DYS437 (0.1777), DYS391 (0.3769), DYS392 (0.3847), and DYS393 (0.3339). Furthermore, some Y-STRs in different Y haplogroups presented different gene diversities, such as DYS437, which had an extremely low gene diversity in both C2 and O1 haplogroups but very high values in the O2 haplogroup. In addition, although the gene diversity in DYS533 was low in the C2 haplogroup in Shandong, it showed a very high value in the same

Distribution Spectrums of Y-STR Alleles within C2, O1, and O2 Haplogroups in Shandong and Yunnan Han Populations
To uncover the varying patterns of Y-STRs within haplogroups from different populations, three major haplogroups, i.e., C2, O1, and O2 were selected, to which the majority of the Shandong and Yunnan Han samples belonged ( Figures S1-S3).
Significant differences of allelic frequencies were observed at DYS627 within all three haplogroups, which was also the only difference observed in both O2 and C2. In O1, however, significant differences could also be observed in single-copy loci DYS481, DYS389I, DYS389II, and DYS570 and in the multi-copy locus DYS385a/b. Albeit the varying patterns of most Y-STR loci showed close correlation with Y haplogroups instead of populations, significant differences at some Y-STR loci within the identical major Y haplogroups may reflect their different ancestral sources. In general, the regularity of a varying pattern demonstrated that the Y-STR gene pool remained stable, regardless of the different haplotypes in Yunnan and Shandong Han populations or the big differences in geographic and cultural definitions. Further, the different patterns for different major haplogroups revealed the primary superiority of Y-SNP haplogroups in classifying male groups, dissecting population structure, and exploring population migration. In forensic practices, especially for the Chinese Y-STR haplotype database which includes tens of millions of Y-STR haplotypes, Y-SNP could play a critical role for pedigree discrimination, as well as biogeographic inference.

Forensic Parameters
The GDs of all 27 Y-STR markers were calculated both for the three major haplogroups (C2, O1, and O2) of the two populations separately and for the total population (Table S3, Figure 5). For the total population, it could be found that albeit gene diversity of most Y-STR loci was high (>0.5), in some cases, it was low, such as for DYS438 (0.2664), DYS437 (0.1777), DYS391 (0.3769), DYS392 (0.3847), and DYS393 (0.3339). Furthermore, some Y-STRs in different Y haplogroups presented different gene diversities, such as DYS437, which had an extremely low gene diversity in both C2 and O1 haplogroups but very high values in the O2 haplogroup. In addition, although the gene diversity in DYS533 was low in the C2 haplogroup in Shandong, it showed a very high value in the same haplogroup in Yunnan. The same was observed for DYS456 in the O1 haplogroup. This indicated the presence of sub-structures within different haplogroups from different regions.  Moreover, all Y-STR markers were used to analyze the classic forensic parameters in the two populations. Among all 870 samples, 864 haplotypes were unique (Table S4). There were four haplotypes with two repetitions, and one with three repetitions. According to the different panels within the Yfiler and Yfiler Plus amplification systems (17 and 27 Y-STR loci), standard forensic parameters (HD, DC, and MP) in 305 Shandong Han samples and 565 Yunnan Han samples were separately calculated. Also, these three parameters were estimated among the K, O2, O2a2, O2a2b, and O2a2b1a1 haplogroups from the two Han populations, as only repetitive Y-STR haplotypes could be observed within both the major and the in-depth clades of the K haplogroup (Table 1, Figure 6). Repetitive Y-STR haplotypes mean that the Y-STR haplotypes of two different males are identical. If all Y-STR haplotypes (number = n) in one population are different from each other, the forensic parameters HD, MP, and DC would be equal to 1, 1⁄n, and 1, respectively, and it would not be worth comparing them.
The HD and DC values of Yunnan Han population were comparatively greater than those of Shandong Han population, though they were high for both populations. The MP value of Yunnan Han was smaller. The varying patterns of the forensic parameters indicated that the 10 new Y-STR loci incorporated within the Yfiler TM Plus kit helped to significantly increase the haplotype diversity and discrimination capacity but decreased the match probability in populations at various scales. For the sub-population composed of samples assigned to higher resolution haplogroups, MP was higher, which conformed to the common knowledge that one Y-STR haplotype would be liable to match those from the same Y-SNP haplogroups. Thus, for samples assigned to the identical high-resolution haplogroup, more Y-STRs are required to identify unrelated males. Moreover, all Y-STR markers were used to analyze the classic forensic parameters in the two populations. Among all 870 samples, 864 haplotypes were unique (Table S4). There were four haplotypes with two repetitions, and one with three repetitions. According to the different panels within the Yfiler and Yfiler Plus amplification systems (17 and 27 Y-STR loci), standard forensic parameters (HD, DC, and MP) in 305 Shandong Han samples and 565 Yunnan Han samples were separately calculated. Also, these three parameters were estimated among the K, O2, O2a2, O2a2b, and O2a2b1a1 haplogroups from the two Han populations, as only repetitive Y-STR haplotypes could be observed within both the major and the in-depth clades of the K haplogroup (Table 1, Figure 6). Repetitive Y-STR haplotypes mean that the Y-STR haplotypes of two different males are identical. If all Y-STR haplotypes (number = n) in one population are different from each other, the forensic parameters HD, MP, and DC would be equal to 1, 1⁄n, and 1, respectively, and it would not be worth comparing them.
The HD and DC values of Yunnan Han population were comparatively greater than those of Shandong Han population, though they were high for both populations. The MP value of Yunnan Han was smaller. The varying patterns of the forensic parameters indicated that the 10 new Y-STR loci incorporated within the Yfiler ™ Plus kit helped to significantly increase the haplotype diversity and discrimination capacity but decreased the match probability in populations at various scales. For the sub-population composed of samples assigned to higher resolution haplogroups, MP was higher, which conformed to the common knowledge that one Y-STR haplotype would be liable to match those from the same Y-SNP haplogroups. Thus, for samples assigned to the identical high-resolution haplogroup, more Y-STRs are required to identify unrelated males.

Phylogenetic Reconstruction on the Individual Level
In Figure 7, it was explicit that the phylogenetic tree could basically cluster the Y-STR haplotypes from the same Y-SNP sub-haplogroups. However, there were also several disparities, since a total of 51 samples (proportion = 6.4%) were observed in various unfitting regions of the phylogenetic tree. Of these, 28 were located inside the major haplogroup, while the rest crossed the major haplogroups. In addition, though the individuals assigned to the identical Y chromosomal haplogroup were clustered, the phylogeny based on 27 Y-STR loci was significantly different from the Y-DNA tree reported [19]. Some samples from the same haplogroup located in several clusters. Clearly, albeit the limited number of Y-SNP loci selected and Y chromosomal haplogroups, Y-STRs combined with Y-SNPs would help increase the discriminability of male pedigrees in the simulated Y chromosome database.
In Figure 7, it was explicit that the phylogenetic tree could basically cluster the Y-STR haplotypes from the same Y-SNP sub-haplogroups. However, there were also several disparities, since a total of 51 samples (proportion = 6.4%) were observed in various unfitting regions of the phylogenetic tree. Of these, 28 were located inside the major haplogroup, while the rest crossed the major haplogroups. In addition, though the individuals assigned to the identical Y chromosomal haplogroup were clustered, the phylogeny based on 27 Y-STR loci was significantly different from the Y-DNA tree reported [19]. Some samples from the same haplogroup located in several clusters. Clearly, albeit the limited number of Y-SNP loci selected and Y chromosomal haplogroups, Y-STRs combined with Y-SNPs would help increase the discriminability of male pedigrees in the simulated Y chromosome database.
Significantly, the structures observed in the MDS plots were different. Figure 8A demonstrates that the 27 Y-STR loci were not able to distinguish Han males in Shandong and Yunnan populations ( Figure 8A), because most individuals clustered together. However, the 24 Y-SNPs showed potential to clearly classify male individuals as various haplogroups ( Figure 8B). In addition, individuals assigned to the C-M130 and D-JST021355 haplogroups were all from Yunnan Han population, while those belonging to the K-M9 haplogroup were all from Shandong Han population, indicating the possible bio-geographic discrimination ability of Y-DNA-haplogroup-determining SNPs.  Significantly, the structures observed in the MDS plots were different. Figure 8A demonstrates that the 27 Y-STR loci were not able to distinguish Han males in Shandong and Yunnan populations ( Figure 8A), because most individuals clustered together. However, the 24 Y-SNPs showed potential to clearly classify male individuals as various haplogroups ( Figure 8B). In addition, individuals assigned to the C-M130 and D-JST021355 haplogroups were all from Yunnan Han population, while those belonging to the K-M9 haplogroup were all from Shandong Han population, indicating the possible bio-geographic discrimination ability of Y-DNA-haplogroup-determining SNPs.

Conclusions
In summary, Shandong and Yunnan Han populations, the representatives of northern and southern Chinese Han, were focused on to investigate their genetic backgrounds via 27 commonly used Y-STRs and 24 East-Asian-haplogroup-determining Y-SNPs. Among the 870 samples, 864 haplotypes were unique. The observed Y-STR allele variants including null alleles, intermediate alleles, and CNVs were summarized. Of these, "DYS518~.2" alleles were all found within QR haplogroup individuals, and a network was constructed to characterize the evolutionary pathway of this kind of variant. Primarily, the forensic parameters (GD, HD, DC, and MP) within different Y chromosomal haplogroups furnished the evidence that the co-application of Y-STR and Y-SNP analysis would provide more informative characteristics of various populations. A phylogenetic reconstruction on the individual level further explained that Y-STRs combined with Y-SNPs would help increase the discriminability of male pedigrees using a Y chromosome database. This study sheds light on basic genetic backgrounds utilizing both Y-STR and Y-SNP loci, showing their usefulness for uncovering detailed population differences. More importantly, this tentative study will likely help to build a Y-SNP databank to promote Chinese male pedigree discriminability.
Supplementary Materials: The following are available online at www.mdpi.com/xxx/s1, Figure S1: Allele frequency distribution in the C2 haplogroup in the population from Yunnan and Shandong, Figure S2: Allele frequency distribution in the O1 haplogroup in the population from Yunnan and Shandong, Figure S3: Allele frequency distribution in the O2 haplogroup in the population from Yunnan and Shandong, Table S1: RST genetic distances and respective p-values calculated based on 27 Y-STR haplotypes for Shandong Han, Yunnan Han, and 13 referenced Chinese populations, Table S2: Detailed variant information for various Y-STR loci within different Y-SNP haplogroups, Table S3: Gene diversities (GDs) of 27 Y-STR loci in 6 sub-populations, as well as in the total population, Table S4

Conclusions
In summary, Shandong and Yunnan Han populations, the representatives of northern and southern Chinese Han, were focused on to investigate their genetic backgrounds via 27 commonly used Y-STRs and 24 East-Asian-haplogroup-determining Y-SNPs. Among the 870 samples, 864 haplotypes were unique. The observed Y-STR allele variants including null alleles, intermediate alleles, and CNVs were summarized. Of these, "DYS518~.2" alleles were all found within QR haplogroup individuals, and a network was constructed to characterize the evolutionary pathway of this kind of variant. Primarily, the forensic parameters (GD, HD, DC, and MP) within different Y chromosomal haplogroups furnished the evidence that the co-application of Y-STR and Y-SNP analysis would provide more informative characteristics of various populations. A phylogenetic reconstruction on the individual level further explained that Y-STRs combined with Y-SNPs would help increase the discriminability of male pedigrees using a Y chromosome database. This study sheds light on basic genetic backgrounds utilizing both Y-STR and Y-SNP loci, showing their usefulness for uncovering detailed population differences. More importantly, this tentative study will likely help to build a Y-SNP databank to promote Chinese male pedigree discriminability.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/11/7/743/s1, Figure S1: Allele frequency distribution in the C2 haplogroup in the population from Yunnan and Shandong, Figure S2: Allele frequency distribution in the O1 haplogroup in the population from Yunnan and Shandong, Figure S3: Allele frequency distribution in the O2 haplogroup in the population from Yunnan and Shandong, Table S1: R ST genetic distances and respective p-values calculated based on 27 Y-STR haplotypes for Shandong Han, Yunnan Han, and 13 referenced Chinese populations, Table S2: Detailed variant information for various Y-STR loci within different Y-SNP haplogroups, Table S3: Gene diversities (GDs) of 27 Y-STR loci in 6 sub-populations, as well as in the total population, Table S4