Forensic Analysis and Genetic Structure Construction of Chinese Chongming Island Han Based on Y Chromosome STRs and SNPs

Y-chromosome short tandem repeat (Y-STR) and Y-chromosome single nucleotide polymorphism (Y-SNP) are genetic markers on the male Y chromosome for individual identification, forensic applications, and paternal genetic history analysis. In this study we successfully genotyped 38 Y-STR loci and 24 Y-SNP loci of Pudong Han (n = 689) and Chongming Han (n = 530) in Shanghai. The haplotype diversity of the Y filer platinum genotyping system was the highest in the Han population in the Pudong area of Shanghai (0.99996) and Chongming Island (0.99997). The proportion of unique haplotypes was 97.10% (Pudong) and 98.49% (Chongming), respectively. The multidimensional scaling analysis and phylogenetic analysis were performed according to the genetic distance Rst, which was calculated based on the Y-STR gene frequency data. Moreover, we made a comparison on the frequency distribution analysis and principal component analysis of haplogroups in both populations. As a result, Shanghai Pudong Han, Chongming Island Han, and Jiangsu Han were determined to have a strong genetic affinity. The haplogroup distribution characteristics of the Pudong Han and Chongming Han populations were similar to those of the southern Han population. The results of haplotype network analysis showed that Jiangsu Wujiang Han and Jiangsu Changshu Han had more paternal genetic contributions to the formation of Shanghai Pudong Han and Chongming Island Han. Through the joint analysis of SNPs and STRs, this study deeply analyzed the paternal genetic structure of the Pudong Han and Chongming Han populations. The addition of Y-SNP haplogroups to forensic applications can provide information for pedigree investigation.


Introduction
The male-specific Y chromosome is an ideal tool for genealogical research, forensic application, and patrilineal immigration research. In addition, 95% of the regions that cannot be exchanged or recombined with the X chromosome are called nonrecombining regions (NRY) [1]. There are many kinds of genetic markers on the Y chromosome, of which the two most studied are Y chromosome short tandem repeat (Y-STR) and Y chromosome single nucleotide polymorphisms (Y-SNP). As a chromosomal marker that can be stably inherited in male families, the Y-STR locus has been used for paternity testing and individual identification for some time. Y chromosome single nucleotide polymorphisms (Y-SNPs) of the nonrecombining portion of the Y chromosome (NRY) play an important role in patrilineal traces of the population. Currently, Y-SNPs are being investigated as genetic markers for pedigree mapping in forensic cases. The mutation rate of Y-STR is 3.78 × 10 −4~7 .44 × 10 −2 mutation/generation, and the mutation rate of Y-SNP is 1 × 10 −9 mutation/generation [2,3]. Y-SNPs have an extremely low mutation rate relative to Y-STRs (approximately 1/30,000,000 of Y-SNPs). Combined analysis of Y-STRs and Y-SNPs can be valuable for the forensic application of population genetic structure construction [4,5].
Chongming Island is an alluvial island at the entrance of the Yangtze River at the eastern end of the Yangtze River Delta. It is the third largest island and the largest alluvial sand island in China. It has been more than 1300 years since Chongming Island appeared to its current scale. According to the seventh census in 2020, the population size of Chongming Island is 637,900, more than 99% of whom are Han Chinese [6].
The Pudong area is located in the east of the Huangpu River in Shanghai, at the entrance of the Yangtze River. With the continuous advancement of Shanghai's urbanization process, a large number of migrants have gathered in Shanghai. According to the 2020 census, there are 5.68 million floating and living populations in Pudong. Investigating Y-STR genetic structure and Y-SNP lineage mapping of the native population in Pudong can be valuable for the case application of paternal biogeographic ancestry inference. Due to the special geographical location of these two places, the two Han populations have unique genetic backgrounds.
In this study, we analyzed 38 Y-STR loci and 24 Y-SNP genetic markers in Chongming Han and Pudong Han populations. This study evaluated the four most common Y-STR genotyping systems and used forensic parameters such as individual identification probability, matching probability, Y-STR haplotyping system discrimination ability, and haplotype diversity value to evaluate their efficacy. Subsequently, based on the Y-STR and Y-SNP genetic markers, population genetic structure analysis was performed, and the results revealed that the Han people in Chongming Island and Pudong have a close genetic relationship with the Han people in Jiangsu. The Jiangsu Wujiang Han population and Jiangsu Changshu Han population have more paternal genetic contributions to the Pudong population and Chongming Han population. Therefore, in our research, a total of 1219 Han Chinese male samples from Chongming Island and the Pudong area were collected to provide raw data for further research.

Materials and Methods
This study was approved by the Ethics Committee of Fudan University (code: BE1806; date: 3 March 2018) and was strictly implemented in accordance with the relevant requirements of the Declaration of Helsinki [7].

DNA Sample Preparation
In this study, peripheral blood samples from 1219 unrelated Han Chinese male individuals in the Pudong and Chongming areas of Shanghai were collected, and blood samples were retained in Flinders Technology Associates (FTA) blood sample collection cards (Whatman International Ltd., Maidstone, UK). All male individuals in this study are local people whose families have lived there for at least three generations, and all household registration information was verified by the administrative department. Based on the principle of informed consent, the research subjects in the Chongming area (n = 530) and Pudong area (n = 689) all signed the informed consent form. The geographic locations of Pudong and Chongming Han populations are marked with stars (shown in Figure 1).

DNA Typing of 38 Y-STR Locus and 24 Y-SNP Markers
In this study, the Yfiler™ Platinum PCR Amplification Kit (Thermo Fisher Scientific, Waltham, MA, USA) [8] was used to analyze 38 Y-STR loci, which are listed in Table S1. The amplification experiment was completed according to the Y filer™ Platinum PCR Amplification Kit workbook. In this study, direct amplification was used, and the DNA blood card sample was cut into 1 mm 2 using a Harris micropunch for PCR amplification. Amplification products were obtained using the GeneAmp ® PCR System 9700 (Thermo Fisher Scientific, Foster City, CA, USA). The reaction system was 10 µL. PCR amplification products were separated by capillary electrophoresis (CE) using an ABI 3500xL Genetic Analyzer (Thermo Fisher Scientific, Foster City, CA, USA). Typing results were analyzed using GeneMapper ID-X v 1.4 software (Thermo Fisher Scientific, Foster City, CA, USA). DNA 007 (Thermo Fisher Scientific, Foster City, CA, USA) in the kit was used as a positive control.

Y-STR Analytic Methods
We calculated the allele frequencies and gene diversity (GD) values [14] of 38 loci in Chongming and Pudong populations using the direct calculation method [14,15]. The formula is shown below: where n represents the number of samples, and p i represents the allele frequency.

Y-SNP Analytic Methods
We directly calculated the frequencies of the primary haplogroups C, D, N, O, and QR and the frequencies of 18 subhaplogroups. Based on research on haplogroups of Chinese populations in other studies, we selected 16 reference populations. Since the haplogroup classifications of each reference population were different, we redefined the haplogroup results of these reference populations using the 24 Y-SNP Markers of the Pedigree Tagging System and calculated the haplogroup frequency.

Y-STR and Y-SNP Joint Analytic Methods
The analysis was performed using the software Network 10.1 [55] (http://www.fluxusengineering.com, accessed on 26 December 2021), and the graphs were drawn using the software Network Publisher. We selected 15 Y-STR loci, including DYS19, DYS389b (equivalent to DYS389II-DYS389I) [56], DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS635, and YGATA-H4. According to the mutation rate of these single-copy sites as the basis for setting the weight of the Network graph, the weight interval was 1-5, and the lower the mutation rate is, the higher the weight [20].

Results
For successful acquisition of Y-STR and Y-SNP genotyping results, see Supplementary Materials Table S2.

Allelic Diversity Analysis
The allelic frequencies and corresponding GD values for each Y-STR locus are listed in Supplementary Materials Table S3 (Chongming) and Table S4 (Pudong). A total of 247 alleles were detected at 32 single-copy Y-STR loci in the Han population of Chongming Island, and the allele frequencies ranged from 0.0019 to 0.9623. At the multicopy loci DYS385a/b, DYS387S1a/b, and DYS527a/b, 53, 37, and 30 haplotypes were detected, respectively, and the haplotype frequencies were distributed between 0.0019 and 0.1698. GD was distributed between 0.0731 (DYS645)-0.9401 (DYS385a/b). A total of 250 alleles were detected at 32 single-copy Y-STR loci in the Pudong population, and the allele frequencies ranged from 0.0015 to 0.9550. A total of 67, 51, and 40 haplotypes were detected at the multicopy loci DYS385a/b, DYS387S1a/b, and DYS527a/b, respectively, and the haplotype frequencies ranged from 0.0015 to 0.1553. The gene diversity value (GD) of each locus was distributed between 0.0864 (DYS645) and 0.9514 (DYS385a/b).
In this study, we compared the GD values of 17 Han Chinese populations, including the Shanghai population [18], Liaoning population [19], Jiangsu Wujiang Han [16], Jiangsu Changshu Han [17], and other Chinese populations [20]. The GD values of other loci were all greater than 0.5, and the GD values of multicopy loci were all greater than 0.9. In general, the 38 Y-STR locus detection system exhibited a high degree of genetic polymorphism in the population of Pudong and Chongming, which can provide rich genetic information for population genetic research and forensic applications.

Haplotype Diversity Analysis
Y-STR haplotypes play an important role in the identification of paternity [57], forensic evidence identification [58], and individual identification [59]. Currently, there are many commercial Y-STR DNA typing systems, such as the AmpFLSTR ® Yfiler™ PCR Amplification Kit [23], Yfiler™ Plus [24], PowerPlex ® Y23 System [25], and Yfiler™ Platinum PCR Amplification Kit [8]. Different Y-STR typing systems use different Y-STR locus panels, which have differential discrimination powers. We evaluated the performance of different typing systems according to haplotype diversity (HD), haplotype matching probability (MP), and discrimination capacity of the Y-STR haplotype typing system (DC). The results are shown in Table 1. The Y filer platinum genotyping system exhibited the highest haplotype diversity values in the Shanghai Pudong area and Chongming Island Han population, which were 0.99996 (Pudong) and 0.99997 (Chongming), respectively. The proportions of unique haplotypes were 97.10% (Pudong) and 98.49% (Chongming), respectively. The Y filer Platinum was found to have the strongest discriminative ability in the four Y-STR genotyping systems discussed in our assay, with the highest proportion of unique haplotypes.

Variation Analysis
Microvariants are rare alleles at the Y-STR locus. We observed 15 microvariants in the Chongming Han population, including three single-copy loci, i.e., DYS627 ( It is worth noting that of the 1219 samples in this study, 22 samples had one ".2" type of microvariant at the DYS518 loci, of which 21 samples were haplotypes in the QR haplogroup. This result is the same as previous studies on the relationship between the "DYS518~.2" allele and haplogroup Q [20,53].

Genetic Affinity Analysis
For the further verification on the genetic structure background of Shanghai Pudong Han and Chongming Han, multidimensional scaling analysis (MDS) can be used to effectively explore similarities and differentiation in the genetic background of different populations. Based on the Rst genetic distance between each population, a corresponding multidimensional scaling analysis plot (Figure 3) was constructed to visualize the genetic structure relationship between 28 populations (initial stress = 0.04649). The results of pairwise genetic distances are shown in Supplementary Material Table S6. The plot shows that 18 Han Chinese are all clustered in the middle of the graph, and the distribution is relatively close. The six ethnic minorities are distributed around the Han population with a relatively scattered distribution.
Among them, Guangxi Han and Hainan Li are clustered together. As reported, Guangxi Han are a group formed by ethnic minorities, so the Guangxi Han and ethnic minorities are seemed to have a high affinity [60]. The clustering of Guizhou Han and Sichuan Han is more obvious because both Guizhou and Sichuan are located in southwestern China and are adjacent [61]. The closest populations to Chongming Han were Pudong Han, Jiangsu Nantong, Jiangsu Wujiang, and Jiangsu Changshu, and we observed strong affinity among them. The Pudong Han people clustered together with Jiangsu Wujiang, Jiangsu Changshu, and Jiangsu Nantong and had a very close genetic relationship. The two populations in Japan as the out group were also clearly clustered into one cluster.
Compared to the other four reference populations in Jiangsu Province, the Pudong Han population is farther from the Jiangsu Changzhou Han population. This result confirms that there is inner genetic structure between the Jiangsu Han population. Pudong is closer to these Jiangsu populations than Chongming Island populations. The geographical locations of Pudong and Jiangsu are closer, and genetic exchange is more likely to occur. However, the Chongming island population is the result of immigration of the mainland population. Since transportation between the island and the mainland is difficult, this has led to partial geographic isolation, resulting in genetic drift.
To further reveal the genetic structure among populations, we constructed a phylogenetic tree based on the neighbor-joining method (Figure 4). We observed that Han populations were almost all in the upper half of the phylogenetic tree, while ethnic minorities and out groups were in the lower half of the tree.

Y-Chromosomal Haplogroup Distribution
In this study, 24 Y-SNP loci were analyzed, and 18 haplogroups were defined (D, D1a1a1, C, C2, IJ, K, QR, N, N1a1, O1a, O1b, O1b2, O2, O2a1, O2a2, O2a2a1a2, O2a2b, and O2a2b1a1). The distribution of haplogroups within the Pudong and Chongming Han populations is shown in Table S7  Haplogroups worldwide can be divided into more than 20 major groups, numbered A-T, among which C, D, N, and O are the four major haplogroups in East Asia, accounting for approximately 93% of East Asian males [63]. More than 70% of the Chinese Han population belongs to the O haplogroup. The O haplogroup frequencies in Chongming Han and Pudong Han in this study were 77% and 82%, respectively. This result is consistent with the frequency of the overall haplogroup distribution in the Chinese Han population. The O haplogroup can be further divided into two major clades, primarily categorized into the O1 and O2 haplogroups, which account for 60% of East Asian males. There is a large population of Han Chinese in China, and the haplogroup frequency data can reflect the genetic differences between people in different geographical locations. The distribution of the O1 haplogroup is influenced by geographic pattern. The O1 haplogroup has a low frequency distribution in northern provinces, almost all below 20%. O1 is relatively higher in the Han population in southern provinces, all being greater than 25% [64]. In this study, Chongming O1a-M119 accounted for 29.62%, and Pudong O1a-M119 accounted for 24.53%. It was the haplogroup with the highest proportion of these two populations. The O1a-M119 haplogroup is concentrated on the southeastern coast of China, the Dong-Dai population, and the Taiwan aborigines. Shandong Province has always been an area with a low frequency distribution of O1a-M119. In previous studies, the proportion of O1a-M119 in Shandong Han nationality was only 3.1% [20] and 3.0% [53]. The proportions of the O1b-M268 haplogroup in Pudong and Chongming Han were 6.97% and 4.43%, respectively. O1b-M268 is widely distributed in northern Eurasia and is a common haplogroup in southern Han populations. Among the Han people in Pudong and Chongming Island, the O2 haplogroup accounted for 50.81% and 42.63%, respectively. The O2 haplogroup is the predominant haplogroup in East Asian populations. The results of previous studies on O2a2b-P164 and O2a1-KL1 revealed that the average distribution ratio of O2a2b-P164 in the southern Han was 21.54%, and the distribution in the northern Han was 34.11% [65]. O2a2b-P164 reflects differences between the northern and southern populations. In this study, the proportions of the O2a2b-P164 haplogroups in Pudong and Chongming Han were 21.78% and 23.58%, respectively. The results of this frequency distribution are more in line with the characteristics of the southern Han. The O2a1-KL1 haplogroup is widely distributed in northern, southern, and eastern China, and the distribution ratio is approximately 20%, which is relatively evenly distributed throughout the Chinese population. In the Chongming Han population, we observed that the proportion of the N-M231 haplogroup was 13.59%, while the proportion of the N-M231 haplogroup in the Pudong Han population was only 7.26%. The N haplogroup is widely distributed in northern Eurasia. Previous studies have suggested that the high frequency of the N-M231 haplogroup in Eastern Europe is the result of the westward migration of populations from inland Asia [66]. The high frequency distribution of haplogroup N-M231 in Chongming Island may be caused by the "founder effect" of genetic drift. After the initial population of the N-M231 haplogroup settled on the island, due to the small population on the island, the gene frequency of the N-M231 haplogroup population dominated and expanded, resulting in the current frequency distribution.

Principal Component Analysis
To further study the paternal genetic relationship among the populations, we inte-  [67,68].
The distribution of the southern population in this figure is relatively loose, and the distribution of the northern Han population in the figure is relatively tight. This shows that southern Hans have greater Y chromosome genetic structural variation than northern Hans [20]. The Guangxi Han people appear as outliers in the graph, probably because the Guangxi Pinghua population was formed by aboriginal minorities who embraced Han culture [20,69]. The PCA results demonstrated that Chongming Han, Pudong Han, Jiangsu Wujiang Han, and Jiangsu Changshu Han had a close genetic relationship.

Network Analysis
To discern the genetic structure between Pudong Han, Chongming Han and Jiangsu Han in details. We used the median joining (MJ) method to create the STR haplotype network under the O1a-M119 haplogroup. The network plot was based on 15 Y-STR loci exploring the connection and differentiation relationship of STR haplotypes under the O1a-M119 haplogroup (Figure 7). The confirmation of the ancestral haplotype of O1a-M119 is based on the 1000 Genomes Project III [70,71] Y-SNP and Y-STR datasets in Central and East Asia. The haplotype with the closest genetic distance between a haplotype and other haplotypes is calculated to define the ancestral haplogroup through EA YPredictor [72].

Network Analysis
To discern the genetic structure between Pudong Han, Chongming Han and Jiangsu Han in details. We used the median joining (MJ) method to create the STR haplotype network under the O1a-M119 haplogroup. The network plot was based on 15 Y-STR loci exploring the connection and differentiation relationship of STR haplotypes under the O1a-M119 haplogroup (Figure 7). The confirmation of the ancestral haplotype of O1a-M119 is based on the 1000 Genomes Project III [70,71] Y-SNP and Y-STR datasets in Central and East Asia. The haplotype with the closest genetic distance between a haplotype and other haplotypes is calculated to define the ancestral haplogroup through EA YPredictor [72].
The joint analysis of Y-SNPs and Y-STRs can be used to infer paternal migration routes [73].

Conclusions
In this study, we analyzed 38 Y-STR loci and 24 Y-SNP genetic markers in Chongming Han and Pudong Han populations. Genetic diversity analysis was performed based on the results of Y-STR. We evaluated four different Y-STR genotyping systems separately The joint analysis of Y-SNPs and Y-STRs can be used to infer paternal migration routes [73]. The O1a-M119 network showed that Chongming Han and Pudong Han were located downstream of Jiangsu Wujiang Han and Jiangsu Changshu Han, suggesting that Chongming Han and Pudong Han might be immigrants from Jiangsu Han. This result indicates that under the O1a-M119 haplogroup, the Jiangsu Wujiang population and the Jiangsu Changshu population have more paternal genetic contributions to the formation of the Chongming and Pudong Han populations. The haplogroup network exhibits a star-like spread, illustrating the phenomenon of population expansion in the Pudong and Chongming areas. This may be due to genetic drift. According to some historical records, the original source of the Han population in Chongming Island was the Han population in Jiangsu Province [6].

Conclusions
In this study, we analyzed 38 Y-STR loci and 24 Y-SNP genetic markers in Chongming Han and Pudong Han populations. Genetic diversity analysis was performed based on the results of Y-STR. We evaluated four different Y-STR genotyping systems separately using four forensic parameters. Variation analysis counted microvariants and copy number variations and verified the correlation between the "0.2" mutation of DYS518 and the QR haplogroup. The results of population genetic affinity analysis indicated a paternal genetic correlation among Pudong Han, Chongming Han, and Jiangsu Han. The Y-SNP haplogroup analysis revealed that the primary haplogroup of the two populations was O1a-M119. The proportions were 29.62% (Chongming) and 24.53% (Pudong). The results of the Y-SNP haplogroup frequencies of the two populations revealed that the haplogroup characteristics were closer to those of the southern Han. In addition, under the O1a-M119 haplogroup, the results of the haplotype network analysis suggested a paternal genetic contribution relationship between Jiangsu Wujiang Han and Jiangsu Changshu Han to Chongming Han and Pudong Han. This study explored the genetic structure of the two populations from the perspective of molecular biology and compared them to other Han populations in China to determine the correlations and differences in the genetic structure between these populations. The results of this chapter provide new ideas for the gradual refinement of population research on the Han population in China and provide original population data for the practical application of forensic medicine.
Supplementary Materials: The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/genes13081363/s1, Table S1: The Y-STR Loci of four genotyping systems used in this study; Table S2: The Y-STR and Y-SNP genotyping results of all the samples in the Pudong Han population, and Chongming Han population; Table S3: Allele frequencies and gene diversities of 38 Y-chromosome STR loci for the Chongming Han population; Table S4: Allele frequencies and gene diversities of 38 Y-chromosome STR loci for the Pudong Han population; Table S5: The detailed variant information at various Y-STR loci in this study; Table S6. The pairwise Rst genetic distances between the two studied populations and 26 reference populations; Table S7 Funding: This study was supported by the grants from the National Natural Science Fund of China (81930056), National Youth Top-notch Talent of Ten Thousand Program (WRQB2019), and the Youth Science and Technology Innovation Leader of Ten Thousand Program (2018RA2102). The funders had no role in study design, data analysis, publishing decisions, or manuscript preparation.

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the the Ethics Committee of School of Life Science, Fudan University (protocol code No.BE1806).

Informed Consent Statement:
Informed consent was obtained from all subjects involved in the study. Written informed consent has been obtained from the sample donors to publish this paper.
Data Availability Statement: Data were available within the article or its Supplementary Materials.

Conflicts of Interest:
The authors declare no conflict of interest.