Identification and Validation of Marketing Weight-Related SNP Markers Using SLAF Sequencing in Male Yangzhou Geese

Growth performance is a complex economic trait for avian production. The swan goose (Anser cygnoides) has never been exploited genetically like chickens or other waterfowl species such as ducks. Traditional phenotypic selection is still the main method for genetic improvement of geese body weight. In this study, specific locus amplified fragment sequencing (SLAF-seq) with bulked segregant analysis (BSA) was conducted for discovering and genotyping single nucleotide polymorphisms (SNPs) associated with marketing weight trait in male geese. A total of 149,045 SNPs were obtained from 427,093 SLAF tags with an average sequencing depth of 44.97-fold and a Q30 value of 93.26%. After SNPs’ filtering, a total of 12,917 SNPs were included in the study. The 31 highest significant SNPs—which had different allelic frequencies—were further validated by individual-based AS-PCR genotyping in two populations. The association between 10 novel SNPs and the marketing weight of male geese was confirmed. The 10 significant SNPs were involved in linear regression model analysis, which confirmed single-SNP associations and revealed three types of SNP networks for marketing weight. The 10 significant SNPs were located within or close to 10 novel genes, which were identified. The qPCR analysis showed significant difference between genotypes of each SNP in seven genes. Developed SLAF-seq and identified genes will enrich growth performance studies, promoting molecular breeding applications to boost the marketing weight of Chinese geese.


Introduction
Growth performance is the most important economic factor in the poultry industry. In poultry breeding, males are usually selected for growth and meat production, while females are selected for reproduction [1]. Chinese geese are very prolific and are considered the best egg laying breed, but are listed in the lightweight class [2]. A synthetic breed, Yangzhou geese, is a main breed in China, approved as the "first national geese breed" by the National Examination and Approval Committee of Domestic Animal and Poultry Breeds in 2006 [3]. It is a medium body-sized dual-purpose breed for meat and egg production [4]. The average body weight for Yangzhou goslings at nine weeks of age is 3.26 kg [5].
Genetic variations at candidate genes-associated with economic traits such as growth and meat production-have stimulated research in marker-aided selection (MAS) and evolutionary relationships studies [6]. DNA-based molecular markers techniques are costly, time-consuming, and some of them lack reproducibility of genotyping results [7]. Nowadays, single nucleotide polymorphisms (SNPs) are the most adopted and stable technique for studying genomic polymorphism [8]. They provide an assessable association on best linear unbiased prediction (BLUP) of individual estimated breeding values (EBV) for MW trait, which had been assessed using full and half sibs' information by the lme4 package of R software [23] according to [24].

Preparation and Construction of SLAF Library
Blood samples (2 mL) were collected from wing veins of 167 male goslings at nine weeks of age and immediately transferred into 5 mL tubes containing acid citrate dextrose and stored at −20 • C pending the DNA extraction. The conventional phenol/chloroform method was used to extract genomic DNA. The concentration and purity of DNA for each individual sample was assessed using the Thermo Scientific NANODROP2000 spectrophotometer (Thermo Fisher Scientific Inc., Waltham, MA USA). DNA concentration was adjusted to 100 ng/µL for each sample. Twenty samples from each of the low estimated breeding values (LEBV) and high estimated breeding values (HEBV) groups were used to prepare two DNA pools by mixing an equal amount of genomic DNA. The SLAF-seq technique was used to develop and obtain molecular markers across the whole genome data of the two goslings groups by Beijing Biomarker Biotechnology Co., Ltd. (http://www.biomarker.com.cn/ accessed on 27 April 2018) [10]. Briefly, the geese (GOOSE) genome was used as the reference genome for electronic enzyme digestion prediction based on the actual size of goose genome and GC content. The genomic DNA of each sample was digested separately. The restriction enzymes (RsaI and HaeIII) were used to digest genomic DNA. After sequencing, the libraries were qualified with Illumina HiSeq TM2500 (Illumina, Inc., San Diego, CA, USA). In order to evaluate the accuracy of the enzyme digestion experiments, Japanese rice (Oryza sativa ssp. japonica) was used as a control for sequencing (http://rapdb.dna.affrc.go.jp/ accessed on 27 April 2018).

Sequencing Analysis and Detection of MW-Related SNPs
A total of 384,079 SLAF tags with an average sequencing depth of 25-fold were developed for each sample. In order to ensure the quality of the project analysis, a reading length of 100 bp × 2 was used as the subsequent data evaluation and analysis. The short oligonucleotide alignment program (SOAP) was used to compare the sequencing reads of control with its reference genome [25]. For the development of SNP markers, the Burrows-Wheeler alignment tool (BWA) was used to align the sequencing reads with reference genome [26]. Two methods of the sequence alignment/map format (SAMtools) [27] and the genome analysis toolkit (GATK) [28] were used to develop the SNPs list. The SNP marker intersection obtained by both methods was used as the final reliable SNP marker dataset, thus a total of 149,045 SNPs were obtained. LEBV and HEBV SlAF-seq libraries were submitted to NCBI's Sequence Read Archive with the accession numbers SRR14113841 and SRR14113842, respectively.

Quality Control of SLAF Tags
The sequencing quality value (Q) was used to assess the quality of raw SLAF reads. The corresponding formula of base sequence error rate p and sequence quality value is (Q-score = −10 × log10 p). If the sequence accuracy is 99.9%, the quality value for the base should be 30. Both the Euclidean distance (ED) and SNP index methods were then used to filter data for either LEBV or HEBV cohorts according to [29,30]. Genotypes' differences in the SNP loci between the both pools were used to calculate the depth of each base and, therefore, the values of ED for each locus. The following equation was used to calculate ED at each SNP location: where, A HEBV , C HEBV , T HEBV , and G HEBV are the depth of A, C, T, and G bases on a site in HEBV bulk, respectively. A LEBV , C LEBV , T LEBV , and G LEBV are the depth of A, C, T, and G bases on a site in LEBV bulk, respectively. The ∆(SNP index) formula used to calculate the differences of genotypic frequency between LEBV and HEBV bulks was as follows: where M and P stands for HEBV and LEBV, respectively. HEBV and LEBV denote the genotype from high and low estimated breeding values pools, respectively. M HEBV and P HEBV are the depth of the HEBV population derived from M and P, respectively. M LEBV and P LEBV are the depth of the LEBV population derived from M and P, respectively. All SNPs that met the condition ED ≥ 0.7, SNP index ≥ 0.5, and Q ≥ 30 were involved in the present analysis. After data filtering, 12,917 SNPs were used for further analysis. Allele frequency differences for each SNP were used to compare between LEBV and HEBV pools. SNPs with a highly significant difference in the allelic distribution of both pools were selected as candidate loci for further verification in the population.

Replication Association Study
For the replication study, blood and MW of 114 male Yangzhou goslings were collected from individuals of the second population (P2) derived from the same farm. MW data were checked for normality before conducting any further analysis. P2 was kept under the same management system of P1. Blood samples (2 mL) were collected, reserved, and DNA was extracted and its quality was assessed with the same protocol described in Section 2.3.

Genotyping of Female Goslings of First and Second Populations
In order to investigate if the candidate SNPs are gender-related, blood samples and MW of 323 and 345 female Yangzhou goslings belonging to the same farm were collected from individuals of the P1 and P2, respectively. Firstly, the discovered SNPs that showed significantly MW-related SNPs between HEBV and LEBV male goslings groups were genotyped using AS-PCR method in 24 individuals from each of the lowest and the highest female gosling groups of P1 based on EBV. The SNPs that showed a significant difference between female goslings of LEBV and HEBV were secondly genotyped using the AS-PCR method in 323 and 345 female goslings of P1 and P2, respectively. Female goslings of P1 and P2 were kept under the same management system of male goslings of P1 and P2. Blood samples were collected and reserved, and DNA was extracted and its quality was assessed with the same protocol described in Section 2.3.

Verification of MW-Related SNP Genotypes
Based on SLAF-BSA analysis and Chi square test, the 31 highest significantly candidate SNPs of different allelic distribution were selected for male individual-based genotyping in the LEBV and HEBV cohorts of P1. A total of 13 SNPs that showed significantly different allele distribution for MW in the LEBV and HEBV cohorts were further verified in all male individuals of P1 and P2 using the AS-PCR genotyping method. The 13 SNPs were also genotyped using the AS-PCR method in the lowest and the highest female gosling groups of P1 and the significant SNPs were further genotyped in all female individuals of P1 and P2. Primer Premier 5 software (PREMIER Biosoft, Palo Alto, CA, USA) was used to design each pair of primers to amplify fragments based on GOOSE genomic DNA sequence (https://www.ncbi.nlm.nih.gov/ accessed on 30 June 2018). The primers for AS-PCR were designed according to [8,31]. An additional mismatch base pair was inserted at the third base from the 3 end to improve the specificity of PCR amplification and reliable discrimination between the alleles. The SNPs, their primers, and PCR product length are shown in Table S1. Genotypes with two specific primers were performed in duplicates in a total of 20 µL reaction volumes containing 10 µL r-taq (Takara, Dalian, China), 1 µL from each of the forward and reverse primers (Tsingke, Nanjing, China), 1 µL DNA template, and 7 µL ddH 2 O. PCR amplification was carried out by preliminary denaturation at 94 • C for 5 min followed by 32 cycles of amplification (denaturing at 94 • C for 30 s, annealing at ATm • C for 30 s, and extension at 72 • C for 30 s) and a final extension at 72 • C for 7 min. All PCR products quantity were fractionated by agarose gel (2%) electrophoresis, visualized with gold view staining, and quantified with Tanon 3500 Gel Imaging system (Tanon Science and technology Co., Ltd., Shanghai, China).

Quantitative Real-Time PCR (RT-qPCR)
Seven male goslings from each of the HEBV and LEBV groups of P2 were selectedaccording to their genotypes-and slaughtered to obtain brain tissues (pre-experiments of gene expression were performed to select the suitable tissue). Tissues were collected and immediately frozen in liquid nitrogen pending the expression analysis. Total RNA was extracted from the brain tissues using TRIZOL Reagent (Invitrogen, Carlsbad, California, USA) according to the manufacturer's protocol. RNA quality and quantity were assessed by NanoDrop spectrophotometer at 260/280 nm. The cDNA first strand was synthesized from 1 µg of purified total RNA using ProtoScript First Strand cDNA Synthesis kit (Takara, Dalian, China). The gene specific primers of the identified 10 candidate genes harboring MW-related SNPs were designed using Primer Premier 5 software to determine the expression levels of the 10 genes in geese brain tissue by qPCR. The primers (Table S2) were designed based on the geese genome sequence of these genes in the Genbank database (Anser cygnoides, taxid:8845). SYBR ® Green Master Mix (Vazyme, Nanjing, China) was used in a StepOne Plus Real-Time PCR system (Applied Biosystems, Foster City, CA, USA). The PCR reaction (20 µL) consisted of 1 µL cDNA, 0.4 µL from each primer (10 µmol), 0.4 µL ROX Reference Dye, 10 µL SYBR Green Master Mix, and 7.8 µL nuclease-free water. Amplification thermal conditions were as follows: pre-denaturation at 95 • C for 5 min, 40 amplification cycles (95 • C for 10 s and 60 • C for 30 s). Melting curve analysis was performed from 60 • C to 95 • C by reading plate every 0.1 • C. Each sample was analyzed three times. The house-keeping gene glyceraldehyde-3-phosphate dehydrogenase (GAPDH) was used as internal control.

Statistical, Bioinformatics, and Data Analysis
Function Shapiro test of R [23] was used to perform a Shapiro-Wilk test for goslings MW normality. In male and female goslings of both populations, the best linear unbiased predictions of individual estimated breeding values for MW trait were estimated using full and half sibs' information by lme4 package of R software [23] according to [24]. The formula used to predict the BV is as follows: â ijk = 1/2 (â j + â k ) +ŵ ijk , where â ijk = the BV i th progeny of parent j x parent k, â j = the BV of parent j, â k = the BV of parent k, and w ijk = the within-family Mendelian additive effect of the individual ijk.ŵ was calculated asŵ ijk = h 2 w (e ijk ), where e ijk = the residual from the linear mixed model associated with progeny from parent j × parent k, and h 2 w = the within-family heritability = 1/2 σ 2 A/σ 2 e. The allelic frequency variation of SLAF tags between LEBV and HEBV male groups were tested using contingency tables and chi-square statistics. Two methods, false discovery rate and Bonferroni correction, were used to obtain adjusted p-values and estimate the significance threshold level at 5% overall Type I error rate [32,33]. One-way analysis of variance (ANOVA) in the statistical language R was used to test the association between different genotypes and MW. Mean separation test of Duncan was used to compare between means [34]. PLINK software was used to estimate genetic and allelic frequencies [35]. All single SNP-trait associations that reached a significance level of p < 0.05 were included in further multiple marker analysis. Multiple-marker associations were analyzed with two quantitative trait modes (additive mode: PAa (PAA + Paa)/2) and dominant mode: PAa either PAA or Paa) by the linear regression procedure of SAS software using forward or backward stepwise comparison [36] according to [37]. All data were expressed as mean ± SE. Spearman rank correlation matrices were performed by SAS software using SNPs' genotype values to design heat map. According to the 10 significant SNPs found to be associated with MW in male gosling of P1 and P2, a BLAST analysis against the NCBI public database (https://blast.ncbi.nlm.nih.gov/Blast.cgi accessed on 20 March 2020) using the SLAF-taqs of 100 bp sequence was performed to retrieve orthologous sequences. The geese sequences in the whole-genome shotgun contigs (wgs) database of Anser cygnoides (taxid:8845) were used for alignment. The 2 −∆∆CT method was used to calculate relative quantification of gene expression. GAPDH was used as an internal control. Two tailed t-tests were used to analyze mRNA expression variation between different genotypes in each tested SNP.

Analysis of Goslings' Marketing Weight
Gander had a higher marketing weight (MW) than goose. The average MWs were 4.12 and 4.09 kg for males and 3.54 and 3.47 kg for females in the first (P1) and second (P2) populations, respectively. The average estimated breeding values (EBVs) of males (0.01 and 0.003) were approximately equal to those of females (−0.006 and 0.002) in both populations. The heritability of MW was 0.29 and the environmental proportion of total variance was 0.71. Figure 1 shows MW (kg) and EBVs for the MW trait of each individual, divided into low, average, and high groups in the male and female goslings of P1 and P2. between high, average, and low EBV groups. The X axis represents low, average, and high groups of males and females from P1 and P2. The Y axis represents MW (kg) or EBV. The error bars represent the standard error of mean. * p < 0.05, ** p < 0.01 and *** p < 0.001. Table 1 shows the statistics of sequencing data for each sample including the number of reads, quality value (Q30), and GC content. A total of 5.4 Gb of sequencing data were generated by SLAF sequencing containing more than 20 M paired-end mapped reads representing 95.39% of the total reads. Through bioinformatics analysis, 427,093 SLAF tags were obtained with an average sequence depth of 44.97-fold. A total of 149,045 SNPs were obtained from polymorphism of 84,594 SLAF tags. The average Q30 sequence was 93.26% with an average GC content of 42.88%. After filtering by SNP index and Euclidean distance (ED), 12,917 SNPs were included in downstream analysis. The ratio of transition/transversion (ti/tv) was 2.33, where 69.96% was transition and 30.04% was transversion. For the transitions category, the percentage of A-G substitution (35.54%) roughly equals that of C-T substitution (34.42%), while the percentages of G-T (7.58%), A-C (8.09%), A-T (7.07%), and C-G (7.30%) substitutions are almost equal in transversions.

Discovering of Goslings MW-Related SNPs
For all 12,917 SNPs detected by SLAF sequencing data, an independent Chi square test was used to estimate allele frequency differences between males of high (HEBV) and low (LEBV) estimated breeding value cohorts. A total of 3145 SNPs showed a significant effect in the Chi square test (p < 0.05), 382 SNPs reached a 5% false discovery rate (FDR), and only 68 SNPs reached 5% Bonferroni correction. The 31 highest significant SNPs (p < 2.26 × 10 −6 -9.22 × 10 −33 ) were selected as candidate MW-related SNPs (Table S1). Allele-specific polymerase chain reaction (AS-PCR) was used to genotype forty male individuals; as twenty from each of the HEBV and LEBV cohorts. Thirteen out of 31 SNPs showed significant (p < 0.04-8.44 × 10 −7 ) allele frequency differences between HEBV and LEBV male groups ( Figure S1). Details on the significant SNPs including their genomic positions and p-values for males of P1 are summarized in Table 2.

Verification of MW-Related SNPs in Male Goslings
Using the AS-PCR method, a total of 13 SNPs that showed a significant influence on MW of HEBV and LEBV male cohorts were selected as candidate SNPs to be associated with MW in 167 males of P1. Ten out of the 13 SNPs reached a 5% Bonferroni distribution over five different chromosomes. Duncan separation test showed that goslings with TT genotype of SNPs Record_1102, Record_7099, and Record_396 loci were of higher MW than those with CC genotype. For the heterozygous, there was a significant difference in MW between goslings with CT and CC genotypes in Record_1102 and goslings with TT genotype in Record_7099 loci. At loci of Record_1056 and Record_7097, the GG was linked with higher MW than the genotype CC. The CC genotype at Record_8964 locus was related to higher MW than the GG genotype. There was a significant difference between CG and GG genotypes in Record_1056 and Record_8964 loci. There was a significant difference in MW between goslings with CG and both homozygous in Record_7097 locus. Individuals having CC genotype showed higher MW than the individuals having TT genotype in Record_1009 and Record_1115 loci. There was a significant difference in MW between CT and CC genotypes in Record_1009 locus. There was a significant difference between CT and both of homozygous genotypes in Record_1115 locus. For Record_1111, the GG genotype was associated with higher MW than the AG and AA genotypes. Heterozygous AG genotype was related to higher MW than both homozygous genotypes at Record_2315 locus. Figure 2 shows the MW of each genotype of the 10 significant SNPs in P1 males.

Replication Association Analysis for Male Goslings
In order to validate the significance of the 10 SNPs that were shown to be associated with MW of males from P1, 114 individuals of males from P2 were genotyped using AS-PCR. The AS-PCR genotypes association analysis of the individuals from P2 confirmed the results obtained from the genotypes analysis of P1. Allelic and genotypic frequencies for males of both populations are shown in Figure S2. Goslings with TT genotype of Record_1102 locus had a significantly higher MW than those with CC and CT genotypes. Goslings with TT genotype of Record_396 locus had a significantly higher MW than those with CC genotype. Goslings with CT genotype of both SNPs loci had a significantly higher MW than those with CC genotype. Goslings with GG genotype in Record_1056 locus showed a significantly higher MW in comparison with those with CC and CG genotypes and goslings with CG genotype had also a significantly higher MW than those with CC genotype. In contrast, goslings with CC genotype in Record_8964 locus showed a significantly higher MW in comparison with those with GG and CG genotypes. Individuals with GG and CG genotypes of Record_7097 locus had a significantly higher MW than those with CC genotype. The homozygous CC genotype of Record_1009 locus had a significantly higher MW than CT genotype. The CC genotype of Record_1115 locus had a significantly higher MW than TT genotype. Goslings with TT and CT genotypes of Record_7099 locus had a significantly higher MW than those with CC genotype. Goslings with AA genotype of Record_1111 locus had a significantly higher MW than those with the GG genotype. Furthermore, goslings with AA genotype of Record_2315 locus had a significantly higher MW than those with AG and GG genotypes. Figure 2 shows the MW of each genotype of the 10 significant SNPs in P2 males. SNPs in males of the P1 and P2, respectively. The X axis represents different genotypes groups of each SNP in males of the P1 and P2. The Y axis represents MW (kg). The error bars represent the standard error of mean. * p < 0.05, ** p < 0.01 and *** p < 0.001.

Linear Regression Model and SNP Networks Analysis of Male Goslings Marketing Weight
Pairwise comparison test with either forward or backward methods was used to identify SNP-SNP combinations using linear regression model according to [37]. The 10 significant SNPs were involved in multiple regression model analysis, which verified single-SNP associations and revealed three types of SNP networks for MW of male goslings.
The three types of SNPs networks are as follows: four SNP-networks; three SNP networks and two SNP networks (Figures 3 and 4). All SNPs had an additive mode except Record_2315 and Record_7099, which a had dominant mode, and Record_1009, which had an over dominant mode, regarding single SNP association with MW. For the three types of SNP networks, all SNP networks showed additive-dominant combinations between SNPs. The substitution of GG-AA homozygotes reduced MW by 110 (SNP network 4) to 180 g (SNP network 2) at Record_1111 locus. For Record_2315 locus, the substitution of AA-GG homozygotes decreased MW by 330 (SNP network 6) to 400 g (SNP network 1). Substitution of CC-TT homozygotes at Record_1009 locus resulted in a decrease of MW by 140 (SNP-network 1) to 170 g (SNP-network 2) and at Record_1115 locus (SNP network 4). The substitution of GG-CC homozygotes at Record_1056 locus descended MW by 190-200 g. The MW increased by 150 g (SNP network 8) owing to the substitution of CC-TT at Record_7099 locus, while this increase was approximately 160 (SNP network 3 and 7) to 250 g (SNP network 2) at Record_396 locus owing to the substitution. The substitution of GG-CC resulted in a weight gain of 120 (SNP network 8) to 290 g (SNP network 6) at Record_7097 locus. On the contrary, this substitution at Record_8964 locus resulted in a weight reduction of 250 g (SNP network 6).

Additive, Dominance, and Recessive Effects of Significant SNPs
The genotypic effects of the 10 significant SNPs of males in both populations were further divided into additive, dominant, and recessive effects. In P1, the values ranged between −0.118-0.265, −0.214-(−0.058), and −0.453-(−0.073) for additive, dominant, and recessive effects, respectively. For P2, the values ranged between −0.198-0.030, −0.430-0.104, and −0.247-(−0.006) for additive, dominant, and recessive effects, respectively. In general, all tested SNPs in the present study showed significant effects of one or more genotypes in P1 or P2. Additive, dominance, and recessive values of males in both populations are shown in Table 3.

Correlations between MW and SNPs' Genotypes
For more confirmation on SNP networks, two correlation matrixes based on coefficients of Spearman rank correlation were performed using SNPs' genotype values. All SNPs of males from P1 and P2 showed a significant correlation with MW and, at the same time, most of them showed a significant correlation with themselves. The highest correlation coefficients were found between MW and SNPs of Record_1111, Record_7099, and Record_396 loci for P1 and SNPs Record_1056, Record_1102, and Record_396 loci for P2. The correlation coefficients between SNPs ranged between medium to unity in P1 and low to medium in P2. First and second populations heat map of all pairwise correlation between SNPs were constructed based on both correlation matrixes and visualized in Figure 5 with their clustering analyses. Genes 2021, 12, x FOR PEER REVIEW 11 of 22   . The X and Y axes represent actual and predicted MW (kg), respectively.

Additive, Dominance, and Recessive Effects of Significant SNPs
The genotypic effects of the 10 significant SNPs of males in both populations were further divided into additive, dominant, and recessive effects. In P1, the values ranged between −0.118-0.265, −0.214-(−0.058), and −0.453-(−0.073) for additive, dominant, and recessive effects, respectively. For P2, the values ranged between −0.198-0.030, −0.430-0.104, and −0.247-(−0.006) for additive, dominant, and recessive effects, respectively. In general, all tested SNPs in the present study showed significant effects of one or more genotypes in P1 or P2. Additive, dominance, and recessive values of males in both populations are shown in Table 3.  . The X and Y axes represent actual and predicted MW (kg), respectively.

Verification of MW-Related SNPs in Female Goslings
In order to investigate if the discovered SNPs are gender-related, the 13 significantly MW-related SNPs in HEBV and LEBV of male cohorts from P1 were used to genotype

Verification of MW-Related SNPs in Female Goslings
In order to investigate if the discovered SNPs are gender-related, the 13 significantly MW-related SNPs in HEBV and LEBV of male cohorts from P1 were used to genotype HEBV and LEBV of female cohorts from the same population by the AS-PCR method. The results showed that only 5 out of 13 SNPs showed significant allele frequency differences between HEBV and LEBV of female groups from P1 ( Figure S1). Details on the significant SNPs including their genomic positions and p-values for females from P1 are summarized in Table 2. The five significantly associated SNPs of LEBV and HEBV female cohorts were further selected for individuals genotyping of 323 and 345 females of P1 and P2, respectively. The results showed a strong significant effect (p < 0.05) for only Record_1056. Goslings with GG genotype showed significantly higher MW than those with CC and CG genotypes in both populations ( Figure 6).

Verification of MW-Related SNPs in Female Goslings
In order to investigate if the discovered SNPs are gender-related, the 13 significantly MW-related SNPs in HEBV and LEBV of male cohorts from P1 were used to genotype HEBV and LEBV of female cohorts from the same population by the AS-PCR method. The results showed that only 5 out of 13 SNPs showed significant allele frequency differences between HEBV and LEBV of female groups from P1 ( Figure S1). Details on the significant SNPs including their genomic positions and p-values for females from P1 are summarized in Table 2. The five significantly associated SNPs of LEBV and HEBV female cohorts were further selected for individuals genotyping of 323 and 345 females of P1 and P2, respectively. The results showed a strong significant effect (p < 0.05) for only Record_1056. Goslings with GG genotype showed significantly higher MW than those with CC and CG genotypes in both populations ( Figure 6).  Figure 6. Average MW (kg) of each genotype of SNP Record_1056 in females of (A) first population and (B) second population. The X axis represents different genotypes groups of Record_1056 in females of P1 and P2. The Y axis represents MW (kg). The error bars represent the standard error of mean. * p < 0.05 and ** p < 0.01.

Annotation of Genes Harboring SNPs Associated with Goslings MW
According to the 10 significant SNPs that were found to be associated with MW of males from P1 and P2, a BLAST analysis against the NCBI public database (https: //blast.ncbi.nlm.nih.gov/Blast.cgi accessed on 20 March 2020) using the SLAF-taqs of 100 bp sequence was performed to retrieve orthologous sequences. The geese sequences in the whole-genome shotgun contigs (wgs) database of Anser cygnoides (taxid:8845) were used for alignment. The 10 significant SNPs were found to be located within or close to 10 genes that were concluded to be significantly associated with MW of male goslings. In general, the 10 SNPs are distributed in five different chromosomes including five SNPs (Record_1102, Record_1111, Record_1009, Record_1056, and Record_1115) on KZ155846.1, one SNP (Record_2315) on KZ155852.1, two SNPs (Record_7099 and Record_7097) on KZ155908.1, one SNP (Record_8964) on KZ155945.1, and one SNP (Record_396) on KZ155843.1 chromosomes. As shown in Table 4, SNP Record_1102 is located on intron four of H6 family homeobox 1 (HMX1) gene. SNP Record_1111 is located at 29.5 Kb upstream region of uncharacterized LOC106034756 gene. SNP Record_2315 is located at 10 Kb upstream region of LRR binding FLII interacting protein 1 (LRRFIP1) gene. SNP Record_1009 is located at 14.5 Kb upstream region of uncharacterized protein K02A2.6-like (LOC106035299) gene. SNP Record_1056 is located on intron one of protein phosphatase 2 regulatory sub-unit B γ (PPP2R2C) gene. SNP Record_1115 is located on intron one of uncharacterized LOC106034755 gene. SNP Record_7099 and Record_7097 are located at 3.3 Kb and 3 Kb, respectively, downstream region of ubiquitin C-terminal hydrolase L1 (UCHL1) gene. SNP Record_8964 is located at 28.3 Kb downstream region and 41.2 Kb upstream region of uncharacterized LOC106034143 and platelet derived growth factor D (PDGFD) genes. SNP Record_396 is located at 24.6 Kb upstream region of extracellular leucine-rich repeat and fibronectin type III domain containing 1 (ELFN1) gene.

Relative Gene Expression
For further confirm that these genes are strongly related to MW, relative gene expression using qPCR was performed. The substitution of reference-alternative genotypes was shown to lead to upregulated mRNA expression levels in five genes (HMX1, LOC106034756, PPP2R2C, UCHL1, and PDGFD) and downregulated levels in five genes (LRRFIP1, LOC106035299, LOC106034755, LOC106034143, and ELFN1) (Figure 7). The mRNA expression levels in all genes showed significant differences (p < 0.05) between genotypes except LOC106034756, LOC106035299, and LRRFIP1 genes. For LOC106034756 and LRRFIP1 genes, the alternative homozygous genotype is difficult to detect and there is no significant difference between reference homozygous and alternative heterozygous genotypes for MW in the second population ( Figure 2).

Discussion
Goslings' growth is a crucial factor for the whole life with a major impact on the ciency of production and reproduction. Many factors influence body weight and gro performance of goslings; some are hereditary in origin (biological factors) and other environmental factors. China dominates the global geese industry by producing app mately 4.8 million tons of meat, owning more than 90% of the global goose popula [38]. Chinese geese have been kept for eggs and meat. They are relatively good egg la They actively forage and produce the least greasy meat of all but Pilgrim geese. Howe Chinese geese are listed in the lightweight category [2]. Chinese geese are relatively s in size with an average mature body weight of 3.5-4.5 Ib for goose and 4.5-5.5 Ib for der. Their average mature body weights are lower than the other global geese breeds as Toulouse (10-13 for goose and 12-15 Ib for gander), Embden (10-13 for goose and 15 Ib for gander), and American Buff (9-12 for goose and 10-12 Ib for gander) [39]. In study, the average marketing weight (MW) of Yangzhou goslings (3.50 kg for goose 4.11 kg for gander) at nine weeks of age is higher than those previously reported fo same breed [5,40], but lower than the other global geese meat breeds such as Pomera (5.29), Landes (4.74), and Steinbacher (4.33) goose breeds at ten weeks of age [41]. No days, the tissue composition of carcasses, body weight at slaughter, and meat quality more important in the selection of breeds for meatiness.
Owing to the low average MW in Yangzhou geese, it was imperative to find a me for genetic improvement of growth, which can be utilized in selection programs. Ach ing significant success using quantitative genetics tools depends on accurate data, a g

Discussion
Goslings' growth is a crucial factor for the whole life with a major impact on the efficiency of production and reproduction. Many factors influence body weight and growth performance of goslings; some are hereditary in origin (biological factors) and others are environmental factors. China dominates the global geese industry by producing approximately 4.8 million tons of meat, owning more than 90% of the global goose population [38]. Chinese geese have been kept for eggs and meat. They are relatively good egg layers. They actively forage and produce the least greasy meat of all but Pilgrim geese. However, Chinese geese are listed in the lightweight category [2]. Chinese geese are relatively small in size with an average mature body weight of 3.5-4.5 Ib for goose and 4.5-5.5 Ib for gander. Their average mature body weights are lower than the other global geese breeds such as Toulouse (10-13 for goose and 12-15 Ib for gander), Embden (10-13 for goose and 12-15 Ib for gander), and American Buff (9-12 for goose and 10-12 Ib for gander) [39]. In this study, the average marketing weight (MW) of Yangzhou goslings (3.50 kg for goose and 4.11 kg for gander) at nine weeks of age is higher than those previously reported for the same breed [5,40], but lower than the other global geese meat breeds such as Pomeranian (5.29), Landes (4.74), and Steinbacher (4.33) goose breeds at ten weeks of age [41]. Nowadays, the tissue composition of carcasses, body weight at slaughter, and meat quality are more important in the selection of breeds for meatiness.
Owing to the low average MW in Yangzhou geese, it was imperative to find a method for genetic improvement of growth, which can be utilized in selection programs. Achieving significant success using quantitative genetics tools depends on accurate data, a good management system, and unbiased distribution of ganders. Furthermore, it required several years to achieve the desired success. Genetic variations at candidate genes regarding economic traits have stimulated research interest [6]. GWA studies were used for rapidly and efficiently discovering thousands of associated SNPs and identifying genes for complex economic traits. SLAF-seq is the most efficient method of large-scale de novo SNP discovery that can be used in GWAS [10].
In this study, an economic and effective method of SLAF-seq with BSA techniques (SLAF-BSA) was employed for discovering and genotyping MW-related SNPs. A total of 427,093 SLAF-tags were obtained with an average sequencing depth of 44.97-fold. A total of 149,045 SNPs were obtained from polymorphism of 84,594 SLAF tags. After quality control by SNPs index and ED corrections, 12,917 SNPs were included in this study. Based on Chi square test, the 31 highest significant SNPs that reached 5% Bonferroni correction were selected as candidate MW-related SNPs of male goslings. Thirteen out of 31 SNPs showed significantly variation (p < 0.05) in allele frequency between males of HEBV and LEBV groups. The 13 SNPs were genotyped in 167 males of P1. Ten out of the 13 SNPs reached 5% Bonferroni distributing. A follow-up replication study in 114 males of P2 was conducted for further verification of the SNP markers' impact on MW. The results verified the significant association of the 10 SNPs with MW. Only Record_1056 showed a significant (p < 0.05) association with MW for 323 and 345 females of P1 and P2, respectively. These results suggested that these SNPs are associated with MW in Yangzhou geese, and this effect might relate to gender. The ti/tv value is an important property for evolution of the DNA-sequence where transition has less impact than transversion in changing amino acids in the protein, despite its higher frequency in the genome [42]. The value of ti/tv ratio (2.33) mediated previously published drosophila (2.00) and human (4.00) values [43,44].
To the best of our knowledge, this is the first time the SLAF-BSA approach has been used to distinguish large-scale de novo SNPs related to MW in Yangzhou geese. Different ways have been used for discovering and genotyping SNPs in goose populations. Next generation sequencing was performed using reduced representation (RR) sequencing from a DNA pool to detect 2188 SNPs for Barnacle goose [45]. Using the candidate gene approach, two SNPs in exon two of the growth hormone gene were detected in Huoyan goose by genotyping 552 individuals using polymerase chain reaction [46]. Using restriction site associated DNA of two DNA pools, 139,013 SNPs related to egg laying in Yangzhou goose were discovered [16]. Moreover, two SNPs in exon one and one SNP in intron two of SMAD family member 9 gene were discovered in goose by the PCR-SSCP method [47]. Using reduced representation (RR) sequencing, 277,362 SNPs were detected for pink-footed goose [48]. In additional, one SNP in exon three of Myostatin gene was detected in Landes and Kielecka geese breeds [49]. Recently, a total of 26 SNPs and 14 annotated genes significantly associated with quality traits and egg production were identified in Sichuan white geese by GWAS [50].
Regarding the 10 significant SNPs of male goslings in both populations-which were detected in the current study-five SNPs (Record_1102, Record_1111, Record_1009, Record_1056, and Record_1115 loci) are distributed in the~4 Mb region (3215955-7261356) of KZ155846.1 chromosome; two SNPs (Record_7099 and Record_7097 loci) are distributed in the 250 b region (2481905-2482155) of KZ155908.1 chromosome; and the remaining three SNPs (Record_2315, Record_8964, and Record_396 loci) are distributed in KZ155852.1, KZ155945.1, and KZ155843.1 chromosomes, respectively. Linear regression analysis procedure was conducted on the 10 significant SNPs to narrow down the genomic region of chromosome KZ155846.1, verify the impact of the SNP marker on MW, improve the accuracy of the analysis model, and detect SNP-SNP combinations (SNP networks). In the construction of SNP networks-related to quantitative traits-several unique features of linear regression procedure were shown such as determining a leading marker within a gene related to a particular trait of interest, improving the correlation between the actual and predicted network values and predicting the average values of genotypes substitution in each SNPs [37]. Our linear regression analysis confirmed single-SNP associations and revealed three types of SNP networks related to MW (four SNP networks, three SNP networks, and two SNP networks). SNP network 4 trimmed the region of KZ155846.1 chromosome from~4 Mb to~56 Kb. As Record_1009 locus in SNP network 4 has an over dominant effect-which is rarely used in a practical selection program [37]-only two SNPs (SNPs Record_1111 and Record_1115 loci) are considered in the~56 Kb region. Therefore, the SNP networks used in this study are a very powerful tool for optimizing candidate genes' selection, thus facilitating the delineation of genetic variation that underlies complex traits such as growth performance [51]. Two SNP networks were established by linear regression model, which revealed two and three SNP networks for egg laying related SNPs in Yangzhou goose [16]. Multiple SNP networks using linear regression model analysis were established, which revealed two and three SNP networks for seven and five production traits in beef cattle, respectively [37].
The ten discovered genes in this study are novel and their association with growth in general and body weight of geese in particular has not yet been verified. Four out of the 10 genes are uncharacterized in geese genome. Despite there being some evidence indicating that the other six genes are associated with growth, their specific function has not yet been clearly defined in geese and further in-depth studies are needed. A nonsense mutation in the first exon of Hmx1 gene causes a semi-lethal mutation in mouse called Dumbo, resulting in reducing body mass in survival mutants from~3 days postpartum onwards with microphthalmic at puberty [52]. Eight SNPs of human Lrrfip1 gene were found to be associated with abdominal fat, body fat percent, and C-reactive protein levels [53]. Micro RNA-132 blocks proliferation of vascular smooth muscle cells by inhibiting Lrrfip1 expression [54]. Additionally, inhibition in cell growth and an increase in apoptosis caused by GCF2/Lrrfip1 knockdown in human HepG2 cells [55]. Ppp2r2c plays various biological roles in dynamics and mobility of cytoskeleton [56], and control of cell growth and division [57]. A recent study revealed that Ppp2r2c is involved in the differentiation of bovine skeletal myoblast [58]. Pdgfd gene is a member of the platelet-derived growth factor family that is important for several types of connective tissue cells in terms of survival, function, and growth. Pdgfd was found to be potentially differentially regulated in obese rats' skeletal muscle by essential polyunsaturated fatty acids [59]. A pair of Uncx and Elfn1 genes are encoding transcription factors that coordinate growth and innervation of somatic muscles in zebrafish [60]. Uchl1 may have a potential role in energy metabolism, as the evidence showed its unique role in muscle differentiation and lipid deposition [61]. Uchl1 was the most downregulated gene involved in oxidative stress during both short-and long-term weight loss [62,63]. Uchl1 was also identified as a differentially expressed gene in the longest dorsal and semi-membranous muscles in two Polish pig breeds differing in fat and meat qualities [64]. Based on these study results, we identified a novel mutation in the promoter region of UCHL1 gene, which can alter transcriptional activity of UCHL1 gene and affect the growth performance of male Yangzhou goslings [65].
In conclusion, the annotation analysis for the 10 genes and gene expression analysis emphasizes that these genes might be related directly or indirectly to growth performance. The mechanism by which the polymorphism in genes affects growth performance in geese has not been established. Whereas the polymorphisms evaluated in this study do not result in amino acid substitutions (all SNPs at down or upstream regions or in intron region of genes), these polymorphisms may be associated with other mutation(s) in other site(s) of the nucleotide sequence or other genes closely linked with these genes. This is the first case of discovery of novel genes that may be associated with growth performance in Yangzhou geese. The 10 SNPs can be recommended as potential genetic markers for growth traits in geese, but further verification studies are still needed to emphasize the current obtained results.

Conclusions
SLAF sequencing with the BSA approach was applied for discovering and genotyping SNPs related to MW of male goslings. In general, 10 SNPs were confirmed to be associated with MW of males and only one SNP was associated with females. Ten genes located in five chromosomes were suggested to be associated with it. Interestingly, five genes are located at the~4 Mb region of KZ155846.1 chromosome, which includes a~56 Kb region containing three genes. We suggest this region with the 250 b region of KZ155908.1 chromosome to be related to the MW of male goslings, as shown by SNP network analysis. Moreover, all 10 SNPs can be potential target markers for marker-assisted selection to help identifying goslings with greater growth performance. Finally, our results confirmed that molecular methods such as SLAF sequencing with BSA can be used for determining molecular mechanisms underlying certain physiological cascade. To the best of our knowledge, the specific function of any of the 10 identified genes has not been clearly defined in geese and further in-depth studies are needed to explore the new functional role in MW of male goslings.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/genes12081203/s1, Table S1: Information about candidate SNPs and their primers for goslings marketing weight, Table S2: Primer pairs of the candidate genes used for quantitative real-time PCR, Figure S1: The allele frequincies of the first population for males of LEBV (first row), males of HEBV (second row), females of LEBV (third row), and females of HEBV groups (forth row), Figure   Data Availability Statement: The raw sequence data have been submitted to the National Center for Biotechnology Information Short Read Archive with project accession of PRJNA718613; LEBV and HEBV SLAF-seq libraries were submitted with the accession numbers SRR14113841 and SRR14113842, respectively.
Acknowledgments: All authors thank Jiangsu Lihua Animal Husbandry Co., Ltd. for providing us with animal samples and data for this study.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

AIC
Akaike's Information Criterion AS-PCR Allele-specific polymerase chain reaction BSA Bulked segregant analysis EBV Estimated breeding value ED Euclidean distance ELFN1 Extracellular leucine-rich repeat and fibronectin type III domain containing 1 gene FDR False discovery rate GATK Genome analysis toolkit GC (%) The percentage of G and C bases in the total bases in the sequencing results

GWASs
Genome-wide association studies HMX1 H6 family homeobox 1 gene LRRFIP1 LRR binding FLII interacting protein 1 gene MW Marketing weight (body weight at nine weeks of age) P1 First population P2 Second population PDGFD Platelet-derived growth factor D gene PPP2R2C Protein phosphatase 2 regulatory subunit B γ gene Q30 (%) The percentage of bases with a sequencing quality value greater than or equal to 30 SAMtools Sequence alignment/map format SLAF-seq Specific locus amplified fragment sequencing SNPs Single nucleotide polymorphisms SOAP Short oligonucleotide alignment program SSCP Single-strand conformation polymorphism ti/tv Transition/transversion ratio UCHL1 Ubiquitin C-terminal hydrolase L1 gene