GWAS Discovery of Candidate Genes for Yield-Related Traits in Peanut and Support from Earlier QTL Mapping Studies

Peanut (Arachis hypogaea L.) is one of the most important oil crops worldwide, and its yet increasing market demand may be met by genetic improvement of yield related traits, which may be facilitated by a good understanding of the underlying genetic base of these traits. Here, we have carried out a genome-wide association study (GWAS) with the aim to identify genomic regions and the candidate genes within these regions that may be involved in determining the phenotypic variation at seven yield-related traits in peanut. For the GWAS analyses, 195 peanut accessions were phenotyped and/or genotyped; the latter was done using a genotyping-by-sequencing approach, which produced a total of 13,435 high-quality single nucleotide polymorphisms (SNPs). Analyses of these SNPs show that the analyzed peanut accessions can be approximately grouped into two big groups that, to some extent, agree with the botanical classification of peanut at the subspecies level. By taking this genetic structure as well as the relationships between the analyzed accessions into consideration, our GWAS analyses have identified 93 non-overlapping peak SNPs that are significantly associated with four of the studied traits. Gene annotation of the genome regions surrounding these peak SNPs have found a total of 311 unique candidate genes. Among the 93 yield-related-trait-associated SNP peaks, 12 are found to be co-localized with the quantitative trait loci (QTLs) that were identified by earlier related QTL mapping studies, and these 12 SNP peaks are only related to three traits and are almost all located on chromosomes Arahy.05 and Arahy.16. Gene annotation of these 12 co-localized SNP peaks have found 36 candidates genes, and a close examination of these candidate genes found one very interesting gene (arahy.RI9HIF), the rice homolog of which produces a protein that has been shown to improve rice yield when over-expressed. Further tests of the arahy.RI9HIF gene, as well as other candidate genes especially those within the more confident co-localized genomic regions, may hold the potential for significantly improving peanut yield.


Introduction
As one important source of edible oil, the cultivated peanut (Arachis hypogaea L.) has been planted in more than 100 countries (FAOSTAT; http://faostat.fao.org) [1], including China, which has become the largest producer and exporter of peanut in the world [2]. Nevertheless, there is still a huge market demand for peanut, which may be resolved with genetic improvement of yield-related traits [3,4]. weight (SW), hundred-pod weight (PW), yield per plant (YP), mature pod number per plant (MPP), pod number per plant (PNP), pod branch number per plant (PBP), and total branch number per plant (TBP). Each yield-related trait for each harvested accession was repeatedly measured three times for each study location and year, and these three measurements were then averaged out, so in total, nine mean values (3 locations × 3 years) for each trait of each accession were acquired. To minimize environmental effects, one BLUP (best linear unbiased prediction) value were estimated for each trait of each accession from the nine mean values obtained above, and it was these BLUP values that would be used later on in the GWAS analyses (cf. [26]). The correlation coefficients of each pair of the analyzed traits were calculated with the R function "cor" (https://cran.r-project.org/bin/windows/base/), and the broad-sense heritability (H 2 ) for each trait was estimated using the R package "lem4" [27].

Genotyping by Sequencing of Peanut Cultivars
Fresh leaves per accession were collected at Shandong Peanut Research Institute, Shandong Academy of Agriculture Science, and it was from these leaves that DNA was extracted using the DNeasy Plant Mini Kit (QIAGEN, Beijing, China). The extracted DNA was firstly electrophoresized and visualized in agarose gel containing Super GelRed (US Everbright Inc., Suzhou, China), and then had their quality and concentration measured by the Nanodrop™ 2000 spectrophotometer (Thermo Scientific, Shanghai, China) and the Qubit ® 2.0 fluorometer (Thermo Scientific, Shanghai, China). Each DNA sample has a concentration of no less than 50 ng/µl and a total DNA of more than 2 µg.
The genotyping by sequencing (GBS) libraries were constructed using a double digest GBS approach that employed a rare-common pair of restriction enzymes [28,29] (EcoR I and Nia III, New England Biolabs, Ipswich, MA, USA) to perform digestion reactions on the above-prepared DNA samples [30]. The digested products first had both of the DNA ends ligated with A1 or A2 adapters separately and were then pooled together to produce the libraries for all the 195 studied peanut accessions. From the pooled libraries, 350 bp DNA fragments were separated effectively on a 1% agarose gel, column-cleaned using PCR purification kit (New England Biolabs), and amplified for 12 cycles using Phusion DNA polymerase (New England Biolabs). The constructed GBS libraries have their concentration adjusted to 10 nmol/L and sequenced on an Illumina HiSeq Xten platform (Illumina, Guangzhou, China).
SNP calling was performed for all samples using the GATK's Unified Genotyper (https://software. broadinstitute.org/gatk) [32]. The identified SNPs were filtered to reduce the false positive errors using GATK Variant Filtration. To improve SNP data quality, the SNP candidates were further filtered based on the following criteria: (i) Quality score >2.0; (ii) coverage depth >3 fold; (iii) missing ratio within each population <20%; (iv) a global minor allele frequency (MAF) >0.05. All high-quality SNPs were annotated using ANNOVAR [33].

Population Genetic Analysis
Based on the SNPs that were identified above, a phylogenetic tree was constructed by a neighbor-joining (NJ) method as implemented in PHYLIP v3.69 [34], with the bootstrap values calculated with 10,000 replicates. The population structure of the analyzed samples was first preliminarily inferred with the principal component analysis (PCA) incorporated in the software package GCTA (genome-wide complex trait analysis) [35] and was then further analyzed using Admixture v1.3.0 [36]. For the Admixture analysis, 10 potential numbers (1 to 10) of ancestral populations (K) were tested using a cross-validation procedure, and the one with the lowest cross-validation error was chosen as the best K value (http://software.genetics.ucla.edu/admixture/admixture-manual.pdf) [36]. The matrix of pairwise kinship coefficients among the studied accessions was calculated using the software SPAGeDi v1.5 [37].

Genome-Wide Association Study Analysis
Genome-wide association study (GWAS) analyses of the seven aforementioned yield-related traits based on the acquired high-quality SNPs were conducted using TASSEL v5.0 [38,39]. Four different models were tried for each trait: A general linear model that focuses only on the SNP effect (here referred to as the GLM model), a second general linear model that also takes population stratification (represented by the Q-matrix of ancestry coefficients) into consideration (the Q model), a mixed linear model that considers degree of genetic covariance among the studied individuals (estimated as the K-matrix of relative kinship coefficients) instead of population stratification (the K model), and another mixed linear model that includes both the population and family structure effects (here referred to as the MLM model). The Q-matrix was estimated using Admixture v1.3.0 with the most likely numbers (i.e., K = 4) of ancestral populations, while the K-matrix was acquired from SPAGeDi v1.5, and both matrices are based on all the high-quality SNPs identified in the present study. For each trait, the best model out of the tested four based on Q-Q plot was accepted as the final model. We used Bonferroni corrected p-value to take the multiple testing problems into consideration, and a p-value of 0.05/(the total number of SNP markers) (i.e., 0.05/13435 = 3.72 × 10 −6 ) or less was required to establish the significance. A candidate genome region that may be responsible for the studied yield-related traits was defined as the most strongly associated SNP (thereafter referred to as peak SNP) plus a 200kb-long genomic region that centered on the peak SNP [10,40].

Candidate Gene Identification
The candidate genome regions identified above were compared with the QTLs from previous relevant QTL/GWAS mapping studies [6][7][8][9][12][13][14], and the candidate genomic regions that co-localized with QTLs from those earlier studies received close examination. The genes within the candidate genomic regions were defined as candidate associated genes (CAGs). Both gene ontology (GO) enrichment analysis and KEGG (Kyoto Encyclopedia of Genes and Genomes pathway database) pathway enrichment analysis were carried out on the CAGs using the Omicshare web server (www. omicshare.com/tools), and provided information about which biological processes/KEGG pathways these genes were enriched in. To accommodate the multiple testing problem in the enrichment analyses, FDR adjusted p-values were calculated with a value of 0.05 being the significance threshold.

Characterisation and Distribution of SNPs in The Peanut Genome
To reveal the genetic base of seven yield-related traits, a total of 195 peanut accessions were genotyped using the genotyping-by-sequencing (GBS) approach. Sequencing of the GBS libraries produced approximately 1695 million clean reads, which were of high quality (93.1% reads with a Phred score >30) and had a GC content of 37.6-40.2%. About 98.88% of those clean reads were successfully mapped to the peanut reference genome. These clean reads were deposited in the sequence read archive database under SRA accession: PRJNA525244.
Following a stringent SNP calling protocol, a total of 13,435 SNPs were identified (Table S3). The peanut genome had an average SNP density of 5.93 SNPs/Mb with the densities at chromosomes Arahy.08 (3.70 SNPs/Mb) and Arahy.19 (7.45 SNPs/Mb) being, respectively, the lowest and the highest ( Figure 1; Table S4). Most of the identified SNPs were found at intergenic regions (89.9%), while the exonic, intronic, up-, and down-stream regions only accounted for, respectively, 3.1%, 2.6%, and 3.6% of the total SNPs (Table S5). Of those SNPs within exonic regions, 35.06% were nonsynonymous while 62.07% were synonymous. The genome-wide transition/transversion (Ts/Tv) ratio for the analyzed peanut genome data was 1.94 ( Figure S2). while the exonic, intronic, up-, and down-stream regions only accounted for, respectively, 3.1%, 2.6%, and 3.6% of the total SNPs (Table S5). Of those SNPs within exonic regions, 35.06% were nonsynonymous while 62.07% were synonymous. The genome-wide transition/transversion (Ts/Tv) ratio for the analyzed peanut genome data was 1.94 ( Figure S2).

Genetic Diversity, Population Structure, and Genetic Diversity
Overall, the levels of genome-wide nucleotide diversity (π) of the irregular accession group (π = 0.00042) being higher than those of the four studied peanut botanical varieties (π = 0.00006-0.00025). To be specific for the botanical varieties, the highest level of nucleotide diversity (π = 0.00025) was found in A. hypogaea var. hirsuta while the lowest (π = 0.00006) in var. hypogaea, with those of var. fastigiata (π = 0.00011) and var. vulgaris (π = 0.00018) being in the middle.
A neighbor-joining (NJ) tree inferred from the acquired SNPs showed that the analyzed peanut accessions could be approximately classified into two major groups, with the first group being dominated by samples from A. hypogaea ssp. hypogaea (var. hypogaea and var. hirsuta), while the second group mostly comprised accessions from A. hypogaea ssp. fastigiata (var. vulgaris and var. fastigiata), but also with a considerable proportion of members coming from ssp. hypogaea (mostly var. hypogaea) (Figure 2e,d).
Results from the principal component analysis (PCA) were in accordance with those of the phylogenetic analysis (Figure 2c

Genetic Diversity, Population Structure, and Genetic Diversity
Overall, the levels of genome-wide nucleotide diversity (π) of the irregular accession group (π = 0.00042) being higher than those of the four studied peanut botanical varieties (π = 0.00006-0.00025). To be specific for the botanical varieties, the highest level of nucleotide diversity (π = 0.00025) was found in A. hypogaea var. hirsuta while the lowest (π = 0.00006) in var. hypogaea, with those of var. fastigiata (π = 0.00011) and var. vulgaris (π = 0.00018) being in the middle.
A neighbor-joining (NJ) tree inferred from the acquired SNPs showed that the analyzed peanut accessions could be approximately classified into two major groups, with the first group being dominated by samples from A. hypogaea ssp. hypogaea (var. hypogaea and var. hirsuta), while the second group mostly comprised accessions from A. hypogaea ssp. fastigiata (var. vulgaris and var. fastigiata), but also with a considerable proportion of members coming from ssp. hypogaea (mostly var. hypogaea) (Figure 2e,d).
Results from the principal component analysis (PCA) were in accordance with those of the phylogenetic analysis (Figure 2c,f). The population structure of the studied peanut samples was further investigated by the Admixture software. Out of the 10 tested potential numbers (K: 1-10) of ancestral populations, K = 4 represented the most sensible choice according to cross-validation error value: It has the lowest value (Figure 2b), however its difference with K = 3 was small (Figure 2b). For K = 4, the first three ancestral populations (green, blue, and red, Figure 2a) were dominated by individuals from A. hypogaea var. hirsuta and var. hypogaea, while the fourth ancestral population (purple) prevailed the genetic makeups of var. vulgaris, var. fastigiata and ca. 40% of var. hypogaea. For K = 3, again var. hirsuta and a big part of var. hypogaea dominated the first two ancestral populations (green and red) while the rest of the peanut accessions constituted the majority of the third population (blue). Overall, the results of both K = 3 and 4 agreed well with that of K = 2; for the latter, the first population was mostly composed of individuals from var. hirsuta and a big part of var. hypogaea, whereas the second ancestral population had var. vulgaris, var. fastigiata and ca. 40% of the var. hypogaea individuals as its main members.
Genes 2019, 10, x FOR PEER REVIEW 6 of 17 individuals from A. hypogaea var. hirsuta and var. hypogaea, while the fourth ancestral population (purple) prevailed the genetic makeups of var. vulgaris, var. fastigiata and ca. 40% of var. hypogaea. For K = 3, again var. hirsuta and a big part of var. hypogaea dominated the first two ancestral populations (green and red) while the rest of the peanut accessions constituted the majority of the third population (blue). Overall, the results of both K = 3 and 4 agreed well with that of K = 2; for the latter, the first population was mostly composed of individuals from var. hirsuta and a big part of var. hypogaea, whereas the second ancestral population had var. vulgaris, var. fastigiata and ca. 40% of the var. hypogaea individuals as its main members.

Phenotypic Correlation and Heritability for Different Traits
One hundred and sixty-five out of the 195 studied peanut accessions have been phenotyped for three years at three different locations, and based on these phenotype data, the correlation and the heritability of seven agronomic traits (hundred-seed weight (SW), hundred-pod weight (PW), yield per plant (YP), mature pod number per plant (MPP), pod number per plant (PNP), pod branch number per plant (PBP), and total branch number per plant (TBP)) were estimated. All seven traits were found to follow the normal distribution without any significant skewness and kurtosis ( Figure  3). The correlation between the seven traits were all positive, with the correlation coefficient (r) estimates being, or less than, 0.45, except those for the trait pairs YP-PBP/MPP/SW/PW (0.60-0.66), as well as SW-PW (0.91) and MPP-PBP (0.93) ( Figure S3). The broad-sense heritability (H 2 ) for SW (H 2 = 0.72) and PW (0.63) were the highest among the seven agronomic traits, while those for PBP (0.04), PNP (0.14), and MPP (0.19) were the lowest, and those for TBP (0.54) and YP (0.39) were in the middle (Table 1).

Phenotypic Correlation and Heritability for Different Traits
One hundred and sixty-five out of the 195 studied peanut accessions have been phenotyped for three years at three different locations, and based on these phenotype data, the correlation and the heritability of seven agronomic traits (hundred-seed weight (SW), hundred-pod weight (PW), yield per plant (YP), mature pod number per plant (MPP), pod number per plant (PNP), pod branch number per plant (PBP), and total branch number per plant (TBP)) were estimated. All seven traits were found to follow the normal distribution without any significant skewness and kurtosis ( Figure 3). The correlation between the seven traits were all positive, with the correlation coefficient (r) estimates being, or less than, 0.45, except those for the trait pairs YP-PBP/MPP/SW/PW (0.60-0.66), as well as SW-PW (0.91) and MPP-PBP (0.93) ( Figure S3). The broad-sense heritability (H 2 ) for SW (H 2 = 0.72) and PW (0.63) were the highest among the seven agronomic traits, while those for PBP (0.04), PNP (0.14), and MPP (0.19) were the lowest, and those for TBP (0.54) and YP (0.39) were in the middle (Table 1). Genes 2019, 10, x FOR PEER REVIEW 7 of 17

Genome-Wide Association Studies in Peanut
Genome-wide association analyses of the phenotypically characterized peanut accessions were conducted to see if any of the acquired SNPs had variants that were associated with the considered yield-related traits. Significantly associated SNPs were found for four of the studied agronomic traits (SW, PW, YP, and PBP), with the MLM and GLM models being the best statistical models, respectively, for traits SW, PW, PBP, and for trait YP based on the Q-Q plots ( Figure 4). It should be noted that there is considerable residual inflation in the Q-Q plots for the best models of traits PW and SW, therefore the associated SNPs identified for these two traits should be interpreted with caution.

Genome-Wide Association Studies in Peanut
Genome-wide association analyses of the phenotypically characterized peanut accessions were conducted to see if any of the acquired SNPs had variants that were associated with the considered yield-related traits. Significantly associated SNPs were found for four of the studied agronomic traits (SW, PW, YP, and PBP), with the MLM and GLM models being the best statistical models, respectively, for traits SW, PW, PBP, and for trait YP based on the Q-Q plots ( Figure 4). It should be noted that there is considerable residual inflation in the Q-Q plots for the best models of traits PW and SW, therefore the associated SNPs identified for these two traits should be interpreted with caution.
To sum it up, there were 93 non-overlapping peak SNPs that had been identified to be associated with the studied yield-related traits, and annotation of the 200kb genome regions that centered on these peak SNPs found a total of 311 unique preliminary candidate associated genes (CAGs) ( Table S7).

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) Pathway Enrichment Analyses
Gene annotation of the 200kb genomic regions that centered on the peak SNPs, which had been identified to be associated with the studied yield-related traits, found 107 candidate associated genes (CAGs) for trait PBP, 70 CAGs for PW, 132 CAGs for SW, and 88 CAGs for YP (Table S7). GO enrichment analyses of these candidate genes revealed that there were in total 24 GO processes that were significantly enriched among the PBP-associated CAGs, with hydrogen ion transmembrane transport (GO: 1902600) and glucose metabolic process (GO: 0006006) being the two most significant ones (Table S7). There were 33 GO processes significantly enriched among PW-associated CAGs, with chlorophyll biosynthetic process being the most significant one. For SW, there were only three significantly enriched GO processes: Non-recombinational repair (GO: 0000726), DNA repair (GO: 0006281), and cellular response to DNA damage stimulus (GO: 0006974). No GO process is significant for YP.
Regarding the KEGG pathway enrichment analyses, the two significant pathways enriched in the CAGs for PBP were autophagy (ko04136, Cellular processes) and oxidative phosphorylation (ko00190, Metabolism). There was only one significantly enriched pathway (ko03020, RNA polymerase) for PW (genetic information processing). For SW, four pathways were significantly enriched among the associated CAGs: Non-homologous end-joining (ko03450) and proteasome (ko03050) (both were Genetic information processing), as well as tyrosine metabolism (ko00350) and nitrogen metabolism (ko00910) (both belonging to Metabolism). No pathway was significant for YP (Table S7).

Literature Survey of Previous Identified QTLs
The previous QTL mapping/GWAS studies (using different sets of peanut accessions comparing to the present study) that identified significant QTL regions for traits PBP, SW, PW, and YP have been summarized (Table 2 and Table S8), and the QTL physical intervals of these identified QTLs for each trait were then determined based on the available left/right marker primer sequences (Table S8). From these earlier QTL/GWAS studies, 2, 46, 29, and 35 matched genomic regions were, respectively, found to be associated with traits PBP, PW, SW, and YP (Table S8).
Gene annotation of the 12 peak SNPs that were co-localized with earlier identified QTLs found a total of 36 CAGs. KEGG pathway enrichment analyses showed that these 36 CAGs were enriched in 19 different KEGG pathways (Table S9), however none were significant (corrected p value < 0.05). Ten of the enriched pathways were involved in metabolic pathways (e.g., nitrogen metabolism, fatty acid biosynthesis, and fatty acid degradation) (Table S9). GO enrichment analysis of these 36 CAGs found no significantly enriched GO process, but one of the top 10 processes is the nucleotide-sugar metabolic process (GO: 0009225). However, one should interpret this result with caution, because the number of genes that were analyzed is very limited (only 36), and only six of these genes were, respectively, recognized/considered by the GO and KEGG analyses.

Figure 5.
Quantitative trait loci (QTLs) identified to be associated with yield-related traits by both the current GWAS and earlier QTL mapping studies. The orange, green, blue, and magenta colors, respectively, for the yield-related traits PBP, PW, SW, and YP. The colorful lines represent the QTLs identified by the present study, the solid squares point to the QTLs that are co-localized with earlier identified QTLs. Most of these co-localized QTLs are located on chromosomes Arahy.05 and Arahy. 16. Gene annotation of the 12 peak SNPs that were co-localized with earlier identified QTLs found a total of 36 CAGs. KEGG pathway enrichment analyses showed that these 36 CAGs were enriched in 19 different KEGG pathways (Table S9), however none were significant (corrected p value < 0.05). Ten of the enriched pathways were involved in metabolic pathways (e.g., nitrogen metabolism, fatty acid biosynthesis, and fatty acid degradation) (Table S9). GO enrichment analysis of these 36 CAGs found no significantly enriched GO process, but one of the top 10 processes is the nucleotide-sugar metabolic process (GO: 0009225). However, one should interpret this result with caution, because the number of genes that were analyzed is very limited (only 36), and only six of these genes were, respectively, recognized/considered by the GO and KEGG analyses.     [23] a The physical intervals of the co-localized QTLs have been determined based on the available left/right marker primer sequences (Table S8).

Discussion
The cultivated peanut is an important oilseed crop, which is widely cultivated across tropical, subtropical, and warm temperate area [41,42]. It originated in South America, from where it spread around the world and at the same time evolved phenotypically and genotypically, which allows it to adapt to various agro-ecological environments [42,43]. In the present study, we have analyzed 195 peanut accessions, which represent the majority of the Chinese peanut landraces and encompass rich genetic variations [22]. China is the largest producer and exporter of peanut in the world [2,42]. Based on the acquired genotype and phenotype data, we carried out genome-wide association analyses with the aim to discover the genetic basis for several yield-related traits [17,44].

Genome-Wide SNP Discovery
The genotyping in this study was performed with the genotyping-by-sequencing approach that uses genome sequencing (with reduced genome representation, but on multiplexed samples) to complete the genome-wide molecular marker (SNP) discovery and genotyping at the same time [29]. This high-throughput genotyping approach is considered to be efficient, reliable, and cheap [15,45,46]. The (tetraploid) cultivated peanut comprises AA and BB sub-genomes, which are closely related to each other, and transcriptome assembly of cultivated peanut has been shown to be challenging due to the difficulty in separating the A and B sub-genome gene sequences [41]. To avoid mixing the homologous regions between these two sub-genomes, a very stringent SNP-calling procedure has been adopted, especially when mapping the high-quality reads onto a reference genome using BWA, only one mismatch (-r 1) for each read is allowed; this strategy may filter out a large number of SNPs, but make the identified SNPs and their genome locations highly confident. In total, 13,435 SNPs are identified from the 195 peanut accessions and consistent with Zhang et al. [10], these SNPs are not evenly distributed on the 20 chromosomes of the domesticated peanut, with the B sub-genome (Arahy.11-Arahy.20) containing more SNPs (8134) than the A sub-genome (Arahy.01-Arahy.10) (5359) (Figure 1 and Table S4). The genome-wide average SNP density is 5.93 SNPs/Mb, which is comparable with a relevant earlier result (seven SNPs/Mb in [10]) and acceptable for GWAS analyses in peanut (cf. [10]). In addition, our results show that there is a higher frequency of transitions over transversions within peanut genomes (ratio = 1.94), which is consistent with earlier observations from other species, e.g., rice [47], maize [48], potato [49], pepper [50], and soya bean [18]. This "transition bias" is due to the conformational similarity between purines (A, G) (or between pyrimidines (T, C)), mis-pairing caused by transitional mutations; (C ER REVIEW 12 of 17 e-wide molecular marker (SNP) discovery and genotyping at the same time [29]. ut genotyping approach is considered to be efficient, reliable, and cheap ploid) cultivated peanut comprises AA and BB sub-genomes, which are closely er, and transcriptome assembly of cultivated peanut has been shown to be he difficulty in separating the A and B sub-genome gene sequences [41]. To avoid gous regions between these two sub-genomes, a very stringent SNP-calling adopted, especially when mapping the high-quality reads onto a reference , only one mismatch (-r 1) for each read is allowed; this strategy may filter out a Ps, but make the identified SNPs and their genome locations highly confident. In e identified from the 195 peanut accessions and consistent with Zhang et al. [10], venly distributed on the 20 chromosomes of the domesticated peanut, with the y.11-Arahy.20) containing more SNPs (8134) than the A sub-genome (Arahy.01gure 1 and Table S4). The genome-wide average SNP density is 5.93 SNPs/Mb, with a relevant earlier result (seven SNPs/Mb in [10]) and acceptable for GWAS (cf. [10]). In addition, our results show that there is a higher frequency of sversions within peanut genomes (ratio = 1.94), which is consistent with earlier ther species, e.g., rice [47], maize [48], potato [49], pepper [50], and soya bean [18]. s" is due to the conformational similarity between purines (A, G) (or between ), mis-pairing caused by transitional mutations; (C↔T, A↔G) is therefore ore favorable than that generated by transversions (A↔C, C↔G, A↔T, G↔T) nsitions give rise to less amino acid replacements and tend to conserve the f an amino acid if it does bring on amino acid replacements. ture ted peanut has been grouped into two subspecies (ssp. hypogaea and ssp. be further sorted out into six botanical varieties [44]. The 195 studied peanut udy are mostly samples from four botanical varieties (ssp. hypogaea: var. hirsuta p. fastigiata: var. vulgaris and var. fastigiata) and almost equally represent the two ). Our PCA, phylogenetic, and Admixure analyses that are based on genomic genome-wide molecular marker (SNP) discovery and genotyping at the same time [29]. roughput genotyping approach is considered to be efficient, reliable, and cheap e (tetraploid) cultivated peanut comprises AA and BB sub-genomes, which are closely ch other, and transcriptome assembly of cultivated peanut has been shown to be ue to the difficulty in separating the A and B sub-genome gene sequences [41]. To avoid omologous regions between these two sub-genomes, a very stringent SNP-calling as been adopted, especially when mapping the high-quality reads onto a reference g BWA, only one mismatch (-r 1) for each read is allowed; this strategy may filter out a r of SNPs, but make the identified SNPs and their genome locations highly confident. In NPs are identified from the 195 peanut accessions and consistent with Zhang et al. [10], re not evenly distributed on the 20 chromosomes of the domesticated peanut, with the e (Arahy.11-Arahy.20) containing more SNPs (8134) than the A sub-genome (Arahy.01-59) (Figure 1 and Table S4). The genome-wide average SNP density is 5.93 SNPs/Mb, parable with a relevant earlier result (seven SNPs/Mb in [10]) and acceptable for GWAS peanut (cf. [10]). In addition, our results show that there is a higher frequency of er transversions within peanut genomes (ratio = 1.94), which is consistent with earlier from other species, e.g., rice [47], maize [48], potato [49], pepper [50], and soya bean [18]. ion bias" is due to the conformational similarity between purines (A, G) (or between (T, C)), mis-pairing caused by transitional mutations; (C↔T, A↔G) is therefore ally more favorable than that generated by transversions (A↔C, C↔G, A↔T, G↔T) er, transitions give rise to less amino acid replacements and tend to conserve the perty of an amino acid if it does bring on amino acid replacements.
n Structure esticated peanut has been grouped into two subspecies (ssp. hypogaea and ssp. ich can be further sorted out into six botanical varieties [44]. The 195 studied peanut this study are mostly samples from four botanical varieties (ssp. hypogaea: var. hirsuta gaea; ssp. fastigiata: var. vulgaris and var. fastigiata) and almost equally represent the two igure 2). Our PCA, phylogenetic, and Admixure analyses that are based on genomic G) is therefore conformationally more favorable than that generated by transversions (A REVIEW 12 of 17 wide molecular marker (SNP) discovery and genotyping at the same time [29]. t genotyping approach is considered to be efficient, reliable, and cheap loid) cultivated peanut comprises AA and BB sub-genomes, which are closely , and transcriptome assembly of cultivated peanut has been shown to be difficulty in separating the A and B sub-genome gene sequences [41]. To avoid us regions between these two sub-genomes, a very stringent SNP-calling dopted, especially when mapping the high-quality reads onto a reference nly one mismatch (-r 1) for each read is allowed; this strategy may filter out a , but make the identified SNPs and their genome locations highly confident. In identified from the 195 peanut accessions and consistent with Zhang et al. [10], enly distributed on the 20 chromosomes of the domesticated peanut, with the .11-Arahy.20) containing more SNPs (8134) than the A sub-genome (Arahy.01-re 1 and Table S4). The genome-wide average SNP density is 5.93 SNPs/Mb, ith a relevant earlier result (seven SNPs/Mb in [10]) and acceptable for GWAS f. [10]). In addition, our results show that there is a higher frequency of ersions within peanut genomes (ratio = 1.94), which is consistent with earlier er species, e.g., rice [47], maize [48], potato [49], pepper [50], and soya bean [18].
is due to the conformational similarity between purines (A, G) (or between mis-pairing caused by transitional mutations; (C↔T, A↔G) is therefore e favorable than that generated by transversions (A↔C, C↔G, A↔T, G↔T) itions give rise to less amino acid replacements and tend to conserve the n amino acid if it does bring on amino acid replacements.
re d peanut has been grouped into two subspecies (ssp. hypogaea and ssp. be further sorted out into six botanical varieties [44]. The 195 studied peanut y are mostly samples from four botanical varieties (ssp. hypogaea: var. hirsuta fastigiata: var. vulgaris and var. fastigiata) and almost equally represent the two C, C OR PEER REVIEW 12 of 17 enome-wide molecular marker (SNP) discovery and genotyping at the same time [29]. ughput genotyping approach is considered to be efficient, reliable, and cheap (tetraploid) cultivated peanut comprises AA and BB sub-genomes, which are closely h other, and transcriptome assembly of cultivated peanut has been shown to be e to the difficulty in separating the A and B sub-genome gene sequences [41]. To avoid mologous regions between these two sub-genomes, a very stringent SNP-calling been adopted, especially when mapping the high-quality reads onto a reference BWA, only one mismatch (-r 1) for each read is allowed; this strategy may filter out a f SNPs, but make the identified SNPs and their genome locations highly confident. In Ps are identified from the 195 peanut accessions and consistent with Zhang et al. [10], not evenly distributed on the 20 chromosomes of the domesticated peanut, with the (Arahy.11-Arahy.20) containing more SNPs (8134) than the A sub-genome (Arahy.01-9) (Figure 1 and Table S4). The genome-wide average SNP density is 5.93 SNPs/Mb, arable with a relevant earlier result (seven SNPs/Mb in [10]) and acceptable for GWAS anut (cf. [10]). In addition, our results show that there is a higher frequency of r transversions within peanut genomes (ratio = 1.94), which is consistent with earlier om other species, e.g., rice [47], maize [48], potato [49], pepper [50], and soya bean [18]. n bias" is due to the conformational similarity between purines (A, G) (or between , C)), mis-pairing caused by transitional mutations; (C↔T, A↔G) is therefore lly more favorable than that generated by transversions (A↔C, C↔G, A↔T, G↔T) r, transitions give rise to less amino acid replacements and tend to conserve the rty of an amino acid if it does bring on amino acid replacements.
Structure esticated peanut has been grouped into two subspecies (ssp. hypogaea and ssp. h can be further sorted out into six botanical varieties [44]. te the genome-wide molecular marker (SNP) discovery and genotyping at the same time [29]. igh-throughput genotyping approach is considered to be efficient, reliable, and cheap 6]. The (tetraploid) cultivated peanut comprises AA and BB sub-genomes, which are closely to each other, and transcriptome assembly of cultivated peanut has been shown to be ging due to the difficulty in separating the A and B sub-genome gene sequences [41]. To avoid the homologous regions between these two sub-genomes, a very stringent SNP-calling ure has been adopted, especially when mapping the high-quality reads onto a reference e using BWA, only one mismatch (-r 1) for each read is allowed; this strategy may filter out a umber of SNPs, but make the identified SNPs and their genome locations highly confident. In ,435 SNPs are identified from the 195 peanut accessions and consistent with Zhang et al. [10], NPs are not evenly distributed on the 20 chromosomes of the domesticated peanut, with the enome (Arahy.11-Arahy.20) containing more SNPs (8134) than the A sub-genome (Arahy.01-10) (5359) (Figure 1 and Table S4). The genome-wide average SNP density is 5.93 SNPs/Mb, s comparable with a relevant earlier result (seven SNPs/Mb in [10]) and acceptable for GWAS s in peanut (cf. [10]). In addition, our results show that there is a higher frequency of ons over transversions within peanut genomes (ratio = 1.94), which is consistent with earlier tions from other species, e.g., rice [47], maize [48], potato [49], pepper [50], and soya bean [18]. ransition bias" is due to the conformational similarity between purines (A, G) (or between ines (T, C)), mis-pairing caused by transitional mutations; (C↔T, A↔G) is therefore ationally more favorable than that generated by transversions (A↔C, C↔G, A↔T, G↔T) oreover, transitions give rise to less amino acid replacements and tend to conserve the al property of an amino acid if it does bring on amino acid replacements.
ulation Structure e domesticated peanut has been grouped into two subspecies (ssp. hypogaea and ssp. a), which can be further sorted out into six botanical varieties [44]. complete the genome-wide molecular marker (SNP) discovery and genotyping at the same time [29]. This high-throughput genotyping approach is considered to be efficient, reliable, and cheap [15,45,46]. The (tetraploid) cultivated peanut comprises AA and BB sub-genomes, which are closely related to each other, and transcriptome assembly of cultivated peanut has been shown to be challenging due to the difficulty in separating the A and B sub-genome gene sequences [41]. To avoid mixing the homologous regions between these two sub-genomes, a very stringent SNP-calling procedure has been adopted, especially when mapping the high-quality reads onto a reference genome using BWA, only one mismatch (-r 1) for each read is allowed; this strategy may filter out a large number of SNPs, but make the identified SNPs and their genome locations highly confident. In total, 13,435 SNPs are identified from the 195 peanut accessions and consistent with Zhang et al. [10], these SNPs are not evenly distributed on the 20 chromosomes of the domesticated peanut, with the B sub-genome (Arahy.11-Arahy.20) containing more SNPs (8134) than the A sub-genome (Arahy.01-Arahy.10) (5359) (Figure 1 and Table S4). The genome-wide average SNP density is 5.93 SNPs/Mb, which is comparable with a relevant earlier result (seven SNPs/Mb in [10]) and acceptable for GWAS analyses in peanut (cf. [10]). In addition, our results show that there is a higher frequency of transitions over transversions within peanut genomes (ratio = 1.94), which is consistent with earlier observations from other species, e.g., rice [47], maize [48], potato [49], pepper [50], and soya bean [18]. This "transition bias" is due to the conformational similarity between purines (A, G) (or between pyrimidines (T, C)), mis-pairing caused by transitional mutations; (C↔T, A↔G) is therefore conformationally more favorable than that generated by transversions (A↔C, C↔G, A↔T, G↔T) [51]. Moreover, transitions give rise to less amino acid replacements and tend to conserve the chemical property of an amino acid if it does bring on amino acid replacements.

Population Structure
The domesticated peanut has been grouped into two subspecies (ssp. hypogaea and ssp. fastigiata), which can be further sorted out into six botanical varieties [44]. The 195 studied peanut accessions in this study are mostly samples from four botanical varieties (ssp. hypogaea: var. hirsuta and var. hypogaea; ssp. fastigiata: var. vulgaris and var. fastigiata) and almost equally represent the two T) [51]. Moreover, transitions give rise to less amino acid replacements and tend to conserve the chemical property of an amino acid if it does bring on amino acid replacements.

Population Structure
The domesticated peanut has been grouped into two subspecies (ssp. hypogaea and ssp. fastigiata), which can be further sorted out into six botanical varieties [44]. The 195 studied peanut accessions in this study are mostly samples from four botanical varieties (ssp. hypogaea: var. hirsuta and var. hypogaea; ssp. fastigiata: var. vulgaris and var. fastigiata) and almost equally represent the two subspecies ( Figure 2). Our PCA, phylogenetic, and Admixure analyses that are based on genomic data also reveal two major groups within the domesticated peanut, and overall, these two groups agree with the taxonomic grouping, except some (ca. 40%) var. hypogaea individuals (Figure 2). One major group corresponds to ssp. hypogaea and comprises mostly individuals from var. hirsuta and var. hypogaea (ca. 60% of the individuals from this species), while the other major group more or less matches up with ssp. fastigiata and is composed of individuals from var. vulgaris and var. fastigiata, as well as ca. 40% of the var. hypogaea individuals. Perhaps the unexpected behaviour of the ca. 40% of the var. hypogaea individuals that are genetically more closely related to ssp. fastigiata instead of ssp. hypogaea (Figure 2) is not surprising, because similar ambiguous genetic boundary between the two domesticated peanut subspecies have been reported in other studies [10,15] as well, even though different sets of peanut accessions that represent the two subspecies differently were used in these studies. These unexpected behaviors may suggest genetic introgression between ssp. fastigiata and ssp. hypogaea during artificial selection [52,53]. In addition, all the analyzed 195 peanut accessions are recently diverged, and it has also been shown that more than 40% of Chinese peanut cultivars originated from a limited number of elite germplasms (e.g., 'Fuhuasheng', 'Xuzhou68-4', 'Shitouqi', or 'Yueyou551') [2,54,55], therefore it may be hard to avoid that the shared ancestral polymorphism complicates the genetic delimitation among subspecies/cultivars.

Yield-Related Candidate Genes
Our GWAS analyses have identified a total of 93 non-overlapping SNP peaks that are associated with four yield-related traits, and seven peak SNPs were identified to be associated simultaneously with three traits (YP, PW, SW), which is not surprising considering that traits YP, SW, and PW are highly correlated with each other (r > 0.6, Figure S3). Genes responsible for the yield-related traits may include or be linked to the identified SNP peaks. In order to look for candidate genes, the 200 kb genomic regions that centered on, and are also very likely to be linked to [10], these SNP peaks were annotated; a total of 311 unique candidate associated genes (CAGs) were identified. However, it should be noted that because there is considerable residual inflation in the Q-Q plots for the best models of traits PW and SW (Figure 4), which may be due to unknown factors that were not considered in the models, the CAGs identified for these two traits should be interpreted with caution. Among the 311 unique identified CAGs, 36 are from genomic regions surrounding the 12 SNP peaks (co-localized genomic regions, on chromosomes Arahy.04, Arahy.05, and Arahy. 16) that have been identified to be associated with yield-related traits in peanut by not only the present study but also earlier QTL mapping studies (Table 2 and Table S9). It should be mentioned that five (arahy.7B5I5W, arahy.KBV6L0, arahy.Y4GE00, arahy.A3WBL0, and arahy.H97NC3) of these 36 CAGs surround two SNP peaks (rs9379 and rs9380) that are associated with trait SW, and none of the 36 CAGs are for trait PW.
A close examination of these 36 CAGs around the 12 relatively confident QTLs for yield found that the arahy.RI9HIF gene (predicted to produce high affinity nitrate transporter 2.4 in A. hypogaea) (within the 200 kb genomic region centering on the SNP peak, rs9411, on chromosome Arahy.16) may deserve more attention. This is because one rice homolog (OsNRT2.3) of this gene produces a protein (OsNRT2.3b) that has been shown to improve yield in rice when over expressed [56,57]. Whether variation at arahy.RI9HIF or at some other CAGs especially in the co-localized genomic regions significantly influences peanut yield needs further investigation.

Conclusions
By carrying out GWAS analyses, we have identified 93 non-overlapping SNP peaks that are significantly associated with four yield-related traits in peanut, and gene annotation of the genomic regions surrounding these SNP peaks identified 311 unique candidate genes. A comparison with earlier related QTL mapping/GWAS studies show that 12 of the 93 yield-related-trait-associated SNP peaks are co-localized with earlier identified QTLs, and 36 candidate genes have been identified from these 12 co-localized genomic regions.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/10/10/803/s1, Figure S1: Geographic distribution of the analyzed peanut accessions. Each accession is displayed as a dot. Different colors stand for the different botanical varieties. (NOTE: The map used here represents only part of China), Figure S2: A summary of the transitions and transversions that occurred within the acquired SNPs (Ts/Tv = 1.94), Figure S3: Correlation coefficients between the studied phenotypic traits, Table S1: Basic information on the 195 studied peanut accessions, Table S2: A summary of the predicted genome-wide restriction reaction result using the enzyme pair EcoR I and Nia III, Table S3: Match statistics for the BWA mapping of the acquired high-quality reads to a peanut reference genome and the number of acquired SNPs after different filtering steps, Table S4: The distribution and frequency of the identified SNPs on the 20 peanut chromosomes, Table S5: A summary of the distributions of the acquired SNPs in different genic and intergenic regions, Table S6: The peak SNPs that are associated with the yield-related traits PBP, PW, SW, and YP under the respective best model for each traits, Table S7: Information on the annotated genes from the 200kb genome regions centered on the peak SNPs that are, respectively, associated with traits PBP, PW, SW, and YP, Table S8: Information on 199 earlier identified related QTLs and their physical locations. These QTLs were found to be associated with the yield-related traits PW, SW, YP, or PBP, Table S9: Information on 36 candidate genes that are identified, by the present GWAS study, to be associated with yield traits SW, YP, and PW and are at the same time co-localized with earlier identified QTLs.