Fine-Mapping of a Wild Genomic Region Involved in Pod and Seed Size Reduction on Chromosome A07 in Peanut (Arachis hypogaea L.)

Fruit and seed size are important yield component traits that have been selected during crop domestication. In previous studies, Advanced Backcross Quantitative Trait Loci (AB-QTL) and Chromosome Segment Substitution Line (CSSL) populations were developed in peanut by crossing the cultivated variety Fleur11 and a synthetic wild allotetraploid (Arachis ipaensis × Arachis duranensis)4x. In the AB-QTL population, a major QTL for pod and seed size was detected in a ~5 Mb interval in the proximal region of chromosome A07. In the CSSL population, the line 12CS_091, which carries the QTL region and that produces smaller pods and seeds than Fleur11, was identified. In this study, we used a two-step strategy to fine-map the seed size QTL region on chromosome A07. We developed new SSR and SNP markers, as well as near-isogenic lines (NILs) in the target QTL region. We first located the QTL in ~1 Mb region between two SSR markers, thanks to the genotyping of a large F2 population of 2172 individuals and a single marker analysis approach. We then used nine new SNP markers evenly distributed in the refined QTL region to genotype 490 F3 plants derived from 88 F2, and we selected 10 NILs. The phenotyping of the NILs and marker/trait association allowed us to narrowing down the QTL region to a 168.37 kb chromosome segment, between the SNPs Aradu_A07_1148327 and Aradu_A07_1316694. This region contains 22 predicted genes. Among these genes, Aradu.DN3DB and Aradu.RLZ61, which encode a transcriptional regulator STERILE APETALA-like (SAP) and an F-box SNEEZY (SNE), respectively, were of particular interest. The function of these genes in regulating the variation of fruit and seed size is discussed. This study will contribute to a better knowledge of genes that have been targeted during peanut domestication.


Introduction
The domestication of today's food crops occurred approximately 10,000 years ago, with the beginning of agriculture. Human actions on wild crop relatives have drastically changed a wide range of morphological and physiological traits such as plant architecture, fruit size, seed dispersal, etc. These changes are collectively referred to as domestication syndrome [1,2]. Cultivated species generally have larger fruits or seeds compared to their wild ancestors, indicating that fruit and seed size are major agronomic traits that have been selected in crops during their domestication [2].
Peanut, one of the most economically important legumes in the world, is a recent allotetraploid (AABB) species, domesticated in South America. Cultivated peanut (Arachis hypogaea) resulted from a single hybridization event between the two wild diploid species A. duranensis (A genome) and A. ipaensis (B genome), followed by chromosomes doubling [3][4][5][6][7]. This polyploidization event first gave rise to the wild allotetraploid species Arachis monticola and, after subsequent domestication, to the cultivated species A. hypogaea [3,4,8,9]. Recent genome sequencing of all these species and their comparison confirm their phylogenetic relationships [5][6][7]9]. A. duranensis, A. ipaensis, A. monticola and the induced allotetraploid IpaDur1 (A. ipaensis KG30076 × A. duranensis V14167) 4x have similar seed size, and their seeds are smaller than the ones produced by the cultivated species, A. hypogaea [10]. This indicated that chromosome doubling is not directly responsible for increasing seed size in cultivated peanut although it has changed other traits such as plant architecture, biomass and photosynthetic pigments production during domestication [10,11]. Other genome modifications such as mutations, deletions, insertions and/or homeologous recombination could have been involved in the increase of pod size during peanut domestication [9]. Therefore, it is important to identify genes governing pod size variation and understand the genetic changes that occurred during the domestication of the peanut.
Interspecific populations are important genetic resources for mapping genomic regions involved in morphological changes that distinguish crops and their wild relatives. Fonceka et al. [12,13] have developed AB-QTL and CSSL populations from the cross between the cultivated variety Fleur11 and a wild synthetic allotetraploid that combines the genomes of A. ipaensis and A. duranensis. Using the AB-QTL population, these authors were able to detect three genomic regions on chromosomes A07, B02 and B05 where several QTLs for pod and seed size clustered. At these QTLs, the wild alleles explained 10% to 26% pod and seed size reduction. The authors hypothesized these regions as a target of human selection during peanut domestication. More recently, using the CSSL population, Tossim et al. [14] confirmed the QTL in the proximal region of chromosome A07 carried by the line 12CS_091 as involved in pod and seed size variation.
QTL fine-mapping is one mean for identifying genes underlying phenotypic variation. It has been applied extensively in crop species for candidate genes identification and cloning [27]. It is a process by which the size of a QTL region (approximately 20 cM or more) is reduced to a few cM or less. The precision of fine-mapping approaches depend on the recombination frequency as well as the marker density in the QTL region [28,29]. Several marker and population types have been used in QTL fine-mapping, including biparental populations (RILs, NILs . . . ), multiple founder populations (NAM, MAGIC . . . ) and core-collections [29]. In peanut, Agarwal et al. [30] and Khan et al. [31] reported the fine mapping of QTLs involved in disease (ELS, LLS and rust) and Aspergillus resistances respectively, using RIL populations and high density SNP genetic maps. Luo et al. [20] succeeded in mapping major and stable QTLs related to weight and size of pods in 280 kb and 1.48 Mb intervals on chromosome A05 and A07, respectively, thanks to a genetic map constructed with 817 SSRs and a RIL population of 187 lines. Zhuang et al. [7] combined a bulk segregant analysis (BSA) and a RIL population to fine-map a QTL for seed size in 1 Mb interval on chromosome A07. As for other crop species, including rice [32][33][34], tomato [35] and Medicago truncatula [36], near-isogenic lines (NILs) have also been developed in peanut. They however have mostly been used to confirm QTLs involved in resistance to nematodes and rust [37][38][39]. At present, the genome sequences of several wild and cultivated peanut species are available [5][6][7], easing the development of thousands of molecular markers which, combined with resolutive mapping populations, can accelerate the fine mapping of QTLs.
In the present study, we developed a NIL population targeting a pod and seed size QTL region on chromosome A07. We used a two-step genotyping strategy combined with genotype/phenotype associations to increase the marker density and narrow down the QTL region to 168.37 kb interval containing 22 genes. Two interesting genes involved in the ubiquitin-proteasome pathway were highlighted and discussed. The discovery of the gene(s) involving in pod and seed size reduction will provide new information on peanut domestication and useful information for peanut geneticists and breeders.

Plant Materials
The parents used in this study are the cultivated variety Fleur11 and the line 12CS_091. Fleur11 is an improved variety grown in Senegal. It is a Spanish type with an erect growth habit, low to moderate pod constriction, short cycle (90 days), high yielding, and moderate tolerance to drought [12,13,40]. The line 12CS_091 is a chromosome segment substitution line (CSSL) resulting from a backcross program involving Fleur11 and a synthetic wild allotetraploid (A. ipaensis KG30076 × A. duranensis V14167) 4x . Genotypically, this CSSL differs from Fleur11 by the introgression of a segment of A. duranensis chromosome in the proximal region of the chromosome A07 relative to the published genome sequence. Phenotypically, it has smaller seeds than Fleur11 [14]. From these parents, a large F 2 population of 2172 individuals was developed by single cross followed by self-pollination of the F 1 generation. 88 F 2 plants were selected based on genotyping data and advanced to the F 3 generation. A total of 490 F 3 lines were used for SNP genotyping. Finally, 10 F 3 plants were selected and advanced to produce NILs (F 3:5 and F 3:6 ) that were used for phenotyping and for identifying the candidate genes. Figure 1 shows the scheme used for developing the NILs.

DNA Isolation
DNA extraction was performed as described by Fonceka et al. [40]. Briefly, 20 mg of dried young leaves were ground 2 min in a Mixer Mill. The samples were then dissolved in 750 µL of MATAB buffer and incubated at 65 • C for 20 min in a water bath. A volume of 750 µL of chloroform-isoamyl alcohol (CIAA) was added to each sample, followed by centrifugation at 13,000 rpm for 20 min. A total of 600 µL of supernatant was harvested, transferred in a new tube and the DNA was precipitated by adding 600 µL of 2-propanol. After centrifugation, the pellets were washed with 500 µL of 70% ethanol and dried at ambient temperature before being dissolved in 500 µL of 1X Tris-EDTA. DNA extracted was stored at −20 • C for quantification and genotyping.

Development and Validation of the New SSR Markers
The QTL involved in pod and seed size reduction was located in the proximal region of chromosome A07 between RN13D04 and TC23E04 SSR markers [12,14]. This segment is approximately 5 Mb size and is tagged by four markers (RN13D04, Seq2E06, Seq5D05 and TC23E04) [12,40]. In a first step, we downloaded from PeanutBase (https://peanutbase.org/home) the entire sequence of A. duranensis located upstream of marker TC23E04 on chromosome A07. Then, all microsatellites of two, three or four nucleotides, and whose motifs were repeated at least 15 times, were searched using SSR Finder (http://fresnostate.edu/csm/faculty-research/ssrfinder/). Finally, primers were designed for 30 markers with the Primer3 [41]. SSR validation and polymorphism detection were assessed on 7 diploid species of A and B genomes (A. batizocoi, A. duranensis, A. ipaensis, A. cardenasii, A. correntina, A. stenosperma and A. villosa), four synthetic tetraploids , two cultivated varieties (Fleur11 and 73-33). Polymorphic SSRs between Fleur11 and A. duranensis were used for genotyping.

Development and Validation of New SNP Markers
Fifty new SNP markers were identified using GBS (genotyping by sequencing) data of the CSSL population developed by Fonceka et al. [13] and the "Axiom-Arachis" SNP array data [42]. KASP ® markers were developed for each SNP, using 50 bp flanking sequences and validated on A. duranensis, 12CS_091, Fleur11 and a subset of 21 F 3 individuals of known genotype at selected SSR markers.

Near-Isogenic Lines (NILs) Development
In a first step, the 2172 individuals of the F 2 population were genotyped with 3 mapped SSRs (RN13D04, Seq2E06 and TC23E04), that tagged the wild chromosome segment containing the QTL of interest. 188 F 2 that showed at least one recombination event between two adjacent markers were selected and genotyped with 23 polymorphic SSR markers developed in this study. All PCR amplifications were performed as described by Fonceka et al. [13]. The genotypic data were used for anchoring the new SSRs in the genetic map of the proximal region of chromosome A07, using Mapdisto software [43]. In a second step, 88 F 2 out of 188 were selfed to produce 490 F 3 that were genotyped with nine SNP markers developed in this study. SNP genotyping data were generated at CERAAS, using a LightCycler96 (Roche, Basel, Switzerland). Amplification and data analysis were performed using the same protocol as the one for SNP validation, except that the DNA and the mix volumes were doubled. Ten F 3 individuals were identified as NILs based on the distribution of recombination events between RN13D04 and Seq2E06 markers as revealed by SNP genotyping data.

Phenotyping of F 2:3 Families and the NILs
Hundred-seed weight (HSW) was first measured on selected F 2:3 families in order to perform the single-marker analysis.
Ten NILs and the two parents (Fleur11 and 12CS_091) were evaluated under rainfed conditions, in Senegal, between July and October in 2018 and in 2019 at the ISRA-Nioro research station (13 • 45 28.8 N; 15 • 47 13.6 W). The experimental design was a randomized complete block design with three replications. In each replication, the NIL were arranged in rows of ten plants. The spacing was 30 cm between plants and 50 cm between rows. Weeds were managed manually before sowing and during all experiments. After harvesting and drying of pods, hundred-pod weight (HPW), hundred-seed weight (HSW), pod length (PL), pod width (PW), seed length (SL) and seed width (SW) were measured.
All statistical analyses of the phenotypic data were performed with R Core Team, Vienna, Austria (http://www.R-project.org/). The range, mean and standard deviation (SD) were calculated for each trait. An analysis of variance (ANOVA) was performed for each year to estimate the effects of genotypes and replications on each variable. From the results of the ANOVA, a Tukey multiple mean comparison test (HSD, for honestly significant difference) was applied to show differences between genotypes. All variables were also analyzed as a multi-environment trial (MET) using the following mixed model: where y ijk is the response variable observed for the genotype i in the block k and the environment j; µ is the mean; α i is the effect of the genotype i; τ j is the effect of the year j; (ατ) is the interaction effect of the genotype i with the year j; γ jk is the effect of the block k within the year j; and ε ijk is the random error. The genotype was treated as fixed effects while genotype by year as well as block effects were treated as random. Broad-sense heritabilities were then calculated for all traits using the following formula: where σ 2 G is the genotypic variance, σ 2 GY the genotype-by-year interaction variance and σ 2 ε the residual variance.

Fine Mapping
A single-marker analysis was first performed using the SSR genotyping and HSW phenotyping data measured on selected F 2:3 families. To this end, at each SSR locus, phenotypic data for lines homozygous for the cultivated Fleur11 or A. duranensis alleles were grouped, and then the phenotypic means for the two groups for the selected trait were compared using two-sample t-tests.
Phenotypic differences among NILs and between the NILs and their parents were checked vis-à-vis their genotypic constitution. Three groups of NILs were defined: similar phenotypic value to 12CS_091, intermediate phenotype, and similar phenotypic value to Fleur11. The QTL position was refined based on NILs phenotypic and genotypic differences and/or similarities. The most probable location of the QTL was identified and flanking SNP markers noted. MapChart [44] was used to draw the physical map of the markers and the position of the QTL of interest. Finally, sequence data between these two flanking SNP were download from the A. duranensis genome sequence available in PeanutBase (https://www.peanutbase.org/gbrowse_aradu1.0). All annotated genes were analysed in order to identify candidate genes.

Sequence Alignment of Selected Candidate Genes
Sequence alignment (coding DNA sequences (CDS) and proteins) was performed for two putative candidate genes using ClustalW (https://www.genome.jp/tools-bin/clustalw). The protein sequences corresponding to the genes were obtained using MEGA software 10.1.7 (https://www.megasoftware. net/). The gene sequences of A. duranensis were aligned against the homologous genes from A. hypogaea subs. hypogaea var. Tifrunner and A. hypogaea subs. fastigiata var. Shitouqi. Homologous genes were found by performing BLASTn and/or using the keyword search option in PeanutBase for var. Tifrunner (https://peanutbase.org) and in Peanut Genome Resource for var. Shitouqi (http://peanutgr.fafu.edu.cn/).

New SSR Markers
We developed 30 new SSR markers in the target A07 chromosome region. Among these 30 SSRs, 28 were perfect repeats and 2 were composites. Twenty microsatellites had three nucleotide repeats and 10 were di-nucleotide repeats (Table S1). A total of 29 SSRs gave clear PCR amplification products. When considering all accessions used for the validation, 25 SSRs were polymorphic with an average number of 7.68 alleles per locus (Table S1). A total of 23 SSRs were polymorphic between Fleur11 and A. duranensis.

New SNP Markers
Fifty new SNPs (31 from GBS data and 19 from the "Axiom-Arachis" SNP array) were developed in the target region. Among these SNPs, 23 were polymorphic, 23 were monomorphic and four did not amplify. Validation plots of some SNPs are shown in Figure S1 and a summary of the polymorphic SNP is presented in Table 1. GBS data provided more polymorphic markers than the Axiom-Arachis SNP array (67.7% vs. 10.5%). All markers developed from GBS data were co-dominant. Nine SNPs (Table 1) were used for the genotyping of the F 3 plants.

Development of the NILs
In a first step, the genotyping of 2172 F 2 with previously mapped SSRs markers allowed identifying 188 F 2 that had at least one recombination between RN13D04 and Seq2E06 or between Seq2E06 and TC23E04. These F 2 individuals were then genotyped with the 23 SSRs polymorphic between Fleur11 and A. duranensis (Table S1) of which 15 showed expected segregation profiles. All 15 new markers mapped on chromosome A07 between Seq2E06 and TC23E04 ( Figure S2). We then performed a single marker analysis for HSW trait to identify the most likely marker interval that houses the QTL. The results from the two-sample t-tests showed that HSW reduction was significantly associated with A. duranensis alleles at RN13D04 (p = 0.0001) and at Seq2E06 markers (p = 0.01). No significant reduction of HSW was associated with A. duranensis alleles at the TC23E04 marker. These results indicated RN13D04-Seq2E06 interval (about 1 Mb) as the most likely location of the QTL.
In a second step, we selected nine SNP markers based on their genomic position to cover the RN13D04-Seq2E06 region. One SNP was located upstream of RN13D04, 5 SNPs between RN13D04 and Seq2E06, and three SNPs downstream of Seq2E06. These SNPs and the SSR markers RN13D04 and Seq2E06 were used to genotype the 490 F 3 derived from 88 selected F 2 . The genotypic data were used to identify F 3 lines that showed at least one recombination event. Forty lines were identified out of which 10 were selected as NILs (Figure 2a). Among these 10 lines, nine were homozygous at all markers and one line (1575-02) had a particular genotypic constitution. In the plot of signal intensity generated for analysing the SNPs Aradu_A07_1136308 and Aradu_A07_1148327, this line was located between the cluster formed by the genotypes similar to Fleur11 and the cluster formed by heterozygous lines. To further investigate the genotype of the line 1575-02, we analysed the segregation patterns of the same two markers in 20 offspring derived from the self-fertilization of the line. Surprisingly, the segregation pattern of the progeny corresponded neither to that of a homozygous lines, nor to that of a heterozygous line, where the expected ratio would be 1 4 homozygous for A. duranensis alleles, 1 2 heterozygous and 1 4 for homozygous Fleur11 alleles. The offspring spread into two clusters; one with 15 individuals that clustered with Fleur11 and the other one with 5 individuals corresponding to the heterozygous genotypes. None of the offspring clustered with 12CS_091 and A. duranensis (Supplementary File S1, Figures A and B). The polymorphisms at SNPs Aradu_A07_1136308 and Aradu_A07_1148327 are C/A and A/G respectively. Based on the SNP clustering plots, Fleur11 is A a A a C b C b at SNP Aradu_A07_1136308 and G a G a A b A b at SNP Aradu_A07_1148327; the exponent letters a and b designed the peanut sub-genomes A and B, respectively. As shown in Supplementary File S1 (Figures A and B) the line 1575_02 was located in the same cluster than Fleur11. We, therefore, hypothesized that 1575_02 is derived from a homeologous recombination between the A and B genomes and is of C a A a C b A a genotype at SNP Aradu_A07_1136308 and of A a G a A b G a genotype at SNP Aradu_A07_1148327. The tetrasomic segregation ratio for SNP Aradu_A07_1136308 and the expected phenotype in the SNP plot of signal intensity is shown in Supplementary File S1, Tables B-F). This ratio fits well with a homeologous recombination with a gamete lethality or adverse selection model that can be found in segmental allopolyploids [45,46].

Phenotypic Variations of Pod-and Seed-Related Traits in NILs
The ten NILs were phenotyped in 2018 and in 2019 along with their parents. As expected, the small seed size donor parent 12CS_091 exhibited significantly smaller pod and seed sizes than Fleur 11 (Figure 2b,c and Figure S3). HPW, PL and PW ranged from 61.57 to 170.69 g, from 23.45 to 30.48 mm and from 9.51 to 14.09 mm, respectively. HSW, SL and SW ranged from 27.00 to 64.93 g, 11.69 to 15.04 mm and 7.43 to 9.32 mm, respectively. All measured traits had very high heritability ( Table 2). The results of the single environment analysis and of the MET analysis showed a significant genotype effect for all traits (Table 2 and Table S2). Moreover, a significant genotype by year effect was found for HPW and HSW traits ( Table 2). The comparison between lines and their parents showed that line 1383-03 produced pods and seeds as large as those of Fleur11 variety, and that the lines 0761-04, 0761-11, 1436-06, 1436-08, 2207-01, 2207-04_03 and 2207-04_07 produced pods and seeds statistically similar to those of 12CS_091 (Figure 2b and Figure S3). However, the two lines 1388-03 and 1575-02 had an intermediate phenotype for all the traits. We defined three groups of NILs based on the phenotypic data: small pod/seed group including lines 0761-04, 0761-11, 1436-06, 1436-08, 2207-01, 2207-04_03 and 2207-04_07, large pod/seed group including 1383-03, and intermediate pod/seed group with lines 1388-03 and 1575-02. We hypothesize that the three phenotypic groups correspond to three different genotypic classes at the locus of interest: large pods and seeds (homozygous for Fleur11 allele), small pods and seeds (homozygous for 12CS_091 allele) and intermediate phenotype (both alleles). This phenotypic distribution of the NILs suggested the action of a single gene or a few tightly linked genes.

Fine-Mapping of the QTL and Identification of Candidate Genes
As shown in Figure 2a, the wild fragment responsible for pod and seed size reduction was subdivided into smaller fragments carried by different NILs. When combining the phenotyping and the genotyping results, we observed that the seven lines with small seeds (0761-04, 0761-11, 1436-06, 1436-08, 2207-01, 2207-04_03 and 2207-04_07) had introgressions of different size on chromosome A07 ( Figure 2). However, although these NILs carried wild chromosome segments of different sizes, they all differed from the large seed size NIL 1383-03 by their genotypic constitution at the region located downstream of SNP Aradu_A07_1316694 (Figure 2). In this region, the seven NILs are homozygous for A. duranensis alleles while 1383-03 is homozygous for Fleur11 alleles. This finding allowed us excluding the chromosome region spanning from the top of the chromosome to SNP Aradu_A07_1148327 from the region that can carry the gene(s). The line 1388-03 appeared to have the same genotype than 1383-03, but had an intermediate phenotype for all the traits measured. These two lines could, however, differ in the location of the recombination event between SNP Aradu_A07_1148327 and Aradu_A07_1316694. Interestingly, the line 1575-02, which has a genotypic constitution similar to the small-seeded lines 1436-06 and 1436-08 but with a homeologous recombination in the region delimited by SNP Aradu_A07_1148327 and Aradu_A07_1316694, also had intermediate phenotypes for all the traits. These results suggest that the phenotypic differences between 1575-02 and 1436-06/08 on one hand and between 1383-03 and 1388-03, on the other hand, can be explained by different recombination event locations that occurred in the interval defined by the SNPs Aradu_A07_1148327 and Aradu_A07_1316694. Altogether, these results suggested, therefore, that the region containing the gene(s) involved in the pod and seed size reduction on chromosome A07, is localized in this interval. This interval is 168,367 nucleotides long based on A. duranensis chromosome A07 V14167 genome sequence (initial position A07: 1,148,327 and final position A07: 1,316,694) ( Figure 3). This region contained 22 annotated and putative genes (https://www.peanutbase.org/gbrowse_aradu1.0). Among these genes, nine had no GO (Gene Ontology) term assigned and 13 were assigned at least to one GO term ( Table 3). The molecular functions of the GO-assigned genes were related to protein binding (07), to catalytic activity (05) and to transmembrane transport and/or translation (01), while the biological processes in which they are participating are cellular and metabolic processes. Among these genes, four (04) were involved in the regulation of cell division and elongation through the ubiquitin-proteasome pathway. These genes are Aradu.RLZ61, Aradu.DN3DB, Aradu.FX37I (and/or Aradu.XJ0L1) and Aradu.X1L7N, which respectively encode an F-box family protein, a transcriptional regulator STERILE APETALA-like (SAP), a BTB/POZ domain-containing protein and an armadillo repeat-containing 8-like (armc8) protein ( Figure 3 and Table 3).

Sequence Analysis of Selected Candidate Genes
We compared the CDSs and corresponding protein sequences of two promising candidate genes in order to identify changes between the wild and cultivated species genes. Aradu.RLZ61 has only one exon and is an ortholog of the A. thaliana SNEEZY (At5g48170) ( Figure S4a). The alignment of the CDS sequences between A. duranensis SNEEZY (Aradu.RLZ61), subspecies hypogaea var. Tifrunner SNEEZY (Arahy.IU3Y9Z) and the subspecies fastigiata var. Shitouqi SNEEZY (AH07G01180) shows no difference between the subspecies and three SNP variations between the wild and cultivated species genes (Figure 4). These mutations are however synonymous because no change is observed in the protein sequences. The SNEEZY protein from A. duranensis and those of the two subspecies are all 160 amino acids long. We also analyzed Aradu.DN3DB gene. This gene encode a transcriptional regulator STERILE APETALA-like or SAP (Table 3 and Figure 3) and it is an orthologue of M. truncatula SLB1 ( Figure S4b). Alignment of Aradu.DN3DB (A. duranensis_SAP), Arahy.5EZV1I (A. hypogaea subs. hypogaea var. Tifrunner SAP) and AH07G01210 (A. hypogaea subs. fastigiata var. Shitouqi SAP) showed that there is no difference between the two subspecies sequences. However, a deletion of three codons in the A. duranensis CDS sequence, two at position 25-30 and one at 61-63, is observed ( Figure 5). These codons code for two serines and one proline according to the protein sequences of the two cultivated subspecies of A. hypogaea. Moreover, three SNP variations, at positions 600, 774 and 895, on the A. hypogaea sequences were also observed. Two of these SNPs (positions 600 and 774) are synonymous. However, the citosine (C) to guanine (G) change at position 895 on the A. hypogaea sequences led to the substitution of proline (CCC) in A. duranensis to alanine (GCC) in A. hypogaea. Aradu.DN3DB and Arahy.5EZV1I consist of two exons separated by a long intron of 2527 bp and 2511 bp respectively. Arahy.5EZV1I and AH07G01210 encode proteins of 484 amino acids, while Aradu.DN3DB encode a protein of 481 amino acids ( Figure 5).

NILs are Resolutive Genetic Material for Fine Mapping Applications
In our study, we developed a NIL population as a tool for fine mapping a pod/seed size QTL region on chromosome A07. NIL is a widely used genetic mapping population for QTL validation and for fine-mapping of QTL. Success of NILs in the fine-mapping approach depends essentially on the marker density and the frequency of recombinations in the QTL region [28]. We used a cost-effective two-step strategy for increasing marker density, checking for recombination events and for developing NILs. In the first step, previously mapped SSR markers were used to select a subset of 188 F 2 plants that showed at least one recombination between two adjacent markers from a total of 2172 F 2 . This subset then was genotyped with 23 polymorphic newly developed SSRs. This allowed reducing by at least 86% the total number of data points that would have been generated if we had genotyped the entire F 2 population with all SSR markers. In a second step, we used nine SNP markers located in a refined QTL region to genotype 490 F 3 plants derived from 88 F 2 . The use of F 3 -derived heterozygous F 2 plants allowed increasing the number of recombinations in the target region and selecting for relevant NILs. Finally, the NIL genotype/phenotype association allowed identifying a region of 168.37 kb as the most likely location of the gene(s). This interval is larger than what reported in other studies that used NIL to fine map QTL regions. For example, in rice, using backcross-derived NILs, Fan et al. [32] were able to delimit a QTL for grain weight and size in a segment of 7.9 kb using 1384 BC 3 F 2 and 11 molecular markers. Similarly, Liu et al. [34] delimited a QTL for stigma exsertion rate in rice in an interval of 28.4 kb, using 3192 BC 4 F 2 and eight molecular markers. In M. truncatula, four HIF (heterogeneous inbred families)-derived NILs and SSR markers allowed to fine mapping a QTL involved in the resistance of root disease in a fragment of 135 kb [36]. However, larger regions were also reported and it was not a major bottleneck for candidate gene identification. For example, Xue et al. [47] succeeded in cloning Ghd7 gene in rice while the genomic region was narrowed to a large segment of 2284 kb.
In our study, we found that two NILs (1575-02 and 1388-03) had an intermediate phenotype across the two years of field evaluation. The segregation ratio of SNPs Aradu_A07_1136308 and Aradu_A07_1148327 in the offspring of line 1575-02 indicated a homeologous recombination between the A and B chromosomes, changing the genomes composition in the genomic region between the two SNPs from AsAsBcBc to AsAcBcAc (the A and B represent the peanut subgenomes and the s and c to the wild and cultivated origin, respectively). The change in genome composition could explain the alteration of the phenotype of line 1575-02 towards intermediary pod and seed size. Bertioli et al. [6] reported that homeologous exchanges are involved in modification of flowers colour in peanut. When considering the line 1388-03, it was genotypically identical to the line 1383-03 at all markers. They both presented one recombination between SNPs Aradu_A07_1148327 and Aradu_A07_1316694. However, in these lines, the exact location of the recombination events is unknown. In the hypothesis of the presence of two candidate genes in the region, the recombination event in line 1388-03 could have occurred between them, gathering in the same genotype an A. duranensis allele at one gene and a Fleur11 allele at the other. Assuming that the two genes acted additively, this could explain the intermediary phenotype observed in 1388-03 line. A graphical representation of the 2 genes model with additive action is shown in Supplementary Figure S5. This results reinforce the usefulness of NILs for deciphering the genetic components of complex phenotype [48].

Candidate Genes Associated with Seed Size are Found in the QTL Region
Few studies have reported QTLs for pod and seed size in the proximal region of the chromosome A07 in peanut. This region has first been reported housing a major QTL for pod and seed size-related traits by Fonceka et al. [12]. Luo et al. [20] also identified a similar region on chromosome A07 (from 0.06 to 1.54 Mb), as containing major, stable and co-localized QTLs involved in pod weight and size variation that explained up to 43.62% of phenotypic variation. This region colocalized with the pod and seed size QTL cluster reported by Chavarro et al. [26] around 0.63-1.03 Mb. Finally, Zhuang et al. [7] reported the same region on chromosome A07 (0.87 to 1.9 Mb) as containing a QTL that controlled pod and seed size. These authors reported a total of 99 genes among which 19 were putative candidates. In our study, a region of 168.37 kb size, delimited by two SNPs, starting from 1,148,327 bp to 1,316,694 bp, was identified on the chromosome A07 ( Figure 3). Therefore, it is likely that this region contains major determinants of pod and seed size variation in peanut. In A. duranensis genome sequence, the region we identified contains 22 putative genes. Among the genes, four genes (i.e., Aradu.RLZ61, Aradu.DN3DB, Aradu.FX37I and Aradu.X1L7N) involved in the ubiquitin-proteasome mediated cell proliferation pathway were of particular interest. This pathway has been shown to regulate seed size in plants [49,50]. The genes, Aradu.FX37I (and/or Aradu.XJ0L1) and Aradu.X1L7N, encode a BTB/POZ domain-containing protein and an armadillo repeat-containing protein 8-like (armc8) respectively. The BTB/POZ domain-containing proteins have been described as specific SKP1/Cullin/F-box E3 ubiquitin ligase complex substrate adapters and armc8 is part of the CTHL complex protein (conserved C-terminal to LisH motif) which is involved in the regulation of microtubule dynamics and cell division [51][52][53]. To our knowledge, there is no evidence in the literature that these genes are associated with fruit or seed size variation. The gene Aradu.RLZ61 encodes an F-box family protein similar to the F-box SNE (SNEEZY) protein. In Arabidopsis, the F-box SNE protein is involved in the regulation of gibberellin (GA) signalling. Its overexpression is associated with the loss of dwarf character in a mutant in which the gene SLY1 (homolog of SNE) is not functional [54,55]. It was also shown that overexpression of SNE caused a decrease in DELLA RGA and GAI proteins [56]. These DELLA proteins were found to be differentially expressed during fruit development in Arabidopsis [56]. Studies of GA action are mostly limited to seed germination and plant growth although it has been hypothesized that it could be among the phytohormones that could, combined with auxin and ABA, control the final seed size in the Medicago model plant [57]. The role of GA in cell division and elongation has been established [58]. Thus, although the protein sequences of this gene are conserved between A. duranensis and A. hypogaea (Figure 4), the gene Aradu.RLZ61 could possibly play a role in the pods and seeds size variation in peanut. Another interesting candidate is the gene Aradu.DN3DB which encodes a transcriptional regulator STERILE APETALA-like (SAP) protein. SAP gene was first identified in Arabidopsis thaliana as a flower regulator. The loss of function of this gene in a mutant causes severe aberrations in inflorescence, flower and ovule development, leading to sterile flowers with small petals [59]. Decrease of SAP activity in Capsella rubella shortens the period of cell proliferation and reduces the number of petal cells [60]. SAP was described as an F-box family protein and one of the components of a SKP1/Cullin/F-box E3 ubiquitin ligase complex, which controls not only flower size but also leaf and fruit size in Arabidopsis, by targeting the repressor factors PPDs (PEAPOD1 and PEAPOD2) and KIXs (kinase-inducible domain interacting proteins KIX8 and KIX9) for degradation [61,62]. Recently, in cucumber, LITTLELEAF an ortholog of A. thaliana SAP (AtSAP), was associated with the variation of leaf, fruit, seed size and number of lateral branches by regulating mainly cell number [63]. In legumes, BIG SEED1 (BS1) was described as homologue of PEAPOD1 and PEAPOD2. In M. truncatula and soybean, BS1 suppression leads to an increase in leaf, fruit and seed size [64]. More recently, an ortholog of the AtSAP (i.e., SLB1) was identified in M. truncatula and it has been shown that SLB1 interacts with BS1 and targets it for degradation, leading to an increase of leaves, fruits and seeds size [65]. SAP was reported to be over expressed in the embryo of a large-seeded A. hypogaea genotype compare to one that gave smaller seed [7]. Sequence alignment between A. duranensis SAP and those of the two cultivated subspecies (var. Tifrunner and var. Shitouqi) ( Figure 5) showed that there are deletions of some codons in the serine/glycine rich region in the A. duranensis sequence. In addition, variations of three SNPs were also observed in the WD40 repeat-containing domain, two of which have no effect on the protein sequences. However, the other SNP at position 895 on the A. hypogaea sequence led to the substitution of a proline (CCC) in A. duranensis for alanine (GCC) in A. hypogaea ( Figure 5). In SAP protein, the serine/glycine rich region is a target for phosphorylation and the WD40 repeat-containing domain has been suggested to coordinate protein-protein interactions [60,62]. Finally, sequence alignment of A. hypogaea SAP and M. truncatula SLB1 shows that these genes are orthologs ( Figure S4) and the CDS is well conserved between the A. hypogaea subspecies we compared in this study. Taken together, the results presented in this study suggest that the transcriptional regulator STERILE APETALA-like is a good candidate gene involved in pod and seed size variation during peanut evolution. However, this gene needs to be validated for confirmation.

Conclusions
In the present study we increased the markers density in the proximal region of chromosome A07 by developing and mapping new SSR and SNP markers. The molecular markers and marker/traits association allowed to identifying different NILs and to narrowing down the QTL region involved in pod/seed size variation to a 168.37 kb segment containing 22 genes. Putative candidate genes were identified and discussed, among them, two were of particular interest. However, more studies are needed for their validation and for identifying with which other genes they interact. Discovery of the gene(s), involving in pod/seed size variation, will contribute to a better knowledge on genes targeted during the domestication of peanut and it will facilitate decisions in future peanut breading programs.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4425/11/12/1402/s1; Table S1. SSRs developed in this study; Table S2. Analysis of variance of pod and seed traits in NILs; Figure S1. Plots of validation of some SNPs; Supplementary File S1. Detailed explanation of the genotypic constitution of the line 1575-02 and the pattern of segregation in its offspring; Figure S2. Genetic map of the target region with the new SSR markers; Figure S3. Multiple mean comparison of pod and seed related traits between genotypes; Figure S4. Protein sequences alignment of the most promising candidate genes identified with their orthologues; Figure S5. Graphical representation of the genotype at the gene loci in two genes model with additive action.