Construction of a High-Density Recombination Bin-Based Genetic Map Facilitates High-Resolution Mapping of a Major QTL Underlying Anthocyanin Pigmentation in Eggplant

High-density genetic maps can significantly improve the resolution of QTL mapping. We constructed a high-density recombination bin-based genetic map of eggplant based on 200 F2 plants from an interspecific cross (Solanum melongena × S. incanum) using the whole genome resequencing strategy. The map was 2022.8 cM long, covering near 99% of the eggplant genome. The map contained 3776 bins, with 3644 (96.5%) being effective (position non-redundant) ones, giving a nominal average distance of 0.54 cM and an effective average distance of 0.56 cM between adjacent bins, respectively. Using this map and 172 F2:3 lines, a major QTL with pleiotropic effects on two anthocyanin pigmentation-related traits, leaf vein color (LVC) and fruit pericarp color (FPC), was steadily detected in a bin interval of 2.28 cM (or 1.68 Mb) on chromosome E10 in two cropping seasons, explaining ~65% and 55% of the phenotypic variation in LVC and FPC, respectively. Genome-wide association analysis in this population validated the QTL and demonstrated the correctness of mapping two bins of chromosome E02 onto E10. Bioinformatics analysis suggested that a WDR protein gene inside the bin interval with reliable effective variation between the two parents could be a possible candidate gene of the QTL.


Introduction
During the development of quantitative genetics, one of the most important events was the advent of molecular marker technologies, which enables the construction of genetic maps for mapping quantitative trait loci (QTLs), dissecting the genetic basis of a quantitative trait into individual Mendelian factors. The resolution of QTL mapping is determined by many factors, such as the complexity and heritability of the trait, the type and size of the population, the marker density of the genetic map, and the power of the statistical method. High marker density is a precondition for high mapping resolution. The fast development of the next-generation sequencing (NGS) technologies has dramatically reduced the costs of high-throughput genotyping and made it practical to construct a high-density genetic map.
There can be two strategies of NGS-based genotyping for constructing a high-density genetic map. One is genotyping by partial genome sequencing (PGS), namely, genotyping individuals by sequencing a selected part of the genome. A few methods for PGS have been developed, such as restriction site-associated DNA sequencing (RAD-seq) [1,2], genotypingby-sequencing (GBS) [3], and specific length amplified fragment sequencing (SLAF-seq) [4]. The merit of PGS is that it does not need a reference genome sequence and can reduce the cost, especially useful for the species with large genome size. So far, high-density genetic map construction has been mainly performed based on this strategy. It has been used in many plant species, such as soybean [5], cucumber [6], chickpea [7], durum wheat [8], rice [9,10], grape [11], Brassica rapa [12], wheat [13], and potato [14].
The other strategy is genotyping by whole genome resequencing (WGR). This strategy was first proposed by Huang et al. [15]. It roughly involves four steps: (1) resequencing the whole genome of each individual in the mapping population at a very low sequencing depth; (2) collectively examining genome-wide polymorphisms for genotype calling and recombination breakpoint determination using a sliding window approach; (3) dividing the genome into recombination bins; and (4) analyzing the genetic linkage among the bins. As WGR can identify all polymorphisms in a population in principle, it has the potential to uncover all recombination events in the population so as to maximize the density and the resolution of the genetic map. This is an advantage of WGR over PGS. Apart from in rice [15,16], this strategy has been used in some other species, such as soybean [17], maize [18], pepper [19], melon [20], peanut [21], and radish [22]. As reference genome sequence is available in more and more species and NGS becomes cheaper and cheaper, the WGR strategy will be used more widely in high-density genetic map construction.
Eggplant (Solanum melongena L.) is an important vegetable crop cultivated and consumed worldwide. To facilitate the genetic studies in eggplant, a number of genetic maps of eggplant have been constructed since the early of this century [23][24][25][26][27][28][29][30][31][32]. Most of the maps, however, were constructed based on traditional molecular markers and had a low marker density. Recently, two high-density genetic maps were constructed with the PGS strategy in eggplant. One was constructed based on an interspecific (S. melongena × S. linnaeanum) F 2 population consisting of 121 individuals using the SLAF-seq approach [33]. The map comprises 2122 single nucleotide polymorphism (SNP) markers, with a total length of 1530.75 cM and an average distance of 0.72 cM between adjacent markers. The other map was constructed based on a population of 163 recombinant inbred lines (RILs) derived from a cross between two cultivated eggplant varieties using the GBS approach [34]. The map contains 7249 SNP markers, with a total length of 2169.23 cM and an average distance of 0.4 cM between adjacent markers. More recently, a bin-based genetic map (or termed bin map for short) was constructed based on a population of 100 F 2 individuals developed from an interspecific cross (S. melongena × S. insanum) using the WGR strategy [35]. The map consists of 3918 bins, with a length of 1384.62 cM in total and a distance of 0.58 cM between adjacent bins on average. These maps have exhibited the merit of high marker density for increasing the resolution of gene/QTL mapping.
Eggplant is rich in anthocyanins, which are beneficial for human health [36]. So, increasing anthocyanin content is an important goal in eggplant breeding. For this purpose, it is necessary to understand the genetic basis and regulatory mechanism of anthocyanin accumulation in eggplant. In the past decade, especially in recent years, remarkable progress has been made in the molecular genetic research of anthocyanin accumulation in eggplant. A number of QTLs related to anthocyanin accumulation have been mapped on all eggplant chromosomes except chromosome E04 [23,34,37,38], and some anthocyanin metabolism-related genes have been identified using the approaches of reversed genetics, including transcriptomic or proteomic analysis [39][40][41] and homologous or candidate gene analysis [42][43][44][45]. However, as many genes/QTLs are involved in the control of anthocyanin accumulation, the genetic basis and regulatory mechanism underlying the anthocyanin metabolism in eggplant is still far from being clarified.
In this study, we constructed a high-density recombination bin-based genetic map based on an inter-specific F 2 population and applied it to mapping QTLs for anthocyanin pigmentation in eggplant. We identified a major QTL displaying pleiotropic effects on the anthocyanin pigmentation in both leaf and fruit and demonstrated that the high-density map facilitated high-resolution mapping of the major QTL and prediction of its possible candidate gene.

Genotyping Results
After sequencing,~178.06 M and 138.30 M clean reads were obtained in the parental lines YZQ (S. melongena) and YQ (S. incanum), respectively, of which 98.69% and 98.83% could be mapped to the reference genome. The number of clean reads obtained in each of the 200 F 2 plants was~17.83M on average, varying 13.51 M-31.33 M, of which 99.12% (97.48-99.51%) could be mapped to the reference genome. The Q20 of each F 2 plant varied 94.72-97.75%, with an average of 96.75%. These results indicated that the sequencing data were good in quality overall.
Based on the sequencing data, a set of 1,273,701 markers (SNPs and short InDels) meeting the three criteria (diallelic; homozygous in YZQ; minor allele frequency, MAF ≥ 0.3) were selected in the F 2 population. The markers were not evenly distributed in the genome (Figure 1a). In most of the chromosomes, markers were mainly located close to the two ends. In chromosomes E01, E07 and E11, however, there was a long segment with higher marker density in the middle region, respectively. In addition, marker density was very low in the left terminal region of E02. The proportion of missing marker genotype data in each F 2 plant was~0.14 on average, varying 0.03-0.26. So, the number of informative markers in each F 2 plant would not be smaller than 1.27 M × (1 − 0.26) = 0.94 M, which should be adequate for constructing a recombination bin-based genetic map.

Genotyping Results
After sequencing, ~178.06 M and 138.30 M clean reads were obtained lines YZQ (S. melongena) and YQ (S. incanum), respectively, of which 98. could be mapped to the reference genome. The number of clean reads obt the 200 F2 plants was ~17.83M on average, varying 13.51 M-31.33 M, o (97.48%-99.51%) could be mapped to the reference genome. The Q20 of e ied 94.72%-97.75%, with an average of 96.75%. These results indicated tha data were good in quality overall.
Based on the sequencing data, a set of 1,273,701 markers (SNPs an meeting the three criteria (diallelic; homozygous in YZQ; minor allele fre 0.3) were selected in the F2 population. The markers were not evenly d genome (Figure 1a). In most of the chromosomes, markers were mainly the two ends. In chromosomes E01, E07 and E11, however, there was a lon higher marker density in the middle region, respectively. In addition, mar very low in the left terminal region of E02. The proportion of missing m data in each F2 plant was ~0.14 on average, varying 0.03-0.26. So, the num tive markers in each F2 plant would not be smaller than 1.27 M × (1 − 0.26) should be adequate for constructing a recombination bin-based genetic m  To evaluate the quality of the F 2 population, we analyzed the following parameters based on the marker data: (1) The MAFs of markers. The MAFs were largely distributed between 0.45 and 0.5 in the F 2 population, peaking at~0.4875 (Figure 1b), very close to the expected peak value 0.5. (2) The Proportion of Heterozygous marker Genotypes (PHG) in each F 2 plant. The observed PHG (p o ) was~0.25 on average, varying 0.10-0.42. Theoretically, the expected PHG (p e ) in an F 2 individual is 0.5. However, because there is a probability (p m ) that a heterozygous genotype is misjudged as a homozygous genotype depending on the sequencing depth, p o must be smaller than p e . It is easy to find that p m = 1/2 k−1 , where k is sequencing depth. In this study, the sequencing depth for each F 2 plant was~2×. Therefore, p m ≈ 0.5 on average. It can be predicted that the 25. This is consistent with the average PHG observed. (3) The genetic relationship between individuals [46]. The standardized relationship coefficient (or termed relatedness score) approximately followed a normal distribution in the F 2 population (Figure 1c), suggesting that the genetic segregation in the F 2 population was symmetric and unbiased overall. The above results together indicated that the F 2 population was normal in structure, suitable for genetic mapping.

Recombination Bin-Based Genetic Map Constructed
The genetic map constructed based on the F 2 population was 2022.8 cM in length, containing 14 linkage groups and 3776 bins (Table 1; Figure 2; Table S1). The bin length was 135.9 kb on average, varying from 10 to 16,490 kb, with~82%, 87%, 90%, 94% and 96% smaller than or equal to 150, 200, 250, 300 and 400 kb, respectively. The total length of the bins was~498.32 Mb, accounting for 46.44% of the eggplant genome sequence, indicating that about half of the genome sequence was not included as bins in the genetic map. There were 132 redundant bins in terms of the position in the genetic map (when there were multiple bins at a position in the map, they function only as one effective marker, so the extra bins were redundant). So, the number of effective bins (bin markers) in the map was 3776 − 132 = 3644. Each chromosome corresponded to a linkage group in the map except for E04, which was divided into three linkage groups (LG4a, 4b and 4c). Overall, the bin density was high throughout the map. The nominal average distance (between adjacent bins, including redundant bins) in each linkage group varied from 0.42 cM (LG4c) to 0.78 cM (LG4b), and was 0.54 cM in the whole map; whereas the effective average distance (redundant bins excluded) was close to the nominal one in each linkage group, varied from 0.42 cM (LG4c) to 0.98 cM (LG4b), and was 0.56 cM in the whole map ( Table 1). The largest spacing between adjacent bins was 12.3 cM in LG7 ( Figure 2). The physical span (from the beginning of the first bin to the end of the last bin) of each linkage group (or the sum of the three linkage groups of E04) was exactly or nearly equal to the full length of the corresponding chromosome except for LG2 (Table 1). In total, the genetic map physically spanned~1.06 Gb, covering near 99% of the eggplant genome (Table 1). Therefore, the genetic map basically represented the whole eggplant genome. LG, linkage group, which is numbered in accordance with the corresponding chromosome. b. Number of bins or effective bins (in parenthesis). c. Average distance between adjacent bins or effective bins (in parenthesis). d. Physical span from the first bin to the last bin. e. Full physical length of the chromosome or chromosomal segment. f. Ratio of physical span to full physical length. By comparing the genetic map with the physical map (reference genome), it was found that each chromosome could be divided into three regions (I, II and III) with two obvious distinguishing features ( Figure 3). One feature is bin density. In all the chromosomes except for E02, regions I and III had a very high bin density, while region II had much lower bin density ( Figure 3). This was basically consistent with the marker density distribution in the genome (Figure 1a). In addition, this result also indicated that the sequences not contained as bins in the genetic map were mainly in the middle region (region The genetic map constructed based on the F 2 population was 2022.8 cM in length, containing 14 linkage groups and 3776 bins (Table 1; Figure 2; Table S1). The bin length was 135.9 kb on average, varying from 10 to 16,490 kb, with~82%, 87%, 90%, 94% and 96% smaller than or equal to 150, 200, 250, 300 and 400 kb, respectively. The total length of the bins was~498.32 Mb, accounting for 46.44% of the eggplant genome sequence, indicating that about half of the genome sequence was not included as bins in the genetic map. There were 132 redundant bins in terms of the position in the genetic map (when there were multiple bins at a position in the map, they function only as one effective marker, so the extra bins were redundant). So, the number of effective bins (bin markers) in the map was 3776 − 132 = 3644. Each chromosome corresponded to a linkage group in the map except for E04, which was divided into three linkage groups (LG4a, 4b and 4c). Overall, the bin density was high throughout the map. The nominal average distance (between adjacent bins, including redundant bins) in each linkage group varied from 0.42 cM (LG4c) to 0.78 cM (LG4b), and was 0.54 cM in the whole map; whereas the effective average distance (redundant bins excluded) was close to the nominal one in each linkage group, varied from 0.42 cM (LG4c) to 0.98 cM (LG4b), and was 0.56 cM in the whole map ( Table 1). The largest spacing between adjacent bins was 12.3 cM in LG7 ( Figure 2). The physical span (from the beginning of the first bin to the end of the last bin) of each linkage group (or the sum of the three linkage groups of E04) was exactly or nearly equal to the full length of the corresponding chromosome except for LG2 (Table 1). In total, the genetic map physically spanned~1.06 Gb, covering near 99% of the eggplant genome (Table 1). Therefore, the genetic map basically represented the whole eggplant genome.
According to the reference genome sequence, the order of the three linkage groups of E04 is LG4a-LG4b-LG4c (Table S1). However, the three linkage groups could not be linked accordingly because the estimated genetic distance between LG4a and LG4b and that between LG4b and LG4c were both very large (>90 cM). When there was no restriction on the order, LG4b still could not be linked to LG4a and LG4c. Hence, LG4b appeared to be genetically independent from LG4a and LG4c. Surprisingly, LG4a and LG4c could be linked in the absence of LG4b with a seemingly acceptable gap (25.8 cM) between them.
There were 45 bins that could not be mapped according to the physical order in the reference genome, but 13 of them were successfully mapped to the linkage groups of other chromosomes, including one from E12 to LG5; two from E02 to LG10; one from E04 and three from E05 to LG9; and one from E02, E05 and E09 each and three from E10 to LG11 (Table S2). The two bins of E02 (B02_129 and 130) mapped to LG10 were adjacent and therefore actually represented a continuous sequence in the reference genome. Similar situations also existed for the three bins of E05 (B05_207, 208 and 209) mapped to LG9 and the three bins of E10 (B10_194, 195 and 196) mapped to LG11 (Table S2). After mapped to LG10 and LG11, the two bins of E02 and the three bins of E10 were merged (as B02_129 and B10_194), respectively (Table S1).
By comparing the genetic map with the physical map (reference genome), it was found that each chromosome could be divided into three regions (I, II and III) with two obvious distinguishing features ( Figure 3). One feature is bin density. In all the chromosomes except for E02, regions I and III had a very high bin density, while region II had much lower bin density ( Figure 3). This was basically consistent with the marker density distribution in the genome (Figure 1a). In addition, this result also indicated that the sequences not contained as bins in the genetic map were mainly in the middle region (region II) of each chromosome. The other feature is Ratio of Genetic distance to Physical distance (RGP). The average RGPs (=total genetic length/total physical length) in regions I and III were high in general, while that in region II was very low ( Table 2). This can be intuitively seen in Figure 3, where the overall slope of the genetic position-physical position curve appears to be very small in region II in each chromosome, suggesting that a very large physical distance is only equivalent to a small genetic distance in region II. Therefore, genetic recombination must occur mainly in regions I and III, but rare in region II. This explains why a high-density genetic map spanning the whole chromosome could be constructed although there were very few bins from region II. LG4b+LG4c, respectively, while that in the whole chromosome was estimated based on LG4a+LG4b+LG4c. LG4b, while the left and right black parts are LG4a and LG4c, respectively. The blank arrowheads indicate the breakpoints between linkage groups.

QTLs Mapped and Candidate Gene Predicted
Based on the field experiments of 172 F2:3 lines in two different cropping seasons, a major QTL was mapped on E10 in each of the two traits, leaf vein color (LVC) and fruit pericarp color (FPC), in both seasons, named as qLVC10 and qFPC10b, respectively (Table  3; Figure S1). The positions and bin intervals of the major QTLs detected in different traits and different seasons were completely the same, all being 63 cM and B10_240-B10_249 (60.97-63.25 cM), respectively (Table 3; Table S1). The results suggested that these major QTLs were actually the same QTL with pleiotropic effects, and the mapping result was reliable and of high resolution (within a marker interval of 2.28 cM) for a primary mapping population of a median size. The major QTL had very high heritability, which could explain ~65% and 55% of the phenotypic variation in LVC and FPC, respectively (Table  3). It showed very large additive effects but negligible dominance effects in the two traits ( Table 3). The allele from the cultivated parent YZQ increased the trait value, while the Among the 12 chromosomes, three appeared to be special. The first was E02. Its first 10 Mb region was not covered by the genetic map ( Figure 3). The reason was that the marker density in this region was very low (Figure 1a). It was unable to identify the region I in E02 (Figure 3). The marker density distribution also suggested that there was no region I in E02 (Figure 1a). Hence, E02 only contained two regions: region II with low average RGP and region III with high average RGP ( Table 2). The second was E04. Its regions I and III were both broken into two parts.
LG4b spanned the three regions, whereas LG4a and LG4c were only a part of region I and region III, respectively (Figure 3). The third was E07. The average RGP in the region II of E07 was much larger than those of other chromosomes (Table 2; Figure 3). In addition, although region II had a small average RGP in general, it contained some positions displaying obviously steep slopes (large RGPs) in some chromosomes, especially in E02, E04 and E07 (Figure 3). These positions must be the hotspots of genetic recombination.

QTLs Mapped and Candidate Gene Predicted
Based on the field experiments of 172 F 2:3 lines in two different cropping seasons, a major QTL was mapped on E10 in each of the two traits, leaf vein color (LVC) and fruit pericarp color (FPC), in both seasons, named as qLVC10 and qFPC10b, respectively (Table 3; Figure S1). The positions and bin intervals of the major QTLs detected in different traits and different seasons were completely the same, all being 63 cM and B10_240-B10_249 (60.97-63.25 cM), respectively (Table 3; Table S1). The results suggested that these major QTLs were actually the same QTL with pleiotropic effects, and the mapping result was reliable and of high resolution (within a marker interval of 2.28 cM) for a primary mapping population of a median size. The major QTL had very high heritability, which could explaiñ 65% and 55% of the phenotypic variation in LVC and FPC, respectively (Table 3). It showed very large additive effects but negligible dominance effects in the two traits ( Table 3). The allele from the cultivated parent YZQ increased the trait value, while the allele from the wild parent YQ acted negatively (Table 3). This was consistent with the phenotypes of the two parents, namely, the colors of leaf vein and fruit pericarp were purple in YZQ, but not purple (no anthocyanin pigmentation) in YQ, while those in F 1 were medium between the two parents ( Figure 4). allele from the wild parent YQ acted negatively (Table 3). This was consistent with the phenotypes of the two parents, namely, the colors of leaf vein and fruit pericarp were purple in YZQ, but not purple (no anthocyanin pigmentation) in YQ, while those in F1 were medium between the two parents ( Figure 4).   Apart from the major QTL, there were three and two minor QTLs detected in LVC and FPC, respectively ( Table 3). All of the minor QTLs were detected only in the first season, and there were no common minor QTLs between the two traits. In addition, the LOD scores of all the minor QTLs except qFPC6 were only slightly over the threshold. Therefore, most of these minor QTLs detected might not be very reliable.
Based on the marker (SNP and short InDel) genotype data of the F 2 population, we also performed genome-wide association study (GWAS) on the two anthocyanin pigmentation traits using the method of mixed linear model (MLM) implemented by the R program rMVP [47]. The GWAS results also indicated the existence of a major QTL in the region between 60 Mb and 75 Mb on chromosome E10 ( Figure S2). In addition, some of the minor QTLs mapped by the linkage analysis (e.g., qFPC6) also showed obvious signal in the GWAS ( Figure S2; Table 3). Therefore, the results of linkage analysis and GWAS were consistent. However, it is noticeable that there was an obvious steady -log10(P) peak detected at~30 Mb on chromosome E02 in both traits and both cropping seasons ( Figure S2), but no QTL mapped at that position ( Figure S1). By careful examination, we found that this -log 10 (P) peak was just located at the two bins, B02_129 and B02_130, which were mapped to E10 in the genetic map, located inside the bin interval of the major QTL and completely linked with B10_249 (Tables S1 and S2; Table 3). This explains the cause of the -log 10 (P) peak on E02.
The bin interval of the major QTL (B10_240-B10_249) was 1.68 Mb wide in the reference genome (Table 3), harboring 103 genes according to the annotation (Table S3). In addition, it is interesting that the bin B02_129 from E02 happened to be mapped between B10_240 and B10_249, completely linked with (0 cM to) B10_249 (Table S1). The size of B02_129 was 400 kb, in which only four genes were annotated (Table S3). Among these annotated genes, only one (Smechr1001624) met the criteria for reliable effective variation between the wild parent YQ and the cultivated parent YZQ. A SNP (A/G) between the two parents was found at 64,566,331 bp inside Smechr1001624, making the codon AGT in YQ altered into GGT in YZQ and therefore leading to a change of the 743rd amino acid in the encoded protein from Serine in YQ into Glycine in YZQ. Smechr1001624 is predicted to encode a WD repeat-containing (WDR) protein (Table S3). WDR proteins are a very large family. It has been found that WDR proteins play a critical role in the regulation of anthocyanin biosynthesis in many plant species [40,[48][49][50][51][52]. WDR physically interacts with MYB and bHLH to form a transcriptional complex, which participates the regulation of the flavonoid pigment pathway [50]. The above information suggests that Smechr1001624 could possibly be a candidate gene for the major QTL.

Discussion
In this study, we constructed a high-density recombination bin-based genetic map of eggplant using an interspecific (S. melongena × S. incanum) F 2 population consisting of 200 individuals (Table 1). Only one recombination bin-based genetic map of eggplant has been reported before [35] (named as Qian's map here). Qian's map is also constructed based on an interspecific F 2 population by means of low-depth resequencing (~1.4× for each individual on average), but the wild parent is a different species (S. insanum), and the population is smaller (only 100 individuals). Qian's map is 1384.6 cM in length (~32% shorter than ours) but contains 3918 bins (~4% more than those in our map). However, most (2370/3918 = 60.5%) of the bins in Qian's map are redundant; only 1548 are effective bins (~42% of the number in our map). Therefore, the effective average distance between adjacent bins in Qian's map is~0.90 cM, which is~60% larger than that in our map. So, our map is much longer and much denser than Qian's map. Since the most apparent difference between our study and that of Qian et al. is that the population we used is twice as large as that used by them, population size must be the major cause leading to the significant difference between our map and Qian's map in total length and effective density, indicating the importance of population size for high-density genetic map construction. Theoretically, the number of recombinations occurring in a population is directly proportional to the population size, while the number of markers (mainly SNPs) revealed by sequencing in a population is usually large enough to identify every genetic recombination occurred. Therefore, population size is the key for achieving the high density of a genetic map. In principle, the population size should be as larger as possible as long as the budget allows.
The method of constructing recombination bin-based genetic map by low-depth resequencing was originally proposed for permanent (e.g., RIL) populations [15]. So, most of the bin maps reported were constructed using permanent populations [15][16][17][18][19]21]. Nevertheless, F 2 populations have also been used for constructing bin maps [20,22,35], although it suffers a problem that the heterozygous genotypes of markers are often miscalled as homozygous genotypes under low-depth sequencing. A way to deal with this problem is to predict the most possible marker genotypes by integrating the information of multiple nearby markers. Following this principle, the method of Huang et al. [15] was still adopted for F 2 populations after modification in those studies reported. In the present study, we developed a different method for the bin map construction based on F 2 populations. The results demonstrated the feasibility of our method. Although the current approaches appear to be effective, the high proportion of misjudgment on heterozygous genotypes due to low sequencing depth may still bring about genotyping mistakes, especially impeding precise identification of the exchange breakpoints of chromosomes, so as to affect the quality of the map. It has mentioned in Section 2.1 that the probability of a heterozygous genotype being misjudged as a homozygous genotype is p m = 1/2 k−1 , where k is sequencing depth. According to this formula, it can be found that the probability would be <1/16 when sequencing depth > 5×. So, a little higher (than low) sequencing depth is beneficial to the bin map construction based on F 2 populations. For example, Luo et al. constructed a bin map in radish using an average sequencing depth of 7.2× for each F 2 individual [22]. However, increasing sequencing depth will increase the cost.
Our study revealed that the relationship of genetic position with physical position in each eggplant chromosome usually appeared as a curve of three slopes, namely, steep slope → gentle slope → steep slope; but in chromosome E02 the curve only contained two slopes, namely, gentle slope → steep slope (Figure 3). Similar phenomenon was observed in the study of Qian et al. [35]. In particular, E02 also displays a curve of two slopes in their studies, suggesting that the exceptive performance of E02 is not an accidental result but a characteristic of E02. The nonlinear relationship between genetic position and physical position suggests that it is inappropriate to use genome-wise average RGP but region-wise average RGP in the studies of map-based cloning of genes.
Chromosome E04 was split into three linkage groups in the bin map constructed in this study. Physically, the three linkage groups were continuous according to the reference genome (Table S1). Each linkage group looked normal and they together could form a continuous three-slope relationship curve between genetic position and physical position similar to those of other chromosomes ( Figure 3). So, the breakage of E04 as three linkage groups did not seem to be caused by experimental errors or data analysis errors. The result was really puzzling. Perhaps, the breakpoints between LG4a and LG4b and between LG4b and LG4c were two recombination hotspots with very high recombination rates in this population. Anyway, the separated linkage groups of E04 may not greatly affect the application of the map to QTL mapping.
In this study, we mapped a major QTL for anthocyanin pigmentation on E10. A major QTL for anthocyanin pigmentation has also been mapped on E10 before [23,34,37], which is estimated to be located at about 94-95 Mb in E10 [34] in the reference genome of S. melongena 67/3 line [53]. According to alignment analysis, we found that this location approximately corresponded to 66-67 Mb in the reference genome of SME-HQ1315 [54], very close to location of the major QTL mapped in this study (Table 2). So, we suspect that the reported major QTL and the one mapped in this study might be the same.
We have seen in the results that the GWAS validated the existence of the major QTL on chromosome E10. Besides, it also proved the correctness of the result that the two bins of chromosome E02 (B02_129 and B02_130) according to the reference genome were mapped to E10 in this study. This demonstrated an additional usefulness of high-density bin-based genetic maps for correcting the mistakes of genome sequence assembly and discovering the false localization of a major QTL by GWAS due to the incorrect genome sequence assembly.

Plant Materials
An eggplant (S. melongena) cultivar Yanzhiqie (YZQ; accession ID: V06B0149) and a wild relative (S. incanum) germplasm Yeqie (YQ; accession ID: V06B0864) kindly provided by the Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences were used as parents. The two parents are distinct in many traits including anthocyanin pigmentation. YZQ shows purple color in leaf vein and fruit pericarp, while YQ does not (Figure 4). A YZQ plant (female parent) and a YQ plant (male parent) were crossed, and an F 1 plant was selfed to produce an F 2 population of 200 plants. Further, an F 2:3 population of 172 lines were developed by selfing the F 2 plants.

Detection of Marker Genotypes by Whole Genome Resequencing
Genomic DNA was extracted from young leaves of every F 2 plant and the two parents. The DNA samples were sequenced based on pair-end libraries using either the BGISeq500 platform (for the F 2 plants, conducted by BGI) or the Illumina NovaSeq platform (for the parents, conducted by Novagene). The sequencing depth for each F 2 plant was~2×, while those for parents YZQ and YQ were~53× and 42×, respectively. Adapter removing and quality control of reads were performed using the software Fastp [55] (https://github. com/OpenGene/fastp, Chen S., Shenzhen, China). Clean reads (150 bp in length each) were mapped to the eggplant reference genome SME-HQ1315 [54] (http://eggplant-hq.cn, accessed on 10 June 2020) using the software BWA-MEM [56] (http://maq.sourceforge. net, Li H., Cambridge, UK) and then sorted using the software SAMtools [57] (http:// samtools.sourceforge.net, Li H., Cambridge, UK). Variants (including SNPs and short InDels) and their genotypes in the parents and the F 2 plants were identified using the software FreeBayes [58] (v2, https://github.com/freebayes/freebayes, Garrison E., Boston, MA, USA), and the variants meeting the following criteria were selected as molecular markers: (1) diallelic; (2) homozygous in the parent YZQ; and (3) the minor allele frequency (MAF) ≥ 0.3 in the F 2 population. According to the marker genotype information of the two parents, the genotype of every marker in each F 2 plant was determined, denoted as 0 (for YZQ genotype), 1 (for heterozygous genotype) or 2 (for YQ genotype).

Construction of Recombination Bin-Based Genetic Map
A main problem of using F 2 population to construct recombination bin-based genetic map is that each marker has heterozygous genotype, which is easy to be misidentified as parental genotypes under low sequencing depth. For this reason, we designed a method different from that of Huang et al. [15] to determine recombination bins. We used a sliding window with fixed size (300 kb) to calculate a genotype index for judging the genotype of the window center in each F 2 plant. The genotype index was defined as: where p 0 , p 1 and p 2 were the proportions of markers with the YZQ genotype, the heterozygous genotype and the YQ genotype within the window, respectively. Obviously, GI varied between 0 and 2. It would be close to 0 when the window was in a YZQ-genotype region, but close to 2 when in a YQ-genotype region. In a heterozygous region, as a heterozygous marker had an equal probability to be misidentified as either the YZQ genotype or the YQ genotype, the genotyping errors of different markers could be partially eliminated after averaging in the sliding window, making the GI value vary around 1. So, in principle, genomic positions with different genotypes could be distinguished according to their GI values. In order to use GI to discriminate genotype explicitly, we established the following function after trial to transform GI into integer: let GI = 0 (YZQ genotype) when GI ≤ 0.2; 1 (heterozygous genotype) when 0.2 < GI ≤ 1.8; and 2 (YQ genotype) when GI > 1.8. To get reliable or valid GI estimates, a minimum number of 10 markers was required in a window. The window moved with a step length of 10 kb from one end to the other on each chromosome. Thus, a GI estimate was obtained every 10 kb in the genome in each F 2 plant.
Based on the integer GI data, co-segregations of adjacent positions in the F 2 population were examined. A genomic segment containing a string of consecutive positions that completely co-segregated was defined as a recombination bin. So, every chromosome was partitioned into many bins. Each bin behaved as a separate unit for genetic recombination like a marker in the population. Using chi-square test, bins showing very serious deviation from the theoretical segregation ratio (1:2:1) in the F 2 population (chi-square > 25, p-value < 105) were filtered out. Then, by fixing the order of the bins in each chromosome according to the reference genome sequence, the genetic distances between adjacent bins were calculated using the software QTL IciMapping v4.2 [59] (https://isbreedingen.caas. cn/software/qtllcimapping/294607.htm, accessed on 24 July 2019). For those bins that could not be mapped according to the expected order, each was taken as an inquiry bin to search for the target bin that was closely linked to it, using the option Rippling provided in the software. If a target bin was found, the inquiry bin was tried to be inserted into the map beside the target bin. After no more bins could be added into the map, the following procedure was conducted on each linkage group: (1) bin genotype errors detected by the linkage analysis were corrected according to the flanking bins; (2) bins that had the same genetic position and were physically consecutive were merged as a larger bin; and (3) linkage analysis was performed again. This procedure was repeated until no bin genotype errors could be found and no bins could be merged in the map.

Field Experiment and Phenotyping
Field experiments of the F 2:3 population with the two parents as controls were carried out twice in the farm of Fujian Agriculture and Forestry University at Yangzhong, Youxi (E118.485841, N26.287161) in two different cropping seasons in 2020. In the first season, seeds were sown in growing trays after pregermination on 20 February, and seedlings were transplanted into the field on 5 April. In the second season, seeds were sown on 9 May and seedlings were transplanted on 12 July. In both seasons, the seedlings to be transplanted were all at the age of five full leaves plus one emerging leaf. The seedlings were planted in a regime of 80 cm between rows and 100 cm between plants, under a randomized complete block design with two replicates. Five seedlings were planted for each F 2:3 line in each replicate.
For each plant, leaf vein color (LVC) was observed from the adaxial side of a mature leaf at the stage when the second inflorescence on the main stem began to flower, measured as 0 (green), 1 (light purple), or 2 (purple); fruit pericarp color (FPC) was investigated from the fruits produced by the second inflorescence on the main stem at the stage of horticultural maturity, measured as 0 (white), 1 (greenish white), 2 (light green), 3 (green), 4 (dark green), 5 (purple), 6 (brown), or 7 (dark brown).

QTL Analysis and Candidate Gene Prediction
Mapping of QTLs underlying the color traits in the F 2:3 population was performed based on the bin genotype data and the genetic map using the software QTL IciMapping v4.2 [59]. The LOD threshold at the overall (genome-wise) significance level of 0.05 was estimated by permutation test with 1000 replicates. Genome-wide association study (GWAS) was conducted to verify the mapped QTLs using the marker (SNP and short InDel) data and the method of mixed linear model (MLM) implemented by the R program rMVP [47]. Using the genome sequencing data of the two parents, possible candidate genes of a major QTL were predicted by examining reliable and effective coding sequence variations between the two parents within the QTL's bin interval based on the following criteria: (1) the polymorphic site was homozygous with sufficient sequencing depth (counts ≥ 6) in both parents; (2) the variation could cause protein change between the two parents; and (3) the genotype of the polymorphic site in YZQ (or YQ) was the same as (or different from) that in the reference genome of HQ-1315, which is similar to YZQ in anthocyanin pigmentation [54].

Conclusions
In this study, a high-density recombination bin-based genetic map of eggplant was constructed based on an F 2 population derived from an interspecific cross (S. melongena × S. incanum) using the whole genome resequencing strategy and was then applied to QTL analysis for anthocyanin pigmentation in leaf vein and fruit pericarp. A major QTL with pleiotropic effects on the two traits was steadily mapped in a small region on chromosome E10 in two cropping seasons, and a possible candidate gene of the QTL was identified. In addition, some mistakes of genome sequence assembly in the reference genome were found and corrected in the genetic map construction. Our results demonstrated the merit of high-density genetic map for high-resolution QTL mapping as well as for discovering and correcting the mistakes in genome sequence assembly.