QTL Mapping for Fiber Quality and Yield Traits Based on Introgression Lines Derived from Gossypium hirsutum × G. tomentosum

The tetraploid species Gossypium hirsutum is cultivated widely throughout the world with high yield and moderate fiber quality, but its genetic basis is narrow. A set of 107 introgression lines (ILs) was developed with an interspecific cross using G. hirsutum acc. 4105 as the recurrent parent and G. tomentosum as the donor parent. A specific locus amplified fragment sequencing (SLAF-seq) strategy was used to obtain high-throughput single nucleotide polymorphism (SNP) markers. In total, 3157 high-quality SNP markers were obtained and further used for identification of quantitative trait loci (QTLs) for fiber quality and yield traits evaluated in multiple environments. In total, 74 QTLs were detected that were associated with five fiber quality traits (30 QTLs) and eight yield traits (44 QTLs), with 2.02–30.15% of the phenotypic variance explained (PVE), and 69 markers were found to be associated with these thirteen traits. Eleven chromosomes in the A sub-genome (At) harbored 47 QTLs, and nine chromosomes in the D sub-genome (Dt) harbored 27 QTLs. More than half (44 QTLs = 59.45%) showed positive additive effects for fiber and yield traits. Five QTL clusters were identified, with three in the At, comprised of thirteen QTLs, and two in the Dt comprised of seven QTLs. The ILs developed in this study and the identified QTLs will facilitate further molecular breeding for improvement of Upland cotton in terms of higher yield with enhanced fiber quality.


Introduction
Cotton is one of the cash crops around the world that provides superior fiber for the textile industry. Among the four cotton cultivated species, Upland cotton (Gossypium hirsutum) produces 95% of the world's cotton and is characterized as high yield with moderate fiber quality and wide adaptability [1,2]. In the last few years, worldwide cotton cultivation has declined, mainly due to high production costs and strong market competition with other crops [3]. Upland cotton dominates world cotton production with its high yield, but its fiber quality is undesirable, whereas Sea Island cotton (G. barbadense) is known to have the best fiber quality but a lower yield. Consequently, one of the goals of world-wide cotton breeding programs is improving Upland cotton cultivars.
With the globally increasing demand for textile products and strong competition from synthetic fibers, the need for Upland cotton cultivars with high yield and improved fiber has never been more critical [4]. Both fiber quality and yield are complex traits that are controlled by a magnitude of QTLs [5]. There is a complicated genetic correlation among fiber and yield traits because of the different population types and parental lines [6][7][8][9]. Thus, improving both fiber quality and yield simultaneously is a long-term task for cotton breeders. Conventional breeding procedures are becoming increasingly difficult because of their long duration and low selection efficiency [10].
The presence of genetic variability in cultivated plants for any breeding program is sometimes restricted, particularly when gene pools are not easily accessible. Wild related cotton species have long been used as genetic sources for incorporating new desirable traits to enhance the potential of cotton cultivars [11]. Three allotetraploid cotton species that are known to exist only in the wild, originating from Hawaii (G. tomentosum), Brazil (G. mustelinum), and Galapagos (G. darwinii), have been widely and extensively used in cotton breeding to improve both yield and fiber quality traits. Wang et al. [12] detected 278 polymorphic loci among 105 ILs by crossing G. hirsutum and G. darwinii. Wang et al. [13] identified 29 QTLs for fiber quality traits from an advanced backcross population obtained from a cross between G. hirsutum and G. mustelinum. G. tomentosum is a wild tetraploid species that is very closely related to the cultivated allotetraploid species G. hirsutum [14]. Zhang et al. [15] exploited the QTL alleles for improved fiber quality traits from G. tomentosum, using an advanced backcross population developed from G. hirsutum × G. tomentosum. Zheng et al. [16] constructed a genetic map using 1295 simple sequence repeat (SSR) markers to locate QTLs for drought-related traits from an interspecific cross between G. tomentosum and Upland cotton CRI-12.
The advent of map-based molecular markers and congeneric QTL mapping endorsed the expansion of new molecular approaches to capably exploit the positive alleles present in the wild species [17]. The introgression line libraries were examined by molecular markers, which carry some segments from wild related species in the background among cultivated species, allowing for the representation of entire wild species genomes [18,19]. These lines are obtained by several rounds of continuous backcrossing to its recurrent parent. The development of ILs provides an auspicious opportunity to well utilize the genetic potential of wild species in breeding. These ILs are especially useful for mapping of complex traits because these are permanent populations and could be evaluated in many environments. Additionally, allelic effects of the wild relative donor parent are cultivated in a homogeneous cultivate genetic background that eases the interaction among donor alleles [20,21].
In this study, a mapping population with 107 ILs was developed from the cross between G. hirsutum acc. 4105 and G. tomentosum. Few reports on cotton introgression breeding using wild species G. tomentosum are available for construction of a high-density map [22] and to identify QTLs for drought tolerance [16]. However, all of these studies are usually based on SSR markers. To the best of our knowledge, this is the pioneer study on ILs in cotton breeding using SNP markers. The objectives of the present study were to identify elite lines and to identify favorable QTLs from wild G. tomentosum for fiber quality and yield traits. The ILs and QTLs identified in this study can facilitate future molecular breeding programs to improve the fiber quality and yield in Upland cotton.

Phenotypic Performance of Yield and Fiber Quality Traits in ILs
The IL population was developed through interspecific hybridization followed by a series of selfing ( Figure 1). The derived IL population with 107 lines with their background parent G. hirsutum acc. 4105 was tested in three environments for two years. Mean squares revealed significant variations (p < 0.01 level) among ILs for all yield and fiber quality traits ( Table 1). The variation among the environments was also highly significant for all of the traits, which indicated that the performance of ILs was different across the three environments, while the variation due to genotype × environment was also highly different for all of the traits.  The descriptive statistics for the yield and fiber quality traits of the ILs population along with their background parent across three environments are displayed in Table 2. The values of the eight yield and five fiber quality traits showed a large range of variation. The coefficient of variation (CV) values for BN (24.91-25.43) were much higher than those of the other traits, while minimum values (1.27-1.52) of CV were recorded for FU. The results showed that ILs performed more or less equally to their background parent for most of the traits. The skewness and kurtosis were also calculated, and the results revealed that these traits fit a normal distribution. Distribution analysis of the phenotypic values of the thirteen traits showed an incessant normal distribution in the IL  The descriptive statistics for the yield and fiber quality traits of the ILs population along with their background parent across three environments are displayed in Table 2. The values of the eight yield and five fiber quality traits showed a large range of variation. The coefficient of variation (CV) values for BN (24.91-25.43) were much higher than those of the other traits, while minimum values (1.27-1.52) of CV were recorded for FU. The results showed that ILs performed more or less equally to their background parent for most of the traits. The skewness and kurtosis were also calculated, and the results revealed that these traits fit a normal distribution. Distribution analysis of the phenotypic values of the thirteen traits showed an incessant normal distribution in the IL population, suggesting that each trait was controlled by multiple genes (Figure S1a,b). The phenotypic trends of eight traits available in three environments are shown in Figure 2; HSW, PH, and BN are shown in Figure S2, (FBN and SI were only in environment 3, thus not analyzed). The variation among these traits across the locations within the same year was less, and the traits were stable, while these traits were influenced by environmental conditions and were less stable across the years. Correlation coefficients among yield and fiber quality traits were calculated and the results are displayed in Table S1. Best linear unbiased predictor (BLUP) values were used to measure the phenotypic correlations among yield and fiber quality traits. The results indicated that FL, FU, and FS exhibited highly significant associations with each other, while MIC showed a significant but negative correlation with FL and it showed negative and non-significant associations with FU and FE. Conversely, FE demonstrated non-significant and negative correlations with MIC and FL, and it had a non-significant relationship with FU and FS. Most of the fiber quality traits showed a non-significant association with the yield traits except MIC, which showed a highly significant association with LP. BW exhibited a highly significant correlation with LW and a significant correlation with HSW and SI. LW was found to have a highly significant relationship with LP and HSW, while HSW demonstrated a significant association with PH and SI. However, PH had significant correlations with BN and FBN.

SLAF-seq and SNP Marker Development
In total, 265.20 million base (Mb) paired end reads of 100 bp length were obtained in both recurrent and donor parents and 107 ILs. The average Q30 ratio was 91% (average 91% of the bases were of high quality, Q30 means a quality score of 30, indicating a 0.1% chance of an error) and the guanine-cytosine (GC) content was 40% (Table S2). The number of reads in the recurrent parent and donor parent were 19,796,308 and 9,273,354 with a Q30 ratio of 81.87 and 91.1, and the GC content was 41.31 and 40.12, respectively. On average, 2,206,821 reads with a Q30 ratio of 90% and GC content of 37% were generated in ILs. In the recurrent parent, the specific locus amplified fragment (SLAF) number was 595,419, and the average sequencing depth in each marker was 29.73-fold ( Figure S3a; Table S3). While for the donor parent, 359,255 SLAFs were obtained with an average depth of 21.12-fold. In the ILs population, the SLAF number was 258,697, and the average depth in each SLAF marker was eight-fold. The average coverage in parents and population revealed that the sequencing results were reliable for marker exploring. Overall, 3157 SNP polymorphic markers were developed and were used for genotyping the whole ILs population and parents. The marker number ranged from 177 to 924 in ILs ( Figure S3b). However, SNP markers on each chromosome ranged from 24 to 535 ( Figure 3). Chromosome A13 harbor maximum number of markers followed by D06 and A01. Distribution of markers on most of the chromosomes is not uniform, as several larger gaps are shown between the markers. Position of the markers on corresponding chromosome is shown in Figure 3. Maximum SNP markers (1816) were in At, and 1341 markers were in Dt. Statistics for SNP number, hetero-loci, and homo-loci for each ILs is presented in Table S4. The average hetero-loci number in each ILs is 1099.98 with a range of 482-5698, while average homo-loci number in each ILs is 13,583.04 with a range of 9329-14,948. However, hetero-loci ration in each ILs ranged from 0.035 to 0.379 with an average of 0.073, and average homo-loci ratio in each ILs is 0.926, ranging from 0.620 to 0.964.

SLAF-seq and SNP Marker Development
In total, 265.20 million base (Mb) paired end reads of 100 bp length were obtained in both recurrent and donor parents and 107 ILs. The average Q30 ratio was 91% (average 91% of the bases were of high quality, Q30 means a quality score of 30, indicating a 0.1% chance of an error) and the guanine-cytosine (GC) content was 40% (Table S2). The number of reads in the recurrent parent and donor parent were 19,796,308 and 9,273,354 with a Q30 ratio of 81.87 and 91.1, and the GC content was 41.31 and 40.12, respectively. On average, 2,206,821 reads with a Q30 ratio of 90% and GC content of 37% were generated in ILs. In the recurrent parent, the specific locus amplified fragment (SLAF) number was 595,419, and the average sequencing depth in each marker was 29.73-fold ( Figure S3a; Table S3). While for the donor parent, 359,255 SLAFs were obtained with an average depth of 21.12-fold. In the ILs population, the SLAF number was 258,697, and the average depth in each SLAF marker was eight-fold. The average coverage in parents and population revealed that the sequencing results were reliable for marker exploring. Overall, 3157 SNP polymorphic markers were developed and were used for genotyping the whole ILs population and parents. The marker number ranged from 177 to 924 in ILs ( Figure S3b). However, SNP markers on each chromosome ranged from 24 to 535 ( Figure 3). Chromosome A13 harbor maximum number of markers followed by D06 and A01. Distribution of markers on most of the chromosomes is not uniform, as several larger gaps are shown between the markers. Position of the markers on corresponding chromosome is shown in Figure 3. Maximum SNP markers (1816) were in At, and 1341 markers were in Dt. Statistics for SNP number, hetero-loci, and homo-loci for each ILs is presented in Table S4. The average hetero-loci number in each ILs is 1099.98 with a range of 482-5698, while average homo-loci number in each ILs is 13,583.04 with a range of 9329-14,948. However, hetero-loci ration in each ILs ranged from 0.035 to 0.379 with an average of 0.073, and average homo-loci ratio in each ILs is 0.926, ranging from 0.620 to 0.964.

Genomic Component and Diversity of the ILs
The physical distance, coverage of the introgressed segments in the genome and percentage of genome coverage were calculated by using Microsoft Excel. The average physical distance in the At is 88.48 Mb, ranging from 62.91 to 103.63 Mb (Table 3). However, the average physical distance in the Dt is 59.57 Mb, ranging from 46.69 to 67.28 Mb. The introgressed segments represented 35.13% of the genome of tetraploid cotton, with 35.68% and 34.59% in the At and Dt, respectively. The highest coverage of introgressed segments was 62 Mb on chromosome A13, while the lowest was 9 Mb on chromosome D09. The maximum percentage of genome coverage was on A13 (77.54%), and the minimum percentage of coverage was on D11 (19.67%) (Figure 3 and Table 3). The introgression segments from wild parent in the each IL and on each chromosome are shown in

QTL Mapping for Fiber Quality and Yield Traits
A total of 74 QTLs for fiber quality and yield were detected on 20 chromosomes. Of these, 30 QTLs were detected for five fiber quality traits, and 44 QTLs for eight yield traits. From the 74 QTLs, 47 QTLs (63.51%) were identified in the At, and 27 QTLs (36.49%) were in the Dt. Overall, 69 SNP markers were found to be associated with 74 QTLs. Twenty-nine markers were associated with fiber quality traits, and forty markers were associated with yield. The logarithm of odds (LOD), position, percentages of PVE, and additive effects of QTLs are presented in Table S5. On average, the PVE in 13 traits was 10.22%, ranging from 2.02% to 30.15%.
Only one QTL for FU (qFU-A06-1) was identified on chromosome A06, with decreasing effects on fiber uniformity. The PVE of this QTL was 17.94%.
Only one QTL for FBN (qFBN-D02-1) was identified on chromosome D02 with decreasing effects. The PVE of this QTL was 13.07%, and the LOD score was 3.02.
Two QTLs for SI on two chromosomes (A02 and A13) were detected with PVE values of 10.83 and 18.10% for each QTL. LOD scores of these QTLs were 3.62 and 6.17, respectively. Both QTLs (qSI-A02-1 and qSI-A13-1) had positive additive effects.

QTL Hotspot Analysis
For identification of significant genomic regions that harbor multiple QTLs associated with the important fiber quality and yield traits, the positions of all of the QTLs were specified on the chromosomes. Said et al. [23] conducted meta QTL analysis and suggested that a QTL cluster can be assigned to 20 cM region amid presence of more than two stable QTLs. Moreover, integration of interspecific genetic map [24] estimated that on average 1cM equated~0.5 Mb physical region on cotton genome. Here we considered 10 Mb (~20 cM) physical region enclosing two or more QTLs as a cluster. Five QTL clusters on five chromosomes were detected with at least three QTLs in each cluster (Table 4). Three clusters were in the At, and two were in the Dt. All of these clusters contained QTLs for more than one different fiber quality or yield trait. The highest number of QTLs was five in A01-cluster and A13-cluster, each in the At for FS, LP, BN, HSW, PH, and SI. These QTLs with increasing effects could simultaneously improve the yield with acceptable fiber quality in Upland cotton.

Advantage of the Permanent Population Derived from Wild Species
Simultaneous improvement of cotton yield with good fiber quality is highly desired in cotton breeding. Nonetheless, successes in such breeding programs are restricted by the absence of promising alleles considering outstanding fiber quality and yield in gene pool of Upland cotton [25]. Use of introgression lines with better yield potential and superior fiber quality is one of the key strategies for improving Upland cotton in terms of both higher yield and good fiber quality. Although wild species are morphologically inferior, they possess superior alleles which have been gone behind in domestication and are frequently discovered by the poor genetic background. The fiber quality potential of elite cotton varieties can further be improved when these superior alleles are transferred into elite cotton cultivars. Thus, IL population offers an opportunity for efficient utilization of the genetic potential of the wild species. Zamir [18] described that the complete set of ILs was supposed to exemplify the whole exotic genome, while each single line possessed chromosome segments from the exotic parental line and rest of the genome was consistently obtained from an elite variety.
In present research, an ILs population with 107 lines was developed with five rounds of backcrossing, each hypothetically minimizing exotic genomic portion by 50% in the following generations. Five selfings have been successively executed to obtain complete homozygous lines that are the stable genetic resource for further evaluation. A wild allotetraploid species G. tomentosum was used as donor parent, and the main reason behind selection of G. tomentosum as male parent was to utilize the desirable exotic genes for the improvement of Upland cotton. In this study, the ILs population is a rich genetic material which possesses great diversity with introgressed segments from wild G. tomentosum species, and introgressed segments represent 34.54% of the genome of tetraploid cotton. The coverage of introgressed segments is much higher in the At than in the Dt. The ILs population showed a large range of variability for the fiber quality and yield traits. Introgression segments from G. tomentosum have greatly affected both fiber quality and yield in ILs. More than half of the ILs were found to be with better fiber quality traits and improved yield. This IL population was used to dissect genetic architecture of the fiber quality and yield traits in this study, and this population will serve as vital genomic resource for cotton breeding and QTL fine mapping.

Characteristics of SLAF-Sequencing Strategy in Genotyping ILs
Previous studies in cotton using ILs were usually based on SSR markers. Numerous studies about QTL mapping for the improvement of fiber quality and yield traits in the Gossypium genus using SSRs have been reported using germplasm having introgressed genomic segments from their wild relative species [12,26]. For high resolution of QTL mapping, traditional markers such as SSRs are not sufficient when their linkage distance is zero, and thus, they cannot meet the requirements [27].
Polyploid crops have large genomes along with large-scale repetitive sequences; as a result, there are many challenges in developing SNPs for polyploidy crops. With the fast progress of whole genome sequencing technologies, sequencing-based marker discovery and genotyping technologies make it possible and have provided good opportunities to develop high-throughput and large-scale SNP markers in many genetic studies [28]. SNPs are ample and are a very stable type of genetic variation, which are characterized as having lower mutation rates, higher numbers, and higher accuracy [29,30]. This has led to the discovery of superior high-density SNP gene-chip technology, which is developed as a superior method for linkage mapping and QTL detection. Now, it is being used extensively to detect QTLs in bi-parental populations of many crop species [31].
SLAF sequencing is a recently developed high-resolution technique to obtain SNPs in large numbers and to perform genotyping by high-throughput sequencing [32]; it combines both high-throughput sequencing and specific locus amplification, and it has been successively and widely used in several plant species [32][33][34][35]. The results obtained from these studies have proven that SLAF-seq is a strong high-throughput method to develop large numbers of SNP markers in a limited time. Compared to other methods of SNP marker development, SLAF-sequencing technology has many significant advantages, such as uniformity, high accuracy, stability, high rate of success, and low cost. Additionally, it does not necessarily require full genomes sequences or genome SNPs [32]. In cotton, few reports of SLAF-seq are available regarding the construction of high-density genetic maps [36], identification of favorable alleles and candidate genes [37], or comprehensive analysis of polymorphisms among tetraploid cotton species [38]. In this study, a SLAF-seq strategy was used in the ILs population in order to identify QTLs for fiber quality and yield from wild relative species. Taking advantage of massively parallel sequencing technology, 265.20 Mb pair-end reads were generated by the SLAF method. After the pre-design scheme, a pilot experiment for ensuring the marker density, uniformity, and efficiency, and filtering low-depth SLAF tags was performed, and eventually, 3157 polymorphic SNP markers were identified. The integrity and the precision of SLAF markers were higher. The distribution of markers was uneven, as more markers (1816) were identified in the At than in the Dt (1341). Few chromosomes, such as A13 followed by D06 and A01, harbored large numbers of markers, which might be due to the large number of introgressed segments on these chromosomes. In initial steps of marker development using SLAF-sequencing, SLAF quantities were more or less same sizes in the chromosomes. After several steps of screening and stringent criteria of SNP filtration, the number of SNP markers varied greatly, and SNP marker ratio reduced on some chromosomes, which led to uneven distribution of markers among the chromosomes. The unbalanced chromosomal dissemination of marker loci is possibly the result of the larger genome size of the At. The results obtained in this study clearly demonstrated that SLAF sequencing is a suitable tool for rapid development of efficient markers in large numbers and large-scale genotyping.

QTL Mapping Using SNP Markers in ILs
Fiber quality and yield traits are controlled by polygenes, and the QTLs identified tend to vary in diverse environments. For identification of more stable and convincing QTLs in multiple environments, permanent populations such as ILs are needed. The ILs are an ideal population for QTL mapping of the complex traits because they possess the potential to reveal new alleles from the wild landraces, to identify genes, and to develop genome-wide genetic resources [39]. It is sensible that the higher yield and superior fiber quality traits of ILs are associated with introgressed genomic components or separated introgression alleles. Therefore, identifying the QTLs for both yield-and fiber quality-related traits associated with introgressed genomic components by molecular markers could provide a better understanding of the genetic mechanism of the introgressed segments on yield and fiber quality traits.
In this study, 74 QTLs were detected using ICIM software 4.1 (http://www.isbreeding.net). Of these, 30 QTLs were detected for five fiber quality traits, and 44 QTLs were detected for eight yield traits. The QTLs were not distributed uniformly in the At and Dt. Of the 74 QTLs, 47 QTLs (63.51%) were identified in the At compared with the Dt (27 QTLs, 36.49%). Previous studies using meta-analysis [5,40,41] have reported that in cotton, a higher number of QTLs for fiber traits exist in the Dt chromosomes. Yu et al. [7] also observed 35% more QTLs in the Dt in an interspecific backcross inbreed line (BIL) population. However, our results are inconsistent with these previous reports. This observation might possibly be due to the use of different populations and different markers, and genes related to adaptation or domestication should be more common in the At because these ILs need to produce more seed for survival. Two QTLs (qFS-A09 and qLP-A07) were detected in two environments. Significant environmental effects were observed for fiber quality and yield traits and different climatic conditions have been observed to affect the fiber quality and yield of cotton. This might be the reason why we were unable to detect more stable QTLs in this study. In this study, PVE in 13 different traits was relatively low (10.22%), ranging from 2.02% to 30.15%. Low PVE (3.18%) ranging from 1.05% to 12.67% for six different yield and fiber traits using introgression lines was also reported by Si et al. [42]. More than half, 44 QTLs (59.46%), in this study showed positive additive effects for fiber quality and yield traits, suggesting that segments from the donor parent improved both the fiber quality and yield significantly, and these ILs could be used for improvement of the desired yield and fiber quality in G. hirsutum cultivars. Because this is a pioneer study in employing high-throughput sequencing by SLAF in cotton using ILs population, the results obtained are unique compared to previous reports, and it is difficult to compare our results with previous results.
Co-localization of QTL on chromosomes, referred to as QTL clusters, were detected for fiber quality and yield traits in this study, indicating that the pleiotropic loci may control these traits. QTL co-localization on chromosomes, referred to as "QTL cluster/hotspots", have previously reported in cotton [5,42]. In the present research, a few genomic regions containing QTL clusters were examined, mainly on chromosomes A01, A09, A13, D02, and D10. These QTL clusters affected two or more different fiber quality and yield traits. The highest number of QTLs was observed on A01-cluster and A13-cluster, where five QTLs were detected on each cluster for FS, LP, BN, HSW, PH, and SI. These co-localized QTLs might elucidate the phenotypic correlation measured. These QTL clusters have provided some valuable information to define genome regions with different traits. Based on the comprehensive analysis of clusters in this study, breeding programs targeting higher yields with superior fiber quality can focus on hotspot clustering areas and select around the region. The existence of hotspots and QTL clusters has proven that genes related to certain traits were more heavily focused in certain areas of the genome than others [5,43].

Plant Materials
In this study, an interspecific BC 5 S 5 introgression population with 107 lines (designated as 4M ILs) was developed. G. hirsutum acc. 4105, a high-yield Upland cotton line, was used as the recurrent parent, while G. tomentosum was used as the donor parent. The F 1 plants were crossed with its recurrent parent as the female parent to produce BC 1 F 1 individuals. The BC 5 F 1 population was developed with a series of backcrosses to its recurrent parent as the female parent. Thereafter, BC 5 F 1 individuals were continuously self-fertilized to produce the BC 5 S 5 introgression population ( Figure S3). All of the ILs along with the background parent were planted in Huanggang and Jinzhou, Hubei province, in 2015, and in Jinzhou and Ezhou (crop was destroyed by flood and waterlogging in Ezhou), Hubei province, in 2016. A randomized complete block design with two replications was applied in each environment. Ten plants were maintained in one row for each line per replication. The essential cultural operations were adopted uniformly in all of the plots throughout the growing period. All agronomic practices such as standard cultivation, weed and insect control practices were followed at the proper time throughout the growing season.

Phenotypic Traits Collection and Analysis
The phenotypic performance of ILs was assessed in eight yield and five fiber quality traits across the three environments. Twenty bolls at maturity from middle fruiting branches of the plants from each line per replication in each environment were collected, and yield traits were determined by weighing the samples and adopting equations. The yield traits included boll weight (BW), lint weight (LW), lint percentage (LP), 100-seed weight (HSW), plant height (PH), number of bolls plant −1 (BN), number of fruiting branches (FBN), and seed index (SI). Data for HSW, PH, and BN were missing in E1, and FBN and SI were only available in E3 and analyzed accordingly. For the measurement of fiber quality traits, 10-15 g fiber samples after ginning from every line along with their parental line were sent to the Institute of Cotton Research, Shihezi Academy of Agricultural Sciences, Xinjiang, for testing fiber quality. The fiber quality traits were tested with an HVI-1000 Automatic Fiber Determination System (USTER ® HVI 1000, Uster Technologies, Uster, Switzerland) at 20 • C and at relative humidity of approximately 65. Traits included the micronaire value (MIC), fiber length (FL), fiber uniformity (FU), fiber strength (FS), and fiber elongation (FE). The analysis of variance and basic statistics such as the mean squares, standard deviation, standard error, skewness, and kurtosis analysis for yield and fiber quality traits were calculated using the STATISTIX 8.1 package (Analytical Software 2005, v8.1, Tallahassee, FL, USA). In addition, correlation coefficients (r) among yield and fiber quality traits were also calculated. The frequency distribution of traits was analyzed by using SPSS version 20.0 (SPSS, Chicago, IL, USA), and the analysis of phenotypic changing trends and the relevance of yield and fiber quality traits were shown in a box plot produced by using the "R" program.

DNA Extraction, SLAF-Library Construction, and High throughput Sequencing
The young and healthy fresh leaf samples from G. hirsutum acc. 4105 and each IL were collected and kept at −70 • C. The total genomic DNA was extracted by using a TIANGEN Plant Genomic Kit (TIANGEN Biotech, Beijing, China). The concentration of DNA was tested with a NanoDrop-2000 Spectrophotometer (NanoDrop, Wilmington, DE, USA), and the quality of DNA was also determined by the agarose gel electrophoresis (1%). The procedure for construction of the SLAF library was executed according to Shen et al. [38]. Initially, a pilot experiment was performed for evaluation of the enzymes and to determine the sizes of the restriction fragments for preparing maximum quantity and high quality SLAFs. For SLAF sequencing, four criteria were selected: (i) the SLAF number must be as low as possible in the repeated regions; (ii) SLAFs must be consistently disseminated in the whole genome; (iii) the SLAF length should be appropriate for an exact experimental system; and (iv) the final SLAF number must meet to the expectations. Then, based on the results obtained from the pilot experiment, the SLAF library was constructed. A cotton reference genome [8] was used. The clean DNA was digested into fragments with a size of 314-344 bp with the specific enzyme combinations Hae+Rsal (NEB, Ipswich, MA, USA.). Afterwards, fragments end amends, indexed paired-end adapters, ligation, and adjusted ends obtainment were then performed step by step accordingly. Objective size was selected on a 2% agarose gel and subjected to amplification of the fragments through reaction. Finally, high-throughput sequencing was performed using an Illumina HiSeqTM-2500 (Illumina, Inc., SanDiego, CA, USA) at the Biomarker Technologies Corporation in Beijing. A real-time examination was performed and the ratio of the high quality with quality scores higher than the Q30 (indicating the chance of an error of approximately 0.1% that means 99.9% of the confidence) in the raw reads was calculated, while guanine-cytosine (GC) amounts for the quality control was measured. The SLAF-sequences of G. tomentosum were from Shen et al. [38].

SLAF-seq Data Grouping and Genotyping
The identification of the SLAF markers and genotyping was done according to the procedures described by Shen et al. [38]. In brief, raw reads were arranged for the progenies according to the duplex barcode sequences, and low-quality reads (quality score < 20 e) were first filtered out. Then, each of the high-quality reads was trimmed off by 5 bp at the terminal site [36]. The reference genome TM-1 sequence was downloaded from the CottonGen database (https://www.cottongen.org). Clean reads from each sample were achieved and mapped to the G. hirsutum TM-1 genome [8] using Burrows-Wheeler-Aligner (BWA, v0.7.10) software [44]. The mapped reads with high mapping quality (MQ ≥ 20) and high base quality (Q ≥ 30) were considered for downstream analysis [38]. Genome Analysis Toolkit (GATK) software [45], Samtools/bcftools [46,47] were used to detect SNPs with default parameters. Furthermore, in order to minimize detection of the false positives when calling SNPs, the stringent parameters of the software were used. SNPs were filtered with the criteria that the minimum read depth was less than 10, and the average base quality was less than 30 [38]. Between two parents, 25,659 SNPs were identified, and 20,370 SNPs were on the chromosome. Then, the SNPs in each ILs having the different position between the parents were further filtered. Finally, only 3157 SNPs were obtained on the chromosome with consistency in the parents. CIRCOS 0.66 software was used to estimate the positions of markers on physical map [48]. The genome sequences of ILs developed from SLAF-seq are available at NCBI Sequence Read Archive database with BioProject accession number PRJNA421265 and PRJNA316549 [38].

QTL Analysis for Yield and Fiber Quality Traits
A likelihood test based on stepwise regression (RSTER-LRT) was used to identify QTLs in ILs [49]. The software QTL IciMapping 4.1 (http://www.isbreeding.net) was used to detect QTL effects. An LOD threshold of 3.0 was considered to define significant additive QTLs. QTLs were named as follows: q + trait abbreviation + chromosome number + QTL number [50].

Conclusions
In conclusion, we used a population of 107 ILs and confirmed the potential of this population for QTL identification of fiber quality and yield traits. Seventy-four QTLs related to different fiber quality and yield traits were identified. Five QTL clusters were identified that could be used in further breeding programs for improving the fiber quality with high yield in the Upland cotton. This IL population can be used for the mapping and cloning of the novel QTL/genes that control corresponding desired traits and will serve as a rich source of plant materials for the cotton research community.