QTL Mapping of Fiber Quality and Yield-Related Traits in an Intra-Specific Upland Cotton Using Genotype by Sequencing (GBS)

Fiber quality and yield improvement are crucial for cotton domestication and breeding. With the transformation in spinning techniques and multiplicity needs, the development of cotton fiber quality and yield is of great importance. A genetic map of 5178 Single Nucleotide Polymorphism (SNP) markers were generated using 277 F2:3 population, from an intra-specific cross between two upland cotton accessions, CCRI35 a high fiber quality as female and Nan Dan Ba Di Da Hua (NH), with good yield properties as male parent. The map spanned 4768.098 cM with an average distance of 0.92 cM. A total of 110 Quantitative Traits Loci (QTLs) were identified for 11 traits, but only 30 QTLs were consistent in at least two environments. The highest percentage of phenotypic variance explained by a single QTL was 15.45%. Two major cluster regions were found, cluster 1 (chromosome17-D03) and cluster 2 (chromosome26-D12). Five candidate genes were identified in the two QTL cluster regions. Based on GO functional annotation, all the genes were highly correlated with fiber development, with functions such as protein kinase and phosphorylation. The five genes were associated with various fiber traits as follows: Gh_D03G0889 linked to qFM-D03_cb, Gh_D12G0093, Gh_D12G0410, Gh_D12G0435 associated with qFS-D12_cb and Gh_D12G0969 linked to qFY-D12_cb. Further structural annotation and fine mapping is needed to determine the specific role played by the five identified genes in fiber quality and yield related pathway.


Introduction
Cotton is one of the most important natural fibers and oil crops in the world. Its annual global market value was estimated to be $630.6 billion in 2011 [1]. Cotton fiber is the primary raw material in the textile industry [2]. The advancements in techniques and diversified methods of spinning have made cotton fiber quality and related yield traits of paramount significance in breeding and production of cotton [3]. Fiber quality is determined by a number of factors such as fiber strength, fiber length, fiber micronaire and fiber color, while yield is mainly determined by lint quantity [4]. However, lint yield and fiber quality have been found to be negatively correlated [5,6], which has long been a critical issue in cotton breeding [7]. Recently, Shang et al. [8] identified 20 QTLs for fiber quality-related traits, however, four QTLs were validated. Moreover, five fiber quality traits were linked to 59 QTLs in an earlier report across five environments [9]. So far, few numbers of QTLs have been employed in marker-assisted selection (MAS) which is one of the enhanced breeding methods [10]. In all the identified and documented QTLs related to fiber and yield traits, most of them have been localized in a wide range of genomic regions and are often not stable across a wide genetic backgrounds [11]. Therefore, a dense interspecific map was generated, which included 2316 loci on the 26 cotton chromosomes in order to reduce and enhance accuracy in mapping [12]. However, these maps developed from interspecific hybridization have limited use in breeding due to limitation in controlling defective genes [2,5].
To overcome the inefficiency of maps developed from interspecific hybridization, it is therefore imperative to generate molecular maps based on an intraspecific population due to their ability to reduce the wide genome gap [2]. The employment of molecular marker techniques in cotton breeding through MAS and more advanced approaches such as genomic selection (GS) [13] would help break the bottleneck and, in turn, development of genetically advantaged genotypes. A small part of a DNA can be archived by reducing the complexity of the genome by restriction enzymes, such as genotyping-by-sequencing (GBS), the reduced-representation libraries (RRLs), restriction-site-associated DNA sequencing (RAD-seq) and next generation sequencing (NGS) [14].
The next-generation sequencing (NGS) of crop plant genomes have transformed the field of plant breeding. In the recent past, a lot of data generated has facilitated the discovery and use of large scale of single nucleotide polymorphisms (SNPs) in different genomes [15,16]. One of which was, genotype by sequencing (GBS), which holds the potential to narrow down the genotyping gap between references of large interest and mapping or breeding populations of local or specific interest [17]. GBS protocol techniques with their sample multiplicity have kept molecular research costs low while their output has diverse applications in many research areas, ranging from gene discovery to genomic-assisted breeding [18]. The ability of generating large amounts of unbiased markers in an inexpensive methods, has enabled GBS to become a more attractive approach to genotype and to construct high-resolution genetic maps, genomic selection and facilitated QTL mapping [19].
Mapping of QTLs has become an important technique to facilitate quantitative trait research and has been largely used in agricultural crops to map a number of beneficial agronomic traits including fiber quality and related yield traits.
In this investigation, a genetic map of 5178 SNP markers was generated using a 277 F 2:3 intraspecific population developed from two tetraploid upland cotton accessions, mainly cultivated in China. CCRI35 with good fiber quality as female parent and Nan Dan Ba Di Da Hua (NH) known for high yield fiber as male parent. The map generated was employed to analyze QTLs related to fiber quality and yield related traits using QTL cartographer [20]. The aim of this study was to identify QTLs related to fiber quality, yield component traits, localize their position within the cotton genome and to identify the genes tightly linked to those QTLs. Findings of this research could provide valuable insights for breeders to develop cultivars with both traits, yield and quality fiber and enhance selection in cotton breeding.

Phenotypic Variation between the Two Parents
In the determination of phenotypic variation of the 11 measured traits, Boll weight (BW), lint percentage (LP), fiber reflectance (FR), fiber yellowness (FY), spinning consistency index (SCI), and mature index (MI) were not used in analysis of the phenotypic variation between the parental lines due to the huge missing data throughout the phenotyping periods. The five traits used were fiber length (FL), fiber uniformity (FU), fiber strength (FS), fiber micronaire (FM) and fiber elongation (FE). FL, FU and FS showed significant differences between the parental lines. All traits were higher in CCRI35 than NH with exemption of FE which was higher in NH. In addition, no significant difference was noted between the two parental lines for fiber micronaire (FM) and fiber elongation (FE), Figure 1. However, there was a wide range of phenotypic variation among the F 2:3 population, with respect to all the measured traits; BW, LP, FL, FU, FM, FS, FE, FR, FY, SCI and MI. Across the three environments, 2014, 2015 and 2016 all the traits showed normal segregation with normalized distribution patterns ( Figure 2). Int. J. Mol. Sci. 2018, 19, x FOR PEER REVIEW 3 of 25 in CCRI35 than NH with exemption of FE which was higher in NH. In addition, no significant difference was noted between the two parental lines for fiber micronaire (FM) and fiber elongation (FE), Figure 1. However, there was a wide range of phenotypic variation among the F2:3 population,  with respect to all the measured traits; BW, LP, FL, FU, FM, FS, FE, FR, FY, SCI and MI. Across the  three environments, 2014, 2015 and 2016 all the traits showed normal segregation with

Correlation Analysis
To determine the correlations among different traits, a Pearson's correlation coefficient on yield-related and fiber quality trait was done using "Performance Analytics" package with Chart correlation function in R software version 3.4.2 [21]. Significant and positive correlations were noted between: BW with FL, FU, FM, FS, FE, FR, and MI; LP with FM and MI;FL with FU, FS, FE, FR, and SCI; FU with FS, FE, and SCI; FM with MI; FS with FE and SCI; FE with SCI and finally FR with SCI. Negative correlations were observed between: LP with FR and SCI; FL with FM; FM with FR and SCI; FY with SCI and finally SCI with MI ( Figure 3). However, no significant correlation was noted between the other traits.

Correlation Analysis
To determine the correlations among different traits, a Pearson's correlation coefficient on yield-related and fiber quality trait was done using "Performance Analytics" package with Chart correlation function in R software version 3.4.2 [21]. Significant and positive correlations were noted between: BW with FL, FU, FM, FS, FE, FR, and MI; LP with FM and MI;FL with FU, FS, FE, FR, and SCI; FU with FS, FE, and SCI; FM with MI; FS with FE and SCI; FE with SCI and finally FR with SCI. Negative correlations were observed between: LP with FR and SCI; FL with FM; FM with FR and SCI; FY with SCI and finally SCI with MI ( Figure 3). However, no significant correlation was noted between the other traits.

ANOVA, Broad Sense Heritability and Phenotypic Analysis of Fiber Quality for the Two Parents and the F2:3 Population
The ANOVA result revealed significant differences between the genotypes, environment and their interactions for all the traits ( Table 1).
The broad sense heritability was much higher for the fiber quality traits as opposed to yield-related traits. The highest broad sense heritability was observed with fiber micronaire (FM), with 92.4% while the lowest broad sense heritability was observed in fiber elongation (FE) with 61.8%.  The ANOVA result revealed significant differences between the genotypes, environment and their interactions for all the traits ( Table 1).
The broad sense heritability was much higher for the fiber quality traits as opposed to yield-related traits. The highest broad sense heritability was observed with fiber micronaire (FM), with 92.4% while the lowest broad sense heritability was observed in fiber elongation (FE) with 61.8%.

GBS Genotyping, SNP Detection and Annotation
The genotypic data for the entire population was developed by use of the genotyping by sequencing (GBS) technique. Fifteen (15) individuals of each of the parents were sequenced and mapped on to the reference genome, which we obtained from the cotton research institute (available online: http://mascotton.njau.edu.cn). We obtained a total of 20,542,731 and 20,244,825 reads for CCRI35 and NH, respectively. An average of 80,372 and 112,128 SNPs were eventually identified for the female parent (CCRI35) and the male parent (NH), respectively, with an enzyme digestion efficiency of 99%. In genotyping the F 2:3 population, the enzyme efficiency was slightly lower compared to its efficiency in the parents, with efficiency of 98.9%. The overall mapped reads for the population and the two parents were 1,507,193,217, with an average of 4,909,424.16 mapped reads per individual which correspond to nearly 180.889 Gb of clean bases. The clean reads obtained were equivalent to 80.42-fold haploid genome coverage of raw paired-end Illumina reads by sequencing whole genome shotgun (WGS) libraries of homozygous cv. "TM-1" compared to Li et al. [22] in their study which generated a total of 445.7 Gb of clean reads translating to about 181-fold haploid genome coverage of raw paired-end Illumina reads by sequencing whole genome shotgun (WGS) libraries of homozygous cv. "TM-1" with fragment lengths ranging from 250 bp to 40 kb. The average GC content of the sequences was 38.25%, with a Q20 score of 94.66%. The parental lines were genotypes such as AC and AA, in which the female parent CCRI35 was heterozygous while the male parent (NH) was homozygous. The total resulting SNPs markers were 103,381 markers which were used to carry out further analysis. We assessed the distribution of the alleles across the F 2:3 population, and those markers which had a coverage threshold of 75% were filtered out, eventually, 34,090 markers were used. Markers with significant distortion (p < 0.001) were filtered and 6405 markers were retained with the purpose of determining bin markers.

Construction of the Linkage Maps
In the construction of the linkage groups, we used 6405 markers (Table S1) and phenotypic data of the F 2:3 population developed from an intra-specific cross of two tetraploid upland cottons were utilized for developing the intra-specific linkage map. A total of 5178 GBS markers were used for mapping the F 2:3 population, all the distorted markers were filtered out, the linkage groups were generated by the use of Join Map 4.0 [23]. Twenty six (26) LGs were generated from 5178 markers ( Figure 4A and Figure S1, Table 2 and Table S2). Markers in linkage groups were ordered, rippled, and re-ordered according to pairwise recombination fractions, LOD scores (Logarithm of Odds) and linkage group length ( Figure 4B). The 26 LGs were designated as A01 to A13 for A t sub-genome and D01 to D13 for D t sub-genome. The map generated had a map distance of 4768.098 cM, higher than the most current upland cotton linkage map with a map distance of 4450 cM [24]. The average distance between adjacent markers was 0.92 cM, the marker distances were narrowed in the map generated compared to earlier maps with 1.7 cM between adjacent markers [24]. The At sub-genome spanned 2611.43 cM, with a total of 3313 markers in the 13 linkage groups, with an average distance of 0.79 cM, while in D t sub-genome, thirteen linkage groups comprised of 1865 markers spanning a distance of 2156.67 cM, with an average of 1.156 cM. The maximum gap between adjacent loci was 26.598 cM and 30.082 cM in A t and D t respectively, affirming the genome lengths between A t and D t [24] (Table 2). Chromosomes; A02, D02, A01, A05, A03, D01 and A10 exhibited higher marker loci with higher recombination frequency compared to the rest of the chromosomes such as D06 and D13 ( Figure 4A,B). The chromosome with the highest marker loci was chromosome A02, 705 loci with map distance of 346.314 cM and an average distance of 0.49 cM, while the lowest marker loci was detected in chromosome D06 with only 16 markers, and a total length of 79.084 cM ( Figure 4B).

Identification of Consistent and Clustering QTLs for Yield Related and Fiber Quality Traits
Thirty (30) QTLs were consistent among all the 110 QTLs identified for 11 traits in at least two environments (Table 3 and Figure S1). The 30 consistent QTLs were located on 16 chromosomes; (4), and D13 (1). The distribution of the QTLs within the identified chromosomes, exhibited multiple position as illustrated in Figure S1 and Table S2 and Table 3. Of the 30 detected QTLs, 11 were localized on A t sub-genome while the remaining 19 were mapped on the D t sub-genome. The contributions of the parents toward the QTLs: 19 QTLs were linked to the good fiber quality parent (CCRI35) while only 11 QTLs were contributed by the high yield fiber parent (NH). Only16 chromosomes out of 26 were found to harbor consistent QTLs for ten traits except MI (Mature Index) for yield-related and fiber quality (Table S2 and Table 3).
Four types of gene actions were revealed by the genetic effects of which one gene exhibited dominant effects (De), four partial dominances (PD), 20 over dominances (OD) and five additive effects (Ae). OD was observed for most of the traits in response to yield-related and fiber quality traits.
The highest percentage of phenotypic variance explained by a single QTL was 15.45%. The highest percentage of phenotypic variance was noted in lint percentage (LP), with a range of 10.03-15.46%. The distribution of the QTLs within the identified chromosomes, exhibited multiple positions in some chromosomes; A02, A03, A09, A10, D01, D03, D05, D08, D12, and D13 as illustrated in Table S2 and  Table 3 and Figure S1. Moreover, a total of two important clusters with more than three traits per region, with high broad sense heritability and high percentage of phenotypic variation were identified as D03 (c17) and D12 (c26), which we designated as cluster 1 and cluster 2, respectively (Table 3, Figures 5 and 6).

The Gene Ontology Enrichment Analysis Based on QTL Clusters
Based on phenotype variation and QTL frequency, D t -sub genome of the whole tetraploid chromosomes, harbored the highest number of stable QTLs with the highest level of phenotypic variation. In lieu of this, chromosome 17 (D03) and chromosome 26 (D12) had two clusters with four QTLS in each. Within the two cluster regions, we were able to mine the putative genes which could be having a role in fiber and yield-related traits. In cluster 1 (Chr17, D03), 136 genes were obtained, in which 14 were found to be highly expressed based on the RNA sequence while in cluster 2 (Chr26, D12), a total of 1280 genes were mined, out of which 153 were highly expressed at various stages of fiber development, 5 DPA, 10 DPA, 20 DPA and 25 DPA.
Moreover, in order to identify the set of the most robust candidate genes for yield-related traits and fiber quality; we mainly focused on the 153 highly expressed genes as obtained from "TM-1"_RNA-seq data (available online: http://mascotton.njau.edu.cn). Out of 153 highly expressed genes, five showed high level of expression across the various stages of fiber development, and therefore, the five genes could be the potential candidate genes with greater roles in the regulation of various fiber traits Table S3. Furthermore, all the five genes were localized in different positions of the genome: one gene (Gh_D03G0889) was located in cluster 1 (D03 (Chr 17)) within the marker mk12119_D03 (30,535,745 bp) to marker mk12123_D03 (30,566,883 bp), the trait localized in this region was fiber micronaire (FM); while the other three genes: Gh_D12G0093, Gh_D12G0410, and Gh_D12G0435 were localized in cluster 2 (D12 (c26)) within the marker mk19853 (101,319 bp) to marker mk17913_D12 (13,479,261 bp), the trait localized in the genome region was fiber strength (FS). Finally, the fifth gene, Gh_D12G0969 was also mapped in cluster 2 (D12 (Chr 26)), from marker mk1009 (18,989 bp) to marker mk17992_D12 (37,732,030 bp), the trait localized in that area was fiber yellowness (FY). Based on the expression profile and GO functional annotation, these five genes were therefore found to be the most robust and possibly the putative candidate genes for fiber quality and yield related traits (Table S3, Figures 7 and 8).
Based on GO enrichment analysis, the five highly up regulated genes were as follows: Gh_D03G0889 was mainly involved in molecular function and biological processes, such as, up regulation of translational elongation (GO: 0003746), poly-A RNA binding (GO: 0003723), ribosome receptor activity (GO: 0043022), hypusine anabolism (GO: 0008612), translation elongation factor (GO: 0003746), regulation of translation elongation (GO: 0045901) and regulation of translation termination (GO: 0045905). The second gene, Gh_D12G0093 was involved only in molecular function, protein amino acid binding (GO: 0005515). The third gene, Gh_D12G0410 was involved in all the GO functional annotation, in biological process, it was mainly involved in translation elongation (GO: 0006414), molecular function, it was mainly involved in translation elongation factor activity (GO: 0003746) and protein binding (GO: 0005515) while in cellular component, it was found to be involved in eukaryotic translation elongation factor 1 complex (GO: 0005853). The fourth gene, Gh_D12G0435, had no functional annotation, however it was found to function in nucleoside diphosphate kinase activity and the last gene, Gh_D12G0969, functions both in biological process and molecular function, nucleoside diphosphate kinase activity (GO: 0004550), nucleoside diphosphate phosphorylation (GO: 0006165), GTP biosynthetic process (GO: 0006183), UTP biosynthetic process (GO: 0006228), CTP biosynthetic process (GO: 0006241) and ATP binding (GO: 0005524). In relation to gene action analysis, the five putative and robust genes with direct role in fiber development in cotton were all contributed by the female parent, CCRI35, known for its superior fiber quality (Table S3 and Figures 7 and 8). The five genes had similar sequences based on phylogenetic tree analysis; the same was affirmed by their expression profile and all from D t -sub genome. High quality fiber attributes are highly linked to the D-genome of the diploid cotton such as G. barbadense, and being tetraploid cotton originated from the polyploidization of the A and D genomes of the diploid cotton.

Discussion
The determination of stable QTLs for superior agronomical traits and the construction of a high-resolution map are crucial for MAS. Several intra-specific genetic maps have been generated and used for QTL detection related to fiber and yield components [2]. Even though these maps have

Discussion
The determination of stable QTLs for superior agronomical traits and the construction of a high-resolution map are crucial for MAS. Several intra-specific genetic maps have been generated and used for QTL detection related to fiber and yield components [2]. Even though these maps have

Discussion
The determination of stable QTLs for superior agronomical traits and the construction of a high-resolution map are crucial for MAS. Several intra-specific genetic maps have been generated and used for QTL detection related to fiber and yield components [2]. Even though these maps have been used, they are limited in scope and accuracy due to huge marker intervals and narrow genome coverage. The greatest impediment in the construction of a high-resolution map in intraspecific crosses is due to low rate of polymorphism within G. hirsutum and the presence of fixed homozygous genetic blocks [11,25]. Therefore, there is a need to find additional markers to fill in the gaps in the genetic map [11]. In this current research, a genetic map consisting of 5178 SNP markers obtained through the GBS technique was developed using a 277 F 2:3 population derived from an intra-specific cross. In addition, the contrasting difference between the two parental lines used in this investigation could be explained based on inherent genetic characteristics. The male parent is known for superior agronomic traits such as early flowering and the ability to generate a high percentage of fruits with large size, while the female parent is known for superior fiber traits. Fiber length (FL), fiber uniformity (FU), fiber micronaire (FM), and fiber strength (FS) showed significant differences between the two parental lines. These traits were attributed to CCRI35, except FE which was linked to NH. There was no significant difference noted between the two parents for FE and FM. This result confirmed the good quality fiber trait of the female parent, CCRI35, compared to the male, NH.
In addition, there was a wide range of phenotypic variation among the F 2:3 population, with respects to the following measured traits: BW, LP, FL, FU, FM, FS, FE, FR, FY, SCI, and MI. In the three environments, all traits exhibited normal segregation patterns, with equal distribution. The low absolute values for skewness and kurtosis showed that these traits had normal distribution. In addition, in the F 2:3 population, the maximum phenotypic data values in all the variables were much higher than in CCRI35, the parent known for superior fiber traits, fiber length (FL), fiber uniformity (FU), fiber micronaire (FM), fiber strength (FS), and fiber elongation (FE). This finding showed that all traits were transgressively segregated in the F 2:3 population. Previous research reported that transgression was the difference observed between the mapping parents of upland cotton [11,[26][27][28].
Furthermore, positive correlations were noted between the following traits: boll weight (BW) with fiber length (FL), fiber uniformity (FU), fiber micronaire (FM), fiber strength (FS), fiber elongation (FE), fiber reflectance (FR), and mature index (MI); lint percentage (LP) with FM, FL with FU, FS, FE, and spinning consistency index (SCI); FU with FS, FE, and SCI; FM with MI; FS with FE, and SCI; FE with SCI and finally FR with SCI. However negative correlations were observed in the following traits: LP with FR, and SCI; FL with FM; FM with FR, and SCI; finally, SCI with MI. This result is consistent with those from Jamshed et al. [11] which showed that positive correlations were observed between: fiber elongation (FE), fiber length (FL), fiber strength (FS), and fiber uniformity (FU), with a significance level of 0.01. Moreover, FL and FS were both negatively correlated with fiber micronaire (FM). In this study, the correlations between FM with and FS were found to be negative but were not significant, which does not agree with previous findings. This deviation could be attributed to the population background used in this study.
A total map distance of 4768.098 cM was generated, higher than the most current linkage map with a map distance of the 4450 cM of cotton genome [24]. This is the densest intra-specific map developed in upland cotton. This map could be helpful for further studies in MAS, especially in fine mapping. The average distance of the adjacent markers was 0.92 cM. A t sub-genome spanned 2611.43 cM, and consisted of 3313 markers with 13 LGs. The average marker distance in A t sub-genome was 0.79 cM with a maximum gap of 26.598 cM of the adjacent markers. In D t sub-genome, 13 LGs were assigned which comprised of 1865 markers spanning 2156.67 cM, with an average of 1.156 cM. The maximum gap was 30.082 cM between adjacent loci. Due to the nature of upland cotton genome, mapping QTLs not only for fiber as in this research but for other agronomic traits has been difficult. This is because of the narrow genetic background, which resulted in low diversity of alleles with a significant role in fiber quality traits between two given varieties [30]. Therefore, only few QTLs could be mapped based on two parent crossing populations, which has been verified by previous reports [3,6,[31][32][33][34][35][36][37][38][39]. In this current study, a total of 110 QTLs were identified for 11 traits, but only 30 QTLs were consistent in at least two environments. The 30 consistent QTLs were located on 16 chromosomes; A02, A03, A05, A09, A10, A12, D01, D02, D03, D04, D05, D08, D10, D11, D12, and D13 with 2, 1, 2, 3, 2, 1, 1, 1, 4, 1, 2, 2, 2, 1, 4, and 1 QTL respectively. Of the 30 detected QTLs, 11 were located on A t sub-genome while the remaining 19 were located on the D t sub-genome. This finding is consistent with previous reports in which 58 QTLs were found on the A t sub-genome, whereas 107 QTLs were localized on the D t sub-genome [11]. Fifty-eight QTLs were located on the A t sub-genome (Chr01-Chr13), and 73 QTLs on the D t sub-genome [25]. These QTLs explained from 2.03 to 16.85% of phenotypic variation, with an average of 6.26% explained in all five fiber quality traits [40].
Most of the QTLs distributed in the cotton genome revealed the complexity of the cotton genome and arduousness of QTL mapping in cotton. Therefore, comparing our QTLs with other QTLs mapped from previous studies could be of great help in determining the reliability of the QTLs detected [41]. Up to now, 4268 QTLs from 140 publications of cotton have been documented in the collected Cotton QTL Database (available online: http://www2.cottonqtldb.org:8081/index). In this study, the GBS-SNP markers are unique and thus lack common identity with the SSR-based markers. However, five QTL clusters in this investigation were found to have a common bearing to those documented by Said et al. [42], which have been known as one of the strongest references in QTL mapping in recent years. The five QTL clusters were: cluster A07 was identical to c7-cluster-Gh × Gb-4:55-79 cM; cluster A08 had an approximate position of 4.81-110.81 cM, which was similar to c8-cluster-Gh-2:21-31 cM; cluster D01, had an approximate position of 2.21-139.31 cM, similar to c15-cluster-Gh-3:49-68 cM; cluster D02, had an approximate position of 0.01-206.11 cM, similar to c14-cluster-Gh-2:76-91 cM and lastly cluster D08, had an approximate position of 100.71-208.61 cM, nearby to c24-cluster-Gh-2:41-62 cM. The high correlation of the QTLs detected in this study to the previous finding, provides the opportunity for the utilization of these QTLs in MAS to improve the fiber quality of Upland cotton.
Gene Ontology enrichment analysis revealed five genes with very high expression and were linked to three fiber quality traits, FM, FS, and FY. Interestingly, the five genes took their alleles from the parental line known for superior fiber quality CCRI35. This result supported our study. Our findings provide an opportunity in the improvement for fiber quality especially fiber color (FY: fiber yellowness). Cotton fiber development occurs through various stages, namely fiber initiation, elongation, secondary cell wall formation and maturation [43]. Cotton fiber development is controlled by a multi-complex of genes interactions rather than a single gene effect [44,45]. GhD12G0969 was mainly found to have a functional role in phosphorylation; phosphorylation is a process mediated by protein kinases to activate critical cellular pathways such as metabolism, cell division and cell differentiation during initiation stages in cotton fiber development [46]. In addition, Gh_D12G0435 was found to be involved in kinase activity; protein kinase activity plays an important role in signal transduction through the phosphorylation process during cotton fiber development [47]. Therefore, the five highly up regulated genes could possibly be the key genes with major functional roles in fiber development and in turn superior quality as evident in the CCRI35, female parent.

Plant Materials, Growth Conditions and Trait Data Collection
The accessions used in this research were, Nan Dan Ba Di Da Hua in Chinese annotation, but for simplicity, we abbreviated the name as (NH), the male parent; it has moderate fiber quality traits but high yielding in fiber [48,49]. The female parent was Zhong35, also with the Chinese name, was then abbreviated as CCRI35; it is known for high fiber quality traits but with moderate yield [9]. The parental lines and 277 F 2:3 population were evaluated for fiber quality traits and yield components in Anyang research station (36 • 100 N, 114 • 350 E), Henan province, Yellow River. The field experiment was carried out during summer periods in three consecutive years, 2014, 2015 and 2016. The experimental layout adopted, was complete randomized block design (CRBD) with three replicates. The plot sizes were 5 m long with row spacing of 0.75 m. Fiber quality and yield component traits were collected following the laid down scheme as described by [41]. Fully opened bolls in each sampled plant were collected within the middle region of the plant, 25 bolls were collected from each line for fiber quality and yielded component determination. The balls were ginned for the determination of lint percentage (LP), fiber length (FL), fiber uniformity (FU), fiber micronaire (FM), fiber elongation (FE), fiber strength (FS), fiber reflectance (FR), fiber yellowness [50], spanning consistency index (SCI) and mature index (MI) by the HVI 900 fiber testing system, which was done in our cotton fiber quality testing unit, cotton research institute, Anyang, China. The test conditions were set with temperature at 20 • C and relative humidity of 65%.

DNA Extraction, Quantification and Quality Determination
Fresh leaf samples were obtained from each line, together with the two parents and immediately frozen in liquid nitrogen then stored under −80 • C before DNA extraction. DNA of the F 2:3 populations of 277 individuals and 10 samples for individual parents was extracted by the CTAB method as described by Paterson et al. [51]. Each sample was then crushed separately in liquid nitrogen to fine powder, then immediately added to CTAB solution. In every 100 mg ground tissues, we added 500 µL of CTAB Buffer. The samples were then shaken for 15 min then centrifuged. The centrifuged mixture was then put in a water bath at 60 • C for 30 min. Then, samples were centrifuged for 5 min at 12,000 revolutions per minute (rpm) for 5 min. After centrifuging, the supernatant transferred to a new tube. Then, 5 µL of RNase solution was added to digest RNA and then incubated for 20 min at 32 • C. Equal amount in volume of chloroform/isoamyl alcohol (24:1) was added then shaken for 5 s before centrifuging the samples for 1 min to separate the phases. We pippeted the aqueous upper phase to a new tube; the method was then redone until the upper phase was clear. The upper clear phase was then pipetted into a new tube. DNA samples were later precipitated by adding 70% by volume of ice-cold isopropanol and incubated for 15 min at −20 • C. The condensed DNA samples were then centrifuged at 12,000 rpm for 10 min. The supernatant was then decanted and subsequently washed with 500 µL ice cold 70% ethanol twice then absolute alcohol. DNAs were later dissolved in 20 µL TE buffer (10 mM Tris, pH 8, 1 mM EDTA) [52]. The degradation and contamination of DNA was checked through 1% agarose gels. The purity of DNA was determined by using a Nano Photometer ® spectrophotometer (IMPLEN, Westlake Village, CA, USA). The ratio of absorbance at 260 and 280 nm was used to assess the purity of DNA. The DNA samples with the ratio of~1.8 were then qualified as pure [53]. The concentration of DNA was done by using Qubit ® DNA Assay Kit in Qubit ® 2.0 Fluorimeter (Life Technologies, Carlsbad, CA, USA). The Qubit ® dsDNA HS (High Sensitivity) Assay Kits make DNA quantitation easy and accurate. The kits contain concentrated assay reagent, dilution buffer, and prediluted DNA standards. The reagents were mixed with the buffer solution, and then added 1-20 µL of each DNA samples.

GBS Library Preparation, Sequencing and SNP Genotyping
GBS is a low cost and an efficient method of large-scale genotyping, which is based on high-throughput sequencing but with a reduced-representation library (RRL). The following were step by step processes in GBS technique; firstly, we carried out a GBS pre-design experiment to test the accuracy of the GBS protocol and quality of the output data. The enzymes and sizes of restriction fragments were examined by using training data. Three basic criteria were followed: (a) the suitability of the number of tags to the project needs; (b) the homogenous distribution of the enzymatic tags throughout the examined sequences; (c) elimination of redundant tags (repeated tags must be avoided). This was to ensure the effectiveness and accuracy of data obtained from GBS reads; 50 bp was the selection criterion to ensure sequence depth uniformity.
Secondly, we constructed the GBS library using the pre-designed scheme. The genomic DNA of the F 2:3 population were incubated at 37 • C with MseI Restriction Enzyme obtained from New England BioLabs (Ipswich, MA, USA), NEB, T4 DNA ligase and ATP. MseI Y adapter N containing barcode. Restriction-ligation reactions were activated at 65 • C, followed by digestion for additional restriction enzyme NlaIII at a temperature of 37 • C. The samples were then purified by using Agencourt AMPure XP (Beckman, Brea, CA, USA). Then carried out polymerization chain reaction (PCR) using the purified samples, Phusion Master Mix universal primer and index primer were used to add index, complete i5 and i7 sequence. The Agencourt AMPure XP (Beckman) was used to purify the PCR products, which were pooled then ran through 2% agarose gels. Fragments with 375-400 bp (with indexes and adaptors) in size were obtained by using a Gel Extraction Kit (Qiagen, Hilden, Germany). The isolated fragment products were then purified using Agencourt AMPure XP (Beckman), and finally diluted for sequencing.
GBS analysis was strictly carried out as outlined by Elshire et al. (2011) [54]; integrating 3 of 96-well plates across 288 barcodes for library preparation and sequencing. For SNP calling, the raw sequence data for the 277 F 2:3 population together with the F 1 generation was processed through the TASSEL 3.0 Genotype By Sequencing (GBS) pipeline [55] using the Gossypium_hirsutum_v1.1.fa as the reference genome [56] which was obtained from Cotton research institute (available online: http://mascotton.njau.edu.cn/info/1054/1118.htm), for alignment and the Burrows-Wheeler Aligner (BWA) mem [57] with default parameters. The output consisted of variant call format (VCF) file version 4.1 [58] including single nucleotide polymorphisms (SNPs) present in at least 40% of the progeny and with a minor allele frequency (MAF) 0.1. Subsequently, the data in variant call format (VCF) was filtered using VCF tools version.1.12a [58] and TASSEL [59] versions 3.0 and 4.0. A total of 93,384 single nucleotide polymorphisms (SNPs) were identified in 277 F 2:3 population by TASSEL 3.0, then a custom filtering process was applied for alignment. The filtering was based on maintaining sites with a minimum read depth of 6% and 75% completeness by site across progeny and by progeny across sites. Results were obtained as a TASSEL hapmap file.
Finally, using a custom perl script marker heterozygous in the F 1 generations and with a co-dominant segregation ratio of 1:2:1 among the F 2:3 population were identified using a chi-squared (χ 2 ) goodness-of-fit test at α ≤ 0.01. These were reconverted and imported in JoinMap ® 4.1 for linkage group generation. A total of 26 LGs were obtained, each linkage group was assigned to its corresponding chromosome by using BLASTN-search (available online: https://blast.ncbi.nlm.nih. gov/Blast.cgi), for the marker sequence.

Data Analysis and Linkage Map Construction
Analysis of variance (ANOVA) was performed by using field phenotype data of the three consecutive seasons 2014, 2015, 2016, and the combine analysis (cb). A mixed procedure was used; the genotypes and the environments were fixed as factors in order to detect the heritability [60]. Post hoc test (Turkey's) to compare means was done [60]. The broad-sense heritability percentage, H b (%), was calculated for each trait using the formula described by [61]. H = σ 2 G/σ 2 G + (σ 2 e/r) With σ 2 G is the genotypic variance; σ 2 e: phenotypic variance and r: replication.
Most of the data were analyzed using R software version 3.4.2 (R Foundation for Statistical Computing, Vienna, Austria) [21]. Markers were ordered, rippled, and re-ordered according to pairwise recombination fractions, LOD scores and linkage group length [62]. Linkage group analyses were conducted using Join Map 4.0 [23] with a recombination frequency of 0.40 and a logarithm of odds (LOD) score of 2.5 for the F 2:3 population. The Kosambi mapping function was employed in the conversion of the recombination frequencies to map distances. Each data point represented the mean of three replications. Fiber quality and yield-related traits such as boll weight (BW), lint percentage (LP), fiber length (FL), fiber uniformity (FU), fiber micronaire (FM), fiber strength (FS), fiber elongation (FE), fiber reflectance (FR), fiber yellowness (FY), spinning consistency index (SCI) and mature index (MI) were used to conduct QTL analysis. The quantitative trait loci (QTLs) were detected using composite interval mapping (CIM) [63] by WinQTL Cartographer version 2.5 [20]. In the CIM mapping method, version 6, forward-reverse regression method with 1 cM walking speed, a probability into and out of the model of p = 0.01 and window size set at 10 cM. The LOD [64] threshold value was determined by 1000 permutation tests for all traits and was used to declare the significant QTLs with a significance level of p = 0.05. In addition, QTLs with LOD threshold of 2.5 in more than one environment were considered as common QTLs based on the explanation by Lander and Kruglyak [65].
QTL nomenclature was done based on the description by Liang et al. [2]. The proportion of observed phenotypic variance explained by each QTL was estimated by the coefficient of determination R 2 (%) as a percentage. The additive and dominance effects from QTL cartographer results were used to calculate genetic effects (|d/a|). The results were used to classify the QTL as additive effect (Ae) (0-0.20), partial dominant (PD) (0.21-0.80), dominant effect (De) (0.81-1.20) and over dominant (OD) >1.20 according to Stuber et al. (1987) [66]. The graphic presentation of the linkage group and QTLs marked were created by R software version 3.4.2 [21] and Map Chart 2.2 [67], respectively

Gene Mining and Expression Analysis
In this study, only segments of linkage groups associated with significantly detected QTLs were presented. The detected consistent QTLs were used to identify the crucial candidate genes for fiber yield and fiber quality-related traits. The genes identified were searched through the available resources [68] (available online: https://cottonfgd.org). The physical position of the GBS-SNP markers flanking major QTLs for fiber quality and yield-related traits were used to find the gene located in each QTL region. The function of the identified genes was determined through gene annotation. Furthermore, the expression profile of the candidate genes was analyzed by mapping it in the "TM-1"_RNA-seq transcriptome data of cotton (available online: https://cottonfgd.org). The expression values for each gene mined were used to generate the heat map using R-software script [21].

The Gene Ontology Enrichment Analysis BaseD on QTL Clusters
In order to determine the functions of the identified genes, we carried out gene ontology enrichment analysis through online software, Blast2GO (available online: https://www.blast2go. com/). Gene ontology describes the genes in three functional annotations, namely cellular component (CC, biological process (BP) and molecular functions (MF); three functions provide information on the possible roles played by the genes in the plant; of interest were the genes responsible or fiber qualities and yield-related traits. The choice of genes used for GO analysis was based on the genes mined from the two clusters, cluster 1 (D03) and 2 (D12), which had high percentage of phenotypic variation (PV) and heritability (Hb).

Conclusions
A genetic linkage map comprising of 5178 SNP markers, obtained by the GBS genotyping method, was generated using a 277 F 2:3 population derived from an intra-specific cross of two tetraploid upland cotton. The map constructed in this study is the highest dense genetic map ever developed from an intra-specific population of the tetraploid upland cotton. The average distance of 0.92 cM was observed between adjacent markers. A total of 110 QTLs were obtained for 11 traits, however, only 30 QTLs were consistent in more than one environment. In addition, we identified 1709 genes that were found in the two main hot spot regions, named as cluster 1 and 2, with four QTLs in each. Out of the 1709 genes, 153 genes exhibited higher expression levels while the rest showed lower expression levels in all stages of fiber development. We further identified five key genes: Gh_D03G0889, Gh_D12G0093, Gh_D12G0969, Gh_D12G0410, and Gh_D12G0435 to be the candidate genes involved in fiber development. This research provides the very first foundation in which future molecular work can be done, such as cloning of the identified genes and/or saturation of the genes to boost the current elite cultivated cotton cultivars.