Application of a High-Density Single Nucleotide Polymorphism Genetic Map in Mapping Quantitative Trait Loci of Early-Maturing Traits in Upland Cotton

: (1) Background: Mapping QTLs for early-maturing traits is necessary for the development of early-maturing variety breeding. (2) Methods: In this research, a high-density genetic map (HDGM) was constructed using an F 2 population with 100 individuals and single nucleotide polymorphism markers (SNPs) developed using the genotyping-by-sequencing (GBS) method. (3) Results: The HDGM, which covered a total distance of 3167.14 cM, harbored 5454 SNPs with an average marker interval of 0.58 cM. In total, 18 QTLs for four early-maturing characters were detected and explained 11.6–46.4% of phenotypic variation (PV). Two QTLs of the whole growing period (WGP) and height of the node of the ﬁrst fruiting branch (HNFFB) were identiﬁed as stable QTLs. In total, 125 candidate genes were identiﬁed in the conﬁdence intervals of these stable QTLs. Presumably, Gh_D03G0857 may play an important role in regulating earliness. (4) Conclusions: This research will provide new information about ﬁne mapping of QTLs for earliness traits, molecular marker assisted selection (MAS) of earliness traits, and pyramiding breeding as well.


Introduction
Allotetraploid upland cotton (AD) 1 (Gossypium hirsutum L., 2n = 4x = 52) with a genome size of 2.5 Gb is a very important cash crop in the world, providing most of the natural fiber for the textile industry [1,2].With the growth of the industrial economy, and national planning and construction, the area of arable land available for agriculture is decreasing and the competition between grain and cotton for land is becoming increasingly prominent [3].The area of arable land used to plant cotton shows a downward trend, and cotton production is facing a very serious situation in China [4].Short-season cotton is one type of upland cotton, and its obvious characteristics are a short growth period, a fast growth and development process, and early maturation [5].These characteristics are not only beneficial for the innovation of the cultivation system and the increase of the multiple crop index, but are alsoeffective in reducing or avoiding the loss caused by the natural environment of low average temperature or short frost-free season.Therefore, earliness is one of the main breeding objectives and the key to improving cotton breeding.
The study on the genetic rule of characters related to earliness can provide a theoretical basis for breeding and improving early-maturing characters in upland cotton.The research results reported by Zhao et al. 1974 showed that broad-sense heritability of the time of a FT, YPBF, FBP, NFFB, and HNNFB were mapped [32][33][34][35].In the results mentioned above, higher density molecular markers in the chromosomes in cotton were used to correlate QTLs of early traits, which will further promote MAS breeding, fine mapping of QTLs determining early maturity, cloning, and functional studies of genes related to early traits in upland cotton.
Here, an F 2 family, containing 100 individuals, and its F 2: 3 family were developed from two upland cotton lines, vsp and TM-1.We attempted to use the F 2 population to construct an HDGM to identify QTLs for precocious traits, and possibly candidate genes for WGP and HNNFB traits.Finally, 100 individuals in the F 2 population were genotyped using 5454 SNP markers on the HDGM.This map was utilized to identify QTLs for FT, WGP, NFFB, and HNFFB across two environments.

Plant Materials and Mapping Populations
The upland cotton line TM-1 (genetic standard line) and virescent mutant by space mutation (vsp) were selected as crossing parents.The vsp was obtained by space mutagenesis of Zhongmian Suo 58 seeds (CCRI58) [36,37].CCRI58, an early-maturing cotton variety, was developed from the cross (SGKZhong27 × 92-047) [38].In 2015, an intraspecific F 2 population including 100 plants was developed by crossing TM-1 as a female with vsp as a male parent and self-producing F 1 plants in the winter field in Sanya, Hainan Province, China [39].In 2016, F 2:3 was developed by selfing the 100 plants of the F 2 population.

Field Experiments and Phenotypic Observations
In 2015, the three cotton lines (TM-1, CCRI58, vsp) and the F 2 population were planted in the experimental field in Anyang, China (36 • 08 N, 114 • 48 E).In field experiments, the three cotton lines TM-1, CCRI58, and vsp were planted sequentially in adjacent plots.At least 20 plants were planted in each of the three cotton lines and at least 10 individuals were selected to detect the FT (days from sowing to first flowering), WGP (days from sowing to first boll opening), and NFFB.In 2016, the F 2:3 family and the three cotton lines were cultivated under the same environmental conditions in Anyang, China (36 • 08 N, 114 • 48 E).Field experiments followed a complete block design with three biological replications.FT (days from sowing to first flowering), WGP (days from sowing to first boll opening), NFFB, and HNFFB were investigated.At least five plants in the three cotton lines for each replicate were randomly selected and used to investigate the four traits; the average was used in one replicate.In the F 2 family, a single plant was investigated.In the F 2:3 family, three independent biological replicates were carried out and each replicate included at least 10 plants.

DNA Extraction
The modified CTAB method was used to isolate genomic DNA from young leaf tissues of the two parents and the 100 individuals in the F 2 population [40,41], and the method described by Dong et al. 2018 [42] was utilized to detect the concentration, purity, contamination, and degradation of extracted DNA.

GBS Library Construction and Illumina Sequencing
The GBS technique was used for the library construction of the two parents (vsp and TM-1) and 100 F 2 progeny.After the distribution and number of restriction enzyme cutting sites were evaluated, various combinations of endonucleases were used to cut the reference genome of Gossypium hirsutum L. [1]; the genomic DNA of the samples was digested with MseI, NlaIII, and EcoRI endonucleases (New England Biolabs, Ipswich, MA, USA).The digested genomic DNA was then amplified in multiplex and 375-400 bp fragments were selected to construct the GBS library.Paired-end sequencing (PE) 150 was performed using the Illumina HiSeq TM2500 platform (Illumina, San Diego, CA, USA).The specific steps of the GBS refer to the method reported by Zhou et al. 2016 [43] and Zhang et al. 2016 [44].

SNP Development
To obtain clean data, raw paired reads with adapter contamination, low-quality bases, and unmeasured bases were removed in accordance with quality control (QC) standards.QC procedures were as follows: (1) removing reads with adapters; (2) removing reads with >50% bases having low quality (Q ≤ 5); and (3) removing reads with ≥10% unidentified nucleotides (N).By using BWA software (version: 0.7.12), clean reads of all samples were aligned to the reference genome [1,45].Using SAM Tools software (version: 1.2), alignment files were converted to BAM files and alignment results were sorted [46].The Perl script was used for the statistics of alignment rate and coverage.The reads that were aligned to a unique location on the reference genome were selected for subsequent analysis by filtering the results of the BWA alignment.SNP calling was carried out for each sample using GATK software (version: 3.4) [47] to detect SNPs between the TM-1 line and the vsp line, and to genotype 100 individuals from the F 2 population, and the base support number of SNP was no less than 3 and no more than 1000.The segregation patterns of the SNPs included nn × np, lm × ll, hk × hk, ef × eg, cc × ab, ab × cd, ab × cc, and aa × bb.

Genetic Map Construction
The SNPs with aa × bb segregation pattern were used to construct the genetic map.These genetic markers were further screened by the following standards: (1) remove base types that were not present in the parents as a result of the progeny genotyping; (2) remove SNPs with integrity < 75% of all offspring; and (3) remove SNP markers with segregation distortion (p < 0.05).The SNP markers were sorted and divided into each LG according to the chromosomes of the reference genome.JoinMap ® 4.0 and Perl SVG (https://metacpan.org/pod/SVG) were used to build the linkage map [48].

QTL Mapping
Based on the linkage map and the phenotypic data, QTLs were detected using Win-QTLCart2.5 [49].The permutation test of the MapQTL software (version: 5.0) was used to determine the logarithm of odds (LOD) thresholds for QTLs.A LOD threshold of 2.5 was utilized to declare the presence of QTLs.Composite interval mapping (CIM) was used to detect any significant association between marker loci and phenotypic traits in the data sets [50,51].

Comparison of the Physical Positions of SNP Markers Linked to QTLs
The 200 bp DNA sequences flanking the SNP markers were selected from the reference genome [1,22] using BLAST (NCBI).The selected DNA sequences were mapped on the reference genome in the upland cotton [23], and the subject identification (ID) corresponding to the query ID was identified using the selection criteria with the highest percentage identity, the longest alignment length, the minimum e-value, and the highest bit score.

Candidate Gene Annotation
The SNPs that flank the confidence intervals, including QTLs detected in the F 2 and F 2:3 populations, were used to identify candidate genes.According to the physical location of these markers, the identifications of the candidate genes were obtained and the coding sequences (CDS) of the candidate genes were extracted from the reference genome [1].A gene ontology (GO) analysis was carried out to categorize candidate genes [52].The hypergeometric test was carried out and the Benjamini and Hochberg correction was performed for the FDR correction; the terms were considered enriched terms according to the corrected p-value < 0.05.

Phenotypic Characteristics of Traits Related to
Early Maturity in the Two Parents, the F 2 Population, and the F 2:3 Population On 25 May 2015 and on 19 May 2016, the three cotton lines were grown in the experimental fields in Anyang, China and the traits FT, WGP, NFFB, and HNFFB in TM-1, CCRI58, and vsp were analyzed (Supplementary File S1).The results of the phenotypic analysis showed that FT, WGP, NFFB, and HNFFB did not have obvious differences between CCRI58 and vsp, but significant differences were observed between TM-1 and CCRI58 or between TM-1 and vsp (Table 1).Therefore, TM-1 and vsp can be used to construct genetic populations that were used to map QTLs of early-maturity traits.  1 according to the mean ± SD form.( 2) In 2016, three biological replicates were obtained from the TM-1, the CCRI58, and the vsp, respectively; each biological replicate included at least 5 individuals and the mean data from these individuals was used for analysis.The mean and standard deviation (SD) of the three biological replicates are listed in Table 1 according to the mean ± SD form.(3) "--" in Table 1 denotes that the data was lost.
Significant variation in FT, WGP, NFFB, and HNNFB traits was revealed in the F 2:3 and F 2 populations (Table 2).The data distributions of four early-maturity traits in the F 2 population and the F 2:3 population are approximately symmetrical and the data distribution curve is high in the middle and low at both ends (Table 2, Figure 1).The average value of the four early-maturity traits fell between the values of the TM-1 and vsp lines; the absolute values of kurtosis and skewness for the traits of FT, WGP, and HNFFB in the F 2 population, and the absolute values of kurtosis and skewness for the four traits in the F 2:3 population were less than 1; the skewness value for the NFFB trait was less than 1 and the absolute kurtosis value for the NFFB was 1.21 in the F 2 population (Tables 1 and 2, Figure 1).The results of the hypothesis tests for the kurtosis coefficient and the skewness coefficient showed that the data distributions of FT and HNFFB in the population F 2 and FT, WGP and NFFB in the population F 2:3 corresponded to the normal distribution (Table 2).The results of a K-S test showed that the data distribution of HNFFB in the F 2 population and in the F 2:3 population corresponded to the normal distribution (Table 2).The results of a K-S test showed that the distribution of the FT, WGP, and NFFB data in the F 2 population and in the F 2:3 population was abnormal (Table 2).The results of hypothesis tests for the kurtosis coefficient showed that the WGP and NFFB data in the F 2 population corresponded to the normal distribution, but the hypothesis tests for the skewness coefficient did not correspond to the normal distribution (Table 2).The distribution of WGP and NFFB data in the F 2 population is roughly symmetrical (Table 2, Figure 1).Pearson's correlation coefficients of the four traits were analyzed in the F 2 population and in the F 2:3 population (Table 3).Except for the correlation coefficient between the WGP in the F 2 population and the HNNFB in the F 2:3 population, other correlation coefficients were positively correlated with each other (Table 3).The above-mentioned results showed that the data can be used for QTL mapping.Note: (1) In 2015, the four traits were investigated from 100 individuals in the F 2 population; the mean and standard deviation (SD) listed in Table 2 were calculated according to the investigation.( 2) In 2016, the four traits of the F 2:3 families were investigated and three biological replicates were carried out; the mean of the three replicates was used for analysis.The mean and standard deviation (SD) of the F 2:3 families were listed in Table 2 according to the mean ± SD form.

Genotyping by Sequencing (GBS)
In total, 152.20 Gb of clean data containing 922.02 Mb of clean reads from 100 individuals and two parents were obtained using GBS and the Illumina HiSeq TM2500 platform; each of the pair reads was approximately 150 bp (Table S1).Among these clean bases, on average 94.94% of them were of high quality with Q20 and 87.77% of them were of high quality with Q30; effective rate and error rate of the bases were 99.58% and 0.04%, respectively; guanine cytosine content of the bases was 36.58%(Table S1).The sequencing depth corresponded to 9.64-fold in the 100 progeny of the F2 population, 30.99-fold in the vsp cotton line, and 30.91-fold in the TM-1 cotton line (Table S1).The raw sequence data reported in this paper have been deposited in the Genome Sequence Archive (Genomics, Proteomics & Bioinformatics 2021) [53] in the National Genomics Data Center (Nucleic Acids Res 2022) [54], China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences (GSA: CRA012341) and are publicly accessible at https://ngdc.cncb.ac.cn/gsa accessed on 26 October 2023.
Clean reads were aligned with the reference genome to call SNP [1].The mean mapping rate of clean reads was 99.83% in the two parents and 99.76% in the F2 progenies; the coverage (1×) and coverage (4×) were averagely 19.10% and 8.63% in the two parents, respectively, and were averagely 4.82% and 2.47% in the progenies, respectively (Table S1).The comparison results were normal, which could be used for subsequent SNP detection.In total, 94,758 polymorphic SNPs were detected between the two parents.The SNPs con-

Genotyping by Sequencing (GBS)
In total, 152.20 Gb of clean data containing 922.02 Mb of clean reads from 100 individuals and two parents were obtained using GBS and the Illumina HiSeq TM2500 platform; each of the pair reads was approximately 150 bp (Table S1).Among these clean bases, on average 94.94% of them were of high quality with Q20 and 87.77% of them were of high quality with Q30; effective rate and error rate of the bases were 99.58% and 0.04%, respectively; guanine cytosine content of the bases was 36.58%(Table S1).The sequencing depth corresponded to 9.64-fold in the 100 progeny of the F 2 population, 30.99-fold in the vsp cotton line, and 30.91-fold in the TM-1 cotton line (Table S1).The raw sequence data reported in this paper have been deposited in the Genome Sequence Archive (Genomics, Proteomics & Bioinformatics 2021) [53] in the National Genomics Data Center (Nucleic Acids Res 2022) [54], China National Center for Bioinformation/Beijing Institute of Genomics, Chinese Academy of Sciences (GSA: CRA012341) and are publicly accessible at https://ngdc.cncb.ac.cn/gsa accessed on 26 October 2023.
Clean reads were aligned with the reference genome to call SNP [1].The mean mapping rate of clean reads was 99.83% in the two parents and 99.76% in the F 2 progenies; the coverage (1×) and coverage (4×) were averagely 19.10% and 8.63% in the two parents, respectively, and were averagely 4.82% and 2.47% in the progenies, respectively (Table S1).The comparison results were normal, which could be used for subsequent SNP detection.In total, 94,758 polymorphic SNPs were detected between the two parents.The SNPs consisted mainly of aa × bb, nn × np, lm × ll, and hk × hk (Table 4).Among the four SNP types, the number of the aa × bb SNPs that were selected to construct the genetic map was 28,684 (Table 4).

Construction of SNP-Based High Density Genetic Map
After removing abnormal base types and removing SNP markers with integrity < 75% of all offspring, 6724 SNP markers were obtained from the 28,684 SNPs detected.After removing SNPs with segregation distortion (p value < 0.05), 6072 SNPs were obtained.To eliminate the large gap of LG, 494 SNPs with segregation distortion (0.001 < p value < 0.05) and with integrity >65% and <75% of all offspring were kept.In total, 6566 SNPs could be used to build the genetic map.Then, JoinMap ® 4.0 was used to sort the SNP markers, count GDs between markers, and remove SNP markers not linked to the LGs, and finally 5454 SNP markers could be utilized to construct the genetic map (Supplementary File S2).The overall length of the map with 26 LGs was 3167.14 cM, 3233 SNPs in the At subgenome and 2221 SNPs in the Dt subgenome (Figure 2, Table 5).The average GD between 5454 SNP markers was approximately 0.58 cM (Figure 2, Table 5).The GD of each LG ranged from 26.97 cM (LG23) to 265.47 cM (LG08), and the number of SNPs of each LG ranged from 31 SNPs on chromosome D05 to 667 SNPs on chromosome A13, with a mean of 210 SNP markers per LG (Table 5).Collinearity analysis between genetic and physical maps indicated that the linkages (LG03, LG04, LG05, LG06, LG08, LG09, LG10, LG11, LG14, LG16, LG17, LG24, and LG25) had good collinearity with the reference genome (A03, A04, A05, A06, A08, A09, A10, A11, D01, D03, D04, D11, and D12) and that the other linkage groups had partial collinearity with the reference genome in upland cotton (Figure 3, Supplementary File S2) [1]. LG23

Candidate Gene Annotation
In total, 125 genes were identified within the CIs of the stable QTLs.The overlap between the CIs of qWGP-D03-2 and qWGP-D03-3 harbored 22 candidate genes; the overlap between the CI of qHNFFB-D03-3 and the CI of qHNFFB-D03-4 harbored 103 candidate genes (Figure S1).Of all candidate genes, 121 genes had annotation information and among them 114 genes had annotation information in GO (Table S2) [52].The results of the GO analysis showed 73 genes identified in the category of cellular components (CC), 46 genes identified in the category of molecular functions (MF), and 54 genes identified in the category of biological process (BP); some of the genes had more than one function and could be divided into two or three baskets of functions.According to p-value < 0.05, 26 genes were significantly related to CC; 16 genes were significantly related to MF; and 27 genes were significantly related to BP.According to the corrected p-value < 0.05, four genes (Gh_D03G0841, Gh_D03G0842, Gh_D03G0857, and Gh_D03G0887) were significantly related to CC, which was involved in the R2TP complex, Swr1 complex, Smc5-Smc6 complex, Ino80 complex, and the NuA4 histone acetyltransferase complex; two genes (Gh_D03G0857 and Gh_D03G0887) were significantly related to MF, which was involved in DNA helicase activity; four genes (Gh_D03G0841, Gh_D03G0842, Gh_D03G0857, and Gh_D03G0887) were significantly related to BP, which was involved in positive regulation of response to stimulus of DNA damage, box C/D snoRNP assembly, histone acetylation, and chromatin remodeling (Table S2, Figure 5).Note: R 2 represents the proportion in which QTL explains the phenotypic variation.

Candidate Gene Annotation
In total, 125 genes were identified within the CIs of the stable QTLs.The overlap between the CIs of qWGP-D03-2 and qWGP-D03-3 harbored 22 candidate genes; the overlap between the CI of qHNFFB-D03-3 and the CI of qHNFFB-D03-4 harbored 103 candidate genes (Figure S1).Of all candidate genes, 121 genes had annotation information and among them 114 genes had annotation information in GO (Table S2) [52].The results of the GO analysis showed 73 genes identified in the category of cellular components (CC), 46 genes identified in the category of molecular functions (MF), and 54 genes identified in the category of biological process (BP); some of the genes had more than one function and could be divided into two or three baskets of functions.According to p-value < 0.05, 26 genes were significantly related to CC; 16 genes were significantly related to MF; and 27 genes were significantly related to BP.According to the corrected p-value < 0.05, four genes (Gh_D03G0841, Gh_D03G0842, Gh_D03G0857, and Gh_D03G0887) were significantly related to CC, which was involved in the R2TP complex, Swr1 complex, Smc5-Smc6 complex, Ino80 complex, and the NuA4 histone acetyltransferase complex; two genes (Gh_D03G0857 and Gh_D03G0887) were significantly related to MF, which was involved in DNA helicase activity; four genes (Gh_D03G0841, Gh_D03G0842, Gh_D03G0857, and Gh_D03G0887) were significantly related to BP, which was involved in positive regulation of response to stimulus of DNA damage, box C/D snoRNP assembly, histone acetylation, and chromatin remodeling (Table S2, Figure 5).

Discussion
For mapping QTLs of quantitative characters, parental selection is critical.In this research, the two upland cotton lines (TM-1, vsp) were selected as parents.The vsp is an early-maturing cotton line with a growth period of about 111 days and the growth period of the TM-1 was about 125 days [55].Wang et al. 2022 reported that the growth period of TM-1 was 135 days [56].No obvious differences were observed in the appearance traits (FT, NFFB, and WGP) between the CCRI58 line and the vsp line, and obvious differences were observed in the three traits between TM-1 and CCRI58, or between the TM-1 and the vsp [55].The sowing to flowering, the flowering to boll opening, the PH, the boll weight and the location of the first branch all showed significant differences between the cotton line TM-1 and the cotton line vsp [55].The phenotype value of the NFFB, the FT, and the WGP of the F 2 population developed by crossing the two parents TM-1 and vsp exhibited a wide range of variations [41].In this research, the four traits had significant differences between the TM-1 and vsp cotton lines (Table 1), and had no obvious differences between vsp and CCRI58.So, the selection of parents should be suitable.
The data distribution of quantitative characters is related to genes that control quantitative traits.For traits controlled by a few major genes or by both major genes and minor genes, the distribution of trait data does not necessarily conform to the normal distribution.In this study, based on the K-S test, only the HNFFB data distribution conformed to the normal distribution (Table 2).The reason may be that the inheritance of the traits FT, WGP, and NFFB is determined by major genes or minor polygenes [11,12].According to the data distribution in Figure 1, NFFB traits are more likely to be controlled by a few major genes, and FT and WGP traits may be determined by both major genes and minor polygenes, and the HNFFB trait is probably determined by both a few major genes and many minor genes or by minor polygenes.In our research, the kurtosis of the data distribution of FT, WGP, and NFFB traits conforms to a normal distribution and data symmetry (Table 2, Figure 1).The data in this investigation can be used for QTL mapping [57].
To successfully map QTLs for precocious characters, it is necessary to construct genetic maps with high-density SNP markers.The assemblies of multiple cotton genomes and highthroughput sequencing can meet the need for the development of SNPs in large quantities.Compared with resequencing technology, simplified genome sequencing is a cost-effective method for the massive development of SNPs.The GBS-seq strategy, a simplified genome sequencing and an effective method based on second generation sequencing technology for high-density SNP development, has been successfully applied in a variety of species and was used in this research.Due to the genome specificity, the selection of suitable restriction endonuclease(s) is the key to making the simplified genome represent the whole genome.In this research, electron enzyme digestion was first used to evaluate the combinations of restriction endonucleases to obtain an even distribution of the cleavage sites in the reference genome [1], and eventually the combinations of MseI, NlaIII, and EcoRI were used to digest the upland cotton genome and SNP markers showed a relatively even distribution in the reference genome (Figure 6).Furthermore, the fragments digested by MseI, NlaIII, and EcoRI were connected to the P1 and P2 adapters, and the quality of sequencing data was rigorously evaluated (Table S1).These operations ensured simplified genome representation and sequencing data quality.Eventually, 94,758 SNP markers were developed and 28,684 SNPs with aa × bb type were selected to build a genetic map.
In cotton, most of the genetic maps without SNP markers had a low polymorphic rate and contained an insufficient number of molecular markers; this caused a low resolution and a low coverage of the reference genome.In most cases, possibly due to insufficient numbers and uneven distribution of molecular markers, the genetic maps have large and wide gaps, and the chromosome was divided into two or more LGs by the gap(s) [13,14,19,20].In this research, using the SNPs mentioned above an HDGM with 5454 SNP markers was constructed, with an average GD of 0.58 cM.The resolution and coverage of the genetic map could be efficiently improved by using SNP markers.Although the number and distribution of SNP markers in the linked groups are uneven (Table 5, Figure 2), probably due to the GBS technique, the use of the aa × bb type of SNP markers with non-uniform distribution, gaps in the reference genome, and inhibition of exchange between non-sister chromatids [58], the genetic map did not contain gaps >20 cM, and the length of 98.70% of all gaps was less than 5 cM.Comparison of the position of the SNP markers between the physical map and the genetic map showed good collinearity, except for parts of several chromosomes according to the information in Figure 3, and the SNP marker positions in linkage groups and in physical maps are displayed in Supplementary File S2.In summary, HDGM with high-density SNPs could be used to map QTLs of early-maturing characters in this research.In cotton, most of the genetic maps without SNP markers had a low polymorphic rate and contained an insufficient number of molecular markers; this caused a low resolution and a low coverage of the reference genome.In most cases, possibly due to insufficient numbers and uneven distribution of molecular markers, the genetic maps have large and wide gaps, and the chromosome was divided into two or more LGs by the gap(s) [13,14,19,20].In this research, using the SNPs mentioned above an HDGM with 5454 SNP markers was constructed, with an average GD of 0.58 cM.The resolution and coverage of the genetic map could be efficiently improved by using SNP markers.Although the number and distribution of SNP markers in the linked groups are uneven (Table 5, Figure 2), probably due to the GBS technique, the use of the aa × bb type of SNP markers with nonuniform distribution, gaps in the reference genome, and inhibition of exchange between non-sister chromatids [58], the genetic map did not contain gaps >20 cM, and the length of 98.70% of all gaps was less than 5 cM.Comparison of the position of the SNP markers between the physical map and the genetic map showed good collinearity, except for parts of several chromosomes according to the information in Figure 3, and the SNP marker positions in linkage groups and in physical maps are displayed in Supplementary File S2.In summary, HDGM with high-density SNPs could be used to map QTLs of early-maturing characters in this research.
In the results, 8 QTLs detected in generation F2 and 10 QTLs detected in generation F2:3 were mapped on chromosomes A08, D03, and D05 (Table 6).Two possible reasons for the change in the number of QTLs detected between generations F2 and F2:3 are as follows: environmental differences occurred in different years, especially temperature differences, which possibly affected the distribution of the phenotypic data; observations of phenotypic data may cause the change; earliness traits of a single individual were investigated in the F2 population and earliness traits of individuals in three biological repeats were investigated in the F2:3 population.Due to precocious traits controlled by multiple genes, In the results, 8 QTLs detected in generation F 2 and 10 QTLs detected in generation F 2:3 were mapped on chromosomes A08, D03, and D05 (Table 6).Two possible reasons for the change in the number of QTLs detected between generations F 2 and F 2:3 are as follows: environmental differences occurred in different years, especially temperature differences, which possibly affected the distribution of the phenotypic data; observations of phenotypic data may cause the change; earliness traits of a single individual were investigated in the F 2 population and earliness traits of individuals in three biological repeats were investigated in the F 2:3 population.Due to precocious traits controlled by multiple genes, only qWGP-D03-2, qWGP-D03-3, qHNFFB-D03-3, and qHNFFB-D03-4 appeared in the two environments and were considered stable QTLs; the qWGP-D03-2 in the F 2:3 population corresponded to qWGP-D03-3 in the F 2 population; the qHNFFB-D03-3 in the F 2:3 population corresponded to qHNFFB-D03-4 in the F 2 population.The intervals of the other QTLs did not overlap each other, possibly due to the differences in the data distribution of the traits under different environments (Figure 1).Using the upland cotton genome [23] as a reference genome, the physical positions of the SNPs linked to the QTLs in this study were compared with the physical positions of the SNPs linked to the QTLs reported by Jia et al. 2016 [28] and Su et al. 2016 [33].Using the reference genome [1], SNPs linked to the four earliness traits in this research were compared with the SNPs reported by Huang et al. 2017 [32] and Li et al. 2021 [35].The comparison results showed that some markers linked to the QTLs in this study were closely related to the reported markers linked to the QTLs of the four traits, and others had no link or had no obvious link mainly due to differences in experimental materials, selections of the reference genome and high-throughput sequencing methods (Table S3).The markers linked to qWGP-D03-1, qFT_D03-1, qFT_D03-2 and qNFFB-D03-2 were tightly linked to the reported markers linked to the loci for WGP, FT and NFFB, and qWGP-D03-3, qWGP-D03-4, qFT-D03-3, qHNFFB_D03-1, qHNNFB_D03-3, qHNNFB_D03-4 and qNFFB-D03-1 were tightly linked to the reported markers linked to the loci for WGP, FT, HNFFB and NFFB (Table S3).Markers linked to qWGP-A08-1, qWGP-D03-2, qWGP-D05-5, qWGP-D05-6, qFT_D03-4, qNFFB-D03-2, qNFFB-D03-3 and qHNNFB_D03-2 had no or no obvious link to the reported markers (Table S3).In summary, we uncover two new stable QTLs for WGP and HNFFB, new SNP markers related to the four traits, and further identified the reported QTLs for WGP, FT, and NFFB.Our study further enriched the molecular markers linked to early-maturity traits.Combined with phenotypic data of precocious traits, cotton breeding efficiency could be improved through the selection of these linkage markers.
One hundred and twenty-five candidate genes were predicted and annotated in the overlap interval (Figure S1).Among them, Gh_D03G0841, Gh_D03G0842, Gh_D03G0857, and Gh_D03G0887 caught our attention because they were significantly related to CC, BP, and MF in the GO analysis (Table S2).The functions of the four genes were annotated in TAIR, and the results showed that the function of Gh_D03G0887 was unknown.The protein encoded by Gh_D03G0841 and Gh_D03G0842 is homologous to the NSE4 encoded by AT1G51130 in Arabidopsis thaliana.The NSE4 is a component of the Smc5-Smc6 complex that positively regulates the response to a stimulus of DNA damage [59].The protein encoded by Gh_D03G0857 is homologous to the nucleoside triphosphate hydrolase superfamily protein encoded by AT5G67630 in Arabidopsis thaliana.The nucleoside triphosphate hydrolase superfamily protein was involved in the biological processes of box C/D snoRNP assembly, chromatin remodeling, histone acetylation, and regulation of transcription by RNA polymerase II, and was a component of the Ino80 complex, NuA4 histone acetyltransferase complex, R2TP complex, and Swr1 complex, and its molecular function had ATP hydrolysis activity, ATP-dependent activity, acting on DNA, DNA helicase activity, and protein binding.Chromatin modifiers play an essential role in controlling gene expression and establishing epigenetic marks that can be inherited, and in controlling plant development, such as regulation of flowering time [60].In Arabidopsis, homologs of components of the SWR1 complex, a member of the SWI2/SNF2 superfamily, have been reported to regulate flowering and plant development [61].In this research, the homolog encoded by Gh_D03G0857 is a component of the SWR1 complex.So, we speculate that Gh_D03G0857 may play an important role in regulating earliness.

Conclusions
This research reported an HDGM constructed by using an F 2 population and SNPs developed by GBS-seq.The genetic map contained 5454 SNPs and a total GD of 3167.14 cM with an average marker interval of 0.58 cM.We also identified QTLs for NFFB, FT, WGP, and HNNFB in two environments using the F 2 family and the F 2:3 family, and identified candidate genes for the WGP and HNNFB traits.In total, 18 QTLs for the four precocious traits were identified and in the two environments 2 were considered stable QTLs.One hundred and twenty-five candidate genes for the WGP and HNNFB traits were identified in the CIs of stable QTLs.We speculated that Gh_D03G0857 may play an important role in regulating earliness.The research results will provide significant information for further research such as fine mapping QTLs for earliness traits, gene cloning, gene functional research, MAS, and pyramiding breeding as well.

( 3 ) 18 Figure 1 .
Figure 1.Frequency distribution and fit curve of four early-maturity traits in the F2 population and the F2:3 population.

Figure 1 .
Figure 1.Frequency distribution and fit curve of four early-maturity traits in the F 2 population and the F 2:3 population.

Figure 2 .
Figure 2. Construction of the genetic map using 5454 SNP markers.The vertical direction indicates the cumulative distance of the LGs (cM), and the horizontal direction indicates the LGs of the At subgenome (A01-A13) and the Dt subgenome (D01-D13).LGs are represented in green and SNP markers are represented in blue.

Figure 2 .
Figure 2. Construction of the genetic map using 5454 SNP markers.The vertical direction indicates the cumulative distance of the LGs (cM), and the horizontal direction indicates the LGs of the At subgenome (A01-A13) and the Dt subgenome (D01-D13).LGs are represented in green and SNP markers are represented in blue.

Figure 3 .
Figure 3. Collinearity analysis based on chromosome level.The x-axis indicates physical maps (A01-A13, D01-D13) and linkage maps (LG01-LG26), and the y-axis indicates the genetic length (cM).The LGs are represented in red and the physical maps are represented in blue.

Figure 3 .
Figure 3. Collinearity analysis based on chromosome level.The x-axis indicates physical maps (A01-A13, D01-D13) and linkage maps (LG01-LG26), and the y-axis indicates the genetic length (cM).The LGs are represented in red and the physical maps are represented in blue.

Figure 4 .
Figure 4. Mapping of QTLs controlling FT, WGP, NFFB, and HNFFB in the F2 and F2:3 populations.(A-J) The curves in the plots indicate the GD coordinate (X-axis) and LOD score (Y-axis) of identified QTLs in Chr D03, D05, and A08; red lines present the LOD threshold.The confidence level is 0.95.

Figure 4 .
Figure 4. Mapping of QTLs controlling FT, WGP, NFFB, and HNFFB in the F 2 and F 2:3 populations.(A-J) The curves in the plots indicate the GD coordinate (X-axis) and LOD score (Y-axis) of identified QTLs in Chr D03, D05, and A08; red lines present the LOD threshold.The confidence level is 0.95.

Figure 5 .
Figure 5. Annotation of candidate genes in the confidence intervals of stable QTLs through GO analysis.Statistical method: hypergeometric test/Fisher's exact test.Significance level: corrected p-value < 0.05.

Figure 5 .
Figure 5. Annotation of candidate genes in the confidence intervals of stable QTLs through GO analysis.Statistical method: hypergeometric test/Fisher's exact test.Significance level: corrected p-value < 0.05.

Figure 6 .
Figure 6.Genome-wide distribution of SNPs throughout the vsp and TM-1 genomes.For detecting SNPs between the TM-1 line and the vsp line, the base support number of SNP was not less than 2 and not more than 2000.

Figure 6 .
Figure 6.Genome-wide distribution of SNPs throughout the vsp and TM-1 genomes.For detecting SNPs between the TM-1 line and the vsp line, the base support number of SNP was not less than 2 and not more than 2000.

Table 1 .
Analysis of early-maturity traits in the three cotton lines (TM-1, vsp, and CCRI58).

Table 2 .
Analysis of early-maturity traits in the F 2 population and the F 2:3 population.

Table 3 .
Correlation analysis of early-maturity traits in the F 2 and F 2:3 populations.

Table 4 .
Statistics of marker type in the two parents.

Table 5 .
Summary of the high-density SNP map based on the F 2 population.

Table 6 .
QTLs for early maturity identified in the F2 and F2:3 populations.

Table 6 .
QTLs for early maturity identified in the F 2 and F 2:3 populations.