Combined Linkage Mapping and BSA to Identify QTL and Candidate Genes for Plant Height and the Number of Nodes on the Main Stem in Soybean

Soybean is one of the most important food and oil crops in the world. Plant height (PH) and the number of nodes on the main stem (NNMS) are quantitative traits closely related to soybean yield. In this study, we used 208 chromosome segment substitution lines (CSSL) populations constructed using “SN14” and “ZYD00006” for quantitative trait locus (QTL) mapping of PH and NNMS. Combined with bulked segregant analysis (BSA) by extreme materials, 8 consistent QTLs were identified. According to the gene annotation of the QTL interval, a total of 335 genes were obtained. Five of which were associated with PH and NNMS, potentially representing candidate genes. RT-qPCR of these 5 candidate genes revealed two genes with differential relative expression levels in the stems of different materials. Haplotype analysis showed that different single nucleotide polymorphisms (SNPs) between the excellent haplotypes in Glyma.04G251900 and Glyma.16G156700 may be the cause of changes in these traits. These results provide the basis for research on candidate genes and marker-assisted selection (MAS) in soybean breeding.


Introduction
Soybean is an important food and oil crops because it is an important source of plant protein and edible oil [1,2]. Breeding both high quality and high-yield soybeans species remains an important goal pursued by breeders. Attaining high yield is not only determined by yield traits; plant types are also important aspects of yield potential [3]. Plant height (PH) and number of nodes on the main stem (NNMS) are two important quantitative traits affecting soybean plant type. In the 1950s, since the Green Revolution, great success has been achieved in reducing the height of plants to increase actual crop yield [4]. However, many studies have shown a significant positive correlation between PH and NNMS, reporting that decreasing PH reduces NNMS and weakens the increased yield to some extent [5,6]. Developing the application potential of two traits to cultivate the ideal soybean plant type cultivar with suitable plant height and a large number of nodes is a continued concern for breeders.

Phenotypic Data for PH and NNMS
The PH and NNMS phenotype of the CSSL population from 2013 to 2016 are shown in Table 1 and Figure 1. PH in 2013 was significantly higher than that in 2014-2016, exhibiting a minimum skewness of 0.03. In 2014, the phenotypic information of PH had a minimum standard deviation (SD) and a minimum kurtosis of 11.50 and 0.08, respectively. The broad-sense heritability of PH ranged from 0.69 to 0.77, and the higher broad-sense heritability indicated that most of the phenotypic variation was mainly controlled by genotypes. The differences in NNMS of the maternal parent were significant. The broad-sense heritability of NNMS was between 0.20 and 0.27, and the lower heritability indicated that NNMS was susceptible to environmental factors, with gene and environment together potentially affecting phenotypic variation. Correlation analysis results of the two trait phenotypes are shown in Supplementary Table S1. In 2013-2016 (except for 2015), the Pearson correlation coefficient for PH and NNMS was between 0.364 and 0.398, indicating a weak correlation. The Pearson correlation coefficient between the BLUE is 0.495, which is moderately correlated. The phenotypic frequency distribution of the two traits exhibits an approximately normal distribution, and the heritability is highly reproducible, which is suitable for QTL mapping.

QTL Mapping Analysis for PH and NNMS in the CSSL Population
In total, 24 QTLs associated with the PH were detected from 2013 to 2016 and distributed over 13 chromosomes ( Table 2). The PVE% (Phenotypic Variance Explained of the QTL) for all of the QTLs ranged from 3.05% to 21.70%, with the logarithm of odds (LOD) values between 3.08 and 15.89. qPH-k-1 had a maximum PVE of 21.70% with a LOD value of 9.66. qPH-j-1 had a maximum LOD of 15.89 while explained 17.04% of the variation. qPH-j-1, which had a minimum map distance of 0.02 Mb, was mapped between 32.76 and 32.78 Mb on Gm16, explaining approximately 17.04% of the variation and having an additive effect of −7.65. qPH-b1-1 had a maximum map distance of 1.03 Mb and was mapped between 24.41 and 25.44 Mb, explaining 3.89% of the variation with an additive effect of 9.80.

Correlation Analysis by the Euclidean Distance (ED)
In this study, a total of 3,717,836 and 2,664,165 single nucleotide polymorphisms (SNPs) of PH and NNMS were used for correlation analysis by the Euclidean distance (ED) method. According to the rules of this method, we excluded SNP loci containing multiple alleles (those with more than one SNP within 5 bp) and SNP loci with identical genotypes between the two pools. Finally, we obtained 3,150,387 and 2,215,742 high-quality SNPs for PH and NNMS, respectively, and the support in both mixed pools was greater than 4 (Supplementary Table S2). After calculating the ED value, the correlation value was obtained using the local linear regression LOESS (locally weighted regression) method, and the candidate interval associated with the trait was determined by using 0.0129 and 0.0006 (median plus three standard deviations) as a standard threshold for PH and NNMS ( Figure 2). Among them, there are six candidate intervals related to PH, which are distributed on four chromosomes: 3, 4, 9 and 10. There are eight related regions related to NNMS, which are distributed on four chromosomes: 4, 6, 16 and 17 (

Consistency Interval Determination Combined with BSA and ICIM
For these QTLs, we obtained several relatively stable QTLs and narrowed the confidence interval of QTLs by comparing the overlapping intervals of QTLs from the BSA and ICIM (Inclusive Composite Interval Mapping) methods. In terms of PH, two stable QTLs were identified by comparing the overlapping intervals of both methods, and they were located on chromosome 4 from 51.16 Mb to 51.25 Mb and on chromosome 9 from 39.48 Mb to 39.81 Mb. In terms of NNMS, a significant QTL interval was also obtained after comparison, it was located on chromosome 16 from 3.26 Mb to 3.56 Mb. Interestingly, when we integrated the QTL of PH and NNMS for a comprehensive analysis, there were multiple significant QTLs identified that may affect both two traits at the same position. A region from 50.35 Mb to 52.38 Mb on chromosome 4 was detected multiple times by these two methods. From 30.78 Mb to 32.78 Mb on chromosome 16, it is an important region where the QTLs were repeatedly positioned five times for the two traits. Ultimately, we obtained a total of 8 carefully selected QTLs for subsequent analysis (Figure 3). Figure 3. Distribution of consistent QTLs and candidate genes. Note: red represents the QTL of PH; light blue represents the QTL of NNMS; navy blue represents the consensus QTL; yellow represents the candidate gene.

Mining Candidate Genes
A total of 335 candidate genes were screened in 8 major QTL intervals (Supplementary Table  S3), 5 of which were related to plant growth and development were found based on the annotation information (Supplementary Table S4).
Glyma.04G252300 is in the KEGG pathway (pathway ID K13946) associated with the auxin influx carrier, which was involved in the transport of auxin. The homologous gene in Arabidopsis is AT1G77690, which encodes an auxin influx carrier LAX3, (Like Aux1), that promotes lateral root emergence [20]. Glyma.04G254200 belongs to the auxin response factor family and affects plant growth by activating or inhibiting transcription to regulate expression of auxin genes. Its homologous gene in Arabidopsis, AT1G77850, is associated with the formation of auxin response factor 17, the transcription factor that regulates auxin-responsive gene expression [21]. Glyma.04G244200 occurs in the KEGG pathway (pathway ID K05282) and is associated with the synthesis of gibberellin 20, which encodes gibberellin 20-oxidase. The SD1 gene in the rice caused mutations in the encoded GA20ox-2, reducing the amount of active GA in the leaves and resulting in a short plant [22]. Glyma.16G156700 and Glyma.04G251900 belong to the GRAS family (gibberellic-acid insensitive (GAI), repressor of GAI (RGA), and scarecrow (SCR)), which are major players in gibberellin (GA) signaling and regulate various aspects of plant growth and development. AT3G54220 and AT4G08250 are their homologous genes in Arabidopsis thaliana, respectively. AT4G08250 encodes nodulation-signaling pathway 2 protein-like (NSP2), which is closely related to nodulation of legumes [23].

Expression Analysis of Candidate Genes
Three of the candidate genes (Glyma.04G244200, Glyma.16G156700, and Glyma.04G252300) were barely expressed in soybean stems. This result was consistent with the relative expression levels of genes found in Phytozome (https://phytozome.jgi.doe.gov/pz/portal.html). The four-year average PH and NNMS of these five materials are shown in Figure 4A,B. R19 and R200 had outstanding phenotypic values compared with the other three materials, while in Figure 4C,D, Glyma.04G254200 and Glyma.04G251900 also had higher expression levels in these lines. In contrast, R120 and R155 had lower phenotypic values, and the relative expression of candidate genes was also significantly lower than in other materials. The higher expression levels of the genes in the stems matched the higher PH and NNMS of the soybean material. The reverse was also true, revealing that these genes may play a positive role in soybean PH and NNMS at this stage. Glyma.04G244200 had the highest relative expression compared to other genes, suggesting that it may play a more significant role at this stage. These results indicate that the expression of candidate genes is closely related to the development of soybean PH and NNMS.

Haplotype Analysis
Haplotype analysis of candidate genes in ninety-two germplasm resources using Dnasp5.0 software was performed next. Phenotypic data from the germplasm resources followed the test results in 2018 (Supplementary Tables S5 and S6). The PH of germplasm resources ranged from 62.4 cm to 140.4 cm, and the range of NNMS was from 12.6 to 24.8. There were abundant variations in both PH and NNMS in the natural resource, meeting the candidate haplotype analysis requirements. The results showed that Glyma.04G244200, Glyma.04G251900, and Glyma.16G156700 had two or more excellent haplotypes (>5% of the population) (Supplementary Table S7), while Glyma.04G252300 and Glyma.04G254200 had only one excellent haplotype in the germplasm resource. Based on phenotypic data of ninety-two germplasm resource populations in PH and NNMS, the ANOVA method was used to analyze the differences between the phenotypes of the excellent haplotypes. Significant differences in PH and NNMS were showed only between the excellent haplotypes of Glyma.04G251900 and Glyma.16G156700 (Supplementary Table S8). Glyma.04G251900 had two excellent haplotypes, Hap-2 and Hap-5, which exhibited significant differences in PH and NNMS ( Figure 5B,C). Hap-2 contained 9 resources with a reduced average of PH and NNMS, and Hap-5 included 55 resources and the average PH and NNMS were significantly higher than those in Hap-2. SNP-190 was the only distinct SNP between Hap-2 and Hap-5 ( Figure 5A), and SNP-190 carries a cis element (ARR1) [24] that changes according to PLACE analysis. According to linkage disequilibrium (LD) analysis, there is linkage between SNP-190 and SNP-225 ( Figure 5D), and SNP-225 encodes the cis element ARR1 as well, potentially representing the cause for differential in gene function. Glyma.16G156700 has four excellent haplotypes with significant differences between Hap-19 with Hap-19 and Hap-13 and between Hap-19 and Hap-28 with respect to PH according to ANOVA based on phenotypic values and haplotypes ( Figure 5F). Analysis of SNPs in the promoter and CDS (Coding sequence) regions revealed that there were two SNP differences between the Hap-19 and Hap-28 in the CDS region ( Figure 5E). The change in SNP1469 was a nonsense mutation, and SNP754 resulted in amino acid changes (W > L), potentially leading to changes in the structure and function of the gene. There are multiple SNP differences between Hap-19 and Hap-13 ( Figure 5E), of which SNP-1803 and SNP-1689 lead to cis element ethylene responsive element [25] and CAAT-box [26] changes according to PLACE and Plant CARE analysis, and linkage disequilibrium analysis shows linkage imbalance between SNP-1803 and SNP-1689 ( Figure 5G). This may be the cause of differences in traits but requires more in-depth research to confirm.

Discussion
CSSL is a high-generation backcross population constructed by using the parental hybrid F 1 generation and the recurrent parent to perform multiple backcrossing and combining molecular marker-assisted selection of the donor parent chromosome fragment. The genetic background of CSSL is usually derived from the recurrent parent and contains only one or a few introduced fragments of the donor parent, which greatly reduces the interference from the genetic background. It is an ideal material for QTL positioning due to improved QTL accuracy. At present, many useful results have been achieved in rice [27,28], maize [29], cucumber [30], and Gossypium hirsutum [31]. There are also extensive studies on 100-seed weight [32,33], seed shape [34], and drug resistance [35] in soybean. During the long process of crop domestication, each generation has only the best seeds to form the next generation, which reduces the genetic diversity of the entire genome in the long run [36]. In this study, the wild soybean variety ZYD00006 was used as the donor parent to construct the CSSL population, and genes were introduced to broaden the genetic resources within the population. Each strain in the population contains only one or a few introduced fragments, facilitating rapid and more accurate localization of the main QTL due to the small genetic difference.
PH and NNMS play major roles in controlling plant type and affecting crop yield. Previous studies have shown a positive correlation between PH and NNMS [6]. Thus, the same QTL may affect both traits due to pleiotropic effects. In the present study, we identified multiple QTLs that affect both PH and NNMS, for example, qPH-C1-3 and qMS-C1-1 have coincident positions at 50.35 Mb-52.38 Mb on Gm04. Among them, the area between 51.16 Mb and 51.24 Mb was identified multiple times and maybe an important site related to growth and development. Lee et al. also located a PH QTL near this location [37]. However, most QTLs only work on PH or NNMS, similar to qPH-d1a-1, which only affects PH and qMS-d1a-1, which is only related to NNMS. These results suggest that the genetic mechanisms of PH and NNMS may be similar but not identical.
Currently, the number of QTLs for PH and NNMS that can be queried in the Soybase database (http://www.soybase.org) is 230 and 39, respectively. Yin et al. interrogated 159 QTLs for PH on the Soybase database and 23 QTLs determined by the 8-year phenotypic data of the RIL population for meta-analysis, obtaining 36 Meta QTLs [38]. In the present study, we identified a total of 24 and 10 QTLs for PH and NNMS, including 10 and 6 QTLs of that completely coincided or intersected with previous studies and 4 QTLs for PH that matched Meta QTL [39][40][41][42]. In addition, the resulting QTLs had smaller confidence intervals, illustrating the reliability and accuracy of these results. Among these, 14 and 4 QTLs did not appear in previous results for PH and NNMS, indicating that they are newly identified QTLs and providing an important basis for understanding plant genetic structure.
Currently, there have been multiple reports concerning the control of PH, Dwarf5 (d5) has defects in terpene synthase (TPS) enzymes involved in the early step of GA biosynthesis that result in reduced PH, and Dwarf18 causes a mutation in the GA3ox gene, resulting in the dwarfism [43,44]. Both of these mutations reduce the GA content in plants by affecting the GA biosynthetic pathway or signal transduction that dwarfs the plants. Glyma.04G244200, Glyma.16G156700 and Glyma.04G251900 are involved in the GAs pathway. Glyma.04G244200 encodes the key enzyme GA20ox in the GA synthesis pathway. GA20ox, GA3ox, GA2ox and EUI1 together form a feedback transcriptional control mechanism to maintain the homeostasis of GA content in plants. Inhibition or loss of function of the GA20ox gene results in dwarfing of the plant [45,46]. This may be an important candidate gene for PH in soybean. Both Glyma.16G156700 and Glyma.04G251900 contain a common GRAS domain. Zhou et al. isolated a novel GRAS transcription factor, SlGRAS26, in Solanum lycopersicum, illustrating that its downregulation generated pleiotropic phenotypes, including reduced PH [47]. The homologous gene of Glyma.16G156700 is AT3G54220, which encodes the SCARECROW-like protein, and Zhang et al. found that SCL3 can promote gibberellin signaling by antagonizing the master growth repressor DELLA in Arabidopsis [48]. These results suggest that these three candidate genes may affect PH and NNMS by regulating the synthesis of gibberellin. In recent years, results have shown that PIN [49,50], Actin7 [51], and BZR1 [52] control the synthesis and transport of auxin and brassinolide to affect PH and NNMS. Glyma.04G252300 and Glyma.04G254200 encode the auxin influx carrier LAX3, and auxin response factors are involved in the regulation of auxin expression. AUX1 and LAX3 participate in the interaction of auxin and ethylene in Arabidopsis, affecting the expression of auxin [53]. ARFs (Auxin Response Factor) are a family of functionally distinct response factors that regulate auxin to complete various plant growth and development processes [54]. MiR160 is a negative regulator of ARF17, which regulates the expression of auxin and affects the production of adventitious roots [55,56]. These results suggest that PH and NNMS may be affected by complex plant hormone regulatory networks, laying the foundation for our study of the molecular mechanisms of PH and NNMS.
Herein, we identified candidate genes that may affect PH and NNMS and discussed gene functions that are involved in the synthetic transcription pathway of plant hormones and play an important role in PH and NNMS. Further work is required to elucidate the molecular mechanisms of these candidate genes.

Plant Material Planting and Phenotypic Determination
A total of 194 whole-genome chromosome segment substitution lines (BC 3 F n , BC 4 F n ) were constructed by crossing "suinong14" (the main cultivars of Heilongjiang, China) and "ZYD00006" (wild soybean resources) and subsequent backcrossing with suinong14 (Supplementary Figure S1) [33]. Based on this study, the number of CSSL populations was increased to 208 by using SSR markers to identify uncovered fragments. During 2012-2016, the CSSL populations were planted in Xiangyang Farm, Harbin, Heilongjiang Province (Harbin, latitude 45 • 450" N, longitude 126 • 380" E). A randomized complete block design with three replicates was used over 4 years. The lines were grown in a one-row plot with 60-cm row spacing and a 6-cm space between plants. The rows were 5 m long, with approximately 80 plants per row. In October of each year, five individuals with the same growth status were selected in each line for PH and NNMS measurements in the field, and field management followed the common agricultural practice. PH (cm) is the average length from the cotyledonary node to the top of the mature plant [57]. NNMS indicates the number of nodes from the cotyledonary node to the top of the main stem. The linear mixed model is used to integrate the phenotypic data from different years, the individual's genotype effect is used as the fixed effect and the year change is used as the random effect. The lme4 in R software is used to calculate the best linear unbiased estimator (BLUE) of each individual. The mixed linear model is calculated according to the following formula: where "X" represents the fixed effect; "β" represents fixed-effects parameter estimates; "Z" represents the random effect; "γ" represents random-effects parameter estimates; "ε" represents the errors. Based on the analysis of variance, the AOV model in QTL ICIMapping Version 4.1 (Beijing, China) was used to determine the broad-sense heritability [58,59].
where "r" represents the number of repetitions; "σ 2 g "represents the genetic variance; "σ 2 e "represents the environmental variance.

QTL Analysis in the CSSL Population
ICIMapping 4.1 was used for QTL positioning by using the CSSL module. The presence of QTL based on a LOD value greater than 2.5 was determined. QTL naming was based on the method of McCouch [60]. The QTL name is constructed as follows: q+ trait + LG or LG number + QTL number.

DNA Extraction and Pool Construction
Combining the phenotypic data of the four years, 30 (>10% of the total population [8,10]) extreme phenotype individuals were selected from the CSSL population according to the four extreme performances (high PH, low PH, high NNMS and low NNMS) (Supplementary Table S9). Fresh leaves were cryopreserved using liquid nitrogen, DNA of leaves was extracted using the CTAB (cetyltrimethylammonium Ammonium Bromide) plant tissue DNA extraction method, DNA concentration and purity were measured using a Nano Drop 2000C (Sunnyvale, California, USA) ultra-micro spectrophotometer and 1.5% agarose gel electrophoresis. The DNA was randomly disrupted by ultrasonic disruption, and the DNA fragment was end-repaired, the 3 end was filled with a base, and the sequencing linker was added. The DNA fragment was enriched by PCR amplification, and the product was purified to remove the contamination of the linker. The construction of the sequencing pool library was completed. The sequencing library constructed above was used for pair-end sequencing (each end 150 bp) on an Illumina HiSeq 2500 platform (San Diego, CA, USA) using the normal protocol at Beijing Biomarker Technologies Corporation (http://www.biomarker.com.cn).

Data Analysis and Filtering
To ensure the quality of information analysis, the following methods were adopted: the sequence containing the adapter was removed; the paired-end reads of N > 10% on the sequence were removed (the specific base type cannot be determined); the low-quality reads were removed (the number of bases with a mass value of Q ≤ 10 is more than 50% of the entire read); and the raw data was filtered. The genome of the soybean variety Williams 82 was used as a reference genome, and the short sequence obtained from the second-generation high-throughput sequencing was compared with the reference genome using BWA software (https://sourceforge.net/projects/bio-bwa/) [61]. According to the positioning results of Clean Reads on the reference genome, the Picard tool was used (http: //sourceforge.net/projects/picard/) to remove the duplicates. After the local realignment and correction of base quality values were performed by GATK software (Cambridge, Massachusetts, USA) [62], SNPs were detected to ensure the accuracy of the detected SNP. Finally, SNP clusters were filtered out (filtered out if there are 2 SNPs within 5 bp); SNPs near indels were filtered out (SNPs filtered within 5 bp near an indel); Filtering adjacent indel was used (two Indel distances less than 10 bp filtered) [63]; and the SNPs were rigorously filtered to obtain the final SNP locus set.

Correlation Analysis by the Euclidean Distance (ED)
The Euclidean Distance (ED) algorithm is a method for identifying significant differences between mixed pools using sequencing data and evaluating areas associated with traits [64]. In theory, there are differences in the target trait related sites between the two pools constructed by the BSA, while other sites tend to be consistent, and the ED value should tend to zero. The formula for the ED method is as follows, and the larger the ED value, the larger the difference between the two mixed pools.
A mut is the frequency of the A base in the mutation pool, and A wt is the frequency of the A base in the wild pool; C mut is the frequency of the C base in the mutation pool and C wt is the frequency of the C base in the wild pool; G mut is the frequency of the G base in the mutation pool and G wt is the frequency of the G base in the wild pool; T mut is the frequency of the T base in the mutation pool and T wt is the frequency of the T base in the wild pool. In the analysis, SNP sites with different genotypes between the two pools were used to calculate the depth of each base in different pools, and the ED value of each locus was calculated. This study used the median plus three standard deviations as the correlation value to eliminate background interference. A region above the threshold is selected as the region related to the trait according to the threshold (median plus three standard deviation).

Gene prediction of Major QTL Intervals
The soybean gene in the target segment was identified using the "Williams 82.a2.v1" genome as the reference genome. All genes in the QTL interval were extracted, and gene functions were annotated by the Kyoto Encyclopedia of Genes and Genomes (KEGG) (https://www.kegg.jp/), gene ontology (GO) (https://www.ebi.ac.uk/QuickGO/), and Pfam (https://pfam.xfam.org/) databases. Finally, the candidate genes were obtained.

RT-qPCR Analysis of Candidate Genes
Based on the phenotypes of PH and NNMS, we selected extremely high materials R19 (the PH is 118.25 cm, the NNMS is 19.25) and R200 (122.4 cm, 22.6) and extremely small materials R120 (56.2 cm, 15) and R155 (56.4 cm, 15.8) and SN14 (81.0 cm, 18.1). There are significant phenotypic differences at the P = 0.05 level between extremely high line and extremely small lines on PH and NNMS (Supplementary Tables S10 and S11). Extraction of shoot tip tissue from the ternate compound-leaf stage for RNA extraction using TRIzol Reagent (Invitrogen, 15596-026, Carlsbad, CA, USA) was performed. Reverse transcription of extracted RNA into cDNA using the Tianhe Real-time quantitative PCR (RT-qPCR) kit was performed using SYBR qPCR Mix (Vazyme, Q711, Vazyme biotech, Nanjing, China) on the Light Cycler 480 System (Roche, Roche Diagnostics, Basel, Switzerland). The expression levels of candidate genes were calculated with GmUKN1 as an internal reference according to the following formula [65]: The qRT-PCR primer sequences specific for candidate genes were designed using Primer Premier 5.0 (http://www.premierbiosoft.com/primerdesign/index.html).

Haplotype Analysis
Ninety-two germplasm resources from all over China were used for haplotype analysis due to their rich genetic variation and large fluctuation range of PH and NNMS. Ninety-two germplasm resources were also planted in Xiangyang Farm, Harbin. Planting methods and field management were the same as for the CSSL population. The genomic sequence of the candidate gene was proposed on the Phytozome12 website (https://phytozome.jgi.doe.gov/pz/portal.html), including the 5 UTR, the upstream 3000 bp promoter region sequence, CDS sequence, intron sequence, and 3 UTR sequence information. The sequence information of the candidate gene and the 92 soybean germplasm resources resequencing genomic sequence information were subjected to local BLAST analysis to obtain SNP information of the candidate gene in the germplasm resource population. In this study, Dnasp5.0 software was used to analyze the haplotype distribution of candidate gene SNP sequences in germplasm resource populations, and the excellent haplotypes (the number of cultivars belonging to the haplotype exceeded 5% of the total) were screened out (https://haploview.software.informer.com/4.2/). Using the Haps Format module in Haploview 4.2 software (Cambridge, Massachusetts, USA) to analyze the excellent haplotype sequence information of candidate genes, ANOVA analysis of candidate gene haplotypes and phenotypes using SPSS 17.0 software (Armonk, New York, USA) was used to determine the effect of each haplotype on phenotype. PLACE (Plant cis-acting regulatory DNA elements (https://www.dna.affrc.go.jp/PLACE/)) and Plant CARE (Plant promoters and cis-acting regulatory elements (http://bioinformatics.psb.ugent.be/webtools/plantcare/html/)) softwares were used to query the function of plant promoter elements.

Conclusions
We combined linkage mapping and BSA to identify QTL and detected 8 consistent intervals. According to the gene annotation of the QTL interval, five of which were associated with PH and NNMS. RT-qPCR and haplotype analysis showed that they may be candidate genes that affect PH and NNMS. These results provide the basis for research on candidate genes and marker-assisted selection (MAS) in soybean breeding.

Conflicts of Interest:
The authors declare that they have no conflict of interest.