Construction of Genetic Map and QTL Mapping for Seed Size and Quality Traits in Soybean (Glycine max L.)

Soybean (Glycine max L.) is the main source of vegetable protein and edible oil for humans, with an average content of about 40% crude protein and 20% crude fat. Soybean yield and quality traits are mostly quantitative traits controlled by multiple genes. The quantitative trait loci (QTL) mapping for yield and quality traits, as well as for the identification of mining-related candidate genes, is of great significance for the molecular breeding and understanding the genetic mechanism. In this study, 186 individual plants of the F2 generation derived from crosses between Changjiangchun 2 and Yushuxian 2 were selected as the mapping population to construct a molecular genetic linkage map. A genetic map containing 445 SSR markers with an average distance of 5.3 cM and a total length of 2375.6 cM was obtained. Based on constructed genetic map, 11 traits including hundred-seed weight (HSW), seed length (SL), seed width (SW), seed length-to-width ratio (SLW), oil content (OIL), protein content (PRO), oleic acid (OA), linoleic acid (LA), linolenic acid (LNA), palmitic acid (PA), stearic acid (SA) of yield and quality were detected by the multiple- d size traits and 113 QTLs related to quality were detected by the multiple QTL model (MQM) mapping method across generations F2, F2:3, F2:4, and F2:5. A total of 71 QTLs related to seed size traits and 113 QTLs related to quality traits were obtained in four generations. With those QTLs, 19 clusters for seed size traits and 20 QTL clusters for quality traits were summarized. Two promising clusters, one related to seed size traits and the other to quality traits, have been identified. The cluster associated with seed size traits spans from position 27876712 to 29009783 on Chromosome 16, while the cluster linked to quality traits spans from position 12575403 to 13875138 on Chromosome 6. Within these intervals, a reference genome of William82 was used for gene searching. A total of 36 candidate genes that may be involved in the regulation of soybean seed size and quality were screened by gene functional annotation and GO enrichment analysis. The results will lay the theoretical and technical foundation for molecularly assisted breeding in soybean.


Introduction
Soybean (Glycine max L.) is one of the most important crops and esteemed foods, with numerous nutritional substances including being rich in protein, carbohydrates, lipids, minerals, vitamins, and bioactive substances [1][2][3], and has the highest level of crude protein among plant-based protein sources [4,5].It is a rich source of both edible oil and plant-based protein because of its atmospheric nitrogen fixing capability which occurs through a symbiotic interaction with soil microorganisms [6].Given this irreplaceable point, Soybean is widely grown and consumed globally and constitutes nearly 28% of vegetable oil and 70% of protein in meals worldwide [7].Since the demand for soybean has been increasing globally, soybean yield enhancement is now receiving significant attention for its potential for evolving productivity, as breeding high-yield and high-quality soybean is an important and urgent task [5,8].Constructing a relatively high-density map and mapping QTLs for seed size and quality traits to search for related genes is helpful to improve the yield potential of soybean.
In crop breeding, seed size is one of the most important agronomic traits that needs to be considered first.It is an important factor in determining soybean production, seed consumption, and evolutionary fitness [8][9][10].Seed size is a quantitative trait controlled by multiple genes and is constrained by environmental factors [11].Furthermore, seed size is significantly correlated with 100-seed weight, and it is not only a component factor of seed yield, but also an important factor affecting morphological traits [12].Seed traits of soybean include seed length, seed width, seed thickness, and 100-seed weight, among which, seed length, seed width, and seed thickness are related to seed size, and 100-seed weight is closely related to seed size [13][14][15].
Soybean is one of the major sources of seed protein and oil around the world, with an average composition of 40% protein and 20% oil [16][17][18].Moreover, it is a source for essential amino acids and metabolizable energy for both human and animal consumption [19].Seed protein and oil content are quantitatively inherited traits and are considerably affected by various environmental conditions [20,21].
Up to now, there are only a few papers focusing on the mapping of QTLs for seed size and quality using the high-density map in various genetic backgrounds of soybean [22].The present study is aimed at constructing a relatively high-density map and mapping QTLs for seed size and quality traits through a population derived from a cross between ChangJiangChun2 (CJC2) and YuShuXian2 (YSX2) in four environments.The results are expected to be useful for marker-assisted selection (MAS) and to improve our understanding of genetic mechanisms underlying seed size and quality traits in soybean.

Trait Phenotype Analysis
The results of the phenotypic data analysis for the four environments are presented in Table 1.For seed size traits, the phenotypic data of HSW and SW of YSX2 were higher than those of CJC2, while the phenotypic data of SL and SLW of CJC2 were higher than those of YSX2.For seed quality traits, The content of OIL, PRO, OA, LNA, PA, and SA of CJC2 were higher than that of YSX2, whereas the content of LA in YSX2 was higher than that of CJC2.
A total of eleven traits conferring seed size and quality were segregated to a certain extent, with coefficients of variation ranging from 2.59% to 23.09%, and there was transgressive segregation for each trait.The histogram of frequency distribution showed that the four traits were approximately normally distributed in the three environments, which was consistent with the genetic rule of quantitative traits (Figures S1 and S2).

Correlation Analysis of Seed Size Traits and Quality Traits
From Figure 1, we can see that within the category of seed size traits, there was a strong positive correlation among HSW, SL, and SW.The SL and SLW were negatively correlated but did not reach significant levels of 0.05; on the other hand, SW and SLW have a significant negative correlation.
For the quality traits, previous studies verified the strong negative correlation between soybean oil and protein [23,24], and that point was also confirmed in the present study.There was an extremely significant negative correlation between OA and LA.Additionally, there was a negative correlation between OIL and LNA at a significance level of 0.01, as well as a negative correlation of a 0.01 significance level between OA and LNA.Furthermore, an extremely significant negative correlation was found between LA and PA, while a positive correlation of 0.01 significant level was observed between PA and OA.
Between quality and seed size traits, it appeared that HSW had a positive correlation with OIL and a negative correlation with LNA.SW had a positive correlation with OIL and a negative correlation with LNA.SLW had a negative correlation with OIL and a positive correlation with LNA.For the quality traits, previous studies verified the strong negative correlation between soybean oil and protein [23,24], and that point was also confirmed in the present study.There was an extremely significant negative correlation between OA and LA.Additionally, there was a negative correlation between OIL and LNA at a significance level of 0.01, as well as a negative correlation of a 0.01 significance level between OA and LNA.Furthermore, an extremely significant negative correlation was found between LA and PA, while a positive correlation of 0.01 significant level was observed between PA and OA.
Between quality and seed size traits, it appeared that HSW had a positive correlation with OIL and a negative correlation with LNA.SW had a positive correlation with OIL and a negative correlation with LNA.SLW had a negative correlation with OIL and a positive correlation with LNA.

Genetic Map Construction
The 3780 SSR primers were selected for screening the polymorphism between ChangJiangChun2 (CJC2) and YuShuXian2 (YSX2), and 465 polymorphic pairs were obtained after the screening.Using the obtained marker loci, a linkage map containing 27 linkage groups was constructed with the 20 chromosomes of soybean.The genetic map was 2375.6 cM in length, with an average map distance of 5.3 cM (Table 2 and Figure 2).The longest linkage group was 200.3 cM of chromosome 13, the shortest was 29.8 cM of

Genetic Map Construction
The 3780 SSR primers were selected for screening the polymorphism between ChangJiangChun2 (CJC2) and YuShuXian2 (YSX2), and 465 polymorphic pairs were obtained after the screening.Using the obtained marker loci, a linkage map containing 27 linkage groups was constructed with the 20 chromosomes of soybean.The genetic map was 2375.6 cM in length, with an average map distance of 5.3 cM (Table 2 and Figure 2).The longest linkage group was 200.3 cM of chromosome 13, the shortest was 29.8 cM of chromosome 5.The maximum number of markers was 44 on chromosome 2, and the minimum number of markers was 7 on chromosome 5.The longest average distance between markers is 8.91 cM on chromosome 12, and the shortest average distance between markers is 2.58 cM on chromosome 15.

QTL Mapping for Seed Size Traits
Based on the constructed linkage group, and using the mapping methods of MQM, a total of 62 QTLs related to seed size traits were mapped in 4 environments (Figure 3 and Table 3).
For hundred-seed weight, 11 QTLs were identified and mapped on ten chromosomes, explaining the phenotypic variation from 7.40 to 17.00%.qHSW13.1 and qHSW16.1 were identified in two environments, with the maximum phenotypic variation of 10.20% and 12.10%, respectively.The favorable alleles of seven QTLs were originated from YSX2.The favorable alleles of three QTLs were originated from CJC2.
For seed length, 15 QTLs were identified and mapped on twelve chromosomes, explaining between 7.50% and 15.10% of the phenotypic variation.qSL13.1 and qSL16.2 were identified in two environments, with the maximum phenotypic variation of 10.30% and 12.00%, respectively.The favorable alleles of nine QTLs were derived from CJC2; The favorable alleles of five QTLs were derived from JY166.
For seed width, 17 QTLs were identified and mapped on fourteen chromosomes, explaining between 7.20% and 20.10% of the phenotypic variation.qSW06.1,qSW14.1, and qSW16.1 were identified in two environments.qSW19.1 on chromosome 19 had the largest phenotypic variation of 20.10%.The favorable genes of six QTL were derived from CJC2, and the favorable alleles of other nine QTLs were derived from YSX2.
For seed length-to-width ratio, 19 QTLs were identified and mapped on fourteen chromosomes, explaining between 7.30% and 17.80% of the phenotypic variation.qSLW16.1 was detected in two environments, with the phenotypic contribution of 11.10%.The favorable genes of eleven QTLs were derived from CJC2, and the favorable alleles of other eight QTLs were derived from YSX2.

QTL Mapping for Seed Size Traits
Based on the constructed linkage group, and using the mapping methods of MQM, a total of 62 QTLs related to seed size traits were mapped in 4 environments (Figure 3 and Table 3).
For hundred-seed weight, 11 QTLs were identified and mapped on ten chromosomes, explaining the phenotypic variation from 7.40 to 17.00%.qHSW13.1 and qHSW16.1 were identified in two environments, with the maximum phenotypic variation of 10.20% and 12.10%, respectively.The favorable alleles of seven QTLs were originated from YSX2.The favorable alleles of three QTLs were originated from CJC2.
For seed length, 15 QTLs were identified and mapped on twelve chromosomes, explaining between 7.50% and 15.10% of the phenotypic variation.qSL13.1 and qSL16.2 were identified in two environments, with the maximum phenotypic variation of 10.30% and 12.00%, respectively.The favorable alleles of nine QTLs were derived from CJC2; The favorable alleles of five QTLs were derived from JY166.
For seed width, 17 QTLs were identified and mapped on fourteen chromosomes, explaining between 7.20% and 20.10% of the phenotypic variation.qSW06.1,qSW14.1, and qSW16.1 were identified in two environments.qSW19.1 on chromosome 19 had the largest phenotypic variation of 20.10%.The favorable genes of six QTL were derived from CJC2, and the favorable alleles of other nine QTLs were derived from YSX2.
For seed length-to-width ratio, 19 QTLs were identified and mapped on fourteen chromosomes, explaining between 7.30% and 17.80% of the phenotypic variation.qSLW16.1 was detected in two environments, with the phenotypic contribution of 11.10%.The favorable genes of eleven QTLs were derived from CJC2, and the favorable alleles of other eight QTLs were derived from YSX2.Chr.17

QTL Mapping for Seed Quality Traits
A total of 104 QTLs related to soybean quality traits were detected in 4 environments (Figure 4, Table 4).

Identification and Analysis of QTL Clusters
Following the principle of stability and effectiveness, a total of 7 QTL clusters were located on 4 chromosomes in this study (Table 5).A total of 4 QTL clusters contained QTLs related to seed size traits, and 3 QTL clusters contained QTLs related to seed quality traits.In terms of the number of controlled traits and environments, two important QTL clusters of four seed size traits and three quality traits were LociS16. 1      For oil content, 21 QTLs were identified and mapped on sixteen chromosomes, explaining the phenotypic variation from 7.50% to 17.10%.qOIL04.1 was identified in two environments, with the phenotypic variation of 10.80% and 8.10%, respectively.The favorable alleles of nine QTLs were derived from CJC2, and the favorable alleles of twelve QTL were derived from YSX2.
For protein content, 13 QTLs were identified and mapped on nine chromosomes, explaining between 7.20% and 17.70% of the phenotypic variation.All of the QTLs were identified in only one environment.The favorable alleles of eight QTLs were derived from CJC2, and the favorable alleles of five QTLs were derived from YSX2.
For palmitic acid, 15 QTLs were identified and mapped on fourteen chromosomes, explaining between 7.50% and 13.50% of the phenotypic variation.qPA04.1 was identified in two environments, with the phenotypic variation of 9.10% and 11.00%, respectively.The favorable alleles of six QTLs were derived from CJC2, and the favorable alleles of eight QTLs were derived from YSX2.
For stearic acid, 27 QTLs were identified and mapped on seventeen chromosomes, explaining between 7.20% and 16.40% of the phenotypic variation.qSA14.2 was detected in two environments with the phenotypic contribution of 9.30%.The favorable alleles of eighteen QTLs were derived from CJC2, and the favorable alleles of other eight QTLs were derived from YSX2.
For linolenic acid, 6 QTLs were identified and mapped on five chromosomes, explaining the phenotypic variation from 7.30% to 10.10%.qLNA13.2and qLNA16.1 were identified in two environments, with the maximum phenotypic variation of 10.10% and 7.80%.The favorable alleles of qLNA14.1 were derived from CJC2, while the favorable alleles of qLNA06.1,qLNA11.1, and qLNA13.1 were derived from YSX2.

Identification and Analysis of QTL Clusters
Following the principle of stability and effectiveness, a total of 7 QTL clusters were located on 4 chromosomes in this study (Table 5).A total of 4 QTL clusters contained QTLs related to seed size traits, and 3 QTL clusters contained QTLs related to seed quality traits.In terms of the number of controlled traits and environments, two important QTL clusters of four seed size traits and three quality traits were LociS16.1 and LociQ06.1.

Candidate Gene Prediction
In the promising intervals of their respective chromosomes, the physical locations of LociS16.1 range from 27.87 Mb to 29.00 Mb, while LociQ06.1 range from 12.57 Mb to 13.87 Mb.We searched 52 genes for seed size traits and 114 genes for quality traits.Based on the GO enrichment tools of the SoyBase (http://www.soybase.org,accessed on 6 December 2023) and the Wm82 genome assemblies, all the genes were conducted with GO analysis (Figure 5).For the 52 genes corresponding to seed size traits, 7 of them failed to be found in any GO Ontologies.For the 114 genes corresponding to quality traits, 13 of them failed to be found in any GO Ontologies.

Discussion
YSX2 is a typical vegetable soybean.It has a higher HSW and SW, but lower SL and SLW than CJC2, which means that YSX2 builds heavier and shorter seeds than the normal soybean CJC2.It is interesting that SL and SW are both positively correlated to HSW, which contributes more to HSW and deserves a further consideration.
The correlation analysis also makes it interesting with regards to SL.As mentioned, the parents have counter-intuitive data on HSW and SL.The correlation analysis also sheds light on SL during seed growth and development, suggesting that as SL increases, SW also tends to increase.This implies that seeds with higher HSW may exhibit lower SLW, indicating a need for further investigation into this conclusion.
Considering the importance of soybean, the improvement in seed size traits and quality traits of soybean are in high demand.The development in the QTL of soybean has made great progress recently.Kulkarni et al. identified 9 QTLs for HSW in 2017, localized on eight linkage groups, using recombinant inbred lines (RILs) constructed from a cross of Williams 82 and PI366121 [25].Kumar et al. used seed-derived F 2 and F 2:3 of vegetable soybean populations to map QTLs.A total of 42 QTLs were identified, distributed on 13 chromosomes [26].For quality traits, a total of 13 QTLs for the traits studied have been mapped on 3 chromosomes of the soybean genome.One major QTL for oil content (qOIL001) explained approximately 76% of the total phenotypic variation in this population [27].Sun et al. used a RIL population to detect QTLs for seed size traits in four environments [28].Ten QTL controlling-related traits were identified, of which, five QTLs distributed on chromosomes 02, 04, 06, 13, and 16 were detected in at least two environments, with PVE ranging from 3.6% to 9.4%.The previous results showed that nine micro-effect QTLs of protein content and seven micro-effect QTLs of fat content were detected [29].
A total of 11 QTLs related to HSW were detected in this study, with phenotypic variation rates ranging from 7.40% to 17.00%.Most of the favorable alleles were from YSX2, while qHSW13.1 and qHSW16.1 were detected in two environments.Among them, qHSW013.1 has been reported by previous studies [30].A total of 15 QTLs related to seed length were detected, located on twelve chromosomes, explaining between 7.50% and 15.10% of the phenotypic variation, while most of the favorable alleles were from CJC2.Wherein, qSL13.1 and qSL16.2 were identified in two environments, with the maximum phenotypic variation of 10.30% and 12.00%.A total of 17 QTLs related to seed width were detected, and located on fourteen chromosomes, explaining between 7.20% and 20.10% of the phenotypic variation, and most of the favorable alleles were from YSX2, among which, qSW03.1 was consistent with Zhang et al. [31] and Hu et al. [32], while qSW09.1 was consistent with Hina et al. [33].A total of 19 QTLs related to seed length-to-width ratio were detected, and were located on fourteen chromosomes, explaining between 7.30% and 17.80% of the phenotypic variation.
A total of 21 QTLs associated with oil were identified and mapped on sixteen chromosomes, explaining the phenotypic variation from 7.50% to 17.10%.qOIL04.1 was identified in two environments, with the phenotypic variation of 10.80% and 8.10%.And qOIL04.1 was consistent with Li et al. [34].A total of 13 QTL associated with protein were identified and mapped on nine chromosomes, explaining between 7.20% and 17.70% of the phenotypic variation.The favorable alleles of eight QTLs were derived from CJC2, and the favorable alleles of five QTLs were derived from YSX2.Wherein, qPRO13.3 was consistent with Whiting et al. [35] and Bandillo et al. [36].A total of 15 QTLs associated with palmitic acid were identified and mapped on fourteen chromosomes, explaining between 7.50% and 13.50% of the phenotypic variation.qPA04.1 was identified in two environments, with the phenotypic variation of 9.10% and 11.00%, respectively.qPA13.1 was consistent with 43 Yao et al. [37].A total of 27 QTLs associated with stearic acid were identified and mapped on seventeen chromosomes, explaining between 7.20% and 16.40% of the phenotypic variation, of which qSA14.2 was detected in two environments with the phenotypic contribution of 9.30%.A total of 13 QTLs associated with oleic acid were identified and mapped on eleven chromosomes, explaining between 7.30% and 11.40% of the phenotypic variation.qOA04.1 and qOA06.1 were detected in two environments, with the maximum phenotypic contribution of 9.40% and 8.00%.A total of 9 QTLs associated with linoleic acid were identified and mapped on eight chromosomes, explaining the phenotypic variation from 7.20% to 12.00%.qLA07.1 and qLA13.1 were identified in two environments, with the maximum phenotypic variation of 10.40% and 8.80%.qLA13.2 was consistent with Priolli et al. [38].A total of 6 QTLs associated with linolenic acid were identified and mapped on five chromosomes, explaining the phenotypic variation from 7.30% to 10.10%.qLNA13.2and qLNA16.1 were identified in two environments, with the maximum phenotypic variation of 10.10% and 7.80%.In summary, 62 QTLs of seed size traits and 104 QTLs of quality traits were located in this study, providing valuable information for improving soybean quality.
The QTL intervals related to seed size traits and quality traits that we detected were compared with the soybean public database, and many QTLs were found to have overlapping regions with days to flowering and maturity.It is therefore hypothesized that genes regulating protein and oil content synthesis or other metabolic pathways may be associated with genes regulating the entire developmental process of soybean, suggesting the potential for common genetic factors for these traits and the need to promote further research on these regions.
We detected overlapping QTLs for multiple traits, with 7 QTL clusters located on chromosomes 6, 7, 13, and 16, each associated with two or more traits related to seed size, oil content, protein, and fatty acids.A total of 4 QTL clusters contained QTLs related to seed size traits, and 3 QTL clusters contained QTLs related to seed quality traits.In terms of the number of controlled traits and environments, two important QTL clusters of four seed size and three quality traits were LociS16.1 and LociQ06.1.QTL clusters may represent gene/QTL linkage or pleiotropic effects of a single QTL within the same genomic region.These QTL clusters can lay a foundation for further exploration of target genes controlling seed size and quality traits.Within the promising intervals of the LociS16.1 and LociQ06.1, the physical locations range from 27.87 Mb to 29.00 Mb and from 12.57 Mb to 13.87 Mb, respectively, in the relative chromosome.Eventually, after gene function annotation screening, 14 candidate genes for seed size traits of soybean and 22 for quality traits of soybean are obtained.
In the course of the study, the orthologous genes of other crops in our candidate interval were found, and some genes were related with the traits we studied.For the next step of the discovery of the molecular mechanism of those genes, we list them here as reference for further study.Arabidopsis thaliana (AT) is a well-studied plant in which we can find the rough function of most genes.The candidate gene we identified and their respective homologous genes are as follows.Glyma.16g128600,whose homologous gene in AT is named as AT5G66210.1,was found out to be related with the function of calcium-dependent protein kinase 28.Glyma.16g129700,AT4G36130.1 in AT, was related to the function of ribosomal protein L2 family [39].Glyma.16g133300 could be related to the function of SEC14-like 12 in AT [40].Glyma.16g131700,AT4G36250.1 in AT, could be related to the function of aldehyde dehydrogenase 3F1 [41].Glyma.16g131500,AT4G08850.1 in AT, was found possibly related to the function of leucine-rich, repeat receptor-like protein kinase family protein [42].Glyma.16g127200,AT4G36020.1 in AT, was associated with the function of cold shock domain protein 1, Glyma.16g127500,AT5G07090.1 in AT, with the function of ribosomal protein S4 (RPS4A) family protein, Glyma.16g129700,AT4G36130.1 in AT, with the function of Ribosomal protein L2 family, and Glyma.06g164300,AT5G61960.1 in AT, with the function of MEI2-like protein 1.Further details can be found in the study by [43].Glyma.16g127400(AT5G66200.1) is related to the function of Armadillo repeat only 2, and Glyma.16g130400(AT4G36180.1) is related to the function of leucine-rich receptor-like protein kinase family protein.Glyma.16g131800 (AT4G36360.1) is related to beta-galactosidase 3, and Glyma.06g158100(AT1G77580.2) is related to the function of a plant protein of unknown function (DUF869), which is mentioned in the study by [44].Glyma.16g129900(AT2G18040.1)relates to the function of peptidylprolyl cis/trans isomerase and NIMAinteracting 1, Glyma.16g131200 (AT4G36220.1) with the function ferulic acid 5-hydroxylase 1, and Glyma.06g155800(AT5G09260.1) with the vacuolar protein sorting-associated protein 20.2.The information regarding the mentioned functions can be found in the study referenced by [45].Glyma.16g129200(AT2G17990.1) was detected to be related to the function of calcium-dependent protein kinase 1 adaptor protein involved in vacuolar transport and lytic vacuole biogenesis [46].Glyma.06g155900(AT5G09250.1) was detected to be related to the function of ssDNA-binding transcriptional regulator [47].Glyma.06g156000(AT5G09230.7) was detected to be related to the function of Arabidopsis thaliana sirtuin 2 (SRT2) [48].Glyma.06g157400(AT1G01720.1) was detected to be related to the function of NAC (No Apical Meristem) domain transcriptional regulator superfamily protein [49].Glyma.06g156300(AT3G52430.1) was detected to be related to the function of alpha/beta-hydrolases superfamily protein [50].Glyma.06g156400(AT5G63860.1) was detected to be related to the function of the regulator of chromosome condensation (RCC1) family protein [51].Glyma.06g157800(AT1G07400.1) was detected to be related to the function of HSP20-like chaperones superfamily protein [52].Glyma.06g160100(AT1G10940.1) was detected to be related to the function of protein kinase superfamily protein [53].Glyma.06g162100(AT4G00650.1) was detected to be related to the function of FRIGIDA-like protein [54].Glyma.06g162300(AT5G47910.1) was detected to be related to the function of respiratory burst oxidase homologue D [55].Glyma.06g164600(AT5G27620.1) was detected to be related to the function of cyclin H;1 [56].

Plant Materials
An intraspecific F 2 population containing 186 individual plants was generated from CJC2 and YSX2 parent materials.Changjiang Chun 2 (CJC2) is a high-yielding, high-protein cultivar with a hundred-seed weight of around 25 g, which was released in Chongqing, China.Yushuxian (YSX2) is a regular vegetable soybean cultivar with a larger hundred-seed weight of about 30 g.The F 2 , F 2:3 , F 2:4 , and F 2:5 populations (21CQ, 22CQ, 22YN and 23CQ) were planted at 2021 summer in Chongqing, 2022 summer in Chongqing, 2022 winter in Yunnan, and 2023 summer in Chongqing, respectively, in China.F 2 population was sown by single plant.F 2:3 , F 2:4 , and F 2:5 families were sown in single row, with a row length of 1 m, row width of 0.5 m, plant spacing of 0.2 m, with 2 seedlings in each plot.And all populations were conducted with general field management.All the plants were harvested after maturity for further examination of seed size and quality traits.

DNA Extraction and SSR Genotyping
DNA extraction and SSR marker detection genomic DNA was extracted from young leaves collected from the F 2 population of 186 single plants, two parent plants, and F 1 plants [57].A total of 3780 SSR primer pairs were synthesized by Biotech Bioengineering Co., Ltd., (Shanghai, China) derived from the soybean database SoyBase (http://www.soybase.org/,accessed on 7 January 2023) [58].Some of these BARCSOYSSR primers were renamed as SWU in this study (as detailed in Supplementary Table S1).PCR amplification was performed as described by Zhang et al. [59].Primers with polymorphisms between the two mapping parents were used to genotype the single plants of the F 2 population.The band type identical to CJC2 was recorded as A, the band type identical to YSX2 was recorded as B, the heterozygous band type was recorded as H, and the deletion was recorded as U.The results were then gathered for further analysis.As a result, additive effects were defined for the CJC2 allele, which means positive genetic effects indicate that alleles of CJC2 increase phenotypic values.

Size Traits
The assessed seed size traits were hundred-seed weight (HSW), seed length (SL), seed width (SW), and seed length-to-width ratio (SLW).The indicators of HSW, SL, SW, and SLW were measured using an automatic seed testing system (SC-A1, Hangzhou Wanshen Detection Technology Co., Ltd., Hangzhou, China).Image Analysis Method was used for determining the Soybean seed traits.About 40 soybean seeds was spread on the white plate of a flatbed scanner (Eloam Technology Co., Ltd., Shenzhen, China).The scanner was set in inverse scanning and positive film mode, 24-bit color, and a dpi resolution of 300.The image was processed with SC-E software (V2.1.2.8 Hangzhou Wanshen Detection Technology Co., Ltd., Hangzhou, China).Firstly, the image was converted to a 24-bit grayscale image immediately after scanning and stored in PNG format automatically for further analysis.The image obtained was 3410 × 2400 pixels in size.Secondly, the background was subtracted to remove the effect of background texture, and any overlapped soybean seed were segmented [60].After that, seed parameters were extracted and stored, and the soybean seed were differently mapped.Finally, the SL, SW, SLW, and HSW of soybean were displayed based on the stored parameters.
FOSS NIRS DS2500 (Foss Analyical A/S, Hilleroed, Denmark) was used to determine OIL and PRO, from 400 to 2500 nm, in transmittance mode with a 1 mm pathlength.A reference scan was taken once in every 10 sample scans.To increase the signal-to-noise ratio, both reference and sample spectra were averaged from 32 scans.Samples were temperature equilibrated at 33 • C (approximately 3 min) in the instrument before scanning and for the rest.
GC methods was used to determine OA, LA, LNA, PA, and SA.Practically, we took 0.2 g of seeds, ground them, put them into 5 mL test tubes, added 2 mL of petroleum ether-ether (1:1) solution, shook them slightly, and left the mixture for 40 min.Then, we added 1 mL of potassium hydroxide-methanol (0.4 mol/L) solution and mixed it well, and the methyl esterification time was 30 min.Then, we added distilled water along the wall of the vials and left the mixture to stand for a while.After layering, 1 mL of the supernatant was aspirated into the autosampling vial.The chromatographic column was DB-WAX (30 mm × 0.246 mm × 0.25 µm), and the stationary phase was polyethylene glycol.The operating conditions of the chromatograph were as follows: the column temperature was 185 • C, the temperature of the vaporization chamber was 250 • C, the temperature of the detection chamber was 250 • C, the flow rate of the carrier gas (nitrogen) was 60 mL/min, the flow rate of the hydrogen was 40 mL/min, the flow rate of the air was 400 mL/min, the retention time of the peaks was 13 min, and the injection volume was 2 µL.The composition of the unknown samples was determined based on the retention time of the standard samples of fatty acid compositions of soybeans.The area normalization method was used to calculate the percentage content of the five fatty acid components.The measurements were repeated 3 times each, and the average value was taken as the final data.
The phenotypic data underwent statistical analysis using Excel 2019 for data manipulation and Origin 2019 for plotting.

Map Construction and QTL Detection
The marker linkage analysis was performed using the mapping software JoinMap 4.0, and the genetic linkage map was constructed with an LOD score of 4.0 and the converting method of the Kosambi mapping function [61].QTL localization for all traits was analyzed with a multiple QTL model (MQM) and MapQTL 6.0 software, and phenotypic data were analyzed using 1000 permutation tests with significance p = 0.05 and LOD = 3.0 as the threshold to determine the presence of QTLs.The QTL graphic representation of the linkage groups was created using MapChart 2.2 [62].
The qualified interval was then named as QTL.The QTLs were named with the letter "q", the trait name, the chromosome number and the sort number.For example, the first QTL we found at Chromosome 1 related to SL would be called as qSL01.1.

QTL Clusters Identification
A QTL cluster is a densely populated QTL region of the chromosome which contains multiple QTLs associated with various traits [63].All QTLs were sorted with the chromosome as the primary condition and the physical location as the secondary condition.QTLs with overlapping physical locations on the same chromosome were grouped into a cluster and identified as a QTL cluster if associated with at least two traits.The QTL clusters that we found were labeled with "Loci".For example, for the QTL cluster denoted as Loci01.1,Loci indicates a QTL cluster, 01 indicates the chromosome on which the QTL cluster detected, and 01.1 indicates the order of the QTL cluster identified on the chromosome.

Candidate Gene Prediction
The candidate genes were searched with SoyBase (http://www.soybase.org,accessed on 12 December 2023), on the interval of promising QTL clusters, which means the interval has more than one related traits, in other word, those QTL related with different traits.Moreover, only the intervals which are repeatedly mapped on more than one environment is filtrated.After all, the promising interval must meet two conditions: stability and effectiveness.
Once the concrete gene names on the promising interval were found out, the genes were then analyzed with GO (Gene Ontology) to reveal their rough function and their corresponding protein.Based on the current functional analysis, candidate genes were selected.
Author Contributions: J.Z. (Jian Zhang) designed and supervised the experiments and contributed to the final editing of the manuscript; W.G. and R.M. analyzed and summarized the data, generated the figures, and wrote the manuscript; X.L., A.J., J.L., P.T., G.X., C.D., J.Z. (Jijun Zhang), X.Z., X.F. and Z.Y. conducted field trials, phenotypic evaluation, and data collection.All authors have read and agreed to the published version of the manuscript.Institutional Review Board Statement: This study did not involve any endangered or protected species and followed all relevant ethical guidelines.The samples examined in this study were used as agricultural plants in China.
Informed Consent Statement: Not applicable.

Funding:
This study was supported by Chongqing Technology Innovation and Application Development Special Key Project (cstc2021jscx-gksbX0011); the collection, utilization and innovation of Germplasm Resources was supported by Research Institutes and Enterprises of Chongqing (cqnyncwkqlhtxm) and the National College Students Innovation and Entrepreneurship Training Program from the Ministry of Education (S202310635404).

Table 1 .
Characteristics of seed size trait in the F 2 population in four environments.

Table 2 .
Distribution of markers on chromosomes on a map developed from the F 2 population.

Table 2 .
Distribution of markers on chromosomes on a map developed from the F2 population.

Table 3 .
QTLs identified for seed size traits in four environments.

Table 3 .
QTLs identified for seed size traits in four environments.

Table 4 .
QTLs identified for seed quality traits in four environments.
b PVE phenotypic variance explained.

Table 5 .
QTL clusters associated with seed size traits and quality in soybean.

Table 6 .
Candidate genes for seed size and quality traits of soybean.