Genetic Basis and Simulated Breeding Strategies for Enhancing Soybean Seed Protein Content Across Multiple Environments

Sun, Xu; Hu, Bo; Li, Wen-Xia; Ning, Hai-Long

doi:10.3390/plants14142117

Open AccessArticle

Genetic Basis and Simulated Breeding Strategies for Enhancing Soybean Seed Protein Content Across Multiple Environments

¹

Key Laboratory of Soybean Biology, Ministry of Education, Northeast Agricultural University, Harbin 150038, China

²

Key Laboratory of Soybean Biology and Breeding/Genetics, Ministry of Agriculture, Northeast Agricultural University, Harbin 150038, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Plants 2025, 14(14), 2117; https://doi.org/10.3390/plants14142117

Submission received: 15 April 2025 / Revised: 4 June 2025 / Accepted: 26 June 2025 / Published: 9 July 2025

(This article belongs to the Special Issue Crop Genetics and Breeding)

Download

Browse Figures

Versions Notes

Abstract

Soybeans are a primary source of plant-based protein, with seeds containing approximately 40% protein—a key quality trait. Selecting superior hybrid combinations and managing progeny effectively are crucial for developing high-protein soybean varieties. Using a recombinant inbred line population (RIL3613) derived from Dongnong L13 × Heihe 36 and its previously constructed high-density genetic linkage map, QTLs and QTL × environment interactions (QEIs) associated with seed protein content (SPC) were identified through the bi-parental population (BIP) model and multi-environment trials (MET) model in QTL IciMapping v4.2. Candidate genes were then predicted via sequence alignment and haplotype analysis between the parents. Finally, simulated breeding was conducted using the B4L function in the In Silico Breeding (ISB) module of the Blib platform to determine optimal breeding strategies across diverse environments. The analysis identified 19 QTLs associated with SPC and 97 QEIs linked to SPC. These QTLs collectively explained 84.442% of the phenotypic variance, with four QTLs exhibiting significant contributions. A key candidate gene, Glyma.12G231400, associated with soybean SPC, was predicted within the 38,995,090–39,293,825 bp interval on chromosome 12. Across 11 environments, three to six optimal breeding schemes were selected, all employing modified pedigree selection. These findings enhance our understanding of the genetic basis of soybean protein formation and provide technological support for molecular breeding for seed quality improvement.

Keywords:

soybean; protein content; QTL; design breeding; candidate genes

1. Introduction

Soybean is one of the world’s most important crops and a primary source of protein and oil for humans [1,2]. China has a long history of soybean production. With the rapid development of modern soybean cultivation in Northeast China, soybeans have become a hallmark of the region’s agricultural economic history [3]. The genetic improvement of soybean seed protein is crucial in terms of meeting the demands of the growing global population [4]. A primary objective in soybean breeding is to enhance protein content [5].

Soybean protein content is a complex quantitative trait influenced by environmental conditions and controlled by multiple genes [6]. Most studies indicate that while seed composition is primarily governed by genetic factors, it is also affected by abiotic and biotic factors [7]. The protein and oil content in seeds of the same variety can vary across years or under different environmental conditions within the same year [8]. Molecular geneticists and breeders commonly use populations derived from biparental crosses to select new varieties and map quantitative trait loci (QTL) for target traits [9]. Recent research has focused on identifying QTLs for protein content and mining genes using linkage analysis and genome-wide association studies (GWAS) with high-density genetic maps. These approaches have been instrumental in elucidating the genetic architecture of soybean seed protein and facilitating variety improvement [10]. According to the SoyBase database, 241 QTLs influencing soybean SPC had been identified by 2018, with additional QTLs discovered in subsequent years. Karikari et al. [11] constructed a genetic map using 2267 bin markers, identifying 25 protein QTLs. By integrating these findings with transcriptomic data, they pinpointed four candidate genes associated with protein synthesis. Zhong et al. [12] used a high-density genetic map to evaluate QTLs for protein content, detecting 44 major and stable QTLs. Cunicelli et al. [13] analyzed 138 recombinant inbred lines across six environments, identifying 21 QTLs for traits including yield, protein, oil content, methionine, and threonine, with four linked to protein. Seo et al. [14] identified 12 seed protein content-related QTLs. Further investigation of candidate genes within major-effect QTLs could provide deeper insights into the genetic basis of SPC. Lee et al. [15] identified 192 co-linear protein QTLs, forming six hotspot regions, and detected eight genes that are highly expressed during seed maturation. Yang et al. [16] mapped protein content to a 15 kb interval using 195 chromosome segment substitution lines and, in conjunction with transcriptomic data, designated Glyma.15G049200 as a candidate gene. Salas et al. [17] mapped two stable protein QTLs, i.e., qPro-10-1 and qPro-14-1. Fliege et al. [9] performed fine mapping of the cq-Seed protein-003 QTL on chromosome 20, identifying Glyma.20G85100 as a gene related to soybean seed protein content.

Large-scale genomic sequencing, high-density linkage analysis, genome-wide association studies, and extensive functional genomics research have made designed breeding a tangible reality. Breeders are gradually moving away from traditional methods and adopting the “priori design” concept [18]. This shift is driven by the long breeding cycle of soybeans. Currently, parental line selection still heavily relies on breeders’ experience and intuition. Designed breeding effectively addresses these challenges. It aims to control all allelic variations of genes which are essential for agronomic traits. This control becomes possible through precise genetic maps, high-resolution chromosome single-nucleotide analysis, and extensive phenotypic evaluations [19].

Designed breeding has proven effective in plant breeding. Wei et al. [20] combined marker-assisted selection with multiple resistance screening. After several rounds of hybridization, they aggregated six target genes and developed a promising restorer line: Guihui5501. This line exhibited heavy grain, good quality, and tolerance to both biotic and abiotic stresses. To develop high oleic acid soybean varieties, Nan et al. [21] analyzed the FAD2-1A and FAD2-1B haplotypes—key factors in increasing oleic acid content—in 1250 soybean materials and developed two molecular markers. Using marker-assisted selection, they identified line 435, which had an oleic acid content of 91.03%. Line 435 was then used as the donor parent, with the superior soybean variety Hainong 51 serving as the recurrent parent. After three backcrosses, a single plant with high oleic acid content (75%) and high yield was obtained. These case studies highlight how parental line selection and breeding strategies determine the success of breeding objectives [22]. In recent years, methods such as BLUP and genomic selection (GS) have been used to estimate parental breeding potential and guide selection in crop improvement [23]. Additionally, parental selection can be based on predicted performance. Zhong et al. [24] proposed selecting inbred line parents based on the projected performance of the best offspring from a cross, termed “superior progeny value.” When designing breeding schemes, breeders must choose the optimal strategy from multiple options before initiating actual breeding. Computer simulations effectively compare multiple breeding methods and identify the most efficient scheme for generating the target genotype, thereby saving time, land, and labor costs. These simulations incorporate assumptions about population and quantitative genetics, influencing the final breeding plan [25]. Additionally, they generate extensive data that may be difficult to obtain through empirical experiments or theoretical models, helping validate proposed theories or models [26]. By leveraging parental molecular data and genomic prediction models, simulations can create segregating populations from virtual crosses, enabling the prediction of the most promising populations before conducting actual field crosses [27]. Bančič et al. [28] recently developed AlphaSimR, a software package that allows breeders to design and simulate breeding schemes independently. Zhang et al. [29] recently developed Blib, a multi-module simulation platform capable of handling more complex genetic effects and models than existing tools, making it suitable for modeling, simulating, and predicting genetic breeding processes in diploid species. Building on the Blib platform, Wang et al. [30] proposed a wheat breeding design method that integrates known QTL information with computer simulations. Potential crosses within a GWAS panel can be evaluated based on the relative frequency of target genotypes, trait correlations in simulated progeny, and genetic gains in selected progeny. By optimizing parental selection, progeny population size, and selection schemes, both yield and grain quality can be improved simultaneously. Applying this design method enables the identification of the most promising crosses and selection strategies before field trials, enhancing the predictability and efficiency of breeding programs.

Based on previous studies, research on identifying QTLs and potential candidate genes associated with protein content is extensive. However, the effects of different haplotypes of candidate genes on soybean SPC remain underexplored. Additionally, the use of genetic information to develop models and assess the breeding potential of soybean SPC is limited.

In our previous studies, a recombinant inbred line (RIL) population, RIL3613, was constructed and used to map QTLs for SPC primarily based on an SSR linkage map [31,32,33]. However, due to a lack of fine mapping, these QTLs could not be applied in molecular breeding. To identify optimal breeding strategies for SPC in RIL3613, the present study conducted a linkage analysis using a high-density SNP linkage map to map SPC-related QTLs and QEI across the whole genome. Key candidate genes were identified through parental sequence comparison and haplotype analysis. A genetic model was constructed using the ISB plant breeding simulation platform, incorporating QEI data. Breeding simulations were then conducted with the RIL3613 population as parental lines to determine optimal breeding strategies for diverse environments.

This study aims to provide a theoretical foundation and technical support for the genetic improvement of soybeans.

2. Results

2.1. Phenotypic Variation Analysis

The phenotypic data of 120 lines from the RIL3613 population across 22 environments were analyzed. A descriptive analysis showed that the absolute values of kurtosis and skewness were <1 in all environments except E09, E10, E17, E19, E20, and E21, indicating a normal distribution of protein content (Table 1, Figure S1). The protein content in RIL3613 spanned the parental range, suggesting transgressive segregation. The coefficient of variation ranged from 2.04% to 5.34%. Analysis of variance revealed highly significant effects of environment, genotype, and genotype × environment interaction, demonstrating that protein content is influenced by both genetic and environmental factors (Table 2). The broad-sense heritability was high (82.3%), indicating that genetic effects primarily drive variation in soybean protein content.

2.2. QTL Mapping for Candidate Gene Prediction

Using the bin map developed previously [34], the ICIM method in the BIP module identified 19 QTLs associated with protein content, distributed across 13 chromosomes, with each chromosome harboring one to four QTLs. These QTLs explained 0.35% to 15.83% of the phenotypic variance, collectively accounting for 84.442% of the total phenotypic variation (Figure 1, Table S1). Notably, four QTLs (qPR-1-1, qPR-1-3, qPR-12-1, and qPR-14-1) contributed over 10% to phenotypic variation and are considered major effective QTLs for soybean SPC (Table 3).

2.3. Candidate Gene Prediction

Potential candidate genes within these four QTL regions (qPR-1-1, qPR-1-3, qPR-12-1, and qPR-14-1) were identified, corresponding to genomic intervals of 38.76–41.33 Mb on Chr01, 42.34–43.01 Mb on Chr01, 39.00–39.29 Mb on Chr12, and 9.90–10.29 Mb on Chr14. A total of 147 genes were identified. Resequencing data from parental lines revealed 52 genes with non-synonymous mutations. Functional annotation of these 52 genes led to the identification of three candidate genes (see Table 4 and Table 5).

2.4. Haplotype Analysis and Validation

To further validate the function of the candidate genes, two haplotypes were defined for each of the three candidate genes based on missense mutations in the parental CDS. For a list of primers targeting these mutation sites, see Table S2. The corresponding fragments were then amplified via PCR and sequenced from the DNA of 92 individual lines in the RIL3613 population. Haplotype analysis of the candidate genes was performed using the sequencing results and SPC phenotypic data from the RIL3613 population. This analysis revealed a significant difference in the SPC trait between haplotypes only for the Glyma.12G231400 gene (Figure 2A). This suggests that HapII of Glyma.12G231400 may be a superior haplotype for enhancing soybean protein content. Consequently, Glyma.12G231400 was designated as the final candidate gene.

To assess the regulatory effect of the candidate gene on protein content across diverse genetic backgrounds, a haplotype analysis of Glyma.12G231400 was conducted using genotypic and phenotypic data from 2898 soybean germplasm resources in the SoyOmics database. The results showed that the HapⅠ haplotype was present in 1612 accessions, with SPC phenotypic data available for 330 accessions, whereas the HapII haplotype was found in 453 accessions, with SPC data available for 111 accessions. The SPC of HapII remained significantly higher than that of HapⅠ (Figure 2B), further confirming the regulatory role of Glyma.12G231400 in soybean SPC.

2.5. Haplotype Analysis and Validation

Using the BIP model in QTL IciMapping v4.2 [35], we identified 19 QTLs across 11 of the 22 analyzed environments (E08, E09, E11, E14, E15, E16, E17, E18, E19, E20, and E21). To apply QEI to simulation breeding, we employed the MET model to map QEIs for soybean SPC across these 11 environments in the RIL3613 population. A total of 97 QEIs associated with SPC, exhibiting additive-by-environment (A × E) effects, were detected across 20 chromosomes. LOD scores for these QEIs ranged from 2.5153 to 7.3106, with each QEI explaining 0.63% to 2.49% of the phenotypic variance (Figure 3 and Figure S2, Table S3). The phenotypic contribution of each QEI was relatively low, indicating that most were minor-effect QEIs. The phenotypic variance explained by environmental effects (PVE (A × E)) was also small, suggesting that these QEIs are relatively stable and suitable for breeding simulations.

2.6. Analysis of Simulated Breeding Results

Based on the additive effects of QEI across diverse environments, the synergistic allele among the two QEI alleles was designated as the superior allele (SA). The number of SA and genotypic values varied across environments for the same variety. Moreover, significant differences in SA numbers and genotypic values were observed among varieties within each environment (Table S4). Consequently, breeding simulations were conducted separately for each environment.

Using the acquired QEI data and genotypic information of individual lines in the RIL3613 population, the initial parental population and genetic models were constructed. Breeding schemes were simulated using the B4L function of ISB. The simulation results included the number of hybrid combinations retained after each selection round at the end of a breeding cycle, the parents of these hybrid combinations, the number of plants and families in each breeding scheme per generation, the genotypes and genotypic values of the output population, and the population mean of genotypic values. The number of hybrid combinations retained after a breeding cycle primarily depended on the selection method. Simulation results showed that the bulk method retained more hybrid combinations (e.g., F2 = 500, Figure 4). The number of plants and families per generation in each breeding scheme was influenced by both the selection method and the planting scale of the F2 generation. Under the modified pedigree method, the number of plants and families per generation increased with an increase in the F2 planting scale. In contrast, under the bulk method, these numbers remained unchanged despite variations in the F2 planting scale (e.g., single cross, Table S5). This suggests that while the bulk method offers a stable workload and lower economic costs, the pedigree method offers greater controllability over these factors. Under identical environmental conditions, the mean genotypic value of the offspring population selected by the pedigree method was slightly higher than that of the bulk method. However, the bulk method produced a wider dispersion of genotypic values. In all simulations, the mean genotypic values of the progeny populations exceeded those of the initial RIL3613 population (Figure 5), highlighting its strong breeding potential and suitability for developing high-protein soybean varieties (e.g., F2 = 500, Table S6).

2.7. Formulation of Breeding Program

The simulation results were significantly influenced by environmental factors, the impact of which on actual breeding cannot be ignored. Therefore, breeding schemes must be tailored to each environment. Target genotypes should be screened based on the genotypic values of the simulated output population. Additionally, the number of target genotypes obtained per simulation and the corresponding hybrid combinations must be recorded (Table S7). The number of target genotypes obtained per simulation ranges from 0 to 1064, while the number of hybrid combinations producing these genotypes ranges from 0 to 419. The choice of breeding scheme affects the acquisition of target genotypes. Thus, in our study, the top three breeding schemes yielding the highest number of target genotypes under each environmental condition were selected as the best for that environment (Table 6). The screening results indicate that all the best schemes employed the pedigree method for selection, with an F2 planting scale predominantly of 800, though some used 500. Different environmental conditions require distinct recurrent parents for optimal breeding outcomes, with varieties possessing a higher number of PSAs being more likely to serve as recurrent parents (for instance, HN2 in E01, HN12 in E05, HN87 in E06, and HN115 in E07). In E03 and E06, the number of SA in the offspring did not increase significantly compared to the initial population. However, the genotypic value did increase. This indicates that these schemes can aggregate SA with larger effect values. These findings suggest that when breeding resources are not a limiting factor, using the pedigree method and maximizing the F2 generation scale can enhance the acquisition of target genotypes.

3. Discussion

In this study, the RIL3613 population was used to identify QTLs and QEIs associated with soybean SPC. The BIP model identified 19 QTLs across 22 environments. Four QTLs had a phenotypic contribution exceeding 10%, classifying them as the major-effect QTL. Using the SoyBase database, these 19 QTLs were compared with 241 previously mapped seed protein-related QTLs. Eight QTLs overlapped with or were included in prior findings [36,37,38,39,40,41,42], while the remaining 11 were identified as novel, validating the reliability of the QEI mapping results. Through parental sequence comparisons and haplotype analysis, a key gene, Glyma.12G231400, associated with soybean protein content, was predicted within the 38,995,090–39,293,825 bp region on chromosome 12. This gene is annotated as BEH4 (BES1/BZR1 homolog 4), a homolog of the BHLH transcription factors BRASSINOSTEROID INSENSITIVE 1 (BES1) and BRASSINAZOLE RESISTANT 1 (BZR1), which are critical in brassinosteroid (BR) signaling. BRs are common plant hormones, and previous studies indicated that BEH1 and BEH2 are regulated by brassinolide (BL) in Arabidopsis [43]. BL, the most prevalent BR, has been shown to increase SPC in common beans [44,45]. The overexpression of BEH4 in tomatoes enhances the expression of genes involved in nitrogen uptake and assimilation [44]. Therefore, BEH4 is likely to promote soybean SPC synthesis and accumulation by modulating BL and nitrogen absorption.

Breeders are increasingly leveraging the expanding wealth of published gene and QTL data, along with the widespread adoption of marker-assisted selection, to accelerate crop improvement. While most QTL mapping efforts have focused on single-environment QTL detection, multi-environment trial (MET) QTL mapping and the detection and modeling of QEIs have received less attention [46]. QEIs can be studied when genetic populations are grown across multiple locations or years, providing invaluable insights for both breeders and geneticists. Based on QTL mapping results, breeders can design optimal genotypes with favorable alleles and implement marker-assisted selection more effectively. Stable QTLs for agronomic traits are applicable across diverse environments, whereas environment-specific QTLs are useful for targeted environments [47].

In this study, 97 QEIs were identified using the ICIM method within the MET module. These QEIs were compared with 241 previously mapped seed protein-related QTLs in the SoyBase database. Forty QEIs overlapped with or were included in prior research findings [36,37,39,40,42,48,49,50,51,52,53,54,55,56,57,58], validating the reliability of the QEI mapping results (Figure S2).

Additionally, ISB, an application module within Blib, was used to simulate pure-line variety development in plants. Key elements for these simulations included environmental and breeding target trait genetic models, parental populations, and breeding methods [59]. Genetic models were primarily constructed based on previous genetic studies. The QEIs identified in this study are minor in effect, stable, and widely distributed across the soybean genome, with reliable localization results, making them suitable for genetic model construction. The SPC of the RIL3613 population used in this study exhibited a normal or near-normal distribution across all 11 simulated environments, with transgressive segregation observed within the population, indicating its suitability for breeding based on soybean SPC traits. To maximize the breeding potential of the RIL3613 population, single cross, backcross, pedigree, and bulk selection methods were applied simultaneously, and a half-diallel cross design was used to simulate all possible cross combinations within the population. Consequently, the breeding simulation results and the proposed optimal breeding strategies in this study are considered reliable.

Environmental factors can significantly influence breeding outcomes, making genotype selection for local conditions essential to enhancing soybean protein content [60,61]. Therefore, a key objective for breeders is to develop genotypes suited to a specific set of environments, termed the “Target Population of Environments” (TPE), which includes a defined range of farms and expected growing seasons [62]. This study designed optimal breeding schemes for individual environments within the TPE, ensuring that each scheme produced target genotypes suited to its respective conditions. However, the study characterized environments solely based on heritability. To improve breeding strategies and variety recommendations, future research should focus on precisely describing climatic stress patterns that may influence environments [63]. Analyzing 25 years of data across 35 regions, Beillouin et al. used historical yield records and weather databases to identify four climatic factor combinations affecting barley crops in the French barley belt, with important implications for local genotype adaptation strategies [64]. Similarly, Heinemann et al. integrated a generalized additive model (GAM), environmental covariates (ECs), and grain yield (GY) data from 18 years of historical breeding trials to develop an “environmental forecasting” approach. This approach predicts the optimal EC thresholds for each production scenario (four regions, three seasons, and two grain types) and their respective contributions to GY adaptation, revealing strong interactions between developmental stages, seasons, and regions due to the nonlinear effects of air temperature, solar radiation, and rainfall [65]. Similarly, precise environmental characterization can similarly enhance breeding simulations, leading to more reliable breeding schemes based on simulation results.

The breeding strategy developed in this study could be applied to various environments. This would involve evaluating parental populations in the target environment, performing QTL mapping, and using QTL and population data for breeding simulations. The simulation results could then guide the design of breeding strategies. This study offers new insights for designing soybean breeding programs.

4. Materials and Methods

4.1. Plant Populations

Two soybean varieties with large differences in seed protein content, i.e., DongnongL13 (45.50%), derived from a cross between Heinong40 and Jiujiao5640, and Heihe36 (39.80%), derived from a cross between Bei 89-7 and Jiusan90-66, were used as parents for crossbreeding in 2008 in Harbin, Heilongjiang Province (E 126.63°, N 45.75°). The F1 generation was planted in Acheng City, Hainan Province (E 109.00°, N 17.50°) in the winter of the same year. After five consecutive generations (2010–2014) of alternate planting in NEAU and Acheng, a population of 120 recombinant inbred lines (RILs) was obtained and used for linkage analysis [66].

4.2. Field Trials and Phenotypic Measurement

In this study, the RIL3613 population was grown under 22 different environmental conditions. Field experiments followed a randomized block design with three replications. Each plot consisted of a single row, measuring 5 m in length and 0.67 m in width. Details of sowing dates, planting density, and fertilization rates are provided in Table 7. All other field management practices adhered to local soybean production standards. At maturity, 10 uniformly grown plants from the middle of each row were manually harvested. Once the threshed soybean seeds reached ~13% moisture content (seed dry weight), soybean SPC was measured using a near-infrared grain analyzer (FOSS Infratec 1241, Denmark). The average of three replicates per sample was used as phenotypic data for subsequent analyses (Table 7).

4.3. Statistical Analysis of Phenotype Data

Frequency distribution histograms were generated using the average protein content phenotypic values from three replicates per environment, followed by descriptive statistics being used to calculate the mean, standard deviation, minimum, maximum, skewness, kurtosis, and coefficient of variation. An analysis of variance was conducted on repeated phenotypic values of three proteins across multiple environments, and broad-sense heritability was estimated. The statistical model for multi-environment variance analysis is as follows:

xij = μ + Gi + Ej + GEij + εij

where μ represents the overall population mean, G_i signifies the effect of the ith genotype, Ej denotes the effect of the jth environment, GEij represents the genotype × environment interaction effect, and εij is the error effect, following a distribution of N(0, σ²).

The broad-sense heritability in multiple environments was calculated via the following formula:

h^{2} = \frac{σ_{G}^{2}}{σ_{G}^{2} + \frac{σ_{GE}^{2}}{e} + \frac{σ^{2}}{er}}

where h² is the broad-sense heritability of genotype × environment interaction,

σ_{G}^{2}

is the genotypic variance,

σ_{GE}^{2}

is the variance due to genotype × environment interaction, σ² is the error variance, e is the number of environments, r is the number of replicates within each environment. The data were analyzed using the Proc Mixed procedure in SAS software (SAS 9.4M9 Institute, Cary, NC, USA).

4.4. SNP Genotyping and Genetic Map Construction

In a previous study, all RIL3613 lines were genotyped using the SoySNP660K BeadChip, yielding 108,342 SNPs makers across 20 chromosomes. A bin-based linkage map was then constructed using QTL IciMapping v4.2. The total map length was 2969.84 cM, with an average marker spacing of 1.33 cM. The average length of the RIL3613 linkage map was 148.50 cM, with an average of 111.25 markers [34].

4.5. QTL Localization

This study employed the BIP and MET models from QTL IciMapping v4.2 for a multi-environment joint analysis. Inclusive composite interval mapping (ICIM-ADD) was used to detect additive effects of SPC. The scan step was set to 1.00 cM, the logarithm of odds (LOD) threshold to 2.50, and the p-value for entering variables (PIN) to 0.001. QTLs were named following the method of McCouch [67]. QTLs identified by the BIP model were used to predict candidate genes, while QEIs identified by the MET model were used for breeding simulation.

4.6. Candidate Gene Prediction

For the QTLs identified by the BIP model, candidate genes were screened within QTL intervals where phenotypic contributions exceeded 10%, using the Phytozome database. A sequence comparison of all genes between parental lines was conducted based on the resequencing data of the RIL3613 population, retaining genes with nonsynonymous mutations. Subsequently, genes potentially associated with soybean SPC were then selected for haplotype analysis to determine the final candidate genes.

4.7. Haplotype Analysis of Candidate Genes

Based on the CDS missense mutation sites of the candidate gene, two haplotypes, i.e., HapI and HapII, were identified, and primers were designed. Leaf DNA was extracted from homozygous lines within the RIL3613 population, and the target fragment was cloned and sequenced. Subsequently, correlation analysis was then performed using phenotypic and sequencing data from the test samples. Furthermore, haplotype analysis was conducted on the candidate gene with superior haplotypes using genotypic data from 2898 soybean germplasm resources in the SoyOmics database and the corresponding phenotypic data, thereby validating haplotype function. Haplotype typing of the germplasm resources was performed using VCFtools v1.16 in the Linux system terminal.

4.8. Breeding Simulations Based on the Blib Platform

A half-diallel cross was designed, where 120 parental lines were paired, yielding 7140 hybrid combinations. Hybridization methods included single crosses and backcrosses, while selection methods involved either pedigree or bulk selection. Six breeding schemes were developed based on different hybridization and selection methods: PedSC, BlkSC, PedBC1P1, PedBC1P2, BlkBC1P1, and BlkBC1P2 (Figure 6). The initial parental population and genetic model were constructed using QEI information and the genotypes of various RIL lines. The ISB’s B4L functionality was employed to simulate these breeding schemes. A virtual QEI, whose genotype value equalled the total effect of all remaining quantitative trait loci associated with protein content, was incorporated into the simulation to enhance accuracy in predicting soybean protein levels. The number of F2 generation planting lines significantly influenced breeding outcomes; therefore, the F2 planting scale was set at 200, 500, or 800 plants per line. Simulations for all six breeding schemes were conducted across multiple environments to assess differences among progeny populations under varying conditions. The design of breeding schemes required specifying a target genotype. Given the large number of identified QEIs, the breeding target was adjusted to a target genotypic value. The breeding objective was defined as the highest genotypic value within the initial population across different environments. Virtual progeny with genotypic values meeting or exceeding this target were considered the target genotype.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/plants14142117/s1, Figure S1: Frequency histogram of protein content of RIL 3613 in 22 environments; Figure S2: Distribution of QTL of RIL3613 protein content on genome map. The blue font indicates the protein content QTL located in previous studies. Table S1: Details of 19 QTLS for RIL 3613 linkage analysis; Table S2: Haplotype sequencing primers; Table S3: Details of 97 QTLS for RIL 3613 linkage analysis; Table S4: The number of advantageous alleles and genotype values across 120 lines in 11 environments; Table S5: The number of families and plants selected before and after each breeding generation; Table S6: The population means, genotype value variance, range and broad-sense heritability of the progeny population and the initial population. Table S7: Information on target genotypes obtained from various breeding schemes.

Author Contributions

Conceptualization, H.-L.N. and W.-X.L.; formal analysis, X.S., H.-L.N. and W.-X.L.; investigation, X.S. and B.H.; resources, H.-L.N. and W.-X.L.; data curation, X.S. and B.H.; writing—original draft preparation, X.S.; writing—review and editing, H.-L.N., W.-X.L. and B.H.; visualization, X.S. and B.H.; project administration, W.-X.L.; funding acquisition, H.-L.N. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Opening Competition Mechanism to Select the Best Candidates Project of Science and Technology Department of Heilongjiang Province (2023ZXJ02B02) to H.-L.N.

Data Availability Statement

The original contributions presented in the study are included in the article/Supplementary Materials, further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

BIP	Bi-parental populations
MET	Multi-environment trials
QTL	Quantitative trait locus
QEI	QTL × environment interactions
GWAS	Genome-wide association studies
LOD	Logarithm of odd
CDS	Coding sequence
ICIM	Inclusive composite interval mapping

References

Feng, Y.-Y.; He, J.; Jin, Y.; Li, F.-M. High Phosphorus Acquisition and Allocation Strategy Is Associated with Soybean Seed Yield under Water- and P-Limited Conditions. Agronomy 2021, 11, 574. [Google Scholar] [CrossRef]
Saenz, E.; Borrás, L.; Panelo, J.S.; Poeta, F.B.; Rotundo, J.L. Yield Trade-off and the Role of Parental Selection Based on Seed Size When Breeding for Soybean Seed Protein. Plant Breed. 2023, 142, 54–65. [Google Scholar] [CrossRef]
Wang, T.; Ma, Y.; Luo, S. Spatiotemporal Evolution and Influencing Factors of Soybean Production in Heilongjiang Province, China. Land 2023, 12, 2090. [Google Scholar] [CrossRef]
Liu, S.; Liu, Z.; Hou, X.; Li, X. Genetic Mapping and Functional Genomics of Soybean Seed Protein. Mol. Breed. 2023, 43, 29. [Google Scholar] [CrossRef]
Zhang, K.; Liu, S.; Li, W.; Liu, S.; Li, X.; Fang, Y.; Zhang, J.; Wang, Y.; Xu, S.; Zhang, J.; et al. Identification of QTNs Controlling Seed Protein Content in Soybean Using Multi-Locus Genome-Wide Association Studies. Front. Plant Sci. 2018, 9, 1690. [Google Scholar] [CrossRef]
Grant, D.; Nelson, R.T.; Cannon, S.B.; Shoemaker, R.C. SoyBase, the USDA-ARS Soybean Genetics and Genomics Database. Nucleic Acids Res. 2010, 38, D843–D846. [Google Scholar] [CrossRef]
Rodrigues, J.I.D.S.; Arruda, K.M.A.; Cruz, C.D.; Piovesan, N.D.; Barros, E.G.D.; Moreira, M.A. Biometric Analysis of Protein and Oil Contents of Soybean Genotypes in Different Environments. Pesqui. Agropecuária Bras. 2014, 49, 475–482. [Google Scholar] [CrossRef]
Helms, T.C.; Hurburgh, C.R.; Lussenden, R.L.; Whited, D.A. Economic Analysis of Increased Protein and Decreased Yield Due to Delayed Planting of Soybean. J. Prod. Agric. 1990, 3, 367–371. [Google Scholar] [CrossRef]
Fliege, C.E.; Ward, R.A.; Vogel, P.; Nguyen, H.; Quach, T.; Guo, M.; Viana, J.P.G.; Dos Santos, L.B.; Specht, J.E.; Clemente, T.E.; et al. Fine Mapping and Cloning of the Major Seed Protein Quantitative Trait Loci on Soybean Chromosome 20. Plant J. 2022, 110, 114–128. [Google Scholar] [CrossRef]
Wang, J.; Hu, B.; Jing, Y.; Hu, X.; Guo, Y.; Chen, J.; Liu, Y.; Hao, J.; Li, W.-X.; Ning, H. Detecting QTL and Candidate Genes for Plant Height in Soybean via Linkage Analysis and GWAS. Front. Plant Sci. 2022, 12, 803820. [Google Scholar] [CrossRef]
Karikari, B.; Li, S.; Bhat, J.A.; Cao, Y.; Kong, J.; Yang, J.; Gai, J.; Zhao, T. Genome-Wide Detection of Major and Epistatic Effect QTLs for Seed Protein and Oil Content in Soybean Under Multiple Environments Using High-Density Bin Map. Int. J. Mol. Sci. 2019, 20, 979. [Google Scholar] [CrossRef] [PubMed]
Zhong, Y.; Wen, K.; Li, X.; Wang, S.; Li, S.; Zeng, Y.; Cheng, Y.; Ma, Q.; Nian, H. Identification and Mapping of QTLs for Sulfur-Containing Amino Acids in Soybean (Glycine max L.). J. Agric. Food Chem. 2023, 71, 398–410. [Google Scholar] [CrossRef] [PubMed]
Cunicelli, M.; Olukolu, B.A.; Sams, C.; Schneider, L.; West, D.; Pantalone, V. Mapping and Identification of QTL in 5601T × U99-310255 RIL Population Using SNP Genotyping: Soybean Seed Quality Traits. Mol. Biol. Rep. 2022, 49, 6623–6632. [Google Scholar] [CrossRef] [PubMed]
Seo, J.; Kim, K.; Ko, J.; Choi, M.; Kang, B.; Kwon, S.; Jun, T. Quantitative Trait Locus Analysis for Soybean (Glycine max) Seed Protein and Oil Concentrations Using Selected Breeding Populations. Plant Breed. 2019, 138, 95–104. [Google Scholar] [CrossRef]
Lee, S.J.; Ban, S.H.; Kim, G.H.; Kwon, S.I.; Kim, J.H.; Choi, C. Identification of Potential Gene-associated Major Traits Using GBS—GWAS for Korean Apple Germplasm Collections. Plant Breed. 2017, 136, 977–986. [Google Scholar] [CrossRef]
Yang, H.; Wang, W.; He, Q.; Xiang, S.; Tian, D.; Zhao, T.; Gai, J. Identifying a Wild Allele Conferring Small Seed Size, High Protein Content and Low Oil Content Using Chromosome Segment Substitution Lines in Soybean. Theor. Appl. Genet. 2019, 132, 2793–2807. [Google Scholar] [CrossRef]
Salas, P.; Oyarzo-Llaipen, J.C.; Wang, D.; Chase, K.; Mansur, L. Genetic Mapping of Seed Shape in Three Populations of Recombinant Inbred Lines of Soybean (Glycine max L. Merr.). Theor. Appl. Genet. 2006, 113, 1459–1466. [Google Scholar] [CrossRef]
The Grand Challenge of Breeding by Design. Nat. Plants 2022, 8, 451–452. [CrossRef]
Peleman, J.D.; Van Der Voort, J.R. Breeding by Design. Trends Plant Sci. 2003, 8, 330–334. [Google Scholar] [CrossRef]
Wei, M.; Yan, Q.; Huang, D.; Ma, Z.; Chen, S.; Yin, X.; Liu, C.; Qin, Y.; Zhou, X.; Wu, Z.; et al. Integration of Molecular Breeding and Multi-Resistance Screening for Developing a Promising Restorer Line Guihui5501 with Heavy Grain, Good Grain Quality, and Endurance to Biotic and Abiotic Stresses. Front. Plant Sci. 2024, 15, 1390603. [Google Scholar] [CrossRef]
Nan, H.; Lu, S.; Fang, C.; Hou, Z.; Yang, C.; Zhang, Q.; Liu, B.; Kong, F. Molecular Breeding of a High Oleic Acid Soybean Line by Integrating Natural Variations. Mol. Breed. 2020, 40, 87. [Google Scholar] [CrossRef]
Witcombe, J.R.; Gyawali, S.; Subedi, M.; Virk, D.S.; Joshi, K.D. Plant Breeding Can Be Made More Efficient by Having Fewer, Better Crosses. BMC Plant Biol. 2013, 13, 22. [Google Scholar] [CrossRef] [PubMed]
Sun, X.; Peng, T.; Mumm, R.H. The Role and Basics of Computer Simulation in Support of Critical Decisions in Plant Breeding. Mol. Breed. 2011, 28, 421–436. [Google Scholar] [CrossRef]
Zhong, S.; Jannink, J.-L. Using Quantitative Trait Loci Results to Discriminate Among Crosses on the Basis of Their Progeny Mean and Variance. Genetics 2007, 177, 567–576. [Google Scholar] [CrossRef]
Martin, N.G. Proceedings of the Second International Conference on Quantitative Genetics, Edited by B.S. Weir, E.J. Eisen, M.M. Goodman, and G. Namkoong, Sunderland, MA: Sinauer Associates Inc., 1988, Xii + 724 Pages, $60.00 (Cloth), $38.50 (Paper). Genet. Epidemiol. 1989, 6, 389–390. [Google Scholar] [CrossRef]
Wang, J.; Crossa, J.; Gai, J. Quantitative Genetic Studies with Applications in Plant Breeding in the Omics Era. Crop J. 2020, 8, 683–687. [Google Scholar] [CrossRef]
Yao, J.; Zhao, D.; Chen, X.; Zhang, Y.; Wang, J. Use of Genomic Selection and Breeding Simulation in Cross Prediction for Improvement of Yield and Quality in Wheat (Triticum aestivum L.). Crop J. 2018, 6, 353–365. [Google Scholar] [CrossRef]
Bančič, J.; Greenspoon, P.; Gaynor, R.C.; Gorjanc, G. Plant Breeding Simulations with AlphaSimR. Crop Sci. 2024, 65, e21312. [Google Scholar] [CrossRef]
Zhang, L.; Li, H.; Wang, J. Blib Is a Multi-Module Simulation Platform for Genetics Studies and Intelligent Breeding. Commun. Biol. 2022, 5, 1167. [Google Scholar] [CrossRef]
Wang, X.; Mao, W.; Wang, Y.; Lou, H.; Guan, P.; Chen, Y.; Peng, H.; Wang, J. Breeding Design in Wheat by Combining the QTL Information in a GWAS Panel with a General Genetic Map and Computer Simulation. Crop J. 2023, 11, 1816–1827. [Google Scholar] [CrossRef]
Su, D.; Jiang, S.; Wang, J.; Yang, C.; Li, W.; Li, W.-X.; Ning, H. Identification of Major QTLs Associated with Agronomical Traits and Candidate Gene Mining in Soybean. Biotechnol. Biotechnol. Equip. 2019, 33, 1481–1493. [Google Scholar] [CrossRef]
Wang, Y.; Dong, Q.; Fang, Y.; Qi, Z.; Tian, X.; Song, J.; Wang, J.; Li, X.; Li, W.-X.; Ning, H. Identification of Quantitative Trait Loci for Seed Protein and Oil Contents in Soybean and Analysis for Epistatic and QTL × Environment Effects in Multiple Environments. Int. J. Agric. Biol. 2020, 24, 493–504. [Google Scholar]
Kaleri, A.A.; Li, L.; Zhang, Y.; Liu, W.; Jiang, C.; Zhang, Y.; Liu, C.; Kaleri, A.H.; Nizamani, M.M.; Mehmood, A.; et al. Recognition of QTL for Seed Protein and Oil Content in Two Soybean Recombinat Inbred Lines Populations. J. Anim. Plant Sci. 2021, 31, 1669–1685. [Google Scholar] [CrossRef]
Wang, J.; Hu, B.; Huang, S.; Hu, X.; Siyal, M.; Yang, C.; Zhao, H.; Yang, T.; Li, H.; Hou, Y.; et al. SNP-Bin Linkage Analysis and Genome-Wide Association Study of Plant Height in Soybean. Crop Pasture Sci. 2022, 73, 222–237. [Google Scholar] [CrossRef]
Li, H.; Ribaut, J.-M.; Li, Z.; Wang, J. Inclusive Composite Interval Mapping (ICIM) for Digenic Epistasis of Quantitative Traits in Biparental Populations. Theor. Appl. Genet. 2008, 116, 243–260. [Google Scholar] [CrossRef]
Mao, T.; Jiang, Z.; Han, Y.; Teng, W.; Zhao, X.; Li, W. Identification of Quantitative Trait Loci Underlying Seed Protein and Oil Contents of Soybean across Multi-genetic Backgrounds and Environments. Plant Breed. 2013, 132, 630–641. [Google Scholar] [CrossRef]
Kabelka, E.A.; Diers, B.W.; Fehr, W.R.; LeRoy, A.R.; Baianu, I.C.; You, T.; Neece, D.J.; Nelson, R.L. Putative Alleles for Increased Yield from Soybean Plant Introductions. Crop Sci. 2004, 44, 784–791. [Google Scholar] [CrossRef]
Vollmann, J.; Schausberger, H.; Bistrich, H.; Lelley, T. The Presence or Absence of the Soybean Kunitz Trypsin Inhibitor as a Quantitative Trait Locus for Seed Protein Content. Plant Breed. 2002, 121, 272–274. [Google Scholar] [CrossRef]
Eskandari, M.; Cober, E.R.; Rajcan, I. Genetic Control of Soybean Seed Oil: II. QTL and Genes That Increase Oil Concentration without Decreasing Protein or with Increased Seed Yield. Theor. Appl. Genet. 2013, 126, 1677–1687. [Google Scholar] [CrossRef]
Chen, Q.; Zhang, Z.; Liu, C.; Xin, D.; Qiu, H.; Shan, D.; Shan, C.; Hu, G. QTL Analysis of Major Agronomic Traits in Soybean. Agric. Sci. China 2007, 6, 399–405. [Google Scholar] [CrossRef]
Hyten, D.L.; Pantalone, V.R.; Sams, C.E.; Saxton, A.M.; Landau-Ellis, D.; Stefaniak, T.R.; Schmidt, M.E. Seed Quality QTL in a Prominent Soybean Population. Theor. Appl. Genet. 2004, 109, 552–561. [Google Scholar] [CrossRef]
Tajuddin, T.; Watanabe, S.; Yamanaka, N.; Harada, K. Analysis of Quantitative Trait Loci for Protein and Lipid Contents in Soybean Seeds Using Recombinant Inbred Lines. Breed. Sci. 2003, 53, 133–140. [Google Scholar] [CrossRef]
Otani, Y.; Tomonaga, Y.; Tokushige, K.; Kamimura, M.; Sasaki, A.; Nakamura, Y.; Nakamura, T.; Matsuo, T.; Okamoto, S. Expression Profiles of Four BES1/BZR1 Homologous Genes Encoding bHLH Transcription Factors in Arabidopsis. J. Pestic. Sci. 2020, 45, 95–104. [Google Scholar] [CrossRef]
Galstyan, A.; Nemhauser, J.L. Auxin Promotion of Seedling Growth via ARF5 Is Dependent on the Brassinosteroid-regulated Transcription Factors BES1 and BEH4. Plant Direct 2019, 3, e00166. [Google Scholar] [CrossRef] [PubMed]
Mohammadi, M.; Pouryousef, M.; Tavakoli, A.; Fard, E.M. Improvement in Photosynthesis, Seed Yield and Protein Content of Common Bean (Phaseolus Vulgaris) by Foliar Application of 24-Epibrassinolide under Drought Stress. Crop Pasture Sci. 2019, 70, 535. [Google Scholar] [CrossRef]
Li, W.; Boer, M.P.; Joosen, R.V.L.; Zheng, C.; Percival-Alwyn, L.; Cockram, J.; Van Eeuwijk, F.A. Modeling QTL-by-Environment Interactions for Multi-Parent Populations. Front. Plant Sci. 2024, 15, 1410851. [Google Scholar] [CrossRef]
Li, S.; Wang, J.; Zhang, L. Inclusive Composite Interval Mapping of QTL by Environment Interactions in Biparental Populations. PLoS ONE 2015, 10, e0132414. [Google Scholar] [CrossRef]
Qi, Z.; Hou, M.; Han, X.; Liu, C.; Jiang, H.; Xin, D.; Hu, G.; Chen, Q. Identification of quantitative trait loci (QTL s) for seed protein concentration in soybean and analysis for additive effects and epistatic effects of QTL s under multiple environments. Plant Breed. 2014, 133, 499–507. [Google Scholar] [CrossRef]
Specht, J.E.; Chase, K.; Macrander, M.; Graef, G.L.; Chung, J.; Markwell, J.P.; Germann, M.; Orf, J.H.; Lark, K.G. Soybean response to water: A QTL analysis of drought tolerance. Crop Sci. 2001, 41, 493–509. [Google Scholar] [CrossRef]
Mansur, L.M.; Orf, J.H.; Chase, K.; Jarvik, T.; Cregan, P.B.; Lark, K.G. Genetic Mapping of Agronomic Traits Using Recombinant Inbred Lines of Soybean. Crop Sci. 1996, 36, 1327–1336. [Google Scholar] [CrossRef]
Lu, W.; Wen, Z.; Li, H.; Yuan, D.; Li, J.; Zhang, H.; Huang, Z.; Cui, S.; Du, W. Identification of the quantitative trait loci (QTL) underlying water soluble protein content in soybean. Theor. Appl. Genet. 2013, 126, 425–433. [Google Scholar] [CrossRef] [PubMed]
Csanádi, G.; Vollmann, J.; Stift, G.; Lelley, T. Seed quality QTLs identified in a molecular map of early maturing soybean. Theor. Appl. Genet. 2001, 103, 912–919. [Google Scholar] [CrossRef]
Jun, T.-H.; Van, K.; Kim, M.Y.; Lee, S.-H.; Walker, D.R. Association analysis using SSR markers to find QTL for seed protein content in soybean. Euphytica 2008, 162, 179–191. [Google Scholar] [CrossRef]
Diers, B.W.; Keim, P.; Fehr, W.R.; Shoemaker, R.C. RFLP analysis of soybean seed protein and oil content. Theor. Appl. Genet. 1992, 83, 608–612. [Google Scholar] [CrossRef]
Lee, S.H.; Bailey, M.A.; Mian, M.A.R.; Carter, T.E.; Shipe, E.R.; Ashley, D.A.; Parrott, W.A.; Hussey, R.S.; Boerma, H.R. RFLP loci associated with soybean seed protein and oil content across populations and locations. Theor. Appl. Genet. 1996, 93, 649–657. [Google Scholar] [CrossRef]
Wang, X.; Jiang, G.-L.; Green, M.; Scott, R.A.; Song, Q.; Hyten, D.L.; Cregan, P.B. Identification and validation of quantitative trait loci for seed yield, oil and protein contents in two recombinant inbred line populations of soybean. Mol. Genet. Genom. 2014, 289, 935–949. [Google Scholar] [CrossRef]
Panthee, D.R.; Pantalone, V.R.; West, D.R.; Saxton, A.M.; Sams, C.E. Quantitative Trait Loci for Seed Protein and Oil Concentration, and Seed Size in Soybean. Crop Sci. 2005, 45, 2015–2022. [Google Scholar] [CrossRef]
Chung, J.; Babka, H.L.; Graef, G.L.; Staswick, P.E.; Lee, D.J.; Cregan, P.B.; Shoemaker, R.C.; Specht, J.E. The Seed Protein, Oil, and Yield QTL on Soybean Linkage Group I. Crop Sci. 2003, 43, 1053–1067. [Google Scholar] [CrossRef]
Li, H.; Zhang, L.; Gao, S.; Wang, J. Prediction by Simulation in Plant Breeding. Crop J. 2025, 13, 501–509. [Google Scholar] [CrossRef]
Mickelbart, M.V.; Hasegawa, P.M.; Bailey-Serres, J. Genetic Mechanisms of Abiotic Stress Tolerance That Translate to Crop Yield Stability. Nat. Rev. Genet. 2015, 16, 237–251. [Google Scholar] [CrossRef]
Murphy, K.M.; Campbell, K.G.; Lyon, S.R.; Jones, S.S. Evidence of Varietal Adaptation to Organic Farming Systems. Field Crops Res. 2007, 102, 172–177. [Google Scholar] [CrossRef]
Cooper, M.; Stucker, R.E.; DeLacy, I.H.; Harch, B.D. Wheat Breeding Nurseries, Target Environments, and Indirect Selection for Grain Yield. Crop Sci. 1997, 37, 1168–1176. [Google Scholar] [CrossRef]
Podlich, D.W.; Cooper, M.; Basford, K.E.; Geiger, H.H. Computer Simulation of a Selection Strategy to Accommodate Genotype Environment Interactions in a Wheat Recurrent Selection Programme. Plant Breed. 1999, 118, 17–28. [Google Scholar] [CrossRef]
Beillouin, D.; Jeuffroy, M.-H.; Gauffreteau, A. Characterization of Spatial and Temporal Combinations of Climatic Factors Affecting Yields: An Empirical Model Applied to the French Barley Belt. Agric. For. Meteorol. 2018, 262, 402–411. [Google Scholar] [CrossRef]
Heinemann, A.B. Enviromic Prediction Is Useful to Define the Limits of Climate Adaptation: A Case Study of Common Bean in Brazil. Field Crops Res. 2022, 286, 108628. [Google Scholar] [CrossRef]
Ning, H.; Yuan, J.; Dong, Q.; Li, W.; Xue, H.; Wang, Y.; Tian, Y.; Li, W.-X. Identification of QTLs Related to the Vertical Distribution and Seed-Set of Pod Number in Soybean [Glycine max (L.) Merri]. PLoS ONE 2018, 13, e0195830. [Google Scholar] [CrossRef] [PubMed]
McCouch, S. Report on QTL Nomenclature. Rice Genet. Newsl. 1997, 14, 11–13. [Google Scholar]

Figure 1. Distribution of QTLs for protein content identified in RIL3613 on genome map.

Figure 2. (A): Identification of Glyma.12G231400 through haplotype analysis in the RIL3613 population (*** indicates p < 0.001). (B): Haplotype validation of Glyma.12G231400 in germplasm resource populations. (*** indicates p < 0.001).

Figure 3. LOD and additive effect curves of QTL for protein content.

Figure 4. Selection of hybrid combinations across seven breeding generations. The x axis represents breeding schemes, the y axis denotes the number of hybrid combinations retained per breeding generation, and the bars represent each breeding generation. The six breeding schemes, from left to right, Pedigree Single Cross (PedSC), Bulk Single Cross (BlkSC), Pedigree Backcross 1 Parent 1 (PedBC1P1)/Pedigree Backcross 1 Parent 2 (PedBC1P2), and Bulk Backcross 1 Parent 1 (BlkBC1P1)/Bulk Backcross 1 Parent 2 (BlkBC1P2). E08 et al. represents the simulation under this environment.

Figure 5. Comparison of the population mean of the progeny population with the initial population. (A–C) represents different F2 generation planting scale. IP represents the initial population. E08 et al. represents the simulation under this environment.

Figure 6. Flowchart of six breeding schemes. The six breeding schemes, from left to right, are PedSC, BlkSC, PedBC1P1/PedBC1P2, and BlkBC1P1/BlkBC1P2. “pedigree” represents after within-family selection; each retained individual in one family was propagated and harvested to make one pedigree family in the next generation of breeding. “bulk” represents after within-family selection; all retained individuals in one family were propagated and harvested together to make one bulk family in the next generation of breeding. In the F1 generation, ten plants were grown for each hybrid variety. In the F2 generation, each line was planted with 200, 500, or 800 individuals. For the F3 and F4 generations, thirty plants were grown per line, while in the F5 to F7 generations, fifty plants were grown for each line. AF denotes selection among families, WF denotes selection within families, with the fractional values representing the selection ratios. In each generation, the highest phenotypic families are selected according to the specified ratios.

Table 1. Description and analysis of protein content of RIL3613 population.

	Parent		RIL3613 Population
	P₁ ^a	P₂	Mean	STD	Minimum (%)	Maximum (%)	Skew	Kurt	CV ^b (%)
E01	42.11	39.77	43.48	2.05	38.00	47.37	−0.39	−0.37	4.70
E02	42.63	39.06	42.30	2.26	36.74	46.14	−0.27	−0.69	5.34
E03	43.59	40.02	42.64	1.28	38.60	44.80	−0.53	−0.28	3.01
E04	43.15	38.56	41.74	1.45	37.59	44.77	−0.21	−0.24	3.48
E05	43.06	38.16	41.82	1.38	37.50	45.00	−0.54	0.88	3.31
E06	42.72	38.65	41.44	1.21	38.20	44.20	−0.20	−0.26	2.91
E07	42.77	39.55	41.54	1.48	38.00	45.30	−0.19	−0.19	3.57
E08	42.03	38.97	41.53	1.46	37.50	44.60	−0.23	−0.05	3.52
E09	42.95	37.64	41.31	1.31	36.20	44.00	−1.12	2.71	3.18
E10	43.36	39.12	42.06	1.28	37.80	45.00	−0.84	1.11	3.03
E11	43.10	39.99	42.22	1.12	38.80	44.60	−0.35	−0.11	2.66
E12	43.30	39.23	42.22	1.49	37.40	44.70	−0.82	0.69	3.52
E13	41.99	38.25	41.44	1.39	36.60	44.20	−0.65	0.50	3.37
E14	42.32	39.48	41.15	1.35	36.90	44.50	−0.44	0.80	3.28
E15	42.03	37.99	41.62	1.19	37.40	44.00	−0.79	0.84	2.86
E16	42.22	37.56	41.21	1.46	36.60	46.10	−0.35	0.93	3.54
E17	43.21	39.26	42.54	0.96	38.10	44.30	−1.06	2.62	2.27
E18	43.99	39.15	42.59	0.87	40.50	44.40	−0.32	−0.26	2.04
E19	44.00	39.68	42.41	0.93	38.90	44.90	−0.46	1.39	2.18
E20	43.26	38.12	42.09	0.99	38.60	44.20	−0.50	1.02	2.34
E21	42.96	38.62	41.91	1.36	36.50	44.30	−1.10	2.05	3.24
E22	42.58	39.25	41.92	1.32	37.80	45.20	−0.19	0.51	3.16

Specific environmental data for E01 to E22 are presented in Table 7. ^a Parents: P₁, female cultivar “Dongnong L13”; P₂, male cultivar “Heihe 36”. ^b Coefficient of variation.

Table 2. Analysis of variance protein content across 22 environments.

Source	DF	SS	MS	F	Pr (>F)	Significance	Variance
Environment (E)	21	2340	111.45	77.777	<2 × 10⁻¹⁶	***
Genotype(G)	119	2989	25.12	17.531	<2 × 10⁻¹⁶	***	0.331
Block	39	59	1.50	1.050	0.386
G × E	2345	10,998	4.69	3.273	<2 × 10⁻¹⁶	***	1.086
Residuals	4933	7069	1.43				1.433
h²							0.823

Signif. codes: <0.001 ‘***’. h² represents heritability.

Table 3. Four QTL detected in multiple environments.

QTL Name	Env.	Chr.	Marker Interval	LOD ^a	PVE (%) ^b	ADD ^c	Physical Region (bp)
qPR-1-1	E21	1	36c01042~36c01043	3.0802	12.3745	−0.4787	38,760,401~41,326,475
qPR-1-3	E20	1	36c01048~36c01049	3.3665	12.2999	−0.3506	42,339,555~43,008,300
qPR-12-1	E08	12	36c12076~36c12077	2.8396	12.2795	0.5115	38,995,090~39,293,825
qPR-14-1	E19	14	36c14059~36c14058	4.8113	15.8298	0.4002	9,898,619~10,293,987

^a LOD, logarithm of odds. ^b PVE, phenotypic variation explained by QTL. ^c ADD, additive effect.

Table 4. Detailed information on four candidate genes related to protein content.

QTL Name	Gene Name	Chr.	Position	Annotation
qPR-1-1	Glyma.01G114100	1	39,440,337–39,440,694	rubisco activase
qPR-1-3	Glyma.01G123300	1	42,526,283–42,535,486	BCL-2-associated athanogene 4
qPR-12-1	Glyma.12G231400	12	39,138,373–39,142,508	BES1/BZR1 homolog 4

Table 5. SNP variation in the CDS region of candidate genes.

Gene	Parents	Position/bp (Chromosome 1)
		39440429	39440646
Glyma.01G114100	Donong L13	T(Lys)	C(Met)
Glyma.01G114100	Heihe36	A(Met)	A(Ile)
		Position/bp (Chromosome 1)
		42529392
Glyma.01G123300	Donong L13	G(Glt)
Glyma.01G123300	Heihe36	A(Lys)
		Position/bp (Chromosome 12)
		39141373
Glyma.12G231400	Donong L13	A(His)
Glyma.12G231400	Heihe36	C(Pro)

Table 6. Optimal breeding strategies for each environment.

Environment	Selection Method	F2 Generation Planting Scale	Cross Combination	Number of Target Genotype	Number of Superior Alleles	Interval
E01	Ped	800	(HN2 × HN118) × HN2	24	52~77	42.8318~45.954
			HN2 × HN120	23	68~74	44.093~45.3332
			(HN4 × HN120) × HN120	19	67~74	43.882~45.5186
E02	Ped	800	(HN14 × HN114) × HN14	10	59~67	41.4378~42.8189
			HN14 × HN117	9	61~66	41.6547~42.5099
			(HN12 × HN117) × HN117	9	59~67	41.8867~42.8811
			(HN14 × HN117) × HN117	9	59~64	41.6028~42.1985
E03	Ped	800	HN34 × HN116	6	61~67	42.4901~43.5157
			HN12 × HN116	6	54~65	42.0977~43.5478
			HN44 × HN105	6	61~67	42.4901~43.5157
			(HN32 × HN116) × HN32	6	58~66	42.6208~43.9469
E04	Ped	800	(HN38 × HN119) × HN38	8	66~70	46.4045~48.0717
		500	HN3 × HN12	7	67~74	46.1507~46.8833
		800	HN3 × HN12	6	66~73	45.9351~47.1123
E05	Ped	800	HN1 × HN12	12	75~78	43.1178~43.4624
		800	(HN1 × HN12) × HN12	12	74~78	42.991~43.4624
		500	(HN1 × HN12) × HN12	11	76~78	43.2866~43.4624
E06	Ped	800	(HN87 × HN118) × HN87	21	54~63	38.4959~42.3515
			(HN4 × HN118) × HN118	16	41~57	37.7403~41.5515
			(HN3 × HN118) × HN118	16	46~57	38.3227~41.3913
E07	Ped	800	(HN61 × HN115) × HN115	7	74~78	44.0076~44.71
			(HN48 × HN115) × HN115	4	70~74	42.8634~44.343
			(HN115 × HN118) × HN115	4	62~75	41.9844~45.4756
E08	Ped	800	(HN27 × HN118) × HN27	6	60~67	42.9358~44.1906
			(HN74 × HN104) × HN74	5	63~71	42.3392~44.3778
			(HN11 × HN21) × HN11	5	65~72	42.8532~43.6847
			(HN11 × HN108) × HN11	5	62~69	42.0788~43.6776
			(HN11 × HN107) × HN11	5	68~69	43.0674~43.3582
			(HN16 × HN80) × HN16	5	63~68	42.4404~43.2367
E09	Ped	800	(HN115 × HN118) × HN115	23	64~73	42.3694~44.471
			(HN115 × HN120) × HN115	16	68~75	42.1194~44.5158
			HN115 × HN118	10	57~73	40.7384~44.6431
E10	Ped	800	(HN67 × HN118) × HN118	12	64~69	41.3527~43.0921
			HN107 × HN118	10	62~70	40.7067~43.8515
			(HN107 × HN114) × HN107	9	56~64	41.2065~43.0447
			(HN80 × HN118) × HN80	9	54~68	39.8393~42.6609
			(HN103 × HN115) × HN103	9	62~69	41.4689~42.0991
E11	Ped	800	(HN2 × HN118) × HN2	17	72~80	45.6911~48.2653
			(HN2 × HN120) × HN2	14	72~79	46.0565~47.7315
			(HN2 × HN116) × HN2	11	74~78	45.9251~47.0337

Table 7. Field management methods in 22 environments.

Environment	Year	Location	Sowing Date	Planting Density (×104 plant/hm²)	(N/P₂O₅/K₂O) (kg/hm²)
E01	2013	Keshan	13-May	30	75/150/75
E02	2014	Harbin	10-May	22	75/150/75
E03	2015	Harbin	10-May	30	75/150/75
E04	2015	Keshan	12-May	35	75/150/75
E05	2016	Acheng	25-May	22	75/150/75
E06	2016	Acheng	10-May	22	75/150/75
E07	2016	Acheng	10-May	30	75/150/75
E08	2016	Acheng	10-May	22	0/150/75
E09	2016	Shuangcheng	28-May	22	75/150/75
E10	2016	Shuangcheng	12-May	22	75/150/75
E11	2016	Shuangcheng	12-May	30	75/150/75
E12	2016	Shuangcheng	12-May	22	0/150/75
E13	2016	Harbin	23-May	22	75/150/75
E14	2016	Harbin	10-May	22	75/150/75
E15	2016	Harbin	10-May	30	75/150/75
E16	2016	Harbin	10-May	22	0/150/75
E17	2017	Shuangcheng	8-May	22	75/150/75
E18	2018	Acheng	9-May	22	75/150/75
E19	2019	Harbin	10-May	22	75/150/75
E20	2019	Shuangyashan	13-May	25	75/150/75
E21	2020	Harbin	10-May	22	75/150/75
E22	2020	Shuangyashan	14-May	25	75/150/75

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sun, X.; Hu, B.; Li, W.-X.; Ning, H.-L. Genetic Basis and Simulated Breeding Strategies for Enhancing Soybean Seed Protein Content Across Multiple Environments. Plants 2025, 14, 2117. https://doi.org/10.3390/plants14142117

AMA Style

Sun X, Hu B, Li W-X, Ning H-L. Genetic Basis and Simulated Breeding Strategies for Enhancing Soybean Seed Protein Content Across Multiple Environments. Plants. 2025; 14(14):2117. https://doi.org/10.3390/plants14142117

Chicago/Turabian Style

Sun, Xu, Bo Hu, Wen-Xia Li, and Hai-Long Ning. 2025. "Genetic Basis and Simulated Breeding Strategies for Enhancing Soybean Seed Protein Content Across Multiple Environments" Plants 14, no. 14: 2117. https://doi.org/10.3390/plants14142117

APA Style

Sun, X., Hu, B., Li, W.-X., & Ning, H.-L. (2025). Genetic Basis and Simulated Breeding Strategies for Enhancing Soybean Seed Protein Content Across Multiple Environments. Plants, 14(14), 2117. https://doi.org/10.3390/plants14142117

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Genetic Basis and Simulated Breeding Strategies for Enhancing Soybean Seed Protein Content Across Multiple Environments

Abstract

1. Introduction

2. Results

2.1. Phenotypic Variation Analysis

2.2. QTL Mapping for Candidate Gene Prediction

2.3. Candidate Gene Prediction

2.4. Haplotype Analysis and Validation

2.5. Haplotype Analysis and Validation

2.6. Analysis of Simulated Breeding Results

2.7. Formulation of Breeding Program

3. Discussion

4. Materials and Methods

4.1. Plant Populations

4.2. Field Trials and Phenotypic Measurement

4.3. Statistical Analysis of Phenotype Data

4.4. SNP Genotyping and Genetic Map Construction

4.5. QTL Localization

4.6. Candidate Gene Prediction

4.7. Haplotype Analysis of Candidate Genes

4.8. Breeding Simulations Based on the Blib Platform

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI