1. Introduction
Soybean is one of the world’s most important crops and a primary source of protein and oil for humans [
1,
2]. China has a long history of soybean production. With the rapid development of modern soybean cultivation in Northeast China, soybeans have become a hallmark of the region’s agricultural economic history [
3]. The genetic improvement of soybean seed protein is crucial in terms of meeting the demands of the growing global population [
4]. A primary objective in soybean breeding is to enhance protein content [
5].
Soybean protein content is a complex quantitative trait influenced by environmental conditions and controlled by multiple genes [
6]. Most studies indicate that while seed composition is primarily governed by genetic factors, it is also affected by abiotic and biotic factors [
7]. The protein and oil content in seeds of the same variety can vary across years or under different environmental conditions within the same year [
8]. Molecular geneticists and breeders commonly use populations derived from biparental crosses to select new varieties and map quantitative trait loci (QTL) for target traits [
9]. Recent research has focused on identifying QTLs for protein content and mining genes using linkage analysis and genome-wide association studies (GWAS) with high-density genetic maps. These approaches have been instrumental in elucidating the genetic architecture of soybean seed protein and facilitating variety improvement [
10]. According to the SoyBase database, 241 QTLs influencing soybean SPC had been identified by 2018, with additional QTLs discovered in subsequent years. Karikari et al. [
11] constructed a genetic map using 2267 bin markers, identifying 25 protein QTLs. By integrating these findings with transcriptomic data, they pinpointed four candidate genes associated with protein synthesis. Zhong et al. [
12] used a high-density genetic map to evaluate QTLs for protein content, detecting 44 major and stable QTLs. Cunicelli et al. [
13] analyzed 138 recombinant inbred lines across six environments, identifying 21 QTLs for traits including yield, protein, oil content, methionine, and threonine, with four linked to protein. Seo et al. [
14] identified 12 seed protein content-related QTLs. Further investigation of candidate genes within major-effect QTLs could provide deeper insights into the genetic basis of SPC. Lee et al. [
15] identified 192 co-linear protein QTLs, forming six hotspot regions, and detected eight genes that are highly expressed during seed maturation. Yang et al. [
16] mapped protein content to a 15 kb interval using 195 chromosome segment substitution lines and, in conjunction with transcriptomic data, designated
Glyma.15G049200 as a candidate gene. Salas et al. [
17] mapped two stable protein QTLs, i.e., qPro-10-1 and qPro-14-1. Fliege et al. [
9] performed fine mapping of the cq-Seed protein-003 QTL on chromosome 20, identifying
Glyma.20G85100 as a gene related to soybean seed protein content.
Large-scale genomic sequencing, high-density linkage analysis, genome-wide association studies, and extensive functional genomics research have made designed breeding a tangible reality. Breeders are gradually moving away from traditional methods and adopting the “priori design” concept [
18]. This shift is driven by the long breeding cycle of soybeans. Currently, parental line selection still heavily relies on breeders’ experience and intuition. Designed breeding effectively addresses these challenges. It aims to control all allelic variations of genes which are essential for agronomic traits. This control becomes possible through precise genetic maps, high-resolution chromosome single-nucleotide analysis, and extensive phenotypic evaluations [
19].
Designed breeding has proven effective in plant breeding. Wei et al. [
20] combined marker-assisted selection with multiple resistance screening. After several rounds of hybridization, they aggregated six target genes and developed a promising restorer line: Guihui5501. This line exhibited heavy grain, good quality, and tolerance to both biotic and abiotic stresses. To develop high oleic acid soybean varieties, Nan et al. [
21] analyzed the FAD2-1A and FAD2-1B haplotypes—key factors in increasing oleic acid content—in 1250 soybean materials and developed two molecular markers. Using marker-assisted selection, they identified line 435, which had an oleic acid content of 91.03%. Line 435 was then used as the donor parent, with the superior soybean variety Hainong 51 serving as the recurrent parent. After three backcrosses, a single plant with high oleic acid content (75%) and high yield was obtained. These case studies highlight how parental line selection and breeding strategies determine the success of breeding objectives [
22]. In recent years, methods such as BLUP and genomic selection (GS) have been used to estimate parental breeding potential and guide selection in crop improvement [
23]. Additionally, parental selection can be based on predicted performance. Zhong et al. [
24] proposed selecting inbred line parents based on the projected performance of the best offspring from a cross, termed “superior progeny value.” When designing breeding schemes, breeders must choose the optimal strategy from multiple options before initiating actual breeding. Computer simulations effectively compare multiple breeding methods and identify the most efficient scheme for generating the target genotype, thereby saving time, land, and labor costs. These simulations incorporate assumptions about population and quantitative genetics, influencing the final breeding plan [
25]. Additionally, they generate extensive data that may be difficult to obtain through empirical experiments or theoretical models, helping validate proposed theories or models [
26]. By leveraging parental molecular data and genomic prediction models, simulations can create segregating populations from virtual crosses, enabling the prediction of the most promising populations before conducting actual field crosses [
27]. Bančič et al. [
28] recently developed AlphaSimR, a software package that allows breeders to design and simulate breeding schemes independently. Zhang et al. [
29] recently developed Blib, a multi-module simulation platform capable of handling more complex genetic effects and models than existing tools, making it suitable for modeling, simulating, and predicting genetic breeding processes in diploid species. Building on the Blib platform, Wang et al. [
30] proposed a wheat breeding design method that integrates known QTL information with computer simulations. Potential crosses within a GWAS panel can be evaluated based on the relative frequency of target genotypes, trait correlations in simulated progeny, and genetic gains in selected progeny. By optimizing parental selection, progeny population size, and selection schemes, both yield and grain quality can be improved simultaneously. Applying this design method enables the identification of the most promising crosses and selection strategies before field trials, enhancing the predictability and efficiency of breeding programs.
Based on previous studies, research on identifying QTLs and potential candidate genes associated with protein content is extensive. However, the effects of different haplotypes of candidate genes on soybean SPC remain underexplored. Additionally, the use of genetic information to develop models and assess the breeding potential of soybean SPC is limited.
In our previous studies, a recombinant inbred line (RIL) population, RIL3613, was constructed and used to map QTLs for SPC primarily based on an SSR linkage map [
31,
32,
33]. However, due to a lack of fine mapping, these QTLs could not be applied in molecular breeding. To identify optimal breeding strategies for SPC in RIL3613, the present study conducted a linkage analysis using a high-density SNP linkage map to map SPC-related QTLs and QEI across the whole genome. Key candidate genes were identified through parental sequence comparison and haplotype analysis. A genetic model was constructed using the ISB plant breeding simulation platform, incorporating QEI data. Breeding simulations were then conducted with the RIL3613 population as parental lines to determine optimal breeding strategies for diverse environments.
This study aims to provide a theoretical foundation and technical support for the genetic improvement of soybeans.
3. Discussion
In this study, the RIL3613 population was used to identify QTLs and QEIs associated with soybean SPC. The BIP model identified 19 QTLs across 22 environments. Four QTLs had a phenotypic contribution exceeding 10%, classifying them as the major-effect QTL. Using the SoyBase database, these 19 QTLs were compared with 241 previously mapped seed protein-related QTLs. Eight QTLs overlapped with or were included in prior findings [
36,
37,
38,
39,
40,
41,
42], while the remaining 11 were identified as novel, validating the reliability of the QEI mapping results. Through parental sequence comparisons and haplotype analysis, a key gene,
Glyma.12G231400, associated with soybean protein content, was predicted within the 38,995,090–39,293,825 bp region on chromosome 12. This gene is annotated as BEH4 (BES1/BZR1 homolog 4), a homolog of the BHLH transcription factors BRASSINOSTEROID INSENSITIVE 1 (BES1) and BRASSINAZOLE RESISTANT 1 (BZR1), which are critical in brassinosteroid (BR) signaling. BRs are common plant hormones, and previous studies indicated that BEH1 and BEH2 are regulated by brassinolide (BL) in Arabidopsis [
43]. BL, the most prevalent BR, has been shown to increase SPC in common beans [
44,
45]. The overexpression of BEH4 in tomatoes enhances the expression of genes involved in nitrogen uptake and assimilation [
44]. Therefore, BEH4 is likely to promote soybean SPC synthesis and accumulation by modulating BL and nitrogen absorption.
Breeders are increasingly leveraging the expanding wealth of published gene and QTL data, along with the widespread adoption of marker-assisted selection, to accelerate crop improvement. While most QTL mapping efforts have focused on single-environment QTL detection, multi-environment trial (MET) QTL mapping and the detection and modeling of QEIs have received less attention [
46]. QEIs can be studied when genetic populations are grown across multiple locations or years, providing invaluable insights for both breeders and geneticists. Based on QTL mapping results, breeders can design optimal genotypes with favorable alleles and implement marker-assisted selection more effectively. Stable QTLs for agronomic traits are applicable across diverse environments, whereas environment-specific QTLs are useful for targeted environments [
47].
In this study, 97 QEIs were identified using the ICIM method within the MET module. These QEIs were compared with 241 previously mapped seed protein-related QTLs in the SoyBase database. Forty QEIs overlapped with or were included in prior research findings [
36,
37,
39,
40,
42,
48,
49,
50,
51,
52,
53,
54,
55,
56,
57,
58], validating the reliability of the QEI mapping results (
Figure S2).
Additionally, ISB, an application module within Blib, was used to simulate pure-line variety development in plants. Key elements for these simulations included environmental and breeding target trait genetic models, parental populations, and breeding methods [
59]. Genetic models were primarily constructed based on previous genetic studies. The QEIs identified in this study are minor in effect, stable, and widely distributed across the soybean genome, with reliable localization results, making them suitable for genetic model construction. The SPC of the RIL3613 population used in this study exhibited a normal or near-normal distribution across all 11 simulated environments, with transgressive segregation observed within the population, indicating its suitability for breeding based on soybean SPC traits. To maximize the breeding potential of the RIL3613 population, single cross, backcross, pedigree, and bulk selection methods were applied simultaneously, and a half-diallel cross design was used to simulate all possible cross combinations within the population. Consequently, the breeding simulation results and the proposed optimal breeding strategies in this study are considered reliable.
Environmental factors can significantly influence breeding outcomes, making genotype selection for local conditions essential to enhancing soybean protein content [
60,
61]. Therefore, a key objective for breeders is to develop genotypes suited to a specific set of environments, termed the “Target Population of Environments” (TPE), which includes a defined range of farms and expected growing seasons [
62]. This study designed optimal breeding schemes for individual environments within the TPE, ensuring that each scheme produced target genotypes suited to its respective conditions. However, the study characterized environments solely based on heritability. To improve breeding strategies and variety recommendations, future research should focus on precisely describing climatic stress patterns that may influence environments [
63]. Analyzing 25 years of data across 35 regions, Beillouin et al. used historical yield records and weather databases to identify four climatic factor combinations affecting barley crops in the French barley belt, with important implications for local genotype adaptation strategies [
64]. Similarly, Heinemann et al. integrated a generalized additive model (GAM), environmental covariates (ECs), and grain yield (GY) data from 18 years of historical breeding trials to develop an “environmental forecasting” approach. This approach predicts the optimal EC thresholds for each production scenario (four regions, three seasons, and two grain types) and their respective contributions to GY adaptation, revealing strong interactions between developmental stages, seasons, and regions due to the nonlinear effects of air temperature, solar radiation, and rainfall [
65]. Similarly, precise environmental characterization can similarly enhance breeding simulations, leading to more reliable breeding schemes based on simulation results.
The breeding strategy developed in this study could be applied to various environments. This would involve evaluating parental populations in the target environment, performing QTL mapping, and using QTL and population data for breeding simulations. The simulation results could then guide the design of breeding strategies. This study offers new insights for designing soybean breeding programs.