Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Identification of Loci and Candidate Genes Associated with Arginine Content in Soybean

Agronomy 2025, 15(6), 1339; https://doi.org/10.3390/agronomy15061339

by Jiahao Ma^1,2

, Qing Yang¹, Cuihong Yu^1,3, Zhi Liu¹

, Xiaolei Shi¹, Xintong Wu¹

, Rongqing Xu^1,2, Pengshuo Shen^1,2, Yuechen Zhang², Ainong Shi⁴

and Long Yan^1,*

Reviewer 1:

Frédéric Marsolais

Reviewer 2:

Ivana Varga

Reviewer 3: Anonymous

Agronomy 2025, 15(6), 1339; https://doi.org/10.3390/agronomy15061339

Submission received: 23 April 2025 / Revised: 21 May 2025 / Accepted: 27 May 2025 / Published: 30 May 2025

(This article belongs to the Special Issue Evaluation of Germplasm Resources, Molecular Breeding, and Utilization in Soybean)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The authors report on a GWAS analysis of arginine content in soybean seed. Two SNP markers were identified having a strong signal for the trait. Some gene candidates in the vicinity of these markers were identified. Different models for genomic selection of arginine content were evaluated. While the study was well performed and the findings are interesting, there are so suggestions and concerns outlined below.

Why focus on arginine content? Especially if arginine is non-nutritionally essential? Why focus on this particular group of 290 accessions, and subset of 164 accessions? Why is the distribution of arginine content bimodal in the 290 accessions? Is this normally observed in soybean populations or is it unique?

What is the rationale for selecting only a 5 kb region next to the marker to investigate candidate genes?

Introduction

Reference 1 should be general in relation with this sentence, rather than about the role of a single transcription factor during drought.
Sentence starting on line 47 is circular with ornithine being mentioned twice. Arginase belongs to catabolism rather than biosynthesis. This deserves more explanation. Next sentence, how is it different from urea excretion in mammals? Provide more details.
Line 78, Zhang et al. is omitted from reference list.
Line 83, Spindel et al. is about rice, not maize.
Next sentence, Crossa et al. is a general review, not related to wheat in particular.
Line 87, reference 20 is not Grenier et al.

Materials and Methods

Line 21, Wang & Zhang 2021 is not numbered.

Author Response

Comments 1: Why focus on arginine content, especially since arginine is non-essential?
Response 1: Thanks for your suggestion. Although arginine is considered non-essential in healthy adults, it is conditionally essential for infants, the elderly, and individuals under stress or recovering from trauma. Additionally, in plants, arginine plays a critical role in nitrogen metabolism and signal transduction, influencing both plant function and seed nutritional quality. We have clarified this in the Introduction (lines 54–93, page 2-3).

Comments 2: Why focus on this group of 290 accessions and the 164 subset?
Response 2: Thanks for your suggestion. The group of 290 accessions and the subset of 164 were selected based on the availability of both phenotypic data (arginine content) and genotypic data (SNP markers), including accessions with the highest and lowest arginine content. These selections were made to capture broad phenotypic diversity while maintaining a balanced population structure. This strategy supports robust genome-wide association studies (GWAS) and genomic prediction (GP) by enabling the evaluation of model generalizability and predictive performance across diverse genetic backgrounds (lines 140–149, page 4).

Comments 3: Why is arginine content bimodally distributed? Is this common in soybean?
Response 3 : Thank you for this important observation. The original arginine content across the broader soybean panel was not bimodally distributed. For the purposes of this study, we selected accessions with either high or low arginine content to better capture phenotypic extremes and facilitate the identification of major QTLs. As a result, the bimodal distribution observed reflects this intentional selection rather than the natural distribution in the population. Additionally, the bimodal pattern may suggest the presence of subpopulations with distinct genetic backgrounds or a major-effect locus influencing arginine content. While many soybean quality traits typically show normal or skewed distributions, bimodal distributions can occur when traits are controlled by major loci. This has been clarified in the revised manuscript (lines 280–283, page 7).

Comments 4: Why use a 5 kb region for candidate gene analysis?
Response 4 : Thanks for your suggestion. The 5 kb window was chosen based on the typical LD decay in soybean, which occurs within 100 kb but is often concentrated within 10 kb. A narrower window helps reduce false positives and increases the precision of candidate gene identification. This is now clarified in the Methods section.

Comments 5: Reference 1 should be general in relation with this sentence, rather than about the role of a single transcription factor during drought.

Response 5 : Thanks for your suggestion. Reference [1] revised to reflect a more general source. (line 44, page 2).

Comments 6: Sentence starting on line 47 is circular with ornithine being mentioned twice. Arginase belongs to catabolism rather than biosynthesis. This deserves more explanation. Next sentence, how is it different from urea excretion in mammals? Provide more details.

Response 6 : Thanks for your suggestion. 47 rewritten to clarify the catabolic role of arginase and contrast with mammalian urea metabolism. (lines 60–67, page 2)

Comments 7:Line 78, Zhang et al. is omitted from reference list.

Response 7 : Thanks for your suggestion. Zhang et al. added to the reference list. (line 115, page 3)

Comments 8:Line 83, Spindel et al. is about rice, not maize.

Response 8 : Thanks for your suggestion. Spindel et al. corrected to refer to rice. (line 120, page 3)

Comments 9:Next sentence, Crossa et al. is a general review, not related to wheat in particular.

Response 9 : Thanks for your suggestion. Crossa et al. corrected to reflect general review content. (line 122, page 3)

Comments 10:Line 87, reference 20 is not Grenier et al.

Response 10: Thanks for your suggestion. Reference 20 corrected. (line 125, page 3)

Comments 11: Line 21, Wang & Zhang (2021) not numbered.
Response 11 : Thanks for your suggestion. This citation was redundant and has been removed to avoid repetition.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

Dear authors,

Your manuscript Identification of Loci and Candidate Genes Associated with Arginine Content in Soybean is very interesting to the readers.

This study analyzed 290 soybean accessions using four GWAS models and identified two SNPs significantly associated with arginine content, linked to candidate genes on chromosomes 6 and 11. Genomic prediction using multiple models showed high accuracy, supporting the use of genomic selection for breeding high-arginine soybean cultivars.

I suggest checking the guidelines for writing the references list. Also, be careful that the table does not split across the pages.

In the abstract, it should be emphasized that the commercial importance of this study lies in its potential to improve the nutritional value and market competitiveness of soybean cultivars. By identifying genetic markers linked to high arginine content and enabling accurate genomic selection, breeders can develop soybean varieties with enhanced protein quality, appealing to health-conscious consumers and high-value food and feed markets.

Please refer to tables in the text (line 250 - 256).

Author Response

Comments 1: Reference formatting and table layout issues.
Response 1 : Thanks for your suggestion. We have carefully formatted all references according to Agronomy guidelines and ensured that tables do not span across pages. Tables have been reorganized for clarity and consistency.

Comments 2: Emphasize the commercial relevance in the Abstract.
Response 2 : Thanks for your suggestion. We have revised the Abstract to emphasize the study's commercial implications:

"This study has important commercial implications, as it provides genetic resources and molecular tools for improving the nutritional quality and market value of soybean cultivars. By identifying SNP markers associated with high arginine content and achieving high accuracy in genomic prediction, this research facilitates the breeding of soybean accessions with enhanced protein quality, catering to health-conscious consumers and the high-value food and feed markets." (lines 33–40, page 1)

Comments 3: Refer to tables in the text (lines 250–256).
Response 3 : Thanks for your suggestion. All tables have been referenced appropriately in the text, and formatting has been revised for clarity and alignment with journal requirements.

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

I have reviewed the manuscript entitled “Identification of Loci and Candidate Genes Associated with Arginine Content in Soybean,” which presents a genome-wide association study (GWAS) to uncover the genetic basis of arginine content in soybean. The study analyzes 290 soybean accessions, identifies two significant SNPs, and proposes two candidate genes related to arginine accumulation. Furthermore, the authors employ genomic prediction models to assess the potential for breeding applications.
While the manuscript addresses a relevant and timely topic in soybean genetics and breeding, in its current form, is not suitable for publication due to several areas that require significant improvement before it can be considered for publication. The study has merit and presents novel findings; however, there are several critical areas that need improvement to meet the standards of scientific rigor and clarity—particularly in enhancing the discussion, clarifying methodologies, and improving data presentation. Once these concerns are addressed, the manuscript may be reconsidered for publication.

The manuscript lacks a strong justification for the focus on arginine. The authors should clearly explain why arginine is an important amino acid, particularly in the context of human nutrition, animal feed, and functional food development. This would better frame the study’s importance and relevance to modern breeding objectives. The Discussion is not sufficiently developed. There is no comparison with previous literature and results, which weakens the interpretation of the findings. The authors should integrate relevant studies and discuss whether the identified loci have been implicated previously or are entirely novel. This would greatly enhance the credibility and scientific contribution of the work. Overall, the discussion currently lacks depth and persuasiveness. Additionally, the overall clarity of the manuscript needs to be improved, as there are discrepancies between certain parts of the methodology and the presented results. These inconsistencies should be carefully revised and clarified to enhance manuscript coherence and scientific integrity.

Below are provided my other recommendations/comments to the authors, which I suggest to improve throughout the manuscript.

Line 102: What criteria were used to select the 290 soybean accessions from the USDA germplasm collection?

Line 112: The section on phenotypic evaluation is vague. Please describe the phenotyping protocol, including sample preparation, quantification method, replication, and environment conditions.

Line 138: Although GLM is listed among the GWAS models, its results are not discussed.

Line 176: The subsection 2.7.3.1 appears redundant with 2.7.1. Please clarify how they differ.

Line 200: Inconsistent number of models presented—there are four models listed in the text, while five were stated in the methods.

Line 244–246: The relationship between the number of loci in Table 1 and the stated criteria is unclear. Clarify the inclusion criteria for SNPs.

Line 304: The rationale for using genomic prediction in the context of traits governed by major genes should be better justified.

Figure 1: The x-axis lacks units. Please indicate whether it represents frequency or number of genotypes.

Line 230: The grouping of genotypes is mentioned, but no description is provided of the characteristics or rationale behind these groups.

Table 1: Inclusion criteria for the listed SNPs are unclear. Some SNPs were detected using only one model despite a stated multi-model approach.

Figure/Table Consistency: There is inconsistency between SNPs shown in Manhattan plots, tables, and discussed in the text. Clarify which models detected which SNPs and standardize the results across all visuals and descriptions.

Line 318: Choose a consistent format (table or image) for presenting genomic prediction results.

Table 2: There’s a discrepancy in the number of models presented here versus those mentioned in the methods.

Supplementary Table S3: Not cited in the text. It should also be expanded to indicate which GWAS model identified each SNP.

Figure 8 legend: Only four markers are listed in Table 1, creating confusion. Review and align legends and table content.

Supplementary Table S3 should be extended with the information which GWAS model identified which SNP

Line 370: isn’t these markers identified by BLINK?

Line 394: Elaborate to strengthen the claim regarding the role of arginine in human health, feed, etc..

Line 401: Have these or related markers been identified in previous studies? A brief review of existing literature would contextualize novelty.

Table S3 is not cited in the text.

Author Response

Comments 1: Weak justification for arginine focus.
Response 1 : Thanks for your suggestion. The Introduction has been expanded to better justify the focus on arginine, highlighting its nutritional and functional significance in both plant and human systems. (lines 68–93, page 2-3)

Comments 2: Discussion lacks depth and comparison.
Response 2 : Thanks for your suggestion. A new paragraph has been added to the Discussion section comparing our findings with previous GWAS and QTL studies on amino acid traits in soybean. (lines 460–470, pages 16)

Comments 3: Inconsistencies between methods and results.
Response 3 : Thanks for your suggestion. We have reviewed the entire manuscript to ensure consistency in model descriptions, SNP numbers, and cross-references to figures and tables.

Comments 4 : Line 102: Criteria for selecting 290 accessions.
Response 4 : Thanks for your suggestion. We clarified in Section 2.1 that the 290 accessions were selected for their geographic diversity (12 countries) and arginine data availability, aiming to capture a broad genetic background. (lines 140-142, page 4)

Comments 5 : Line 112: Phenotyping protocol unclear.

Response 5 : Thanks for your suggestion. We added a detailed description of the phenotyping process, including sample preparation, quantification by NIR, replication details, and calibration protocol. (lines 154–160, page 4)

Comments 6: Line 138: GLM listed but not discussed.
Response 6 : Thanks for your suggestion. As the GLM model was not used in the final analysis, we have removed references to it to maintain consistency.

Comments 7: Line 176: The subsection 2.7.3.1 appears redundant with 2.7.1. Please clarify how they differ.
Response 7 : Thank you for the suggestion. We have clarified in Section 2.7 that Subsection 2.7.1 describes genomic prediction using randomly selected SNP sets for comparative purposes, while Subsection 2.7.3.1 focuses on self-prediction using GWAS-identified significant SNPs. These two analyses serve different purposes—random SNPs for model benchmarking and GWAS-based SNPs for trait-specific prediction—and utilize different methodologies. The text has been rephrased accordingly to eliminate redundancy and enhance clarity (lines 199–221, pages 5–6).

Comments 8: Line 200: Inconsistent number of models presented—there are four models listed in the text, while five were stated in the methods.
Response 8 : Thank you for catching this inconsistency. The GLM model was originally mentioned but not ultimately used in the final analysis. We have corrected the number of models used to four in the text (line 221, page 6).

Comments 9: Line 244–246: The relationship between the number of loci in Table 1 and the stated criteria is unclear. Clarify the inclusion criteria for SNPs.
Response 9 : We appreciate this observation. The Results section now clarifies that SNPs listed in Table 1 were identified by one or more GWAS models with a LOD score greater than 5.78 (lines 316–319, page 9).

Comments 10: Line 304: The rationale for using genomic prediction in the context of traits governed by major genes should be better justified.
Response 10 : Thank you for the thoughtful suggestion. While arginine content is influenced by a major-effect locus, it remains a complex trait modulated by multiple minor-effect loci. We have added to the Discussion that genomic prediction is valuable in integrating both major and minor genetic effects, especially for improving prediction accuracy across populations (lines 495–509, page 16-17).

Comments 11: Figure 1: The x-axis lacks units. Please indicate whether it represents frequency or number of genotypes.
Response 11 : Thank you for pointing this out. The x-axis label in Figure 1 has been updated to include the appropriate unit: “(g/100g)” (line 284, page 7).

Comments 12: Line 230: The grouping of genotypes is mentioned, but no description is provided of the characteristics or rationale behind these groups.
Response 12 : Thank you. We have added in Section 3.3 that genotypes were grouped based on the 50K SNP genotyping array data from the USDA database. SNPs significantly associated with arginine content (Table 1) were further analyzed by ANOVA to assess their effects, and statistical significance was indicated (lines 335-336, page 10).

Comments 13: Table 1: Inclusion criteria for the listed SNPs are unclear. Some SNPs were detected using only one model despite a stated multi-model approach.
Response 13 : Thank you for your comment. It has been clarified in the Results section that SNPs in Table 1 were detected by one or more GWAS models, each with a LOD score > 5.78 (line 316-319, page 9).

Comments 14: Figure/Table Consistency: There is inconsistency between SNPs shown in Manhattan plots, tables, and discussed in the text. Clarify which models detected which SNPs and standardize the results across all visuals and descriptions.
Response 14 : Thank you for the valuable feedback. All figures and tables have been revised for consistency. Table 1, Figure 3, and Figure 4 now align with the text, and each SNP is annotated with the corresponding GWAS model(s) used for its detection.

Comments 15: Line 318: Choose a consistent format (table or image) for presenting genomic prediction results.
Response 15 : Thank you. We have revised the presentation to adopt a consistent format for all genomic prediction results, enhancing clarity and interpretability.

Comments 16: Table 2: There’s a discrepancy in the number of models presented here versus those mentioned in the methods.
Response 16 : Thank you for pointing this out. We have corrected the number of SNPs and prediction models used in Table 2 to match those described in the Methods section.

Comments 17: Supplementary Table S3: Not cited in the text. It should also be expanded to indicate which GWAS model identified each SNP.
Response 17 : Thank you. Supplementary Table S3 has now been properly cited in both the Methods and Results sections (line 460-470, page 16), and expanded to specify which GWAS model identified each SNP.

Comments 18: Figure 8 legend: Only four markers are listed in Table 1, creating confusion. Review and align legends and table content.
Response 18 : Thank you for your observation. Table 1 has been expanded to include all 10 SNPs used in the analysis, ensuring consistency with the content of Figure 8 and its legend.

Comments 19: Supplementary Table S3 should be extended with the information which GWAS model identified which SNP

Response 19 :Thanks for your suggestion, we have labeled the GWAS counterpart models in the Supplementary Table S3.

Comments 20: Line 370: Aren’t these markers identified by BLINK?
Response 20 : Yes, thank you for the clarification. We have now explicitly stated that the GAGBLUP model in Figure 10 uses SNP markers identified through the BLINK analysis (line 422-435, page 14-15).

Comments 21: Line 394: Elaborate to strengthen the claim regarding the role of arginine in human health, feed, etc.
Response 21 : Thank you. The Introduction has been expanded to discuss the scientific rationale and biological importance of arginine in human health, animal nutrition, and feed efficiency, supported by relevant literature references (lines 68–93, page 2-3).

Comments 22: Line 401: Have these or related markers been identified in previous studies? A brief review of existing literature would contextualize novelty.
Response 22 : Thank you for the insightful suggestion. We have added a brief review of related findings from previous studies to the Discussion section to better contextualize the novelty of our results (lines 460–470, page 16).

Comments 23: Table S3 is not cited in the text.
Response 23 : Thank you. Table S3 is now properly cited in both the Methods and Results sections, and has been expanded to indicate which GWAS model identified each SNP (line 348-349, line 460-461, page 11, page 16).

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors completed all of the suggested revisions and added explanations where requested.

Article Menu

Identification of Loci and Candidate Genes Associated with Arginine Content in Soybean

Further Information

Guidelines

MDPI Initiatives

Follow MDPI