Detection of Candidate Loci and Genes Related to Phosphorus Efficiency at Maturity through a Genome-Wide Association Study in Soybean

: Phosphorus (P) deficiency is one of the major factors limiting soybean production, and approximately 90% of P absorbed by plants occurs during the reproductive stage. Thus, it is important to understand the genetic mechanism underlying soybean low-P tolerance, especially in the mature period. Here, we evaluated six P-efficiency-related traits at maturity of 219 soybean accessions, namely, plant height (PH), node number of the main shoot (NN), branch number of the main shoot (BN), pod number per plant (PN), 100-seed weight (100SW), and seed yield per plant (SY), under normal-phosphorus (NP) and low-phosphorus (LP) conditions across two environments. Then, a genome-wide association study (GWAS) in conjunction with a high-density NJAU 355 K SoySNP array was performed. As a result, 27 P-efficiency-related single nucleotide polymorphisms (SNPs) were identified. Furthermore, two repeated SNPs, AX-93897192 and AX-93897200, located on chromosome 19 that were associated with both PH and NN were considered as stable SNPs associated with P deficiency, and the candidate gene GmABCG39 was identified. This work will be helpful in breeding high-P-efficiency soybean varieties.


Introduction
Phosphorus (P) is one of the essential mineral elements for plant growth and development. It not only is an important component of ATP, nucleic acids, and phospholipids but also plays important roles in plant signal transmission, energy transfer, respiration, and photosynthesis [1]. However, phosphate (Pi), the only form that can be absorbed and utilized by soybean plants, is relatively low in abundance in cultivated soils, and low P has become an important factor limiting soybean yield. Although the application of Pi fertilizer could solve this problem, the large use of Pi fertilizer causes a series of environmental problems [2,3]. Importantly, Pi fertilizer mainly comes from the mining of Pi rock resources, which are nonrenewable resources, and the world's Pi rock resources will be exhausted in the next 50-200 years [4]. Therefore, analysing the genetic mechanism of low-P tolerance and exploring P-efficiency-related genes in soybean could be effective in preventing P pollution, as well as improving soybean yield.
Compared with traditional quantitative trait locus (QTL) mapping approach, the genome-wide association study (GWAS) is based on linkage disequilibrium (LD) and can

Plant Materials and Growth Conditions
A natural soybean population was used in this study. The population consisted of 219 soybean accessions with different geographical origins, which were collected from 26 provinces in China and from America, Japan, and Brazil [19]. The natural population with abundant genetic variations has been successfully applied for the genetic analysis of complex quantitative traits in soybean [14,20,21]. All soybean materials were provided by the National Center for Soybean Improvement of China.
For the natural population, the 219 soybean accessions were sown in 15 L plastic pots at Jiangpu Experimental Station of Nanjing Agricultural University in 2012 (designated as E1) and 2013 (designated as E2). Before sowing, the concentrations of nitrogen (N), phosphorus (P), and potassium (K) in dry soil at the Jiangpu Experimental Station were measured to ensure that the P concentration was below 10 mg/kg. Then, the dry soil was designated as low-P soil, and monopotassium phosphate (KH2PO4) was added into low-P soil to bring the P concentration to 20 mg/kg, which was designated as normal-P soil. To satisfy the demands of plants growth, urea (H2NCONH2) was added to both low-P soil and normal-P soil to bring the N concentration to 60 mg/kg, and potassium chloride (KCl) was added to low-P soil so that the K concentration was consistent with that of the normal-P soil. In general, the 219 soybean accessions were sown in 15 L plastic pots containing 10 kg of normal-P dry soil or low-P dry soil, with four or six seeds per pot, in accordance with a completely randomized block design with three replications. Then, the soybean seedlings were thinned to two in each plot approximately two weeks after germination. The normal-P dry soil was considered the NP condition, and the low-P dry soil was considered the LP condition. That is, each accession was subjected to NP and LP conditions, with six seedlings in three plastic pots.
To test the relative expression levels of GmABCG39, a candidate gene identified in this study, when the plants were exposed to low-P stress, seedlings of the low-P tolerant soybean accession "Kefeng No.1" and the low-P sensitive soybean accession "Nannong 1138-2" [12] were germinated in plastic pots in the greenhouse of Nanjing Agricultural University according to a previous study [22]. The treatment involving 1/2 Hoagland solution with 0.5 mM KH2PO4 was designated as the +P condition, and the treatment involving 1/2 Hoagland solution with 0.5 mM KCl was designated as the -P condition.

Evaluation of Soybean P-Efficiency-Related Traits at Maturity
When the soybean plants matured, three plants with similar growth conditions were harvested for each accession under NP and LP conditions. Then, the PH (from the cotyledonary node to the top of the main stem), NN, BN, and PN were evaluated. The soybean seeds of each plant were dried at 35 °C in an oven to a constant weight, and then the total weight of the seeds was measured, which represented the SY. At the same time, 100 seeds were taken from those harvested from each plant randomly and weighed. This step was repeated three times, and the average weight was designated as the 100SW.

Statistical Analysis of Phenotypic Data
The descriptive statistical analysis and correlation analysis were conducted using SAS 9.2 (SAS Institute Inc., Cary, NC, USA). Origin 8.5 software (OriginLab, Northampton, Massachusetts, USA) was used to construct histograms, and Manhattan and quantilequantile (QQ) plots were generated using R software (https://www.r-project.org/, accessed on 26 August, 2022).

GWAS of Soybean P-Efficiency-Related Traits
A GWAS with the high-density NJAU 355 K SoySNP array [19] was conducted on Pefficiency-related traits, including PH, NN, BN, PN, 100SW, and SY, under NP and LP conditions across environments E1 and E2. The multiple mixed linear model (MLMM) in the GAPIT package was used, and the threshold was set to 1/n (n is the number of SNP markers, P ≤ 4.82×10 −6 or -log10(P) ≥ 5.32) to identify significant SNPs [19].

Expression Analysis of Candidate Genes
SoyBase (https://www.soybase.org/, accessed on 10 May, 2022) was used to determine the expression levels of candidate genes in different tissues based on RNA sequencing (RNA-seq) data, and the low-P induced transcriptome analysis of candidate genes were searched in NCBI database (https://www.ncbi.nlm.nih.gov, accessed on 10 May, 2022).
For induced expression analysis of GmABCG39, total RNA was isolated from the roots of "Kefeng No.1" and "Nannong 1138-2" grown under +P and -P conditions for 15 days by the use of a kit according to the instructions (Tiangen, Beijing, China). After the RNA was reverse-transcribed into cDNA (Takara, Kyoto, Japan), Real Universal SYBR Green (Tiangen, Beijing, China) was used to generate a 20-μL reaction system to perform quantitative real-time PCR on a Bio-Rad CFX96 Real-Time System (Bio-Rad, CA, USA). The primers used for GmABCG39 were 5′-TCATCAACCAAGCATAGACA-3′and 5′-CCTCAAGATTAGCCTCCATT-3′, and Gmtubulin (GenBank accession number: AY907703) was used as a reference gene [22]. The expression level of GmABCG39 was calculated by the 2 -∆∆CT method [23].

Bioinformatic Analysis of Candidate Genes
The 2-kb genomic sequence located upstream of the start codon of GmABCG39 was identified as the promoter region, and prediction of cis-acting elements was conducted online via the New PLACE database (https://www.dna.affrc.go.jp/PLACE/?action=newplace, accessed on 10 January, 2022). The protein sequence of GmABCG39 was used as a query sequence to predict the likely interacting proteins by the STRING database (https://cn.string-db.org, accessed on 30 June, 2022).

Soybean P-Efficiency-Related Traits at Maturity Exhibited Wide Genetic Variation
To evaluate the genetic variation of P-efficiency-related traits, six P-efficiency-  Table 1). The mean values of the six P-efficiency-related traits under NP condition were higher than those under LP condition, and the results of variance analysis indicated that P levels significantly affected PH, NN, BN, PN, 100SW, and SY in E1 and E2 (Table 1). In addition, with values ranging from 92.97-96.25%, PH and 100SW showed higher broad-sense heritability (h 2 ) than the other five traits did (Table 1). The frequency distribution analysis revealed that PH, NN, BN, PN, 100SW, and SY in E1 and E2 appeared to exhibit normal or approximately normal distributions ( Figure  1), which indicated that P-efficiency-related traits were quantitative characteristics that were controlled by multiple genes.

Significant Correlations Were Detected among the Six P-Efficiency-Related Traits
Correlation analysis was conducted to further determine whether significant correlations occurred among the six P-efficiency-related traits in soybean. As shown in Table  2, PH, NN, BN, PN, and SY all exhibited significant positive correlations with each other under both NP and LP conditions. In addition, PN and 100SW showed significant negative correlations under NP and LP conditions ( Table 2). Taken together, these results demonstrated that significant correlations occurred among soybean P-efficiency-related traits at maturity.

SNPs Related to P-Efficiency Were Identified by a GWAS
To understand the genetic mechanism underlying soybean P efficiency, a GWAS was performed for PH, NN, BN, PN, 100SW, and SY under NP and LP conditions across E1 and E2, and a total of 27 significant SNPs were identified (Table 3, Figures 2 and S1). Among these SNPs, eight, six, two, six, two, and three were found to be associated with PH, NN, BN, PN, 100SW, and SY, respectively, under both NP and LP conditions (Table  3 and Figure 2).  (Table 3).

Prediction of Candidate Genes Associated with Soybean P-Efficiency
Among the repeated SNPs, both AX-93897192 and AX-93897200, which were associated with PH and NN, were located on chromosome 19, and the two SNPs were only approximately 14 kb apart (Table 3). Then, the two SNPs were identified as stable SNPs related to soybean P efficiency, and candidate genes around them were searched. As shown in Table 4, a total of 32 candidate genes were identified. Among them, the homologous gene of Glyma.19g192700 in Arabidopsis was At3g52910, which belongs to the growth regulating factor (GRF) family. A previous study demonstrated that overexpression of Arabidopsis GRF9 in tomato plants regulated resistance to P deficiency [24]. Glyma.19g192900 was named GmABCG39 in soybean [25], and members of ABCGs (subfamily G of the ATP-binding cassette family) were found to play important roles in defences against various biotic and abiotic stresses [26,27]. Homology of Glyma.19g193400 was At1g19490 (AtbZIP62), which regulated responses to drought stress [28], and both Glyma.19g193900 and Glyma.19g194000 encoded purple acid phosphatases. Glyma.19g194500 and Glyma.19g195200 were defined as "protein abscisic acid-insensitive 5" and "auxin responsive protein", respectively, in the Phytozome database, and abscisic acid [29] and auxin [30] were shown to be involved in P signalling pathways in plants. Overall, the above seven genes were identified as candidate genes related to P efficiency in soybean.

Expression Analysis of Candidate P-Efficiency Related Genes
To further identify the functions of candidate P-efficiency-related genes, we detected the expression levels of candidate genes based on low-P induced transcriptome analysis in the NCBI database. Glyma.19g192900 (GmABCG39) was induced by P deficiency in the roots of the low-P tolerant soybean accession "B20" [16] and nodules of "YC03-3" [31]. Glyma.19g193900 and Glyma.19g194000 were induced by P deficiency in the roots and leaves of "B20" [16]. Glyma.19g193900 was also upregulated by P deficiency in the nodules of "YC03-3" [31], and Glyma.19g195200 was induced by P deficiency in the nodules of "YC03-3" [31].
Based on these results, the expression profiles of Glyma.19g192900 (GmABCG39), Glyma.19g193900, Glyma.19g194000, and Glyma.19g195200 in different tissues were analysed based on RNA-seq data available in SoyBase. As shown in Figure 3a, GmABCG39 was expressed only in the roots. Given that roots play vital roles in conferring tolerance to low-P stress, GmABCG39 was chosen as the P-efficiency-related gene for further study.
Induced expression analysis of GmABCG39 in the low-P-tolerant accession "Kefeng No.1" and the low-P-sensitive accession "Nannong 1138-2" revealed that GmABCG39 was significantly upregulated under -P condition compared with +P condition in "Kefeng No.1" and "Nannong 1138-2", with ratios corresponding to 5.14 and 16.95, respectively (Figure 3b). Overall, GmABCG39 might act as a positive regulatory element involved in soybean P metabolism.

Bioinformatic Analysis of the P-Efficiency-Related Gene GmABCG39
The coding DNA sequence (CDS) of GmABCG39 was 4,365 bp, and it encoded 1455 amino acid residues. In addition, two PHR1-binding sequence (P1BS; GNATATNC) elements, which were demonstrated to be involved in responses to low-P stress through binding high-P-efficiency transcription factors in plants [32], were present in the promoter region of GmABCG39.

Significant Correlations Were Detected among Soybean P-Efficiency-Related Traits at Maturity
The GWAS on soybean P efficiency has always focused on traits at the seedling stage [11,12,14]. In this study, however, we evaluated six P-efficiency-related traits at maturity and found that soybean P-efficiency was a complex quantitative trait that was controlled by multiple genes as all the six soybean P-efficiency-related traits, namely, PH, NN, BN, PN, 100SW, and SY, showed wide genetic variants and appeared to exhibit normal or approximately normal distributions (Table 1 and Figure 1). Different from the significant positive correlations observed among PH, NN, BN, PN, and SY under both NP and LP conditions, PN and 100SW showed significant negative correlations under both NP and LP conditions (Table 2). Similarly, the significant negative correlations were found between PN and 100SW under NP condition in previous studies [35,36], which might be caused by supply competition between them.
In addition, we noticed that the correlations among the six P-efficiency-related traits at maturity under NP and LP conditions were consistent. However, the correlations among P-efficiency-related traits at the seedling stage always showed the opposite results under NP and LP conditions. For example, root dry weight and the root-to-shoot ratio showed a significant positive correlation under NP condition but showed a significant negative correlation under LP condition [11]. Similarly, the correlation between shoot P concentration and shoot Mg concentration [22] and the correlation between quantum efficiency of photosystem II and nonphotochemical quenching [14] under NP and LP conditions also showed the opposite results. These results might suggest that using traits at maturity as indexes to evaluate soybean P efficiency is reliable and stable.
Furthermore, the correlation coefficient between PH and NN was the highest compared with those of the others under NP (r = 0.89, P < 0.001) and LP (r = 0.87, P < 0.001) conditions (Table 2). These results were further verified by the GWAS results. For example, two SNPs, AX-93897192 and AX-93897200, were associated with both PH and NN (Table 3).

Novel P-Efficiency-Related Loci in Soybean at Maturity Were Identified
Through our GWAS of the six P-efficiency-related traits at maturity, a total of 27 SNPs were identified, and 9 SNPs were identified repeatedly (Table 3, Figure 2, and Figure S1). Among the nine repeated SNPs, AX-94111943, AX-93815278, and AX-94287407 on chromosome 13 were all located within the interval of QTL qPC-F-1, which was related to the P content of whole soybean plants at the seedling stage [37]; AX-94139280 on chromosome 15 was located within the interval of QTL cqFARLPE-06, which was related to flower abscission rate under LP condition [38]; and AX-93897192, AX-93897200, and AX-93659142 on chromosome 19 were located within the interval of QTL q19-2, which was related to P use efficiency, P uptake, P concentration, biomass yield, chlorophyll content, intercellular carbon dioxide concentration, and transpiration rate [39]. Taken together, these results suggested that the GWAS results concerning soybean P-efficiency-related traits at maturity are reliable. However, the significant markers identified in this study were environment-specific, which could be due to the fact that the traits at maturity were affected greatly by environments.

New P-Efficiency-Related Genes Were Identified at Maturity
Two repeated SNPs AX-93897192 and AX-93897200 on chromosome 19 were associated with two P-efficiency-related traits (PH and NN) and were considered as stable SNPs for subsequent searches for candidate P-efficiency-related genes.
Glyma.19g195200 was identified as encoding an auxin responsive protein, and auxin signalling had been found to be associated with changes in root architecture caused by P deficiency [44,45]. In soybean, the endogenous indole-3-acetic acid (IAA) content in the roots was shown to increase in response to P deficiency, and the application of exogenous IAA promoted the activity of plasma membrane H + -ATPase and P uptake [46]. Liu et al. predicted target sites of microRNAs that were differentially expressed in two soybean varieties with different P efficiencies under NP and LP conditions and found auxin transcription factors were among the main targets [47], indicating that Glyma.19g195200 could be involved in responses to soybean P deficiency regulated by auxin signalling.
Glyma.19g192900, a member of the ABCG subfamily, was named GmABCG39 [25]. L.albABCG36s and L.albABCG37s in white lupin were found to promote cluster root formation through the regulation of indole-3-butyric acid (IBA) transport, thus contributing to adaptation to low-P stress [48]. Moreover, L.albABCG29 improved low-P tolerance by improving root growth [49]. Interestingly, GmABCG39 was expressed only in the roots (Figure 3a), and its expression levels in roots were regulated by both P deficiency ( Figure  3b) and Fe deficiency [33]. Two P1BS elements were present in the promoter region of GmABCG39, and three (Glyma04g40591, Glyma06g14210, and Glyma18g13610) genes encoding three of ten predicted interacting proteins were associated with Fe deficiency in soybean ( Figure 4). Previous studies had demonstrated that P and Fe functioned dependently to maintain ion homeostasis [50], and GmABCG39 might be involved in complex regulatory networks underlying P and Fe homeostasis in soybean roots.

Conclusions
In conclusion, a total of 27 SNPs were identified through a GWAS of six soybean Pefficiency-related traits at maturity across two environments, and two stable P-efficiencyrelated SNPs that were associated with both plant height and node number of the main shoot were identified. Furthermore, a candidate gene, GmABCG39, that was upregulated in soybean roots in response to P deficiency was identified. This work enriches our understanding of the genetic mechanism underlying P-efficiency, which will be helpful in breeding high-P-efficient soybean accessions.

Supplementary Materials:
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/agronomy12092031/s1, Figure S1: QQ plots of the GWAS results of phenotypic values of the six soybean P-efficiency-related traits at maturity under NP and LP conditions in E1 and E2.