Genome-Wide Association Study of Partial Resistance to P. sojae in Wild Soybeans from Heilongjiang Province, China

Phytophthora root rot (PRR) is a destructive disease of soybeans (Glycine max (L.) Merr) caused by Phytophthora sojae (P. sojae). The most effective way to prevent the disease is growing resistant or tolerant varieties. Partial resistance provides a more durable resistance against the pathogen compared to complete resistance. Wild soybean (Glycine soja Sieb. & Zucc.) seems to be an extraordinarily important gene pool for soybean improvement due to its high level of genetic variation. In this study, 242 wild soybean germplasms originating from different regions of Heilongjiang province were used to identify resistance genes to P. sojae race 1 using a genome-wide association study (GWAS). A total of nine significant SNPs were detected, repeatedly associated with P. sojae resistance and located on chromosomes 1, 10, 12, 15, 17, 19 and 20. Among them, seven favorable allelic variations associated with P. sojae resistance were evaluated by a t-test. Eight candidate genes were predicted to explore the mechanistic hypotheses of partial resistance, including Glysoja.19G051583, which encodes an LRR receptor-like serine/threonine protein kinase protein, Glysoja.19G051581, which encodes a receptor-like cytosolic serine/threonine protein kinase protein. These findings will provide additional insights into the genetic architecture of P. sojae resistance in a large sample of wild soybeans and P. sojae-resistant breeding through marker-assisted selection.


Introduction
Phytophthora root rot (PRR), caused by the Phytophthora sojae pathogen, is one of the most destructive diseases of soybeans in world [1]. In China, PRR was first detected in Heilongjiang province in 1989. Subsequently, PRR spread to most soybean-producing areas, which caused significant yield losses each year [2,3]. Currently, the most effective ways to control PRR is to grow soybean cultivars, which confer resistance genes to P. sojae [4].
Two types of resistance to PRR have been reported in soybeans, including partial resistance, which is controlled by multiple genes, and complete resistance, which is mediated by the single dominant Rps resistance gene [5]. The management of PRR has primarily relied on single dominant resistance genes. A large amount of research on resistance-gene mapping has been conducted since the first resistance gene to P. sojae was identified in the 1950s [6]. To date, more than 33 Rps genes/alleles on 9 different soybean linkage groups/chromosomes have been identified and mapped: Rps1a, Rps1b, Rps1c, Rps1d, Rps1k, RpsYu25, and Rps7 were in linkage group N; Rps2 was in group J; Rps3a, Rps3b, Rps3c, and Rps8 were in group F; Rps4, Rps5, and Rps6 were in group G; and Rps1Su was in group O. Moreover, Rps12, RpsHN, RpsQ, RpsGZ, Rps14, and Rps11 were reported as resistance genes to P. sojae [7][8][9][10][11][12][13]. The Rps11 gene was located from Pi 594527 in chromosome 7 by using SNP genotyping [14]. Recently, a Phytophthora resistance gene RpsWy was mapped on chromosome 3 by high-throughput genome-wide sequencing [15]. Two candidate genes, Glyma.03G033700 and Glyma.03G033800, conferring PRR against race 1 were also identified on chromosome 3 using the Specific Locus Amplified Fragment Sequencing (SLAF-seq) approach [16]. A novel Phytophthora resistance gene, RpsZS18, was detected on chromosome 2 of the soybean cultivar Zaoshu18 [17]. RpsYD25 was predicted as a candidate gene and validated to be a diagnostic marker for P. sojae resistance breeding [18].
Complete resistance and partial resistance were not completely independent, the varieties with a complete resistance gene also had higher partial resistance levels. Partial resistance, known as horizontal resistance, was a quantitative trait controlled by QTL. Partial resistance could limit the spread of P. sojae in plant tissues and reduce the degree of root rot [19]. Recently, more than 70 quantitative trait loci (QTL) related to soybean partial resistance against P. sojae have been identified by genome-wide association studies (GWAS) [20]. Glyma.13g32980, Glyma.13g33900, and Glyma.13g33512 were identified on chromosome 13 by GWAS based on naturally occurring variations of 279 accessions from Yangtze-Huai soybean breeding germplasms [21]. A major QDRL (quantitative diseaseresistance locus) on chromosome 18 (QDRL-18) was identified in PI 427105B and PI 427106, which represents a valuable resistance source for breeding programs [22]. Moreover, some genes, such as Glyma.01g32800, Glyma.01g32855, and Glyma.14g087500, which are likely involved in PRR resistance, were identified. These works lay a foundation for exploring the mechanism of P. sojae resistance [23][24][25]. However, even though the rapid shift in quantities of P. sojae limits the effectiveness of resistance genes, the durability of an Rps gene is generally only 8-15 years [3,26]. Therefore, researchers must continuously search for more valuable resistance sources to identify new resistance genes.
Wild genetic resources play a significant role in transferring traits of interest, such as disease and insect resistance, improved quality, abiotic stress tolerance, and manipulation in modes of reproduction [27,28]. China is the origin and diversification center of the wild soybean that possesses many agronomically beneficial traits, such as high protein and lipid contents, adaptation to harsh conditions, and resistance to insects and disease [29]. Serving as valuable genetic resources, wild soybean harbors a high level of genetic variation and is certainly an extraordinarily important gene pool for soybean improvement [30]. However, many soybean collections, but few wild soybeans, were screened for exploiting novel resistance or tolerance sources [31][32][33].
Hence, the objectives of this study are to (i) detect the genetic resource presenting resistance and possibly carrying candidate genes or alleles by screening 242 wild soybeans from different regions in Heilongjiang province, (ii) map the resistance gene through a genome-wide association study (GWAS), and (iii) identify the candidate genes and their functional markers for marker-assisted selection.

Plant Materials
The resistance evaluation and correlation analysis of 242 soybean germplasms were conducted. These resources were obtained from 13 different ecological regions in Heilongjiang province and sampled according to the principle of representative and balanced sampling. All the materials were self-bred.

Medium and P. sojae Strain Preparation
Preparation of medium and strains are shown as follows: Formula for medium: carrots (200 g) were juiced, boiled (30 min), and filtered; the volume was fixed to 1000 mL; and then agar (20 g) was added. This was then sterilized at 120 • C for 20 min. The prepared medium was poured into culture dishes with a thickness of 0.6 cm.
Culture of strain: race 1 of P. sojae was placed in the incubator at 25 • C for dark culture for 7 days.

Pot Experiment
Before planting, the soil was packed into small bags for sterilization at 121 • C for 1h. The seed coat was gently cut and broken with a knife, then sterilized with 75% alcohol for 50 s, and rinsed with sterile water 3-5 times. A total of 5 seeds from each material were planted in a pot and this was repeated 3 times. After emergence, 3 seedlings with consistent growths were kept in each pot. After the first three compound leaves were fully spread, inoculation and identification could be started. The materials were planted in batches every 7 days, which was repeated 3 times, After identification, fresh leaves were sampled and stored in a −80 • C refrigerator for DNA extraction. The CTAB method was used for DNA extraction [34].

Inoculation Identification
The leaf inoculation method was used in this study [35]. The three compound leaves that spread first were obtained and placed in a tray pre-arranged with sterile gauze. Distilled water was sprayed onto the bottom of the tray to keep the gauze moist. Cut a 0.5 cm wound in the center of each leaf with a blade. Then, cut the cultivated fungal medium into 0.5 × 0.5 cm pieces and inoculate it on the leaf wound with the growth side facing upward. The culture medium with no fungus was taken as the control. The inoculation test was repeated 3 times. The culture conditions were 24 • C, 12 h of light, and the relative humidity was 100%. After 5 days of inoculation, the disease status of the inoculated leaves was investigated. The standard of resistance identification is shown in Table 1 [35]. The formula for calculating the infection rate is as follows: infection rate = number of infected leaves (yellowing, browning, or yellowing)/total number of inoculated leaves × 100%.

Genotype Data and Quality Control
Data analysis was performed via R software (3.6.1 the 1me4 packages were loaded). The best linear unbiased prediction (BLUP) was obtained from the three-batch resistance to P. sojae of 242 accessions. The BLUP was used as the phenotypic value for association analysis. The calculation of the broad sense heritability (h 2 ) was obtained using the equation h 2 = δ g 2 /(δ g 2 + δ e 2 ), the variance of genetic variation and residual was calculated by the covariance of the genetic kinship matrix between individuals. Significant differences were evaluated by using one-way ANOVA and Duncan's test at p ≤ 0.05. Tukey's honest significant difference tests were conducted.
Genomic DNA was extracted by random disruption, DNA fragments were recovered, cluster was prepared by splicing, and enrichment amplification and sequencing were performed on Hiseq4000. The sequence data were compared with reference genome sequence by BWA software (0.7.17). When the level of mapping rate was below 70%, elimination was performed. SNPs were identified by Samtools (1.10) and Genome Analysis Toolkit (GATK4.0) [36]. SNP markers were excluded with a missing rate of >50% and a minor allele frequency (MAF) < 0.05.

Population Structure and Linkage Disequilibrium Analysis
The population heterozygosity (He), polymorphism information (PIC), and genetic diversity (π, θ) were calculated by VCF tools. The neighbor-joining tree was constructed using Phylip(3.5c). Population structure was calculated by fast structure [37]. Principal component analysis (PCA), which determines the population structure of G. soja accessions, was calculated using the R software package. The number of subgroups can be estimated by calculating the marginal likelihood. The pairwise linkage disequilibrium (LD) between SNP markers was calculated by using squared allele frequency correlations (r 2 ) with PopLD decay.

Genome-Wide Association Analysis
The association analysis was performed with a general linear model (GLM) in GAPIT (3.0) [38]. The population structure was explained by PCA and the kinship was calculated by the Vanraden method [39].

Fluorescence Quantitative PCR Detection
The resistant and susceptible G. soja accessions were selected to screen differential expressions of candidate genes by fluorescence quantitative PCR. Samples were obtained from the stem 0.5 cm above and below the hypocotyl inoculation site at 0 h, 6 h, 12 h, 24 h, 36 h, and 48 h after inoculation, and were immediately frozen in liquid nitrogen and stored at −80 • C. RNA isolation was performed on each sample using a plant RNA Extraction Kit (Tiangen). The first cDNA strand was synthesized using the transcript RT Kit (Tiangen), according to the manufacturer's instructions. Real-time fluorescent quantitative PCR was performed on a LightCycler480 II (Roche, Rotkreuz, Switzerland). Primer sequences of the candidate gene Glysoja.19G051583 (F: CACCACCAAATCCCAGTT; R: AAGCACCAAAGACCAACAAAA), Glysoja. 15 g042014 (F: AAAAGTTGCTGACCCATTG-GTAAAT; R: TACCATACTGATGCTTACACGCT) were used for fluorescence amplification. The relative levels of transcript abundance were calculated using the underlying comparative threshold method 2 −(∆ Ct) with GmActin4 (GenBank accession no. AF049106) as the internal standard.

Variation in Resistant Levels among G. soja Accessions
A total of 242 G. soja accessions originating from Heilongjiang province were evaluated for their response to one virulent isolate of P. sojae, race 1, using the leaf inoculation method. The susceptible rate for the inoculated accessions ranged from the lowest at 3.70% to the highest at 91.36%. The result's phenotypic evaluation revealed a broad range of P. sojae resistance among the screened accessions. A total of 47 susceptible accessions, 167 intermediate accessions, and 27 resistant accessions were identified according to the standard of resistance; the percentage of genotyping was 19.50%, 69.29%, and 11.20%, respectively, among the tested accessions (Table 2). The phenotypic variations of three batches of P. sojae resistance were analyzed, including descriptive statistics, significance analysis, and generalized heritability ( Table 3). The results show that there is a wide variation of resistance to P. sojae in the population, and the distribution is continuous (Figure 1). The kurtosis and skew of the first and third batches were all greater than 0, and the kurtosis and skew of the second batches were less than 0.

SNP Data
A total of 2.27 TB of data was acquired by high-throughput sequencing. According to the variation of SNPs, we calculated that the population's He was 0.2898, PIC was 0.2389, diversity π was 1.45 × 10 −3 , and theta was 0.1576. A total of 4,152,769 SNPs were polymorphic in our data set (1 SNP/228 bp); a minimum minor allele frequency (MAF) of ≥5% was employed. Of the polymorphic SNPs, 999,800 had an MAF ≥ 5% and missing rate ≤ 50%, and these were evaluated in the present study for associations with P. sojae resistance.

Population Structure
The structure and relevance of soybean populations were analyzed. The marginal likelihood of population composition was estimated from 2 to 9 subgroups in turn. The 242 accessions were divided into two sub-populations (clusters), K1 (red) and K2 (blue), based on structure analysis, as the maximal delta K value was observed when K = 2 ( Figure 2). Sub-population K1 represented typical wild soybean accessions with smaller seeds, and sub-population K2 was predominantly composed of semi-wild soybeans with larger seeds. From the principal component analysis (PCA) (Figure 3), the total amounts of genetic variations explained by the first two principal components were 12.58% and 10.24%. The first two principal components visually differentiated accessions into wild soybeans and semi-wild soybeans, which were consistent with the structure analysis. Sub-populations K1 and K2 were clustered together in the phylogenetic tree analysis (Figure 4), respectively, which displayed consistent results in agreement with the population structure analysis.

Linkage Disequilibrium
The linkage disequilibrium (LD) was calculated using 999,800 SNPs with a minor allele frequency ≥ 5% covering the 20 chromosomes. LD decayed to an r 2 of 0.2 at approximately 50 kb for the whole population. While the LD value of subgroup 1 was 60 kb, the decrement distance of subgroup 2 could not be obtained ( Figure 5).

Genome-Wide Association Analysis
A total of 79 SNPs were identified to be significantly associated with resistance; at least one batch tested P. sojae race 1 at the level of −log 10 (p) ≥ 4.5 in the GLM analysis ( Figure 6, Table 4). We identified 14 SNPs associated with race 1 for the first batch, 15 SNPs were associated with race 1 for the second batch, and 25 SNPs were associated with race 1 for the third batch. Moreover, 25 SNPs were identified to be associated with race 1 for the BLUP. To evaluate the potential resistance gene of P. sojae, the methods used in this study applied several approaches to avoid Rps-mediated responses. A total of 9 SNPs were found to be repeatedly associated with race 1 for both the batch and the BLUP, and were located on chromosomes 1, 10,12,15,17,19, and 20 (three sites on chromosome 19). The phenotypic variation explanation of 9 association SNPs ranged from 7.43% to 10.05% (Table 5).  To confirm the reliability of resistance-associated markers identified by the GLM method, as shown in Table 6, for each variation, the accessions were divided into two groups based on the variations of SNPs. A t-test was performed for the mean value of the susceptible rate between the two groups. The susceptible rate of accessions with alleles that were significantly lower in disease resistance can be selected as reliable variants for screening favorable germplasm resources; meanwhile, alleles with no significant differences in disease resistance were unreliable. A total of seven favorable allelic variations (rs10641-T, rs532502-T, rs718743-C, rs922217-G, rs938638-G, rs940996-C, and rs1958957-T) were detected in nine significant associated alleles ( Table 6). The typical carrier accessions were HAAS_077 and HAAS_264.

Prediction of Candidate Genes for PRR Resistance
As the stable resistance-associated SNPs located on different chromosomes were consistently identified to be associated with the P. sojae resistance in all batches, we performed the candidate gene prediction analysis in the genomic region surrounding the even associated SNPs (Table 6). According to the LD distance, we extended and selected the region of 500 kb upstream and downstream of the peak SNP marker on both sides. We found that four SNPs were located within the gene, the other three SNPs were located in intergenic regions. A total of 30 candidate genes were predicted within the search region (Table 7). Based on the detailed annotations for soybean reference genomes in SoyBase (http://www.soybase.org, accessed on 8 April 2020), or wild soybean candidate genes in NCBI (http://www.ncbi.nlm.nih.gov, accessed on 8 April 2020), 8 candidate genes were predicted from these 30 genes for possibly regulating P. sojae resistance in soybeans and considered to be candidate genes associated with PRR resistance (Table 8). This candidate list included genes encoding resistance to Phytophthora-related proteins, receptor-like kinase proteins, a caffeoyl-CoA O-methyl transferase, and glutathione S-transferase. Two identified RKF3 genes (Glysoja.19G051583 and Glysoja.19G051582) and one RBK2 gene (Glysoja.19G051581) were close to SNP rs938638. The gene Glysoja.19G051582 was only 1 kb away from SNP rs938638, the gene Glysoja.19G051581 was 4 kb away from it, and the gene Glysoja.19G051583 was 7.6 kb away from SNP rs938638.

Discussion
PRR is one of the most serious diseases in soybeans and has caused a great reduction in soybean production in recent years. The application of resistant varieties seemed to be the most effective way to control PRR. However, the widespread use of complete resistance genes can lead to the adaptation of P. sojae populations to the deployed resistance. Searching for more valuable resistance sources has become important to develop cultivars with increased levels of partial resistance. A large quantity of soybean germplasms have been screened for PRR resistance [31][32][33]40,41]. Wild soybean is an extraordinarily important gene pool for soybean breeding. In this study, 27 resistant wild soybeans were identified in response to race 1 of P. sojae, the dominant race of PRR; these works could be useful for breeding and the genetic research on resistance to P. sojae. The intention of this investigation was to identify SNPs by GWAS and candidate genes that play an important role in the PRR resistance variation in our wild soybean population.
For the GWAS analysis, we used the GLM method to identify the markers associated with PRR resistance. By using the genotypic data of 999,800 SNPs with MAF ≥ 5%, a total of 79 SNPs were identified to be significantly associated with the resistance to P. sojae race 1 of at least one tested batch at the level of −log10 (p) ≥ 4.5. Among these SNPs, 9 SNPs were detected to be associated with race 1 for both the batch and BLUP. Compared to previous GWAS studies on P. sojae resistance, our resistance-associated regions were either not on the same chromosomes or were at various distances from the reported alleles and QTLs. Niu et al. (2018) used 337 accessions to identify resistance regions associated with PRR resistance by GWAS, 26 significant SNPs associated with Phytophthora resistance were detected on chromosome 1, and no previous studies have reported resistance loci in this 441 kb region [23]. Schneider et al. (2016) used 1395 Korea accessions to identify seven QTLs on Chr. 3, 13, and 19 associated with partial resistance to P. sojae [42]. The SNPs we identified for race 1 were on Chr. 1, 10, 12, 15, 17, 19, and 20, while Qin et al. (2017) identified six SNPs located on Chr. 3, 5, 13, and 18 associated with race 1. The result show that ss715614943 on Chr. 13 has the highest significant association with P. sojae race 1 with an LOD value of 4.46 in the GLM analysis [24]. Li et al. (2016) also identified a resistance-associated region containing three candidate genes (Glyma.13g32980, Glyma.13g33900, and Glyma.13g33512) on chromosome 13 [21]. The highest significant association rs718743 was identified on Chr. 15 with an LOD value of 5.68 in the GLM analysis in our result. Interestingly, a relatively major effect P. sojae resistance QTL was identified on Chr. 15 through whole-genome resequencing using a diverse panel of 357 soybean accessions in the previous study [20]. Moreover, in our study, a total of seven favorable allelic variations (rs10641-T, rs532502-T, rs718743-C, rs922217-G, rs938638-G, rs940996-C, and rs1958957-T) were identified to be candidate regions for resistance to P. sojae in wild soybeans on chromosomes 1, 12, 15, 19, and 20, which had not been reported. It indicated that these seven favorable materials, including HAAS_077, which carries six tightly resistant associated alleles, could be useful for germplasm innovation and molecular marker-assisted breeding. For candidate gene identification, we discovered even significant SNPs on five different chromosomes that were associated with P. sojae resistance in our wild soybean sample. SNPs in clusters, especially those on chromosomes 15 and 19, are probably the most interesting and worth further investigation. Eight candidate genes involved in plant defense-related reactions were identified here. Glysoja.15G042020 and Glysoja.15G042019, which encode Glutathione Stransferase, were detected in the 48099201-48129356 region on chromosomes 15. A number of studies have reported that Glutathione S-transferase, caffeoyl-CoA O-methyltransferase, LRR receptor-like serine/threonine protein kinase, and receptor-like protein kinases play important roles in plant defense-related reactions against fungal attacks [43][44][45][46][47][48]. Jing et al. (2015) found that the expression of the GST family in soybeans was down-regulated following P. sojae infections [49]. These results imply that Glysoja.15G042020 or Glysoja.15G042019 are likely candidate genes conferring resistance to P. sojae. Receptor-like protein kinases (RLKs) and other stress-related plant protein kinases have been found to be involved in signal transduction. The RLKs located on plant cell membranes have attracted considerable attention in the study of plant signal pathways [50,51]. Interestingly, three of the candidate genes (Glysoja.19G051582, Glysoja.19G051583) were identified close to SNP rs938638, which encode LRR receptor-like serine/threonine protein kinase located on chromosome 19 in the region of 41190378-41199245. Four serine/threonine protein kinase-coding genes are mapped and annotated in the region that is a well-known location for Rps1 and Rps7 in a previous study [52]. Furthermore, the expression of Glysoja.19G051583 reached the highest level at 6 h after inoculation. Importantly, the gene expression was 10-fold greater in resistant germplasms compared with susceptible germplasms. Li et al. (2017) found that Glyma.03g27200 encoding a protein with a typical serine/threonine protein kinase structure and the expression pattern analysis showed that this gene was induced by P. sojae infection, which was suggested as the best candidate gene for RpsQ [9]. Further research revealed that RpsX and RpsQ share common nonsynonymous SNPs and a 144-bp insertion in the Glyma.03g027200 sequence encoding a leucine-rich repeat (LRR) region, which may be important for PRR resistance in soybeans [3]. The results of the present study provide foundational knowledge for researchers who are interested in soybean-P. sojae interactions. A further characterization should focus on validating the role of candidate genes against P. sojae and modulating the resistance between the accessions carrying the R or S alleles in this population.

Conclusions
In the present study, a GWAS was performed to detect genomic regions contributing to partial resistance to P. sojae using wild soybean accessions obtained from Heilongjiang province China. Nine SNPs were detected to be repeatedly associated with race 1 and were located on chromosomes 1, 10, 12, 15, 17, 19, and 20. Some SNPs that coincided with previously reported QTLs for resistance to P. sojae were identified. A total of eight candidate genes were predicted to explore mechanistic hypotheses of partial resistance, including RKF3 and RBK2, which was involved in morphology and development, basal defense, and signal transduction. Some of these SNPs may be useful for P. sojae resistance breeding. Our results also provide additional insights into the genetic architecture of P. sojae resistance in a large sample of wild soybeans.

Conflicts of Interest:
The authors declare no conflict of interest.