Genome-Wide Association Study of Six Forage Traits in Ramie (Boehmeria nivea L. Gaud)

Genome-wide association study (GWAS) of six forage traits using whole-genome sequencing data generated from 301 ramie accessions found that traits were continuously distributed; the maximum variant coefficient was fresh weight per clump (FWPC) (2019) and individual plant height (IPH) (2019) minimum. Correlation analysis demonstrated that 2019 and 2020 results were similar; all traits were correlated. GWAS analysis demonstrated that six traits exhibited consistent and precise association signals. Of the latter, 104 were significant and detected in 43 genomic regions. By screening forage trait-associated single nucleotide polymorphisms and combining Manhattan map with genome annotation, signals were categorized according to functional annotations. One loci associated with fresh weight per plant (FWP) (chromosome 5; Bnt05G007759), two associated with FWPC (chromosome 13; Bnt13G018582, and Bnt13G018583), and two associated with leaf dry weight per plant (LDWP) and dry weight per plant (DWP) (chromosome 4; Bnt04G005779 and Bnt04G005780), were identified. We describe forage trait candidate genes that are highly correlated with FWP and FWPC; Bnt05G007759 may be involved in nitrogen metabolism, while Bnt13G018582 and Bnt13G018583 may encode TEOSINTE branch 1/CYCLOIDEA/proliferating cytokine 1 (TCP) domains. Bnt04G005779 and Bnt04G005780, which may regulate growth and development, are highly related to LDWP and DWP. These genomic resources will provide a basis for breeding varieties.


Introduction
Ramie (China grass) is one of the oldest vegetable fiber crops in China, and its fibers have been used as textiles for over 4700 years in China [1]. However, as a fibrous crop application, only 5% of the whole plant is used, with the remaining 95% abandoned [2]. Consequently, many byproducts of ramie are not utilized well every year, except for fibers. Therefore, it is necessary to evaluate the feeding value of ramie to promote multifunctional development.
The demand for animal products has increased markedly and protein-rich feed shortages have consequently increased sharply [3]. To meet the ever-increasing requirement for feed, there is an urgent need to search for domestic plant resources with high nutrient content, such as ramie. Many studies have shown that fresh ramie leaves and shoot tips, which can be used as animal feed, have low fiber content and are rich in nutrients, such as crude protein (CP), lysine, and carotene [4][5][6][7]. Ramie is an unconventional forage resource with high biomass and rich in carbohydrates and minerals [8,9]. Toledo et al. reported that ramie could replace alfalfa and improve rabbit growth [6]. Tang et al. showed that there were no negative impacts on the growth of black sheep when ramie was used instead of alfalfa hay [10]. Therefore, it is necessary to study the forage characteristics of ramie and ramie varieties with high biological yields and protein content.
Individual plant height (IPH) and fresh weight per plant (FWP) are important factors influencing ramie biomass yield. CP and fiber components are two critical indices of forage quality that affect digestibility and nutritional intake during animal feeding [11]. In recent years, despite the use of ramie as feed, little progress has been made in research on feeding traits of ramie.
The aim of genome-wide association studies (GWAS) is to discern a large number of SNPs and identify trait-associated SNPs to reveal their biological mechanisms [12]. Compared to traditional genetic mapping methods, GWAS uses thousands of SNPs and linkage disequilibrium in the genome to identify quantitative trait loci (QTL) related to traits of interest. GWAS facilitates the analysis of complex trait loci in plants [13,14] and has been successfully used to detect plant morphological genetic loci [14], disease resistance [15,16], grain size [17], and many other agronomic traits [18]. SNPs significantly related to quantitative traits were obtained by GWAS and were associated with target candidate genes, which have been widely used in many plants. Twenty-two candidate genes were identified for resistance against Verticillium wilt in cotton using GWAS [19]. GWAS helped identify six candidate genes with pleiotropic effects on maize stem cell wall components [20]. Furthermore, a gene associated with arsenic accumulation in the leaves of Arabidopsis was determined by conducting GWAS [21]. GWAS has also been successfully applied to the detection of genetic loci associated with ramet number in ramie [22]. However, the molecular marker-selective breeding associated with the forage traits of ramie remains largely unclear, especially for GWAS analysis. Here, we present a comprehensive genomic variation map of ramie obtained by sequencing 301 landraces and cultivar accessions. Furthermore, to identify genetic markers associated with forage quality traits in ramie germplasm resources, which was a major objective of this study, we performed a GWAS with respect to six forage traits. Furthermore, candidate genes related to forage quality traits of ramie are also discussed. This study helps clarify the forage characteristics of ramie germplasms and provides candidate genes controlling the feeding quality traits of ramie.

Phenotypic Trait Analysis
A total of 301 ramie varieties were harvested on 10 June 2019, and 2020, and the traits related to forage were recorded for assessment. The results indicated that all traits showed tremendous phenotypic variation and continuous distribution in the population with the coefficient of variation ranging from 18.0% to 57.0% and were close to fitting a normal distribution (Figures 1 and 2). Therefore, we considered these traits to be quantitative. In particular, the variation coefficient of fresh weight per clump (FWPC) was the highest in 2019 and that of IPH was the lowest in 2019. In addition, all trait variation coefficients were slightly lower in 2020 than they were in 2019 ( Table 1). The coefficient of variation (CV) is the main index used to measure the degree of dispersion of population traits. A better estimate of the correlation between markers and traits can be obtained with a high degree of trait dispersion and large trait separation. In general, the coefficients of variation for these traits were relatively high and suitable for GWAS analysis.       Correlation analysis was performed to gain insight into the relationships between these traits (Tables 2 and 3). The results for 2019 and 2020 are very similar, and all features are highly correlated. The correlations between traits were extremely significant (Tables 2 and 3).

GWAS Analysis for Traits RELATED to Forage
The genetic structure of a population can markedly influence the power of a GWAS; thus, before performing the GWAS, we inferred the population structure using a subset of 1,433,616 SNPs without close linkages [23]. Our previous studies, through Bayesian cluster analysis and principal component analysis (PCA), show that our group has no obvious grouping, suitable for GWAS. We also analyzed the linkage disequilibrium (LD), and discovered that the average coverage density of SNP in this study was 45.45 bp, less than 0.65 kb of LD decay distance [24]. This indicated that the SNP density used in this study was sufficient to capture all genetic variations in the population.
To identify potential candidate genes related to forage traits in natural populations of ramie accessions, according to 5,951,826 SNPs (Table 4) identified from 301 accession and 6 feeding traits of ramie, we performed GWAS of traits related to forage in ramie using Efficient Mixed-Model Association eXpedited (EMMAX). We determine the threshold according to the Q-Q plot (Figures 3-8) on the right side of the figure. When −log 10 p-value is 6, the statistical model conforms to the expected value. In total, 78 significant association signals were detected in 43 genomic regions (Figures 3-8; Tables S2-S4). Of the loci identified, 18 were LDWP-associated, 17 were IPH-associated, 15 were FWP-associated, 12 were DWP-associated, 9 were SDWP-associated, and 7 were FWPC-associated. Notably, the SNP from chromosome 4 (position 9,261,790) was significantly associated with both the traits of LDWP and DWP, and it showed the largest signal associated with the LDWP (p = 1.5 × 10 −8 ) among all association SNPs. In addition, there was another significant SNP on chromosome 4 (position 9,064,066), which was significantly correlated with the traits of LDWP and DWP, and was located between Bnt04G005779 and Bnt04G005780 genes.    Notes: Blue and green dots represented SNP (single nucleotide polymorphism) sites on chrom somes, respectively. When the value of −log10 p was above 6 (gray line), the selected SNP site w significant aboriginality. Orange dotted lines represented the location of SNP loci.     Notes: Blue and green dots represented SNP (single nucleotide polymorphism) sites on chro somes, respectively. When the value of −log10 p was above 6 (gray line), the selected SNP site w significant aboriginality. Orange dotted lines represented the location of SNP loci.     Notes: Blue and green dots represented SNP (single nucleotide polymorphism) sites on chrom somes, respectively. When the value of −log10 p was above 6 (gray line), the selected SNP site w significant aboriginality. Orange dotted lines represented the location of SNP loci.  Notes: Blue and green dots represented SNP (single nucleotide polymorphism) sites on chromosomes, respectively. When the value of −log 10 p was above 6 (gray line), the selected SNP site with significant aboriginality. Orange dotted lines represented the location of SNP loci.

Identification of Candidate Genes
When a −log 10 p-value was > 6, SNPs above this threshold were deemed to be significantly associated with the target traits. We examined genes containing SNPs associated with forage traits and revealed that 78 forage trait-associated SNPs were located in 43 gene regions (including the genic region and upstream/downstream region within 20 kb) (Table S2-4). We categorized the 63 candidate genes according to functional annotations by screening SNPs closely associated with forage traits through the Manhattan map and combined these with genome annotation (Table S5). Next, we selected ten signals that may be related to forage traits. Our results revealed that among the genes containing SNPs associated with these six traits, one was associated with IPH, two with FWPC, three with FWP, four with DWP, three with LDWP, and two with SDWP (Table 5).
Annotated genes were searched within the interval associated with the SNP linkage disequilibrium block and within the physical position of each significant SNP. A peak strongly associated with FWP, identified on chromosome 5, was located in Bnt05G007759 (position: 11,395,767 bp, Figure 9). Through genetic annotation, we found that Bnt05G007759 has a critical domain, the MOSC N-terminal beta barrel (Table 5). Molybdenum cofactor sulfurase (MCSU) contains a key domain in the MOSC N-terminal β-barrel. Its main structural features are the β-barrel structure at the N-terminus of the MOSC domain and the strictly conserved cysteine residues at the extreme C-terminus [25]. MCSUs participate in processes such as nitrogen metabolism and the regulation of abscisic acid levels in plant tissues and are important enzymes for plant development and response to environmental queues [26]. Nitrogen is an essential nutrient for plant growth and development, and is a major factor limiting crop yield. Ramie is a crop with a very high nitrogen requirement. Biological yield is positively correlated with soil nitrogen levels, and FWP is an important factor in ramie biological yield. Therefore, there was a strong correlation between genes related to nitrogen metabolism and the FWP of ramie. Thus, we speculated that there is a correlation between MCSUs and ramie growth and development, and that Bnt05G007759 might be a logical candidate for FWP.   Among the signals associated with the FWPC trait, we found a particularly interesting association locus on chromosome 5 that was associated with the FWPC trait located between the genes Bnt13G018582 and Bnt13G018583 (position: 7866103bp, Figure 10). Bnt13G018582 and Bnt13G018583 belong to the TEOSINTE branch 1/CYCLOIDEA/proliferating cytokine 1 (TCP) transcription factor family, which participates in cell growth, leaf tissue proliferation, and branch growth [6,27,28]. The TCP family is a small plant-specific gene family that was first identified in 1999. The TCP domain is a specific transcription factor with a 59-amino acid basic helix-loop-helix motif [29]. FWPC is an important indicator of the biological yield of ramie. An increase in leaves and branches increases the biological yield of ramie. The TCP gene affects leaf and branching growth of plants; therefore, we speculate that Bnt13G018582 and Bnt13G018583 are potential candidate genes for the traits of FWPC.   In addition, we found two genes, Bnt04G005779 and Bnt04G005780 (position: 9,261,790 bp, 9,264,066 bp, Figure 11), which were significantly associated with both LDWP and DWP traits on chromosome 4. Bnt04G005779 encodes a protein containing an F-box domain, whereas Bnt04G005780 belongs to the F-box gene family. At the N-terminus of the F-box protein, there is an F-box domain of approximately 40-50 amino acids, that is mainly associated with the binding protein Skp1. The skeleton proteins Cullin1 and Rbx1 form the SCF complex, and at the C-terminus of the F-box protein, the SCF complex is usually the secondary structure that mediates the specificity of substrate recognition [30,31]. The majority of F-box gene-encoded proteins that play a role in the ubiquitin-protease pathway are SCF complexes. However, a small number of F-box proteins in the form of non-SCF complexes are also involved [32]. LDWP and DWP are important ramie biomass indicators. F-box genes are widely present in plants and are among the largest and fastest-evolving gene families. The biological functions of these gene families are diverse and are distributed among various plants. They play an important role in plant growth and development, biotic and abiotic stresses, and other processes. Therefore, we speculate that Bnt04G005779 and Bnt04G005780 are potential candidate genes for LDWP and DWP traits.

Analyzing the Characteristics of Ramie and Related Indexes of Feeding Characters
Ramie, a perennial bast fiber crop widely grown throughout China, has been used as animal fodder because of its high fresh yield and leaves rich in nutrients including protein

Analyzing the Characteristics of Ramie and Related Indexes of Feeding Characters
Ramie, a perennial bast fiber crop widely grown throughout China, has been used as animal fodder because of its high fresh yield and leaves rich in nutrients including protein and eight amino acids essential for animal health [33]. It can be made into fresh fodder or dried into leaf meal; fresh leaves are used in the United States, Japan, Brazil, Spain, Colombia, and other countries as feed for cattle, chickens, and pigs [34]. The addition of ramie leaves to livestock and poultry diets has been shown to improve their production [5,35]. Toledo et al. found that using ramie as feed to replace 15% alfalfa enhanced the growth of rabbits [6]. There are an increasing number of reports on ramie feeding; however, to date, ramie feeding characteristic indices have not been standardized and there are very few research reports on the same. In this study, we investigated and analyzed six feeding traits of 301 ramie resources over a period of two years. Abundant phenotypic variation and heritability estimates are significant for successfully detecting the genetic basis of target traits. The CVs of six biological yield traits of forage ramie were all greater than 0.20, and three of them had CVs exceeding 0.40 ( Table 1). The CVs observed in this study were generally higher than those reported in a previous study on ramie [22], suggesting that ramie has more abundant phenotypic variation in forage traits. This will provide a clear target for breeding forage ramie varieties.

Application of GWAS in Forage Ramie
It is important to study ramie genetic resources to breed new varieties and for global biodiversity conservation. GWAS of ramie traits is rare; candidate ramie genes were obtained using GWAS analysis for the first time [22]. However, our knowledge of the genetic variation and architecture of forage ramie traits is still limited. Forage characteristics are typical quantitative traits in ramie; CP, IPH, FWP, DWP, LDWP, SDWP, and FWPC are controlled by multiple genes. A previous study detected an association between SNPs and quantitative traits in forage ramie using an association population consisting of 301 core germplasms [22]. In this study, the core collections used for GWAS were obtained from different regions in China and represent the typical characteristics of various feed ramie varieties. In addition, all traits were continuously distributed in this population and were close to a normal distribution (Figures 1 and 2), with high levels of genotypic and phenotypic diversity (Table 1) suitable for GWAS analysis.

Putative Candidate Genes Associated with Forage Quality
GWAS analysis was used to determine the number of target regions that putatively controlled important agronomic traits. Among the nine annotated genes associated with forage quality traits, three-Bnt13G018582, Bnt13G018583, and Bnt05G007759-were identified by LD block and GWAS analyses ( Table 5). The genes Bnt13G018582 and Bnt13G018583 contain a TCP domain. It has been postulated that TCP transcription factors regulate plant growth and development by mediating environmental factors and hormone signaling pathways [36][37][38]. TCPs have been studied extensively with regard to the regulation and control of leaf morphology by promoting the differentiation of leaf pavement cells via both auxin-and auxin-independent pathways. TCP4 restricts the cell number and final size of leaves in Arabidopsis [39]. He et al. demonstrated that TCP2 regulates leaf morphogenesis via a mechanism involving cell proliferation in Arabidopsis [40]. Previous studies have shown that DELLAs directly regulate the activity of the plant-specific class I TCP transcription factor family, which are the key regulators of cell proliferation [41]. Daviere et al. demonstrated that class I TCP factors directly bind to the promoters of core cell-cycle genes in Arabidopsis inflorescence shoot apices, while DELLAs block TCP function by binding to their DNA-recognition domain [38]. In our study, we used GWAS analysis to obtain two candidate genes, Bnt13G018582 and Bnt13G018583, which are associated with critical forage traits, FWPC of ramie, and contain a TCP domain. Therefore, we suggest that these genes may be involved in leaf growth and plant height of forage ramie; however, their specific function needs to be tested further and verified.
MCSUs participate in processes such as nitrogen metabolism and the regulation of abscisic acid levels in plant tissues and are important enzymes for plant development and response to environmental cues. There are three critical domains in all MCSUs: aminotransferase class V, the MOSC N-terminal beta barrel, and MOSC [26]. In our study, we identified the candidate gene Bnt05G007759 using GWAS analysis. This gene contains an MOSC N-terminal beta barrel domain, may be involved in nitrogen metabolism in ramie, and is related to forage traits of ramie.
Plant F-box proteins regulate many key plant physiological processes, including hormone regulation, light signal transduction, self-incompatibility, floral organ development, pathogen invasion resistance, and specific recognition of degraded proteins [38]. The first Fbox protein discovered was UFO, and UFO genes play an essential role in flower meristem and organ development [42]. The Arabidopsis F-box gene At PP2-B11 plays an important role in drought stress as a negative regulator, reducing plant tolerance to drought [43]. The At2g02360 (F-box-Nicta) gene in Arabidopsis thaliana encodes a sugar-condensing F-box protein. Overexpression of F-box-Nicta in transgenic A. thaliana can reduce the bacterial colonization rate after Pst DC3000 infection and reduce leaf damage [44], and FBX92 affects plant leaf size [45]. Therefore, we believe that the Bnt04G005779 and Bnt04G005780 genes may ensure the normal operation of life activities through multiple coordinated nodal mechanisms; however, their specific functions require further testing and verification.
The identification of target genes can be used in future functional and molecular breeding studies. These new genomic resources will promote research on genome evolution, perennation mechanisms, and other traits of forage ramie. This, in turn, will facilitate the breeding of improved forage to benefit human society.

Plant Materials
In the present study, 301 core germplasms of ramie used for SNP development were cultivated at the Institute of Bast Fiber Crops, Chinese Academy of Agricultural Sciences. The germplasms were collected from China, India, and Cuba (Table S1). All accessions were used for asexual propagation by tender stem cutting in the testing field of Yuanjiang, Hunan Province, China (112 • 33 N, 28 • 16 E), in 2016. We conducted field planting trials according to a randomized block design. There are 2 rows (2 repetitions) for each variety, 6 plants in each row, the row length is 3.6 m, and the row spacing is 1 m. We selected core germplasms with abundant phenotypic variation that could represent ramie germplasm resources as suitable accessions for GWAS.
In May 2019 and 2020, fresh leaves of 301 ramie accessions were collected twice, placed in liquid nitrogen in an icebox, and frozen at −80 • C until genomic DNA was extracted for sequencing.

Phenotype Evaluation
The traits related to forage, such as IPH (straight length from ground to main stem tip), FWP (fresh weight of each plant cut off from the ground), DWP (dry weight measured per plant), LDWP (dry weight of leaves per plant after drying), stem dry weight per plant (SDWP: dry weight of stem after drying), and FWPC (total fresh weight of several individual plants of one ramie clump), were recorded for assessment. According to the normal cultivation method, 301 accessions were harvested on 10 June 2019, and 10 June 2020. Afterwards, the traits were investigated in the second growing season of each year, when the ramie grew to about a meter high.

Statistical Analyses
Multiple comparisons of all parameters were conducted using one-way analysis of variance (ANOVA) and Tukey's honest significant difference test.
Descriptive statistics were computed using SPSS 22.0. Microsoft Excel 2010 was used to determine the frequency distribution for each trait.

Population Structure and LD Analysis
All SNPs from ramie accessions were filtered using the PLINK software (version: 1.9; parameters -indep-pairwise 50 5 0.5) [46], generating a subset of SNPs without close linkages. Investigation of population structures was carried out using ADMIXTURE software (version: 1.3.0) [47], and it was run 10 times for each K value with varying random seeds; the Q-matrices were aligned using pong software and were clustered based on similarity [48]. Thereafter, the matrices from the largest cluster were averaged to produce a final matrix of admixture proportions. A PCA was performed using the smartPCA program of EIGENSOFT software (version: 6.1.4) [49]. To measure LD levels in groups of landraces and the bred cultivars, r 2 of each allele was estimated using PopLDdecay (version: 3.34) with the following parameters: −MAF 0.05 −Miss 0.2 −MaxDist 300 [50]. Average r 2 values were calculated for each distance.

GWAS Analysis
Genome sequence was performed for 301 accessions using the next-generation Illumina HiSeq platform. Sequencing and analysis were conducted by OE Biotech Co., Ltd. (Shanghai, China). Raw sequence reads were filtered to produce clean reads that were aligned to the ramie genome (downloaded from NCBI; accession no. PRJNA663427) using BWA (version 0.7.15), allowing no more than 4% mismatch and one gap [51]. The alignment results were converted to bam format using SAMtools (version: 1.9) [52], and duplicated reads were removed using the picard package. GATK (version 3.8.1) [53] was used to identify SNPs and indels. GWAS analysis was performed based on 5,951,826 SNP data of the population (accession number GVM000103 in the Big Data center at the BIG). All SNPs from ramie accessions were filtered using the PLINK software (version: 1.9; parametersindep-pairwise 50 5 0.5; Purcell et al., 2007), generating a subset of SNPs without close linkages for population structure analysis [46]. Efficient Mixed-Model Association eXpedited (EMMAX) software was used to perform GWAS using SNP genotypes and phenotype matrices under principal component analysis and kinship as covariations [54,55], and GEC software was used to calculate the effective number of independent markers and significant p-value threshold [56]. Manhattan and quantile-quantile plots were created using the R package CMplot (https://github.com/YinLiLin/RCMplot (accessed on 8 December 2020).). According to genome annotation, SNPs were divided into five regions: exon, intron, upstream, downstream, and intergenic. After screening for counterfeit SNPs, high-quality and significantly correlated SNPs were obtained, and the threshold was log 10 p-value ≥ 6.

Prediction of Candidate Genes
BLAST analysis was conducted on the ramie genome using favorable SNP sequences [57], and determined the SNPs related to forage traits. The size of the site interval was estimated based on LD block analysis, and the genes in the region near the site were obtained, and the functional annotation of these genes was performed. According to the functional annotation, we can roughly determine which genes may be related to the traits we are concerned with and enter the subsequent analysis.