1. Introduction
Legumes are plants from the family Fabaceae. Consisting of more than 750 genera and 19,500 species, this family makes up about 7% of all flowering plant species [
1,
2]. This widely distributed family is the third-largest flowering plant family by number of species. From an economic point of view, it is the second-most important after Poaceae (grasses). Due to their great diversity and abundance, legumes include a number of agronomic crops (grain and fodder legumes) and others serve as genetic model organisms (
Medicago truncatula,
Lotus japonicus) [
3,
4,
5]. In the context of sustainable agriculture, many legume species have the potential to establish symbiosis with nitrogen-fixing bacteria and obtain access to nitrogen using biological nitrogen fixation (BNF). BNF is a process whereby plants acquire atmospheric nitrogen through interacting with bacteria capable to convert this molecular nitrogen to ammonium. This symbiotic relationship, within which the plant provides the bacteria with organic compounds used as carbon and energy source and bacteria supply the plant with fixed nitrogen, is a significant competitive advantage for plants in the occupation of nitrogen-poor soil [
6].
The Fabaceae genus
Trifolium includes more than 250 species having cosmopolitan distribution around the world [
7,
8], with the greatest diversity occurring in the temperate Northern Hemisphere. The economic importance of this genus relates especially to those species used extensively as fodder crops for livestock (
T. pratense,
T. hybridum,
T. repens) or as a green manure plant to enhance soil fertility [
9]. Due to their high content of secondary metabolites, such as isoflavonoids, some species are also being studied for potential pharmacological use [
10,
11,
12]. Soil enrichment with nitrogen via growing plants utilizing BNF, such as clover, is more sustainable than using synthetic nitrogen-based fertilizers. However, not all genotypes within the fixation-capable species have the same nitrogen fixation efficiency [
13]. Plant breeding directed to the enhancement of nitrogen-fixing ability is complicated by the complexity of this phenotypic trait, as an estimated several hundred genes are involved in the nodulation and nitrogen fixation [
14,
15].
The early phase of plant–bacteria interaction depends upon an early dialog between the host and microbes [
16] when bacteria begin to produce their own lipochitooligosaccharide signals, termed Nod factors, in response to released plant flavonoids [
17]. These signal molecules determine the specificity of the interaction itself [
18,
19] and are recognized by nod factor receptors on the root surface, such as NFR1 and NFR5 [
20], and this causes both morphological alterations on the root surface and induction of two root-specific and one systemic pathway. While the systemic reaction, known as autoregulation of the nodulation, controls the number of nodules on the roots depending upon the nodule number already formed and regulation based on the availability of nitrogen from the soil [
21], the signal pathways in the roots enable nodulation initiation and nodule formation using calcium-dependent kinases and transcription factors [
22] or cytokinins [
23].
Nodulating bacteria use infection threads to enter the root [
24,
25]. The bacteria then penetrate the plant cell by endocytosis as symbiosomes, which gradually differentiate into nitrogen-fixing bacteroids and further into root nodules. The nodules are specialized organs consisting of bacteroids, meristems, and vascular bundles. Nitrogen fixation is enabled by a complex of nitrogenase-nitrate reductase enzymes [
26] supported by leghaemoglobin proteins located in the nodules that provide oxygen for respiratory processes into the bacteroid membrane as well as reduce the oxygen concentration inside bacteroids [
27]. Two types of root nodules are distinguished: determinate and indeterminate [
24]. Meristems of indeterminate nodules remain functional (genus
Medicago or
Trifolium); determinate nodules, however, lose their meristematic character in later stages of development (genus
Glycine or
Phaseolus) [
28].
Indeterminate nodules are usually created by legumes belonging to the inverted repeat-lacking clade. In this clade, bacteria released into the plant cells terminally differentiate into bacteroids that cannot be cultured, show endoreduplication of their genomes, and maintain changes in the cell wall or in expression patterns [
29,
30,
31]. Many of these changes are processed using small defensin-like peptides, especially nodule-specific cysteine-rich (NCR) peptides, which are typical for legumes with indeterminate nodules and which induce bacteroid differentiation [
32]. In the best-studied legume plant,
M. truncatula, more than 600 NCRs have been identified [
33], but there are large differences in the numbers of NCR peptides among various legumes, ranging from just a few NCRs to hundreds [
34].
It is estimated today that hundreds of genes with differing impacts on the phenotype are associated with the BNF process [
14], and nearly 200 important genes have been identified on model legume plants using both forward and reverse genetics [
35]. Originally, chemical and physical mutagens (γ-rays, ethyl methanesulfonate, fast neutron bombardment) were used to enhance the frequency of mutants and to accelerate the discovery of genes connected with BNF on such model legumes as
M. truncatula [
36,
37],
L. japonicus [
38], or
Glycine max [
39]. In addition, transposon mutagenesis has broadened the possibilities for obtaining mutant populations by using Ac Transposon (
L. japonicus; [
40]), transfer DNA insertions (
L. japonicus; [
41],
M. truncatula; [
42]), retrotransposon
Tnt1 (
M. truncatula; [
43,
44]) or endogenous
Lotus retrotransposon 1 [
45,
46] for both forward and reverse genetics. Antisense RNA/RNAi methods began contributing to a better understanding of BNF’s genetic background at the beginning of the 21st century [
47,
48], and over the years these have enabled the identification of many genes associated with BNF (see review by Arthikala et al. [
49]). In recent years, CRISPR/Cas9 mediated genome editing has been established in legumes such as
G. max [
50],
L. japonicus [
51],
M. truncatula [
52], and
Cicer arietinum [
53] and enabled targeted mutagenesis of BNF-associated genes. Moreover, due to the current possibilities of studying genes participating in fixation, there are also papers demonstrating the advantages of using different approaches and combining methods [
13,
54]. In the field of synthetic biology, there are efforts to introduce nitrogen fixation into plants that have not yet been able to do so, such as cereal crops, by transferring a system of multicistronic genes connected with nitrogen fixation [
55,
56].
Since their development in the early 21st century, next-generation sequencing technologies have significantly accelerated genome research, identification of gene polymorphisms, and phylogenetic analyses. From that time, too, RNA sequencing has become an important and quite universal tool for transcriptome assembly, quantification of gene expression, identification of spliced variants/fusion genes, and analysis of differentially expressed genes (DEGs) [
57,
58,
59,
60]. The last of these, identifying gene expression changes between different experimental conditions or different cell populations, is the most popular application of RNA-seq to many and various questions of interest, such as in detecting genes connected with resistance against stress factors [
61,
62], genes regulating development [
63,
64], or the genes involved in a symbiotic relationship, such as BNF, where it is used for gene expression analysis in symbiotes [
15,
65], transcriptome profiling of nodules [
66,
67], and detection of expression changes during nodule development [
68]. The downstream analyses of DEGs aim at functional characterization and annotation of DEGs or finding possible common patterns among them, including enrichment of certain biosynthetic pathways or Gene Ontology (GO) terms.
The red clover genome has been de novo sequenced for the varieties
Tatra [
69] and
Milvus [
70]. In the context of BNF, Ištvánek et al. [
69] identified 542 potential NCR peptides and 11 leghaemoglobin genes, and De Vega et al. [
70] anchored 22,042 out of a total of 40,868 annotated genes to seven pseudomolecules and constructed a physical map enabling large-scale genomic and phylogenetic studies of traits having biological and agronomic importance. Several studies sequencing red clover transcriptomes have also been published that focus on the stress response [
71,
72], leaf senescence [
73], splice isoforms, fusion gene and non-coding RNA [
74,
75], and leaf variegation [
76]. Owing to the complexity of this trait, and even though red clover has a high level of BNF heritability [
77], phenotypic-level understanding of nitrogen fixation is insufficient. Trněný et al. [
13] identified candidate genes associated with BNF efficiency as well as polymorphisms associated with BNF and reflecting phenotype variability. Our knowledge of the genetic variation within BNF must be expanded on the level of gene expression and transcriptomic analysis.
The goals of our experiments, therefore, were to obtain red clover populations with different levels of nitrogen fixation and perform differential gene expression analysis using RNA sequencing of root nodules of red clover genotypes with contrasting nitrogen fixation levels. The annotation of differentially expressed genes between genotypes with high and low nitrogen fixation efficiency was directed to finding their functions and thereby allowing their connection with biosynthetic pathways associated with BNF. NCR peptides in nodule transcripts were identified and characterized to evaluate their connection to BNF efficiency, and evolutionary analyses were aimed at revealing the roles of different modes of duplicated genes in BNF.
4. Discussion
The
Rhizobium–legumes symbiosis has received much attention in recent decades because soil enrichment by nitrogen using BNF has environmental and ecological advantages over the use of synthetic nitrogen fertilizers. Realization of this phenotypic trait, however, is facilitated by the interaction of two genomes (plant × bacteria) along with an influence of the environment. These conditions, taken together with the involvement of hundreds of genes connected with nodulation and nitrogen fixation, impede research into BNF and the identification of genes with a major influence on BNF efficiency and their utilization for agronomic purposes [
14,
35]. Therefore, the amount of fixed nitrogen acquired by current nitrogen-fixing plants is far below its potential. It has been estimated that the amounts of fixed nitrogen could be increased by as much as 300% through plant breeding and utilizing genotypes highly efficient in BNF [
112]. Moreover, BNF efficiency is a highly variable trait differing not only between species [
113] but also among individuals within a given species [
13].
Due to the estimated high broad-sense heritability of this trait in relatively stable field conditions (more than 0.8 in
G. max [
114] and 0.9 in inbred lines of
T. incarnatum [
115]), the potential for selecting highly effective BNF genotypes is high, although it has been reported that efficiency of particular genotypes is greatly influenced by both environmental conditions (soil acidity, phosphorus availability) and symbiotic partner [
116,
117,
118]. That was the reason why we followed up on the conclusions of Trněný et al. [
13]; we evaluated the BNF efficiency in the next generation of strong- and weak-fixing red clover genotypes analyzed and evaluated in their publication. Because there is not a consistent opinion regarding the effect of ploidy upon BNF [
119,
120], diploid and tetraploid red clover genotypes of different red clover varieties were equally included in both contrasting groups (strong and weak fixators) to minimize the effect of ploidy upon BNF efficiency, and all analyzed plants were planted and maintained under the same conditions to reduce the environmental effect.
Among several methods developed for assessing BNF [
79], ARA is one of the most widespread and is favored for its high sensitivity, and high throughput potential, especially for comparative purposes in manipulative experiments [
121]. Because many factors influence the measured BNF rate, such as temperature [
122], light [
123], ecosystem successional stage [
124] or seasonal/diurnal variations [
125,
126], ARA is less suitable for obtaining absolute values. As in our case, however, uniform measurement conditions at a specific time enable acquiring relative rates of BNF [
127] and thus ARA was a method well suited to our purposes.
RNA-seq and the following differential gene expression analysis were focused upon the discovery of genes differentially expressed to a statistically significant extent within nodules between genotypes with high and low BNF efficiency regardless of ploidy and red clover variety and while controlling for effects of environmental conditions. Nodules served as the target tissue for evaluating nitrogen fixation. The expression profiles obtained reflected the involvement of plentiful genes for processes such as legume–rhizobia interaction and nodule development, and almost 500 DEGs were identified from RNA-Seq data. For the first time, our results report the assessment of genes influencing the efficiency of BNF in red clover.
Because a number of genes were annotated not at all or only in part, annotation of DEGs across genotypes was a necessary step to find their functions and allow their connection to biosynthetic pathways. Insomuch as red clover is not a model genetic plant, its first genome assembly was published only in 2014 [
69], nine years later than the draft genome sequences of legumes
M. truncatula and
L. japonicus [
128]. One year later, another assembly was published [
70] together with the construction of a physical map. Although we used both available annotations to decipher the functions of DEGs, this approach was not sufficient because about a quarter of the DEGs were without any annotation, thereby hindering the disclosure of their functions. Thus, we attempted to improve annotation using recently published annotation files of closely related species. This approach helped to improve annotation and allowed at least one functional annotation category to be assigned to each of more than 90% of DEGs. Even improved annotation, however, is not sufficient to identify the functions of many genes detected as DE, and limited assignment to some functional annotation category may merely suggest rather than reveal a possible function.
As a result of our analyses, DEGs encoded the highest number of enzymes as associated with sesquiterpenoid and triterpenoid synthesis. Terpenoids constitute a highly diverse and widely distributed group of secondary metabolites in plants playing various roles in plant defense, determination of membrane fluidity, or plant growth [
129,
130,
131]. In the context of BNF and nodulation, it has been demonstrated that terpenoids are able to induce the expression of Nod factors or genes involved in the Nod signaling pathway [
132]. Moreover, strigolactones, a group of terpenoid lactones acting as hormones, exhibit various roles in root growth and formation of root nodules in legumes [
133,
134], and strigolactone genes influence nodulation by inducing the expression of Nod factors of rhizobial bacteria [
135]. Among other enriched pathways, several “sugar-related” signaling pathways were found: pentose and glucuronate interconversions (PGI), starch and sucrose metabolism (SSM), galactose metabolism (GM), and amino sugar and nucleotide sugar metabolism (ASNSM). Akbar et al. report the activation of PGI and SSM pathways during salt stress in cotton [
136], and those authors hypothesized that the modification of these pathways could lead to significant tolerance to the salt stress. Similarly, shifting concentrations of metabolites within the PGI pathway were found during a study of stress response and host defense against plant herbivores [
137]. GM and ASNSM pathways are well-studied in fungal pathogens or pathogen–plant interactions because the metabolites of these pathways are utilized on the wall surfaces as compounds of fungal and/or plant cell walls or virulence factors [
138,
139,
140]. Among enriched pathways was also phenylpropanoid biosynthesis. Metabolites of this pathway then enter into multiple other pathways, such as lignin and flavonoid biosynthesis, and contribute to the response to both biotic and abiotic stimuli. They are indicators of various stress factors and mediators of particular stress tolerance [
141]. They help to invade new habitats [
142], or they influence the stability or robustness of plants in relation to mechanical or environmental factors such as drought using phenylpropanoid-based polymers [
143]. Flavonoids, secondary metabolites of one of the branches of the phenylpropanoid pathway, are known to have multiple roles during the processes of nodulation and nitrogen fixation. They act as signal molecules during the early phases of the rhizobia and plant interaction [
144] or serve as polar auxin transport inhibitors leading to nodule organogenesis [
145].
Taken together, the pathways enriched by the representation of DEGs encoding particular enzymes are directly connected with nodulation and BNF (terpenoids, flavonoids), and the metabolites of the others can influence the BNF performance through several possible effects. For instance, metabolites of enriched “sugar-related” pathways are reported to have shifts in concentration under various stress conditions, thus indicating that these compounds could be involved in mechanisms for stress response. Although colonization of symbiotic rhizobia usually does not elicit plant defense mechanisms [
146,
147], the particular step during nodulation could be a cause of defense response under some circumstances because a plant controls every aspect of the correct nodulation process. In case of any problem or defect, the defense response can occur, and a plant can undergo some sort of stress condition. That means that the enrichment of pathways more or less connected with stress responses between genotypes with high and low fixing efficiency could result from the fact that the process of nodulation has not developed correctly, probably in weak-fixing genotypes, and resulting in plant stress response. Alternatively, some genotypes could have undergone some type of stress conditions (e.g., infection, mechanical damage) before they were analyzed, although all plants were planted and maintained in the same way, and these stress stimuli could have an effect on nodulation and BNF efficiency.
Table 5 summarizes the top 10 enriched GO bp terms, and stress response is one of the most enriched. That supports this hypothesis. Other enriched GO bp terms include several responses to stimulus, developmental processes, or interaction with different organisms, all of which are terms relating to biological processes expected in the context of legume symbiosis and nodulation.
Leghaemoglobin genes were among the genes with the highest expression in the nodule transcriptome (
Table 6). The same finding has been proven in
M. truncatula, where genes for leghaemoglobin were also among the most strongly expressed genes in nodule transcriptome [
66], and both species, too, have similar numbers of leghaemoglobin genes [
69]. Leghaemoglobin proteins are necessary for the activity of the enzyme nitrogenase [
148]. Because nitrogenase is irreversibly inactivated by oxygen [
149], leghaemoglobins reduce free oxygen levels inside the bacteroids while allowing ATP production by transporting oxygen for respiratory processes on the bacteroid membrane [
150].
Inasmuch as red clover has nodules of an indeterminate type whose bacteroids are terminally differentiated, NCR peptides play an important role in nodule development, especially in bacteroid differentiation. Therefore, we strove to identify NCR peptides expressed in nodule transcripts and evaluate their predicted functions using in silico approaches. Ištvánek et al. [
69] predicted 542 genes for NCR peptides during the first red clover assembly using tblastx searches against NCR peptides of
M. truncatula, and that number is comparable with those identified in this model legume [
151]. In contrast to this prediction, we were able to identify only 33 genes within the nodule transcriptome that met the criteria set for the search for genes encoding NCR peptides (structure, conserved cysteines, length). Only 33 out of 37,000 genes detected in nodes had a conserved structure with 4 or 6 cysteines and length <150 bp, and for only 4 out of these 33 sequences were their functions supported by in silico analysis assessing, for example, signal sequence or subcellular location and BLAST searches. These differences in amounts of predicted and detected NCR peptides arose mostly due to our use of different methods. To predict NCR peptides, BLAST searches were performed regardless of structure, length, or other aspects that were considered in identifying NCR peptides in this study. Moreover, not all similar genes need to be really NCR in nature. They can be pseudogenes or can have different functions, such as producing defensins instead of functioning in root nodule symbiosis. As a result, many genes predicted as NCR during red clover assembly lack the typical NCR structure with conserved cysteines. The resulting number of sequences found was significantly lower compared to those in
M. truncatula, but the abundance of NCR peptides among legume species has been reported to be highly variable [
34]. The high number of NCR peptides can be due to: (1) constrained rhizobial growth in nodules, (2) selection against cheaters, (3) control of bacteroid development and metabolism, or (4) a combination of these points. Lower numbers of NCR peptides have been identified in several other legumes, such as 63 in chickpea (
C. arietinum) [
152] and 7 in
Glycyrrhiza uralensis [
32].
Gene duplication is considered to be one of the most important evolutionary mechanisms generating plentiful raw materials for processes such as speciation or neofunctionalization [
153]. Gene duplication was realized by several mechanisms to varying degrees that include, among others, single gene duplication and whole genome duplication. Single gene duplication consists of four types: tandem (TD), proximal (PD), transposed (RD), and dispersed duplication (DSD) [
106]. In the context of BNF, WGD has been extensively studied in connection with an ancient polyploidy event that occurred in a Papilionoideae lineage of legumes approximately 58 Ma ago [
151]. Although it is generally supposed that this event did not precede BNF, it might have facilitated and refined the BNF system using genetic materials provided by this polyploidy event [
154]. Here, we classified DEGs into five groups according to the duplication mode. We inspected the distribution of each particular mode among the DEGs, then compared this distribution with those across all genes detected in nodules. While we observed no statistically significant difference between the distribution of WGD, PD, and RD duplicates, TD and DSD duplicates were significantly overrepresented in DEGs and non-duplicated genes were significantly underrepresented in DEGs. The results showed the non-random distribution of a particular mode in DEGs and the preferential representation of duplicated genes connected with BNF efficiency. According to Qiao et al. [
106], TD together with PD showed no significant decrease in frequency over time, thus indicating that this mode of duplication offers a continuous supply of genetic material for evolution and important genetic material for rapidly changing environments [
155]. Dispersed duplicates are among the most prevalent duplication modes in genomes across different plant species [
156]. Expression divergence analysis showed that about 75–80% of duplicated gene pairs diverged from each other in all those duplication modes analyzed, but the answer as to why only TDs and DSDs are overrepresented in DEGs remains unknown.
WGCNA analysis complements DEGs analysis and enables the arrangement of other transcripts. WGCNA analysis is used to classify genes according to their expression profiles. Genes with similar expression patterns may form clusters (modules) [
157]. Transcripts in one module have a similar transcription pattern through all RNA-seq samples. In terms of nitrogen fixation, modules Blue, Red, and Salmon are negatively correlated and Greenyellow, Turquoise, and Purple are positively correlated. Of 491 DEGs, 51% belong to the Blue module and 15%,14%, 8%, and 7%, respectively, to the Turquoise, Brown, Red and Yellow modules. The remaining DEGs are spread across other modules or were filtered prior to WGCNA analysis. Among putative NCR genes, 3 (gene18074, gene23764, and gene38999) of 8 such genes were part of the Blue module and 1 (gene33781) was part of the Turquoise module. Another 4 identified NCR genes do not fulfill wgcna filter criteria for minimal expression level and expression variance across samples.
An interesting question is of where the known core genes of the root symbiotic nitrogen process appear. To answer this, we borrowed a list of 19 core predisposition genes that were collected in other studies within closely related species, in particular
M. truncatula [
158]. We found their red clover orthologues and then their localization in the WGCNA network and in the DEGs list (
Supplementary Table S6). Fifteen core genes are captured in the most common Turquoise module, 2 core genes in the Brown module, and 1 each in the Blue, Green, Greenyellow, and Yellow modules.
Interestingly, no core gene was identified among the DEGs, indicating that the differential phenotype of nitrogen fixation levels is realized not at the level of symbiosis establishment and symbiotic structure formation but rather at the level of fixation regulation. This is supported by the fact that we are comparing not zero fixation level with non-zero but lower with higher fixation levels.
Nitrogen fixation through root nodule symbiosis is an essential process by which diazotrophic organisms make otherwise unavailable nitrogen available for their life needs and, through themselves, make it available to other living organisms. The phenomenon of symbiotic nitrogen fixation has evolved multiple times independently in one evolutionary branch of angiosperms that has been termed the “Nitrogen-fixing clade”. We can assume that, prior to the actual development of the ability to fix nitrogen, plants of this clade must have been predisposed through a support mechanism already in place [
158,
159]. It is probable that a broad and very complex transcriptomic background allowed nitrogen fixation to evolve while enabling the preservation of transcriptomic diversity in fixing nodules.
In red clover, an important non-model plant and forage crop, we found 491 differentially expressed genes connected with BNF efficiency. Subsequent annotation of genes in nodule transcriptome revealed more than 800 genes not yet experimentally confirmed. We were able to confirm only four nodule-specific cysteine-rich (NCR) peptides in the nodule transcriptome. In addition, we found unequal distribution of different modes of gene duplication in DEGs, with genes originating from tandem and dispersed duplication being significantly overrepresented in DEGs. Finally, using WGCNA we organized expression profiles of the transcripts into 16 modules linked to the analyzed traits, such as nitrogen fixation efficiency or sample-specific modules. Nodule transcriptomics is a rewarding topic. A series of transcriptomic studies have revealed transcripts associated with the root nodule symbiotic process [
15,
160,
161,
162,
163,
164,
165,
166,
167,
168,
169,
170,
171,
172,
173]. The DEGs identified in this study and their analyses allowed a comparison to the nodule transcriptome in genotypes with different BNF efficiency and provided a valuable resource for further investigation of the genetic basis of this trait of interest.