Evolution of RAD- and DIV-Like Genes in Plants

Developmental genetic studies of Antirrhinum majus demonstrated that two transcription factors from the MYB gene family, RADIALIS (RAD) and DIVIRICATA (DIV), interact through antagonism to regulate floral dorsoventral asymmetry. Interestingly, similar antagonistic interaction found among proteins of FSM1 (RAD-like) and MYBI (DIV-like) in Solanum lycopersicum is involved in fruit development. Here, we report the reconstruction of the phylogeny of I-box-like and R-R-type clades, where RAD- and DIV-like genes belong, respectively. We also examined the homology of these antagonistic MYB proteins using these phylogenies. The results show that there are likely three paralogs of RAD-/I-box-like genes, RAD1, RAD2, and RAD3, which originated in the common ancestor of the core eudicots. In contrast, R-R-type sequences fall into two major clades, RR1 and RR2, the result of gene duplication in the common ancestor of both monocots and dicots. RR1 was divided into clades RR1A, RR1B, and RR1C, while RR2 was divided into clades RR2A/DIV1, RR2B/DIV2, and RR2C/DIV3. We demonstrate that among similar antagonistic interactions in An. Majus and So. lycopersicum, RAD-like genes originate from the RAD2 clade, while DIV-like genes originate from distantly related paralogs of the R-R-type lineage. The phylogenetic analyses of these two MYB clades lay the foundation for future comparative studies including testing the evolution of the antagonistic relationship of proteins.


Introduction
The MYB gene family comprises three members, A-, B-and c-MYB [1,2], found in many vertebrates, that are involved in the regulation of cell proliferation, differentiation, and apoptosis [3]. Homologs of MYB genes have also been identified in insects, fungi, and slime molds [4]. The first plant MYB gene, C1, was isolated from Zea mays, and it encodes a c-MYB-like transcription factor involved in anthocyanin biosynthesis [5]. Plant MYB proteins were found to be involved in the regulation of many developmental processes including the biosynthesis of anthocyanin and flavonoids, trichome differentiation, the determination of cell shapes, and the regulation of cell proliferation and cell cycles [5][6][7][8][9].
In plants, the MYB genes have also been found in the regulation of the development of floral symmetry in the Lamiales [10]. In the zygomorphic flowers of Antirrhinum majus L., the two dorsal petals are significantly enlarged compared to the lateral and ventral petals, and the single dorsal stamen is aborted [11]. Two genes, CYCLOIDEA (CYC) and DICHOTOMA (DICH), belonging to the CYC/TB1 clade of the TCP transcription factor family, were found to promote the dorsal identity of zygomorphic flowers [11][12][13][14]. RADIALIS (RAD), a member of the MYB gene family, was found to be the downstream target of CYC and DICH [15][16][17]. Plants of the double cyc/dich or the single rad mutants produce flowers that have entirely or partially lost their dorsal identity [11,16]. The dorsal petals assume the ventral petal identity and the aborted dorsal stamen becomes functional [11]. DIVARICATA (DIV),

RAD-Like Genes from Solanaceae
Sixteen sequences of RAD-like genes were discovered in this study (GenBank numbers MF398572-MF398587) ( Table 1). We show that our cloning method can recover all of the RAD2 paralogs identified from the genome data of P. hybrida and So. lycopersicum (Table 1).
A phylogeny of RAD-like genes was constructed based on 53 sequences from four species of Arabidopsis (A. thaliana, A. halleri, A. lyrata, and A. salsuginea), six species of Solanum (So. melongena So. pennellii, So. lycopersicum, So. pimpinellifolium, So. peruvianum, and So. tuberosum), and Oryza sativa ( Figure 1). The phylogeny indicated that sequences from O. sativa form a monophyletic clade. However, the phylogenetic relationships among the three previously identified RAD1, RAD2, and RAD3 clades [21] were not fully resolved. The RAD2 clade is likely monophyletic while the RAD1 and RAD3 clades are not (Figure 1, also see below). The RAD2 clade consists of Arabidopsis thaliana RL1 and Arabidopsis thaliana RL2 and species of Solanum, which were further divided into two Solanum-specific clades, RAD2A and RAD2B. The FSM1 of So. lycopersicum was placed in the RAD2A clade. It is unclear, however, how the other sequences of Solanum should be placed within the RAD1 clade represented by Arabidopsis thaliana RL3 and Arabidopsis thaliana RL4 and with the RAD3 clade represented by Arabidopsis thaliana RL5 and Arabidopsis thaliana RL6 (Figure 1) [21].
Another phylogeny of RAD-like genes was reconstructed based on 274 CDSs, including 258 from blast results and 16 in this study ( Figure 2, Table S1). All eight species from seven families of monocots form a monophyletic clade and were used to root the phylogeny. RAD2 forms a monophyletic clade, while both RAD1 and RAD3 were not fully resolved (Figures 1 and 2). RAD2 comprises representatives from eleven orders: Vitales, Rosales, Malvales, Fabales, Cucurbitales, Sapindales, Malpighiales, Brassicales, Solanales, Lamiales, and Dipsacales ( Figure 2 and Table S1). Most of the solanaceous and convolvulaceous RAD-like sequences fell into the RAD2 clade, which is further divided into two clades, RAD2A and RAD2B (Figures 1 and 2; Figure S1). The unrooted phylogeny including only RAD2 of Solanaceae and Convolvulaceae further indicates that two paralogs have been likely formed at least in the common ancestor of the two families ( Figure S1). Further gene duplication and gene losses likely also occurred, which led to Nicotiana and Petunia having additional paralogs in RAD2A ( Figure 2, Figure S1). RAD2 sequences from the two species of Schizanthus, the first branching clade of Solanaceae [23], are more closely related to the sequences from Convolvulaceae, which might be due to the limited sampling. The FSM1 of So. lycopersicum expressed in fruit is grouped in the RAD2A clade, while the RAD of A. majus is also in the RAD2 clade.
An R-R-type gene phylogeny was first reconstructed based on 52 CDSs from O. sativa japonica, A. thaliana, and five species of Solanum (So. melongena, So. lycopersicum, So. pennellii, So. peruvianum, and So. tuberosum) ( Figure 3). All sequences fell into two clades, RR1 and RR2/DIV. The RR2/DIV clade represented the DIV clade identified by Howarth and Donoghue [22]. Each of these two clades contained sequences from O. sativa, A. thaliana, and Solanum.  [21], only the RAD2 clade was monophyletic and contained sequences from Arabidopsis and Solanum. There are two paralogs in the RAD2 clade, i.e., RAD2A and RAD2B, which resulted from a gene duplication that Arabidopsis was not involved. RAD1 and RAD3 are paraphyletic. Bayesian posterior probabilities and bootstrap frequencies ≥40% depicted close to the branches, respectively.  [21], only the RAD2 clade was monophyletic and contained sequences from Arabidopsis and Solanum. There are two paralogs in the RAD2 clade, i.e., RAD2A and RAD2B, which resulted from a gene duplication that Arabidopsis was not involved. RAD1 and RAD3 are paraphyletic. Bayesian posterior probabilities and bootstrap frequencies ≥40% depicted close to the branches, respectively.

Figure 2. Phylogeny of I-box-binding/RADIALIS-like genes based on Bayesian and ML inferences. 274
CDSs of I-box-binding/RADIALIS-like genes from both monocots and dicots were analyzed. All sequences from monocots formed a monophyletic group was used to root the phylogeny. RAD2 formed a monophyletic clade. At least one gene duplication was identified in the common ancestor of Solanaceae and Convolvulaceae. RAD1 and RAD3 clades are paraphyletic. Bayesian posterior probabilities and bootstrap frequencies ≥40% depicted close to the branches, respectively. Phylogeny of I-box-binding/RADIALIS-like genes based on Bayesian and ML inferences. 274 CDSs of I-box-binding/RADIALIS-like genes from both monocots and dicots were analyzed. All sequences from monocots formed a monophyletic group was used to root the phylogeny. RAD2 formed a monophyletic clade. At least one gene duplication was identified in the common ancestor of Solanaceae and Convolvulaceae. RAD1 and RAD3 clades are paraphyletic. Bayesian posterior probabilities and bootstrap frequencies ≥40% depicted close to the branches, respectively.  Figure 3. Phylogeny of R-R-type genes of five species of Solanum, Arabidopsis thaliana, and Oryza sativa based on Bayesian and ML inferences. Two major clades, RR1 and RR2, were identified, each of which includes sequences from Arabidopsis, Oryza, and Solanum. Bayesian posterior probabilities and bootstrap frequencies ≥40% depicted close to the branches, respectively.
For Arabidopsis, AT5g04760 was placed in the RR1A clade, AT5G08520 and At5g23650 in the RR1B clade, and AT1G49010 in the RR1C clade. For the RR2/DIV clade, previously identified DIV2 and DIV3 clades formed monophyletic clades [22]. The sequences of A. thaliana, At2g38090, At5g01200, At5g58900, belonged to DIV1, while At3g11280 and At5g05790 belonged to DIV2. Arabidopsis lacked the DIV3 copy based on previous work [22]. The MYBI of So. lycopersicum expressed in fruit was grouped in the RR1A of RR1 clade, while the DIV of A. majus was likely in RR2A/DIV1 of the RR2 clade. 18,1961 9 of 18 formed monophyletic clades [22]. The sequences of A. thaliana, At2g38090, At5g01200, At5g58900, belonged to DIV1, while At3g11280 and At5g05790 belonged to DIV2. Arabidopsis lacked the DIV3 copy based on previous work [22]. The MYBI of So. lycopersicum expressed in fruit was grouped in the RR1A of RR1 clade, while the DIV of A. majus was likely in RR2A/DIV1 of the RR2 clade.   . Phylogeny of R-R-type genes based on Bayesian and ML inferences. Two hundred and ninety-eight CDSs of R-R-type genes from both monocots and dicots were analyzed. They formed two major clades, RR1 and RR2/DIV, each of which contained sequences from monocots and dicots. The RR1 clade was further divided into three groups, RR1A, RR1B, and RR1C. For the three RR2/DIV clades identified by Howarth and Donoghue [22] only the DIV2 and DIV3 are monophyletic. The Arabidopsis sequences include AT2G38090, AT5G01200, and AT5G58900 identified as DIV1, which is not a clade in this phylogeny. Bayesian posterior probabilities and bootstrap frequencies ≥40% depicted close to the branches, respectively. . Phylogeny of R-R-type genes based on Bayesian and ML inferences. Two hundred and ninety-eight CDSs of R-R-type genes from both monocots and dicots were analyzed. They formed two major clades, RR1 and RR2/DIV, each of which contained sequences from monocots and dicots. The RR1 clade was further divided into three groups, RR1A, RR1B, and RR1C. For the three RR2/DIV clades identified by Howarth and Donoghue [22] only the DIV2 and DIV3 are monophyletic. The Arabidopsis sequences include AT2G38090, AT5G01200, and AT5G58900 identified as DIV1, which is not a clade in this phylogeny. Bayesian posterior probabilities and bootstrap frequencies ≥40% depicted close to the branches, respectively.

Testing the Tree Topology for R-R-Type Genes
We further examined whether either of the two clades of R-R-type genes, RR1 including RR1A, RR1B, and RR1C; and RR2 including RR2A, RR2B, and RR2C, are monophyletic. Our results indicate that the tree topology number one, of which the subclades RR1A, RR1B, and RR1C formed a monophyletic RR1 clade, and the subclades RR2A, RR2B, and RR2C formed a monophyletic RR2 clade, is the most likely phylogeny [p > 0.5; Kishino-Hasegawa test (KH) = 0.794, Shimodaira-Hasegawa test (SH) = 0.997, and Approximately Unbiased test (AU) = 0.872] ( Figure S2, Table 2). All the other tree topologies, except for the tree topology number nine, which have the RR2A subclade grouped within the RR1 clade, were rejected. However, the tree topology number nine is not strongly supported (0.2 < p < 0.5; KH = 0.206, SH = 0.343, and AU = 0.213) compared to the tree topology number one ( Figure S2, Table 2).

Motif Analyses
We also analyzed the nucleotide sequences, which cover the diverse lineages of the I-box-like and R-R-type from A. thaliana and O. sativa, and the representatives from An. majus and So. lycopersicum, to identify protein motifs. Our results largely agree with the study of Chen et al. [24], which analyzed the motifs for the I-box-like, R-R-type, and CCA1-like MYB genes. We found that the I-box-like genes have only one motif, while the R-R-type genes have two motifs, i.e., R-R (A), and R-R (B). R-R (A) locates at the N-terminal of the R-R-type genes, and shows high similarity to the I-box-like genes ( Figure 5). R-R (B) locates at the C-terminal of the R-R-type genes, and is distinct in amino acid sequences compared to the R-R (A) and the only motif of I-box-like genes.
On the other hand, the results of our motif analyses also show differences compared to the results of Chen et al. [24]. For I-box-like genes, our analyses identified a single motif that is 33 amino acids in length. In contrast, the same motif based on Chen et al. [24] contains 56 amino acids, which has eight and 15 extra amino acids at the N-and C-terminal, respectively. For R-R-type genes, our results indicate that R-R (A) is 21 amino acids in length, while Chen s results [24] include 59 amino acids for the same motif, which has eight and 30 extra amino acids at the N-and C-terminal, respectively. Our results suggest that R-R (B) is 50 amino acids in length. However, the same motif based on the work of Chen et al. [24] has 53 amino acids, which has five extra amino acids at the N-terminal but lacks two amino acids at the C-terminal. One possible reason for the discrepancy between the two studies is that the Multiple Expectation maximizations for the Motif Elicitation (MEME) methods applied in our study only consider the continued amino acid sequences for a motif and no gap in the sequences is allowed.
(MEME) methods applied in our study only consider the continued amino acid sequences for a motif and no gap in the sequences is allowed.

Phylogenetic Positions of RAD-and DIV-Like Genes in the Plant MYB Lineage.
MYB proteins contain a conserved MYB domain, which usually comprises one to three imperfect repeats, namely R1, R2, R3 [4,24]. Each of these repeats comprises about 52 amino acid residues that encode a helix-loop-helix structure involved in DNA binding [4,25]. MYB genes have been found in all eukaryotes [4,26].
Phylogenetic analysis indicates that the MYB genes of plants, which is sister to all animal MYB genes, form a monophyletic clade [25]. MYB genes in plants are structurally and functionally more variable compared to MYB genes in vertebrates [25,27]. Based on the MYB domain structures, the MYB proteins of plants can be classified into three major groups: R1R2R3-MYB with three adjacent repeats, R2R3-MYB with two adjacent repeats, and MYB-related proteins, a heterogeneous group, often containing a single MYB repeat [7,24,25,[28][29][30]. The R2R3-MYB group is thought to be derived from the R1R2R3-MYB group, which occurs in all major lineages of land plants [27]. Based on the phylogenetic analysis and the protein domain structure, MYB-related proteins were further divided into five subfamilies: CCA1-like, CPC-like, TBP-like, I-box-binding-like (abbreviated I-box-like), and R-R-type [24,30]. Based on Chen et al. [24], A. thaliana has five I-box-like genes, i.e., At1g75250, At1g19510, At2g21650, At4g39250, and At4g36570, and nine R-R-type genes, i.e., At1g49010, At2g38090, At3g11280, At5g01200, At5g05790, At5g08520, At5g58900, At5g23650, and At5g04760. Boyden, Donoghue, and Howarth [21] indicated that RAD-like genes belong to the I-box-like clade. Our analyses further indicate that the I-box-like lineage is synonymous with RAD-like genes (Figures 1 and 2). Furthermore, Howarth and Donoghue [22] focused on the evolution of DIV-like genes in core eudicots especially in Dipsacales, and indicated that the DIV-like genes belong to an R-R-type lineage. Our analysis of R-R-type genes showed that the gene duplication occurred at least in the common ancestor of dicots and monocots, giving rise to two paralogs, the RR1 and RR2 clades (Figures 3 and 4), of which the RR2 clade is synonymous with the DIV-like lineage [22].

Phylogenetic Positions of RAD-and DIV-Like Genes in the Plant MYB Lineage.
MYB proteins contain a conserved MYB domain, which usually comprises one to three imperfect repeats, namely R1, R2, R3 [4,24]. Each of these repeats comprises about 52 amino acid residues that encode a helix-loop-helix structure involved in DNA binding [4,25]. MYB genes have been found in all eukaryotes [4,26].
Phylogenetic analysis indicates that the MYB genes of plants, which is sister to all animal MYB genes, form a monophyletic clade [25]. MYB genes in plants are structurally and functionally more variable compared to MYB genes in vertebrates [25,27]. Based on the MYB domain structures, the MYB proteins of plants can be classified into three major groups: R1R2R3-MYB with three adjacent repeats, R2R3-MYB with two adjacent repeats, and MYB-related proteins, a heterogeneous group, often containing a single MYB repeat [7,24,25,[28][29][30]. The R2R3-MYB group is thought to be derived from the R1R2R3-MYB group, which occurs in all major lineages of land plants [27]. Based on the phylogenetic analysis and the protein domain structure, MYB-related proteins were further divided into five subfamilies: CCA1-like, CPC-like, TBP-like, I-box-binding-like (abbreviated I-box-like), and R-R-type [24,30]. Based on Chen et al. [24], A. thaliana has five I-box-like genes, i.e., At1g75250, At1g19510, At2g21650, At4g39250, and At4g36570, and nine R-R-type genes, i.e., At1g49010, At2g38090, At3g11280, At5g01200, At5g05790, At5g08520, At5g58900, At5g23650, and At5g04760. Boyden, Donoghue, and Howarth [21] indicated that RAD-like genes belong to the I-box-like clade. Our analyses further indicate that the I-box-like lineage is synonymous with RAD-like genes (Figures 1 and 2). Furthermore, Howarth and Donoghue [22] focused on the evolution of DIV-like genes in core eudicots especially in Dipsacales, and indicated that the DIV-like genes belong to an R-R-type lineage. Our analysis of R-R-type genes showed that the gene duplication occurred at least in the common ancestor of dicots and monocots, giving rise to two paralogs, the RR1 and RR2 clades (Figures 3 and 4), of which the RR2 clade is synonymous with the DIV-like lineage [22].

Evolution of the I-Box-Like Subfamily
Boyden, Donoghue, and Howarth [21] indicated that RAD-like genes consist of three major clades: RAD1, RAD2, and RAD3, which were speculated to result from genome duplications associated with the origin of core eudicots. The RAD1 clade has Arabidopsis AT4G36570 and DQ395345 of Clade I, defined in Reference [24], and RAD2 and RAD3 have the Arabidopsis sequences from Clade III (AT2G21650 and AT4G39250 belong to the RAD2, and AT1G19510 and AT1G75250 belong to RAD3). Our analysis recognized RAD2 as a monophyletic clade (Figures 1 and 2). Furthermore, there are two RAD2 paralogs involving Solanaceae and Convolvulaceae, RAD2A and RAD2B, which likely resulted from a gene duplication at least in the common ancestor of these two plant families. On the other hand, the RAD1 and RAD3 clades were not fully resolved based on our analyses. Our phylogenetic analyses indicated that the RAD of A. majus belongs to the RAD2 clade, while FSM1 is placed in the RAD2A clade, suggesting that RAD and FSM1 belong to the same orthologous lineage.

Evolution of the R-R-Type Subfamily
The R-R-type genes have two imperfect repeats of the MYB domain, namely R-R (A) and R-R (B) [24]. The N-terminal MYB repeat R-R (A) was found to be closely related to the MYB repeats of the I-box-like genes, and the C-terminal MYB repeat R-R (B) was closely related to those of certain CCA1-like genes based on the positions of the introns and shared motifs [24]. The phylogeny of R-R-type genes based on nine sequences of A. thaliana and seven of O. sativa japonica suggests several gene duplications in the common ancestor of the monocots and dicots, but the phylogenetic relationships of the predicted paralogs were unresolved in that study [24]. The work by Howarth and Donoghue [22] focused on the evolution of DIV-like genes in core eudicots, especially in Dipsacales, which showed duplications giving rise to three DIV-like clades in the core eudicots, DIV1, DIV2, and DIV3. Our blast and phylogenetic analyses indicated that most of the sequences named DIV-like genes belong to the R-R-type subfamily, while most of the sequences named as MYB1R1-like genes belong to the CCA1-like gene family (Table S2). Each of the two R-R-type subclades, RR1 and RR2, was further divided into three paralogs, which likely resulted from genome duplication in the common ancestor of core eudicots [22]. RR1 consists of RR1A, RR1B, and RR1C, while RR2/DIV is composed of RR2A/DIV1, RR2B/DIV2, and RR2C/DIV3 (Figures 3 and 4) [22]. We found that the DIV of An. majus belongs to the DIV1 of the RR2/DIV clade [22], while the MYBI of tomato belongs to the RR1A of the RR1 clade.

Evolution of the Antagonism among RAD-DRIF-DIV and FSM1-FSB1-MYBI in An. majus and So. lycopersicum, Respectively
Based on an analysis of amino acid sequences, the two MYB domains of DIV had different functions with the C-terminal domain similar to known DNA binding MYB proteins, while the N-terminal domain was associated with protein-protein interactions ( Figure 5) [19,31]. In contrast, RAD has a single MYB domain that is predicted to act through a mechanism involving protein-protein interactions ( Figure 5) [16]. As the members of MYB-related subfamilies, I-box-like and R-R-type genes were previously placed in the same clade by Riechmann and Ratcliffe [30], which suggested that they might be closely related paralogs. One possible hypothesis proposed for the evolution of these two MYB-related subfamilies is that I-box-like genes evolved through the loss of the MYB domain at the C-terminal end [24,32]. RAD-DRIF-DIV and FSM1-FSB1-MYBI therefore represent the recruitment of homologous genes from similar MYB lineages in the development of floral zygomorphy in An. majus, and the development of fruit in So. lycopersicum [10].
In summary, I-box-like and R-R-type lineages have experienced extensive gene duplication that predated the diversification of the core eudicots. Our work further clarified the evolution of these two MYB subfamilies, which will help the future inquiry into the functional studies of the paralogs of the I-box-like and R-R-type genes that may have been involved in the evolution of molecular antagonism.

Cloning RAD-Like Genes from Species of Solanaceae and Convolvulaceae
Primers incorporated with degenerate polymorphic sites based on the alignment of RAD-like sequences, especially the RAD2 clade from Solanaceae and Lamiales, were used for amplifying the genes from species of Solanaceae and representatives of Convolvulaceae. The locations of our primers referred to the study by Boyden, Donoghue, and Howarth [21]. These primers, i.e., forward primer 5 -AACAAGGCITTTGARARGGCWTYRGC-3 , and reverse primer 5 -GGRAARGGBAYIMYACCAIDITCAAT-3 , successfully amplified RAD-like genes from both the basal and derived clades of Solanaceae (Schizanthus pinnatus Ruiz & Pav, Schizanthus grahamii Gillies, Petunia sp., Nicotiana obtusifolia M. Martens & Galeotti, Solanum lycopersicum L., Lycium ruthenicum Murray, and Atropa belladonna L.) and species of Convolvulaceae (Evolvulus sp. and Ipomoea tricolor Cav.) ( Table 1). PCR reactions were performed using GoTaq ® G2 Hot Start Polymerase (Promega, Madison, WI, USA), as follows: 95 • C for 5 min, 95 • C for 45 s, 55 • C for 45 s, and 72 • C for 1 min and 30 s, repeated for 39 cycles, with a final step at 72 • C for 10 mins. PCR products were then purified through gel extraction using Wizard SV Gel and PCR Clean-Up System from Promega. The purified PCR products were used as a template for the second round of PCR following the same PCR program described above. The purified second round PCR products were used in ligation and transformation with pGEM-T Easy Vector System I from Promega. At least 50 clones were screened for each species. The sequences of the clones were determined using Sanger sequencing by GENEWIZ (115 Corporate Boulevard, South Plainfield, NJ, USA).

Alignment and Phylogenetic Analyses
The DNA matrices of the coding sequences were aligned using Geneious version 7.1.9 (PO Box 5677, Wellesley St, Auckland 1010, New Zealand). The MUSCLE algorism that refers to the protein sequence alignment for building nucleotide sequence alignment was applied. Each DNA matrix was analyzed by using the Bayesian and ML inferences, which were implemented in RAxML_HPC2, and MrBayes version 3.2.6 on XSEDE, respectively, at the CIPRES Science Gateway V. 3.3. [33][34][35][36]. For ML analyses, a random seed value for rapid ML bootstrapping was estimated on each dataset. The GTRCAT model was chosen for the bootstrapping analysis based on the program recommendation because GTRCAT shows lower computational costs and memory consumption for the ML method [34]. The models used for the Bayesian analyses were estimated using jmodeltest 2.1.10 [37,38]. The Akaike Information Criterion (AIC) [39] was used to determine the best-fit model for each DNA sequence matrix, i.e., K80 (K2P) + g model for the I-box-like/RAD gene phylogeny including Arabidopsis, Solanum, and Oryza alone, JC + g model for the large RAD phylogeny, GTR + i + g model for the R-R-type gene phylogeny including Arabidopsis, Solanum, and Oryza alone, and GTR + i + g model for the large R-R-type gene phylogeny. We used the Metropolis-coupled Markov chain Monte Carlo method as implemented in MrBayes to run four chains. We ran five million generations for each chain, and sampled every 1000 generations with a burn-in of the first 2000 trees.

Phylogeny Assessment for R-R-Type Genes
We generated 14 tree topologies manually based on our Bayesian tree to test the alternate hypotheses. To produce these topologies, first the RR2A clade was set as monophyletic. Second, we collapsed the relationships among the subclades in RR1 and RR2 clades. Finally, the subclades were subsequently moved around as indicated in Figure S2. The log-likelihoods for each tree topology were calculated using TREE-PUZZLE (ver. 5.3.rc16) [40] with the HKY model of evolution [41] and four rate categories for the discrete Gamma distribution. The log-likelihoods information estimated for the 14 topologies from TREE-PUZZLE was then entered into CONSEL (ver. 0.20) [42] to generate the bootstrap replicates for each tested tree. The p-values of KH [43], SH [44], and AU tests [45] were subsequently calculated based on the bootstrap samples [42]. The confidence of the 14 trees was then assessed by the p-values [42]. If the p-value estimated for a tree was <0.05, the topology was rejected; if the p value > 0.5, the topology was preferred [45,46].
The nucleotide sequences of these CDSs were translated into amino acid sequences by Mesquite version 3.2 [47]. We used the MEME algorithm, which extends the Expectation Maximization (EM) algorithm for identifying motifs in unaligned amino acid sequences [48]. The MEME algorithm is designed to discover novel and ungapped motifs in a set of homologous sequences [48]. To use this function, we uploaded and analyzed the I-box-like and R-R-type amino acid sequences at http: //meme-suite.org/index.html [49]. For the MEME options, we set the numbers of motifs to be found as three, and each motif occurred only one time in each testing sequence.