Mining of Cloned Disease Resistance Gene Homologs (CDRHs) in Brassica Species and Arabidopsis thaliana

Simple Summary Developing cultivars with resistance genes (R genes) is an effective strategy to support high yield and quality in Brassica crops. The availability of clone R gene and genomic sequences in Brassica species and Arabidopsis thaliana provide the opportunity to compare genomic regions and survey R genes across genomic databases. In this paper, we aim to identify genes related to cloned genes through sequence identity, providing a repertoire of species-wide related R genes in Brassica crops. The comprehensive list of candidate R genes can be used as a reference for functional analysis. Abstract Various diseases severely affect Brassica crops, leading to significant global yield losses and a reduction in crop quality. In this study, we used the complete protein sequences of 49 cloned resistance genes (R genes) that confer resistance to fungal and bacterial diseases known to impact species in the Brassicaceae family. Homology searches were carried out across Brassica napus, B. rapa, B. oleracea, B. nigra, B. juncea, B. carinata and Arabidopsis thaliana genomes. In total, 660 cloned disease R gene homologs (CDRHs) were identified across the seven species, including 431 resistance gene analogs (RGAs) (248 nucleotide binding site-leucine rich repeats (NLRs), 150 receptor-like protein kinases (RLKs) and 33 receptor-like proteins (RLPs)) and 229 non-RGAs. Based on the position and distribution of specific homologs in each of the species, we observed a total of 87 CDRH clusters composed of 36 NLR, 16 RLK and 3 RLP homogeneous clusters and 32 heterogeneous clusters. The CDRHs detected consistently across the seven species are candidates that can be investigated for broad-spectrum resistance, potentially providing resistance to multiple pathogens. The R genes identified in this study provide a novel resource for the future functional analysis and gene cloning of Brassicaceae R genes towards crop improvement.

The large demand and intensified cultivation of Brassica crops have made them vulnerable to abiotic and biotic stresses, particularly to diseases. While the most common control methods for managing pathogens are specific cultural practises and chemical application, the deployment of disease-resistant crops is more environmentally friendly and cost-effective. Brassica crops have two types of disease resistance: qualitative and quantitative. While quantitative relies on several minor genes with partial resistance expressed at the later crop stages, qualitative resistance is governed by major genes or resistance genes (R genes), largely expressed in the early crop stages through to maturity. Both resistance types are useful, however, qualitative is widely utilised in Brassica cultivar development because its effect is easily manifested and can be easily identified at the cotyledon stage. For instance, a set of differential blackleg isolates containing avirulence (Avr) genes is used to screen R genes in B. napus lines via the assessment of a hypersensitive response (HR) observed in the cotyledons [18,19].
Resistance gene analogs (RGAs) play an important role in host resistance [20] and are generally categorized into three main classes, nucleotide-binding site -leucine rice repeats (NLRs), receptor-like protein kinases (RLKs), and receptor-like proteins (RLPs). The NLR family, which is the most common class of RGAs, carries cytoplasmic receptors for recognising specific pathogens and are involved in effector-triggered plant immunity (ETI) [21][22][23][24]. RLKs and RLPs are involved in pattern-triggered immunity (PTI), which relies on pattern recognition receptors (PRRs) to elicit the first line of defence by recognising pathogen elicitors [25,26].
Examining gene homology among plant species is important to obtain the possible functions of a gene. Several studies have exploited gene homology for crop improvement. For instance, the homolog of an A. thaliana R gene, At_NDR1, was cloned and functionally characterised in Coffea arabica, conferring R-gene-mediated resistance to coffee leaf rust caused by Hemileia vastatrix [27]. Further, homologs of the Triticum aestivum Mla gene, TmMla in Hordeum vulgare, Sr33 in Secale cereale, and Sr50 in Aegilops tauschii were introgressed in T. aestivum, providing disease resistance [28][29][30].
Here, we used the sequences of 49 cloned R genes with a confirmed function against fungal and bacterial diseases to identify cloned disease resistance gene homologs (CDRHs) across six Brassica crops and A. thaliana. The evolutionary events including the loss, retention, and diversification of RGA domains in the CDRHs were also investigated. The outcome of this study could facilitate the identification and cloning of functional RGAs and their application in Brassica breeding programs towards disease resistance improvement.

Collection of Gene and Genomic Data
A comprehensive search was conducted to identify cloned R genes that provide qualitative resistance to fungal and bacterial diseases in all six Brassica species and Arabidopsis. A total of 49 cloned R genes were identified and included in this study (42 in Arabidopsis and 7 in the Brassica species) based on the following 3 criteria: (1) has a known gene-forgene interaction with a corresponding pathogen Avr gene or (2) confers resistance in the form of a HR, indicating that it is involved in a gene-for-gene interaction or (3) acts as a helper or accessory gene necessary for the gene-for-gene interaction ( Table 1). The complete protein (amino acid, aa) sequence of each gene was extracted from the UniProtKb (https://www.uniprot.org/uniprot/, accessed on 10 October 2020) [31] or NCBI (https: //www.ncbi.nlm.nih.gov/, accessed on 10 October 2020) website (Table 1). The genome used for each of the seven species is listed in Table 2.

Homolog Identification and Classification
To perform the homology search, the protein sequence of each of the cloned genes was aligned across the seven genomes using translated Basic Local Alignment Search Tool (tBLASTn) using CoGeBlast [93]. Following the criteria used by previous studies identifying homologous genes in plants, tBLASTn hits with an E value range outside E0 to E-45 [94][95][96] or which did not have >70% similarity [96][97][98][99] were removed from further analyses. Since the smallest reference gene used in the study, At_Rpw8.1, has 148 aa [75], any tBLASTn hits with <148 aa, were also removed from further analyses.
The list of predicted RGAs derived from the RGAugury pipeline [100] in A. thaliana, B. rapa, B. nigra, B. oleracea, B. juncea and B. napus were extracted from a previous study [101] and used to classify homologs. The RGAugury pipeline was also used to predict B. carinata and RLP with LysM (LysM-RLP) were identified for each species. We further classified the RGAs according to whether they had the same predicted domain to their homologous counterpart, or whether it was different.
Homolog types, such as paralog (homologous genes within the same species) or ortholog (homologous genes in different species), were also determined for each of the 49 cloned R genes. Paralogs were further classified as tandem, when a paralog exists within 5 Mb of the cloned R gene, or segmented, when a paralog is >5 Mb away from the cloned gene or the paralog is located on another chromosome [102]. Lastly, genes that were homologous to two or more cloned RGAs were also identified.

Gene Cluster Analysis
Two types of gene clusters were identified in this study. The first was a homogenous RGA cluster which is defined as a cluster with at least 2 or more (but no more than 8) RGAs of the same class, either NLR, RLK or RLP, located within a 200 kb region on the same chromosome [103,104]. The second was a heterogeneous cluster which refers to clusters containing different classes of RGAs or containing both an RGA and a Non-RGA (for example, a homolog that has not been identified using the RGAugury pipeline).
We recorded a total of 75 CDRHs, including 60 RGAs, for the cloned R genes (At_BAK1, At_RLP23, At_RLP30, and At_SOBIR1) against fungal pathogen Sclerotinia sclerotiorum, the causal agent of Sclerotinia stem rot (SSR) disease (Tables 3 and 4, Figure S1). The 22 and 18 RGAs in the C and A genome/sub-genomes of the Brassica species, respectively, were higher compared to that of other genome/sub-genomes and A. thaliana (Table S1). For the cloned R genes against Fusarium oxysporum (also a fungus), the causal agent of Fusarium wilt (FW) disease (Bol_FocBo1, At_RFO1, At_RFO2, and At_RFO3), 50 CDRHs (34 RGAs, 16 non-RGAs) were obtained (Tables 3, 4 and S1). B. carinata with 9 RGAs (4 in the B sub-genome and 5 in the C sub-genome) had the highest number, while B. juncea with 2 RGAs (1 in each B sub-genome and unplaced contigs) had the lowest RGA count across the studied species (Table S1).

Identification of CDRH Types
This study identified 68 CDRHs that are homologous to more than 1 of the cloned R genes (Table S1). Of these, 12 RGAs were previously identified and functionally characterised disease resistance genes such as At_NRG1a, At_NRG1b, At_RAC1, At_WRR4b, At_WRR9, At_RLM1a, At_RPP4, At_RPP5, At_RPP2a, At_WRR8, Bra_Crr1a, and Bra_cRa/cRb (Table S1). For instance, At_WRR4b, a WR R gene, and At_RLM1b, a BL R gene, were homologous to each other in this study. This was also the case with At_RPP2a, a DM R gene, and Bol_FocBo1, a FW R gene.
In terms of RGA domain retention and losses, this study found that 431 and 229 out of 660 CDRHs have retained (as RGA) and lost (as Non-RGA) resistance domains and motifs from the original gene, respectively (Tables 3 and 4). In some cases, the RGA class of CDRHs tend to be different from their corresponding cloned gene because the RGA domain has been contracted/truncated. For example, At_RPP8 encoding a CNL, had 4 NL and 1 CN CDRHs, while At_WRR8 encoding a TNL, had 1 TN, 1 NL, and 1 NBS CDRHs (Tables 3 and 4). At_RPP8 and its CDRHs had a common NBS domain, while At_WRR8 and its CDRHs had a common domain of either TIR, NBS or LRR (Tables 3 and 4). On the other hand, 2 CDRHs did not have a common RGA domain with their homologous cloned R gene. These included Bra_cRa/cRb (TNL) and Bju_WRR1 (CNL), which both had at least one Other-RLK CDRH (Tables 3 and 4).

Discussion
RGAs are the most important genes that need to be discovered and cloned for the improvement of Brassica crop disease resistance. The availability of Brassica genomic resources, along with the model species A. thaliana, and the aid of computational and bioinformatic tools have led to their widespread identification. Across A. thaliana and Brassica species, an approach utilising homology can reveal associations between functionally characterised R genes and RGAs, and how each species' genetic repertoire differs (for example, RGA and non-RGA content).
The larger number of total CDRHs in Brassica polyploids over Brassica diploids is likely due to polyploidisation [11,14,15]. It has previously been shown that the total number of genes, RGAs and glucosinolate-related genes in Brassica polyploids were higher than in the Brassica diploid/progenitor species [11,101]. The number of DNA transposable elements, a major factor in plant genome expansion [106], was also found to be higher in polyploid B. napus compared to the diploids B. rapa and B. oleracea [107], which could likely be the case for B. carinata and B. juncea when compared to their corresponding diploid progenitors. On the other hand, the fewer counts of CDRHs in Arabidopsis than Brassica species could probably be due to whole genome triplication events which did not happen in ancestral Arabidopsis while it occurred in ancestral Brassica [7,108]. As a result, it is expected that the increased genome size of Brassicas also increased their gene number compared to A. thaliana [11]. Furthermore, Brassica crops undergone long history of extensive breeding to improve disease resistance which may have led to an increase in their RGA content [109].
The fewer RGAs in the individual sub-genomes of the Brassica polyploids compared to their diploid genome progenitor that we found in this study is consistent with the other Brassica RGA studies [11,96,[109][110][111][112]. Duplicated disease R genes or RGAs are favourably lost in the sub-genomes of polyploid Brassicas after a duplication event compared to their diploid genome progenitors [109,110,113,114]. This event was also observed in other species such as legumes [115], maize [116,117], and wheat [118]. In B. napus, the loss of RGAs is thought to be a result of homoeologous exchange between the A and C subgenomes [107,119].
The total number of CDRHs, particularly the RGAs, per disease was also determined. Limited genetic resistance towards BLS, PW, and GM disease has been identified in Brassica species, making the RGAs obtained here a valuable starting point for future studies to explore potential BLS, PW, and GM R genes. For WR, it has been reported that B. rapa and the A sub-genome of B. napus are a good source of resistance [120][121][122]. The majority of markers associated with WR resistance that have been utilised for resistance exploration were also derived from the A-genome [80,[123][124][125][126]. For BL, previous investigations showed that B-genome Brassica species have high levels of phenotypic resistance to BL compared to Brassicas containing the A and C genomes, and A. thaliana [127][128][129]. However, the association of phenotypic BL resistance to the identified RGAs in this study is yet to be confirmed. Another, QTL against FW and BR have previously been identified in Brassica C genome/sub-genome [89,[130][131][132][133][134].
Our results showed that there are a considerable number of CDRHs throughout Brassica crops and A. thaliana. CDRHs, especially those with resistance domains (RGAs), play important roles in disease resistance responses, and their subsequent application in breeding programs will help to improve disease resistance. However, RGAs are not the only genes that may confer disease resistance in Brassica crops and this is particularly true for diseases whose resistance response is quantitatively controlled, such as SSR [147]. Therefore, the non-RGAs identified in this study may be useful, but still need further analyses and confirmation.
A CDRH can be homologous to more than one cloned R gene because some of the genes may share the same resistance domains. Considerable number of collinear genes were obtained between Arabidopsis and Brassica species as they originated from one ancestral species [11,150]. It is also possible that the homology to one more gene could imply multiple resistance function. The At_RPP8 gene, causing resistance to DM disease, was later found to contain two alleles; HRT and RCY1, which confer resistance to turnip crinkle virus and yellow strain cucumber mosaic virus, respectively [151][152][153]. The At_RRS1 gene, initially associated with the avirulence gene popP2, which triggers resistance against Ralstonia solanacearum [154,155], was later found to also mediate a resistant response against P. syringae and Colletotrichum higginsianum [77]. However, this assumption needs thorough investigation and multiple functional characterisation to be confirmed.
The large number of tandem duplicates or paralogs in A. thaliana over Brassica species is consistent with findings in previous studies [13,156,157]. Tandem duplication may have occurred more frequently in A. thaliana because its ancestors did not undergo whole genome triplication, hence there was no extensive genome fractionation [108]. Conversely, the large number of segmented paralogs in Brassica species over A. thaliana could also be due to genome fractionation and block reshuffling which separated the homologous RGAs during the process of these evolutionary events [108,158]. Segmented paralogs act as gene-buffers in forms of structural variation such as copy number variation (CNV), which has been found abundantly in B. napus and B. oleracea, and is associated with SSR, CR and BL resistance [109,112].
Homology analysis is useful in elucidating gene gains and losses, and verifying retained resistance domains or function of genes [159]. From an ancestral gene, homologs could undergo neofunctionalisation, subfunctionalisation or duplication-degenerationcomplementation (DDC), non-functionalisation or pseudogenisation, escape from adaptive conflict (EAC) and other routes involving gene dosage and redundancy [160][161][162][163][164][165][166]. In plants, RGAs are prone to rapid gene expansion during evolutionary events, as well as gene loss and contraction, as they respond to environmental stress such as disease pressure [167,168]. Nevertheless, truncated RGAs such as NL and TN have been cloned and functionally characterised with disease resistance in A. thaliana [53,61,62]. While a NBS gene has been reported as a signalling component in disease resistance [20], genes with TX domains were able to interact with different R and Avr genes to elicit disease resistance in A. thaliana [169,170], and CC domain has been reported as a candidate for the blackleg R genes Rlm1, LepR2, and LepR4 in B. oleracea [171][172][173].
In gene clustering, the greater number of clusters in A. thaliana could possibly be due to its smaller genome size, compared to the Brassica species (Table 2), where the position of the genes or RGAs tend to be closer to each other. However, between Brassica species, the presence of RGAs could be a factor to gene clustering as it was observed in this study that the higher the total number of CDRHs with an RGA domain in each species, the higher the likelihood that these specific RGAs were part of a gene clusters.
Earlier studies have suggested homogeneous gene cluster may have evolved via tandem duplication [174,175]. The existence of tandem paralogs is yet to be functionally confirmed, however, their co-existence with cloned genes in a cluster suggests a "balancing" model in which genetic variation in disease resistance is maintained despite the presence of selection pressure [176]. In previous A. thaliana studies, a NLR Suppressor of Non-expressor of Pathogenesis-Related Genes 1-1, CONSTITUTIVE 1 (SNC1) requires its co-clustered NLR SIDEKICK SNC1 1 (SIKIC1), SIKIC2 and SIKIC3 to mediate defence signalling [177], while the NLRs Chilling Sensitive 1 (CHS1) or TN2 pairs to Suppressors of CHS (CHS1 or CHS2) gene 3 (SOC3) to monitor the homeostasis of Senescence-Associated E3 Ubiquitin Ligase 1 (SAUL1) [178], which is a positive regulator of PTI in plants [179]. Thus, clustering of these RGAs may be maintaining variation in disease resistance but when selection pressure occurs, RGAs could act as either accessory or helper or sensor needed in disease resistance [180,181].
The "birth and death" model could also be the fate of the RGAs in a gene cluster. The "birth and death" model indicates that when a RGA function is overcome by a pathogen, the duplication process facilitates DNA sequence exchanges of homologous genes via crossover, leading to sequence mispairing, loss of the original sequence, converting the gene, and eventually generating a novel RGA with possible altered pathogen specificities [182]. The emergence of cloned genes At_RPP13 in A. thaliana, Pm3 in wheat, L in flax and elF4E in capsicum was said to follow the "birth and death" model [183][184][185][186]. The same mechanism of "birth and death" has likely occurred within a blackleg resistance gene Bna_Rlm9/4/7 which contains three alleles Rlm9, Rlm4 and Rlm7 on chromosome A07 of B. napus [85,187] because their corresponding avirulence genes have been found to have an epistatic interaction, indicating an evolutionary arms race between the host and pathogen [188][189][190].
For the heterogeneous clusters, this clustering occurs because of random ectopic recombination, chromosomal translocation, gene transposition and co-localisation of the genes [191][192][193]. However, the genes with different domains in a cluster is yet to be functionally confirmed. The same can be said for homogenous clusters, where there is a high chance that the distribution and position of CDRHs is not random; however, this assumption requires further research for confirmation.

Conclusions
The identification of RGAs throughout the genome and underlying QTL is one of the breakthroughs that has accelerated disease resistance improvement in crops. While the process of identifying and functionally testing R genes has shortened, QTLs can have numerous candidates which results in time consuming validation. Hence, the use of cloned R gene sequences to search for RGA homologs can provide a basis for narrowing down candidates for functional characterisation.
The findings in this study can also be useful in studying the evolution and mechanisms of resistance in these genes, which can later help to guide appropriate crop methodologies to develop disease resistant and resilient Brassica cultivars. Additionally, gene-specific markers from these specific RGAs can be used as diagnostic markers in determining Brassica lines with disease resistance and possibly explore new QTL not only in Brassica species but in other members of the Brassicaceae family.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/biology11060821/s1, Table S1: Results from the BLAST analysis (Comparative Genomics website) along with the identification of resistance gene analogs (RGAs) using RGAugury pipeline in Brassicaceae species and the underlying types of homolog; Table S2: List of resistance gene analogs (RGAs) in Brassica carinata zd-1 v1.0 identified by RGAugury pipeline.
Author Contributions: A.Y.C. and J.B. conceptualized the paper; A.Y.C. wrote the original draft along with formal analyses; T.X.N., S.T. and W.J.W.T. helped improve the paper by suggesting additional ideas and by thorough revision/editing; P.E.B. analysed the Brassica carinata genes using the RGAugury pipeline; D.E. and J.B. supervised, reviewed, and suggested revisions to the paper. All authors have read and agreed to the published version of the manuscript.
Funding: This study is funded by the Australian Research Council projects (DP200100762, and DP210100296) and Grains Research and Development Corporation UWA1905-006RTX.
Institutional Review Board Statement: Not applicable.