Predicting Cloned Disease Resistance Gene Homologs (CDRHs) in Radish, Underutilised Oilseeds, and Wild Brassicaceae Species

Brassicaceae crops, including Brassica, Camelina and Raphanus species, are among the most economically important crops globally; however, their production is affected by several diseases. To predict cloned disease resistance (R) gene homologs (CDRHs), we used the protein sequences of 49 cloned R genes against fungal and bacterial diseases in Brassicaceae species. In this study, using 20 Brassicaceae genomes (17 wild and 3 domesticated species), 3172 resistance gene analogs (RGAs) (2062 nucleotide binding-site leucine-rich repeats (NLRs), 497 receptor-like protein kinases (RLKs) and 613 receptor-like proteins (RLPs)) were identified. CDRH clusters were also observed in Arabis alpina, Camelina sativa and Cardamine hirsuta with assigned chromosomes, consisting of 62 homogeneous (38 NLR, 17 RLK and 7 RLP clusters) and 10 heterogeneous RGA clusters. This study highlights the prevalence of CDRHs in the wild relatives of the Brassicaceae family, which may lay the foundation for rapid identification of functional genes and genomics-assisted breeding to develop improved disease-resistant Brassicaceae crop cultivars.

The gene diversification could either be through truncation (one or two domains omitted), addition (one or two domains were added) or the combination of truncation and addition of RGA domains. Of the diversification results in CDRHs, 100% (130 CDRHs) of the CDRHs from RNL cloned R genes did not have an RNL domain. Diversification was also observed in CDRHs from cloned R genes that were TN (62% or 8 out of 13 CDRHs), TNL (55% or 752 out of 1356 CDRHs), NL (51% or 121 out of 236 CDRHs), CNL (49% or 128 out of 332 CDRHs), LRR-RLP (5% or 29 out of 628 CDRHs) and LRR-RLK (2% or 3 out of 170 CDRHs). Of the cloned R genes, which were NLs, all the CDRHs (29) had additional RGA domains, while for the LRR-RLP cloned R genes 59% (71 out of 121 diversified CDRHs) had an additional one or two RGA domains. On the other hand, the combination of truncation and addition of RGA domains was observed in CDRHs from cloned R genes TN (63% or 5 out 8 diversified CDRHs), TNL (55% or 411 out of 752 diversified CDRHs) and RNL (54% or 70 out of 130 diversified CDRHs).

Identification of CDRH Clusters in Arabis alpina, Camelina sativa and Cardamine hirsuta
The organisation of CDRHs with RGA domains across chromosomes of A. alpina, C. sativa and C. hirsuta was studied to investigate the gene clustering of CDRHs in Brassica crop relatives. We identified a total of 72 gene clusters, consisting of 62 homogeneous RGA clusters (38 NLR, 17 RLK and 7 RLP clusters) and 10 heterogeneous RGA clusters (Figures 4-6). C. sativa contained the highest number of gene clusters with 28 ( Figure 5), followed by C. hirsuta with 24 gene clusters ( Figure 6) and A. alpina with 20 gene clusters ( Figure 4).

Identification of CDRH Clusters in Arabis alpina, Camelina sativa and Cardamine hirsuta
The organisation of CDRHs with RGA domains across chromosomes of A. alpina, C. sativa and C. hirsuta was studied to investigate the gene clustering of CDRHs in Brassica crop relatives. We identified a total of 72 gene clusters, consisting of 62 homogeneous RGA clusters (38 NLR, 17 RLK and 7 RLP clusters) and 10 heterogeneous RGA clusters ( Figures  4-6). C. sativa contained the highest number of gene clusters with 28 ( Figure 5), followed by C. hirsuta with 24 gene clusters ( Figure 6) and A. alpina with 20 gene clusters ( Figure 4).

Discussion
By aligning the 49 cloned R genes from 11 diseases, across 20 Brassicaceae genomes (crop species C. sativa, R. sativus and S. alba and wild species A. halleri, A. lyrata, A. alpina, B. vulgaris, B. stricta, B. cretica, C. grandiflora, C. bursa-pastoris, C. rubella, C. hirsuta, E. salsugineum, L. alabamica, L. meyenii, R. raphanistrum, Sisymbrium irio, S. parvula and T. arvense), an inventory of specific RGAs associated with cloned R genes was found. This provides an opportunity to search for novel CDRHs, which may confer disease resistance (especially the CDRHs in wild species), which can be used for future crop improvement once function is established in the crop species. Once cloned, molecular markers can be developed as a diagnostic tool in screening additional germplasm to characterise further lines for resistance.
Brassica crops have experienced extensive breeding and development to improve disease resistance due to their long history of domestication that may have been a factor for RGA number expansion [63]. A previous study showed an average of 1563 RGAs in 11 genomes of the domesticated species compared to the average of 863 RGAs in 19 genomes of the wild species [51]; a similar trend was observed in this study between the domesticated and wild species. The number of RGAs in B. cretica (wild species) in this study was lower compared to the number of RGAs found in domesticated Brassica crops. This was also the case with the specific RGAs for R. sativus and R. raphanistrum (CDRHs in this study) and the RGAs obtained in a previous study [51], where domesticated radish had more RGAs compared to wild radish. However, this is not always the case as B. macrocarpa (wild cabbage species) had more RGAs compared to 10 domesticated cabbage species in pangenome analysis [58]. Here, the lesser RGAs in B. cretica and R. raphanistrum than their domesticated counterpart species may also be due to the quality of genomes, as domesticated crops often have better genome qualities.
The domesticated Brassicaceae members (used in this study) have also been reported as excellent sources of disease resistance. For instance, C. sativa has been reported to have R genes providing resistance against Alternaria black spot, blackleg, downey mildew and Sclerotinia stem rot [40,64,65], R. sativus has resistance against black rot [66], clubroot [67,68], downey mildew [69,70], Fusarium wilt [71], white rust [72] and Turnip mosaic virus [73,74] and S. alba has resistance to blackleg [39,75], Turnip mosaic virus [76] and Sclerotinia stem rot [77,78]. However, further investigation is needed as to whether the RGAs we identified in these three species are associated with the resistant phenotype. Nevertheless, our study supports the previous findings and the RGAs we identified are a valuable reference for future studies.
Unlike the cultivated crops, information towards genetic disease resistance in Brassicaceae wild species is limited. Of the wild Brassicaceae species we included, a few of them have been reported previously as potential R gene source against a particular disease, for instance, B. vulgaris against Alternaria black spot and black rot [79], B. cretica against Verticillium wilt disease [80], C. bursa-pastoris against clubroot [81], Sclerotinia stem rot [82] and Alternaria black spot [83], R. raphanistrum against blackleg [38], clubroot [84], downey mildew [85] and Sclerotinia stem rot [86] and T. arvense against blackleg [42]. However, the association between the reported phenotypic disease resistance in these species and the identified RGAs here needs further research.
The retention and diversification of RGA domains in the Brassicaceae family are a result of evolutionary events, such as whole-genome triplication/duplication [87][88][89][90][91]. Homologs may confer similar or dissimilar function to the reference gene [92,93]. A functional study revealed the A. lyrata homologs AL.MTP11A and AL.MTP11B are redundant to AT.MTP11 in A. thaliana [94], a gene involved in Mn 2+ transport and tolerance [95]. Similarly, AL.TSO2A and AL.TSO2B in A. lyrata are homologous to AT.TSO2 in A. thaliana [94], a gene functionally related to ribonucleotide reductase [96]. On the other hand, diversification in domains may indicate a different function of the original gene. For instance, the At_RPP1 homolog At_RPP1 Nd (Nd accession) recognises a single allele of Avr gene ATR1 NdWsB , while At_RPP1 WsB (WsB accession) also detects ATR1 NdWsB plus three additional alleles with divergent sequences to confer resistance against downey mildew [97].
RGA domains have also been reported to be prone to alteration, such as truncation or even loss of function, as they respond to selection pressure (e.g., presence of virulent pathogens) [98,99]. Truncated R genes encoding two-part proteins, such as CN, TN and NL, are evolutionary gene reservoirs and they readily allow for the formation of new genes through duplications, translocation and fusions [100][101][102]. In an RGA, added LRR domains can indicate pathogen specificity. For instance, the LRR domain in At_RPP1 directly interacts with Avr ATR1 [103], much like the L6 recognition of AvrL567 and the L11 recognition of AvrL11 [104,105]. The LRR domain is also important for gene/protein stability [106]. Solo RGA domains could also confer resistance, as reports showed that the overexpression of NBS domains in a potato R gene Rx (CNL) resulted in an HR [107]. However, the case is different to the CC domain overexpression in At_RPS5, as it did not yield a hypersensitive response, but when both CC and NBS were overexpressed, it resulted in a hypersensitive response [108].
In gene clustering, C. sativa contained the highest total number of CDRHs clusters due to its higher number of chromosomes, 20, compared to 8 chromosomes of A. alpina and C. hirsuta. The RGA clusters are more prone to evolutionary processes, such as sequence exchanges, insertion or duplication, followed by neofunctionalisation [109][110][111][112]. The NLRs in a gene cluster can undergo mono or polymerisation, which results in massive expansions of pathogen recognition [111]. For instance, an NLR cluster with eight members contained two functionally characterised R genes, At_RPP4 and At_RPP5, recognizing the Avr genes ATR4 and ATR5 in the downey mildew resistance response, respectively [113]. Furthermore, it has been shown that RLPs in a gene cluster are most likely pathogen responsive [114]. Two cloned RLP genes, At_RLP30 and At_RLP32, which are involved in bacterial leaf spot resistance, form a gene cluster on At03 in A. thaliana [56,115,116], while a gene cluster on A10 in B. napus consists of LepR3/Rlm2, two alleles of a cloned RLP gene that confers blackleg resistance [117,118] and a homolog of At_PBS1 [56]. On the other hand, 16 RLK clusters associated with disease resistance were found in A. thaliana and Brassica crops [56]. Heterogeneous gene clusters with members having RGA domains and including secreted peptides associated to blackleg and clubroot were also observed in B. napus [119]. Thus, the CDRHs obtained here, especially those that were clustered, are putative R genes that may confer disease resistance.

Mining the Protein Sequences of the Cloned Genes
In total, 49 cloned R genes identified in Brassica crop species and A. thaliana that confer resistance against fungal and bacterial diseases that affect Brassicaceae species (Table 2) were selected based on the following criteria set in a previous study [56]: (1) the R gene pairs to an effector or Avr gene in a gene-for-gene resistance or (2) confers resistance in the form of a hypersensitive response (usually observed early stage), indicating its involvement in a gene-for-gene interaction or (3) acts as a helper or accessory gene pairing to the existing R-Avr interaction. The protein sequences of the 49 cloned R genes were retrieved from the UniProtKb (https://www.uniprot.org/uniprot/, verified and accessed on 8 August 2022) [120] or NCBI (https://www.ncbi.nlm.nih.gov/, verified and accessed on 8 August 2022) website.

Identification of Homologs
The RGAs from the 20 Brassicaceae genomes and the 49 cloned R genes were aligned using Protein Basic Local Alignment Search Tool (BLASTp) [178]. From the BLASTp results, the criteria of the previous studies in identifying homologous genes in plants were applied by removing hits with greater than E-45 [56,[179][180][181] and less than 148 amino acid or aa (coverage) [56] from further analyses. We applied an additional criterion by removing any BLASTp results lower than 60% similarity from further analyses as the homology search was conducted between crop R genes and several wild species. Further classification of RGAs was undertaken, according to whether they had the similar resistance domain to their homologous cloned R gene counterpart or whether it was different [56].

Gene Cluster Analysis
Among the 20 Brassicaceae species used in this study, only three genomes, A. alpina [182], C. hirsuta [11] and C. sativa [183], were used for gene cluster analysis, due to the accessibility of their pseudo-chromosomes (assigned chromosomes), from which gene clusters were derived. Two types of gene clusters were then identified, with the first defined as a homogenous RGA cluster (having at least 2-8 RGAs of the same class either NLR, RLK or RLP) situated within a 200 kb region on the same chromosome [184,185]. The second was defined as a heterogeneous cluster, containing different classes of RGAs [184,185].

Conclusions
CWRs with exotic genetic libraries provide rare RGAs, which could be a GMO alternative in improving disease resistance in Brassicaceae crops. This study suggests several domesticated and wild species could be a potential R gene source for a particular disease resistance. Based on their CDRHs having RGA domains, A. alpina and B. stricta, C. hirsuta and C. bursa-pastoris and C. sativa are good sources of resistance against white rust, black rot and Sclerotinia stem rot, respectively. Though the challenge remains in the gene transfer, several methodologies, such as bridging crosses, chromosome doubling after hybrid crossing and somatic hybridization, have found success in Brassicaceae crop breeding. Several CDRHs have also been found in less-explored disease resistance, such as Alternaria black spot, bacterial leaf spot, black rot, grey mould and powdery mildew in Brassicaceae crops, and the RGAs obtained are a valuable starting reference for future studies. Lastly, the current findings of CDRHs in crops C. sativa, R. sativus and S. alba and the 17 wild Brassicaceae species and the previous findings of CDRHs in A. thaliana and Brassica crops [56] provide an opportunity to study the evolutionary differences in 49 cloned R genes (reference in this study) and their homologs throughout the Brassicaceae family.
Author Contributions: A.Y.C. and J.B. conceptualized the paper; A.Y.C. wrote the original draft along with formal analyses; W.J.W.T. helped improve the paper by suggesting additional ideas and by thorough revision/editing; P.E.B. analysed the Brassica cretica, Capsella bursa-pastoris and Sinapis alba genes using the RGAugury pipeline; D.E. and J.B. supervised, reviewed and suggested revisions to the paper. All authors have read and agreed to the published version of the manuscript.
Funding: This study is funded by the Australian Research Council projects (DP200100762 and DP210100296) and Grains Research and Development Corporation UWA1905-006RTX.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: The data used in this research are publicly available. The protein sequences of each cloned gene can be found at https://www.uniprot.org/uniprot/ (accessed on 10 October 2020) and https://www.ncbi.nlm.nih.gov/ (accessed on 10 October 2020). The data (results) presented in this research are available in the Supplementary Materials.