Copy Number Variation among Resistance Genes Analogues in Brassica napus

Copy number variations (CNVs) are defined as deletions, duplications and insertions among individuals of a species. There is growing evidence that CNV is a major factor underlining various autoimmune disorders and diseases in humans; however, in plants, especially oilseed crops, the role of CNVs in disease resistance is not well studied. Here, we investigate the genome-wide diversity and genetic properties of CNVs in resistance gene analogues (RGAs) across eight Brassica napus lines. A total of 1137 CNV events (704 deletions and 433 duplications) were detected across 563 RGAs. The results show CNVs are more likely to occur across clustered RGAs compared to singletons. In addition, 112 RGAs were linked to a blackleg resistance QTL, of which 25 were affected by CNV. Overall, we show that the presence and abundance of CNVs differ between lines, suggesting that in B. napus, the distribution of CNVs depends on genetic background. Our findings advance the understanding of CNV as an important type of genomic structural variation in B. napus and provide a resource to support breeding of advanced canola lines.

A CNV is defined as a genomic sequence variant larger than 50 bp [20] to over several Mbp in size [21], consisting of deletions, insertions, duplications or translocations [22]. Gene CNVs occur due to errors in homologous recombination events [23] and are observed in many organisms resulting in dozens to hundreds of differences in their number of functional genes [24].
CNVs affect gene and protein expression levels and eventually influence the phenotype [25] and evolutionary adaptation [26]. There are increasing reports associating CNV with major traits in different crop species, but the extent and role of CNVs in plants are not yet fully understood [27]. CNVs may have broad implications for model organism research, evolutionary biology, and genomics-assisted breeding approaches to improve crop adaptation and yield [28,29].
Since CNVs are ubiquitous and encompass more nucleotides per genome than the total number of SNPs [21,30], more attention has recently been paid to their role. There
For phylogeny analysis, SNP calling was performed using bcftools and only the biallelic SNPs were kept. A Neighbour Joining tree was made using vcfkit.

RGA Prediction and Physical Clustering
The RGAugury pipeline (v 2017-10-21) [78] was used to automate RGA (NLR, RLK, and RLP) prediction in the B. napus Darmor-bzh NRGene v9 annotation. RGA candidates were classified into subclasses based on the presence or absence of specific domains. The NLR candidates were divided into classes based on domain presence. Proteins carrying only an NB-ARC domain were classified as NBS, proteins carrying TIR, NB-ARC, and Leucine-Rich-Repeat (LRR) domains were classified as TNLs, or TN if the LRR domain was missing. Proteins carrying Coiled-Coils, NB-ARC, and LRR domains were classified as CNLs, or CN if the LRR domain was missing, or NL if the Coiled-Coils domain was missing. Proteins carrying a TIR domain with additionally unknown domains were classified as TX. Other combinations (e.g., CNL + RPW8) were classified as OTHER. RGAs were joined into physical clusters if they were located within ±10 genes of each other.

CNV Analysis
To investigate the role of CNVs in RGA-diversity in B. napus, we generated wholegenome sequencing data to search for CNVs among RGAs of eight B. napus morphotypes; Ascona, English Giant, Hansen × Gaspard, Milena, Pacific, Pirola, Tina and Wilhelms- burger. While all lines are winter type and blackleg resistant, they are of interest for other characteristics including canola quality (widely cultivated) and resistance to diseases other than blackleg. The phylogeny analysis of lines is shown in Figure 1.
representing an average of 3.29 kb across the eight cultivars. Out of the 1,137 CNV events, 704 (61.92%, 2.58 Mbp) were deletions and 433 (38.08%, 1.16 Mbp) were duplications, with an average of 88 and 54 events, respectively (Table 1). We found 1.6× more deletion than duplication events, and on average deletions were larger (3.67 kb) than duplications (2.66 kb). The largest deletion and duplication percentages were found in the cultivars Tina (68.20%) and Pacific (50%), respectively ( Figure 2 and Table 1). We identified 188 CNV events (16.53%) that showed deletion in one cultivar, but duplication in another, which are termed as "both deletion and duplication". These "both deletion and duplication" events were detected on all chromosomes except A07, A08, A10, C01, C02 and C05 (Figures 3 and Figure S1).   (Table 1). We found 1.6× more deletion than duplication events, and on average deletions were larger (3.67 kb) than duplications (2.66 kb). The largest deletion and duplication percentages were found in the cultivars Tina (68.20%) and Pacific (50%), respectively ( Figure 2 and Table 1). We identified 188 CNV events (16.53%) that showed deletion in one cultivar, but duplication in another, which are termed as "both deletion and duplication". These "both deletion and duplication" events were detected on all chromosomes except A07, A08, A10, C01, C02 and C05 (Figures 3 and S1). Based on the number of CNV events detected in each cultivar, Hansen × Gaspard with 26.98% and Pirola with 11.86% contained the largest and lowest percentages of these "both deletion and duplication" CNV events, respectively, ( Table 1).  Based on the number of CNV events detected in each cultivar, Hansen × Gaspard with 26.98% and Pirola with 11.86% contained the largest and lowest percentages of these "both deletion and duplication" CNV events, respectively, ( Table 1).

Distribution along Chromosomes and Sub-Genomes
The average number of CNV events per chromosome ranged from 19.37 on chromosome A09 to 1.25 on chromosome A10 ( Figure 4 and Table S1). In cases where both deletion and duplication events were observed, the largest deletion and duplication percentages (in relation to the total number of CNV events on each chromosome) were found on chromosomes C08 (30 deletions out of 31 CNVs; 96.77%) and A03 (12 duplications out of 14 CNVs; 85.71%) in the cultivars Tina and Pirola, respectively ( Figure 4 and Table S1).  (Table S1). Overall, deletions were more abundant than duplications in both the A (317 vs. 251) and C (387 vs. 182) subgenomes (Table S1). Out of the 1,137 CNV events, 905 CNVs (79.59%) were found to be larger than 1 kb (Table S2). The average size of the CNVs identified varied from 1.91 kb in Ascona to 4.90 kb in Milena, with an average size of 3.29 kb across the eight cultivars (Table S2). In all the cultivars, except for Hansen x Gaspard, deletions were larger than duplications (Table S2 and Figure 2). The size distributions of observed CNVs were also very similar between the eight cultivars. Only Milena and Pacific had more CNVs larger than10 kb than CNVs smaller than 10 kb but larger than 5 kb (Figures 2 and 5).  (Table S1). Overall, deletions were more abundant than duplications in both the A (317 vs. 251) and C (387 vs. 182) sub-genomes (Table S1).
Out of the 1,137 CNV events, 905 CNVs (79.59%) were found to be larger than 1 kb (Table S2). The average size of the CNVs identified varied from 1.91 kb in Ascona to 4.90 kb in Milena, with an average size of 3.29 kb across the eight cultivars (Table S2). In all the cultivars, except for Hansen x Gaspard, deletions were larger than duplications (Table S2 and Figure 2). The size distributions of observed CNVs were also very similar between the eight cultivars. Only Milena and Pacific had more CNVs larger than10 kb than CNVs smaller than 10 kb but larger than 5 kb (Figures 2 and 5).
x FOR PEER REVIEW 8 of 17

CNVs across RGAs
We identified 563 RGAs overlapping with CNVs including 164 NLR, 319 RLK and 80 RLP genes. The largest classes of RGAs affected by CNV across the eight cultivars were RLK and RLP (on average 50.21% RLKs and 16.86% RLPs in each cultivar) (Table S3). Among the NLR sub-families, NL and TNL were the most abundant RGAs affected by CNV events (Table S3). Out of 563 RGAs, 310, 196 and 57 genes showed deletion, duplication and "both deletion and duplication", respectively (Table S4). No "both deletion and duplication" events were detected on chromosomes A07, A08, A10, C01, C02 and C05 (Figure 3). Across all eight cultivars, multiple RGAs overlapping CNV were shared between two or more cultivars ( Table 2). The highest and lowest two cultivar overlap was 126 between Tina and Wilhemsburger, and 11 between English Giant and Hansen × Gaspard ( Table 2). The number of RGAs with CNV in common between the cultivars is depicted in Table 2 and Figure 6. Out of 563 RGAs showing CNV, 262 (46.54%) were detected only in one cultivar and two (0.36%) were shared in all cultivars (Table 3).

CNVs across RGAs
We identified 563 RGAs overlapping with CNVs including 164 NLR, 319 RLK and 80 RLP genes. The largest classes of RGAs affected by CNV across the eight cultivars were RLK and RLP (on average 50.21% RLKs and 16.86% RLPs in each cultivar) (Table S3). Among the NLR sub-families, NL and TNL were the most abundant RGAs affected by CNV events (Table S3). Out of 563 RGAs, 310, 196 and 57 genes showed deletion, duplication and "both deletion and duplication", respectively (Table S4). No "both deletion and duplication" events were detected on chromosomes A07, A08, A10, C01, C02 and C05 (Figure 3). Across all eight cultivars, multiple RGAs overlapping CNV were shared between two or more cultivars ( Table 2). The highest and lowest two cultivar overlap was 126 between Tina and Wilhemsburger, and 11 between English Giant and Hansen × Gaspard ( Table 2). The number of RGAs with CNV in common between the cultivars is depicted in Table 2 and Figure 6. Out of 563 RGAs showing CNV, 262 (46.54%) were detected only in one cultivar and two (0.36%) were shared in all cultivars (Table 3).

Gene-Physical Clustering
Out of 1,768 RGAs previously identified in the B. napus Darmor-bzh NRGene v9 annotation, 793 RGAs were clustered in 306 physical clusters, of which 284 RGAs    (Table S4). The distribution and number of the singletons and clustered resistance genes affected by CNV across the chromosomes are presented in Table S5.

Investigating of RGAs Affected by CNV Events across Known Genomic Regions for Blackleg Resistance Genes
The RGA positions were compared with known regions for blackleg resistance to identify possible candidate genes affected by CNV. Positions were predicted for 14 markers from genetic mapping of seven loci: LepR1 (A02), LepR2 (A10), Rlm1, Rlm3, Rlm4, Rlm7 and Rlm9 (A07) in the Darmor-bzh v9 assembly (Table 4) (Table 4). Overall, we identified 100 RGAs within previously known regions for blackleg resistance of which 22 RGA were affected by CNV events. There were 64 RGAs overlapping Rlm1, Rlm3, Rlm4, Rlm7 and Rlm9 QTL on chromosome A07 of which 16 were affected by CNV events; 12 RLKs and 1 TNL were deleted, and 2 RLKs and 1 TNL were duplicated.
On chromosome A02, out of 7 RLKs, two RLKs were deleted, on chromosome A10, out of 29 RGAs three RLKs were deleted, and one RLK was duplicated (Table 4).

Discussion
Recently, several studies have reported CNV events across various crop species, including rice [27,36], wheat [85], barley [86], maize [52,87], soybean [46], melon [88] and cannabis [89]. Most of these studies have linked CNV analysis with agronomic traits. Given that canola is a major crop and CNVs are among the major genomic structural variations and hotspots for genetic and phenotypic variation during environmental adaptation and population differentiation, we performed genome-wide analysis of CNV events of RGAs across eight canola cultivars. In total 563 RGAs overlapped with 1,137 CNV events of which the majority were deletions (704 deletions, 433 duplications). The higher number of deletions than duplications is consistent with other B. napus studies. Schiessl, Huettel, Kuehn, Reinhardt and Snowdon [29] have shown that deletions are more abundant than duplications in B. napus as genomes are known to reduce their gene space after polyploidisation [90].
Deletions abolish gene function, whereas duplications can cause an alteration in gene expression level [91] and thereby affect gene dosage. Kopec et al. (2021) showed in B. napus resistant and susceptible lines against clubroot that the transcript levels of the two TNL copies in the resistant line was twice the amount of the transcript level of one copy in the susceptible line, and this upregulation was most likely involved with the resistance response [55]. Therefore, duplications are more likely to change traits than point mutations or InDels [92].
We found more deletions in the C sub-genome than in the A sub-genome and more duplications in the A sub-genome than in the C sub-genome. These findings are consistent with earlier B. napus studies [29]. This might be due to the fact that the A sub-genome copies had been selected over the C sub-genome copies. For example, CNVs concerning copies of Bna.FLC, Bna.PHYA and Bna.GA3ox1 involve duplications in the A sub-genome and corresponding homoeologous deletions in the C sub-genome [93]. Another possible explanation for this genome bias might be due to the high transposon content and more active transposons in the C sub-genome [5,94]. Generally, due to high gene redundancy [29] and inter-sub-genomic homology [95], genomic rearrangements are common events in polyploid genomes. Our data suggest that CNVs larger than 1 kb but smaller than 5 kb are more frequent than other CNV sizes. Similar results were found in rice and maize where smaller CNVs (shorter than 10 kb) are more frequent than larger ones [36,96].
CNV numbers differ between species and between individuals of the same species. In this study, the chromosomes of all eight cultivars exhibited different numbers and patterns of CNV events. Similarly, Springer et al. (2009) identified more than 400 putative CNVs between Mo17 and B73 maize inbred lines distributed across all maize chromosomes [31]. Furthermore, Demeke and Eng (2018) investigated CNVs among three canola cultivars and found variability in gene copy numbers [97].
Although CNVs frequently overlap with protein-coding regions in plant genomes [95], little is known about the presence and phenotypic effects of CNVs in plants. Nevertheless, the nature of CNVs detected in maize suggests that they may have a significant impact on plant phenotypes, including disease response and heterosis [36]. We found that the majority of RGAs that were associated with CNV events are RLKs due to RLKs being the most abundant class of RGAs. RLKs and RLPs are primary components of the first line of plant immune response and mediate microbial elicitors pathogen/microbe-associated molecular pattern (PAMP/MAMP), triggered immunity (PTI/MTI) [98] to recognize broad spectra of pathogens [99]. In addition to defense mechanisms, RLKs and RLPs are also involved with developmental processes [98] including meristem and stomatal development [100,101] which can explain their abundance across the genomes.
It has been reported that the CNV of RGAs differ between species and within species [102,103], and this variability allows RGAs to recognize a wide range of effec-tor proteins [104]. Therefore, a high copy number of RGAs should be beneficial to guard against the genetic diversity of pathogens.
We found that genes localized in physical clusters exhibit more CNV than singletons, which is consistent with a previous study in soybean [105]. RGAs in plants tend to be physically clustered in genomes [106]. For example, approximately 66% of resistance genes in Arabidopsis [107] and 76% in rice [108] were found in physical clusters. In addition, Yr genes responsible for resistance against wheat yellow rust were found to be physically clustered [109]. Similar to our findings, it has been previously reported that the majority of RGAs within a cluster belong to the same subfamily [110,111] and can have different rates and patterns of variation [112]. Genes in physical clusters may have adaptive advantages derived from rapid evolution due to rearrangement [52]. The results revealed that CNVs are distributed throughout the genome and CNV affected genes were more likely to be found in physical clusters. Thus, gene clustering may be a critical feature of the generation of novel resistance specificities through gene deletion or duplication.
Several regions that carry blackleg resistance genes have been identified in B. napus cultivars [80,83,113,114]. We identified 22 RGAs within the regions associated with blackleg resistance affected by CNV events, potentially leading to different levels of disease resistance in cultivars. Identification of RGA candidates and their structural variation will assist with RGA mapping and a better understanding of RGA evolution and functionality which is beneficial for genes identification and their application breeding programs.
To conclude, whole-genome sequencing was used to investigate CNV events of RGAs across eight blackleg resistant B. napus cultivars. The outcomes reveal that CNV events are a key type of genomic variation that may play an important role in disease resistance. The results constitute a valuable genome-wide variation resource of B. napus for future research on phenotypic variation and breeding. The results also provide insights into the evolution, formation and distribution of resistance genes in B. napus.

Supplementary Materials:
The following supporting information can be found at https://www. mdpi.com/article/10.3390/genes13112037/s1. Figure S1: The position of CNV events (red and blue lines represents deletions and duplications, respectively) across the chromosomes of eight B. napus cultivars. The tracks from outer to inner show chromosomes, Ascona, English Giant, Hansen × Gaspard, Hansen × Gaspard, Milena, Pa Pacific, Pirola, Tina and Wilhelmsburger; Table S1: Chromosomal distribution of CNV events in eight B. napus cultivars; Table S2: Characteristics of CNVs including CNV number, deletion to duplication ratio, average CNV size and percentage of CNVs larger or smaller than average in eight B. napus cultivars; Table S3: The number of RGAs affected by CNV events in eight B. napus lines; Table S4: The number of singletons and clustered RGAs affected by CNV across 563 RGAs; Table S5: Distribution and number of the singletons and clustered RGAs affected by CNV across the chromosomes.

Conflicts of Interest:
The authors declare no conflict of interest.