agronomy Fine Mapping and Functional Analysis of Major QTL, CRq for Clubroot Resistance in Chinese Cabbage ( Brassica rapa ssp. pekinensis )

: Clubroot disease caused by Plasmodiophora brassicae is one of the major threats to Brassica crops. New clubroot resistant varieties of Chinese cabbage ( B. rapa ssp. pekinensis ) have been developed through breeding, but the underlying genetic mechanism of clubroot resistance is still unclear. In this study, two Chinese cabbage DH lines, clubroot-resistant Y635-10 and susceptible Y177-47 were crossed to develop F 2 population for ﬁne mapping and cloning resistance gene CRq . After sequence analysis, the expression vector was constructed by gateway technology and transferred into Arabidopsis thaliana for functional characterization. Bulked segregant analysis sequencing (BSA-seq) conﬁrmed that CRq is located in the 80 kb genomic region on chromosome A03 between markers GC30-FW/RV and BGA. In silico tools conﬁrmed that the gene length was 3959 bp with 3675 bp coding sequences (CDs), and it has three exons and two introns. In addition, we found 72bp insertion in the third exon of CRq in the susceptible line. We developed and veriﬁed functional marker Br-insert1, by which genotyping results showed that 72bp insertion might lead to the destruction of the LRR region of Y177-47, resulting in a loss of resistance relative to clubroot. The results of genetic transformation showed that the roots for wild-type Arabidopsis thaliana were signiﬁcantly enlarged compared with T 2 generation transgenic Arabidopsis after treatment by P. brassicae spores, and transgenic Arabidopsis had certain resistance. Therefore, CRq is a candidate gene of clubroot disease resistance in Chinese cabbage, which could be used as a reference for elucidating disease resistance mechanisms and the marker-assisted breeding of clubroot resistant varieties.


Introduction
Brassica crops are affected by clubroot disease, resulting in a global reduction in yields of 10-15%. Currently, more than 20 provinces, municipalities and autonomous regions in China are affected by clubroot disease with varying morbidity, and the disease is more frequent in the southwest and central regions [1]. Plasmodiophora brassicae is a biotrophic

Plant Materials
A set of 340 QJF 2 population was developed by crossing between clubroot resistant line Y635-10 (derived from hybrid 'Qiulihuang') and susceptible line Y177-47 (derived from hybrid 'Jianchun') for mapping. P. brassicae race 4 was collected from the roots of infected Chinese cabbage field and characterized by William's system [13]. In addition, we also used 30 each of the highly resistant and susceptible plants from the F 2 population to analyze the possible mutations in the candidate resistant gene (Table S1).

Isolation of Pathogen and the Pathogenicity Test
Seeds were planted in 50-well plates and kept in the greenhouse with a 16 h photoperiod at 23-26 • C. The suspension of 2 mL of P. brassica resting spores (10 7 spores/mL) were injected at the bottom of stem base of 10-day-old seedlings [22]. The infected roots were collected on 30 days post-inoculation. The sampled roots were pooled from three individual plants. The samples were immediately frozen in liquid nitrogen and stored at −80 • C until use [23]. The incidence of clubroot disease was determined by combining the pathogenesis of the diseased plant and the size of the root. Disease severity was scored as 0: no infection; 1: small root nodules in lateral roots; 3: the main root was swollen and the diameter was less than 2 times of the stem; 5: the main root was swollen and the diameter was 2-4 times of the stem; 7: the main root was swollen and the diameter was4-times larger than the stem [24]. The disease index (DI) was calculated according to the following formula: DI = [(n 1 + 3n 3 + . . . + 7n 7 )/N T × 7] × 100, where the number of plants with the symptom of I and N T is the total number of plants tested [25]. The DI for each F 2 individual was calculated from the mean grade of 2 replicates.

Bulked Segregant Analysis (BSA) by Resequencing
The young leaves of Chinese cabbage were harvested from F 2 plants and parents inoculated with P. brassicae. DNA was extracted from the leaves of parents and F 2 (Y635-10×Y177-47) plants by the modified cetyltrimethyl ammonium bromide (CTAB) method [26]. For BSA-seq, two DNA pools were constructed, of which one contained DNA from 30 highly resistant (DI = 0; R-pool) and the other contained DNA from 30 highly susceptible (DI = 5-7; S-pool) F 2 plants. Sequence data of two bulks and two parents were generated by Illumina HiSeq 2500 and analyzed by Novogene (http://www.novogene.com accessed on 1 March 2015). In order to ensure the reliability of reads, raw data (raw reads) were first processed through a series of quality control (QC) procedures, and low quality paired reads and adapter contamination were removed [27].
The filtered clean reads from the two DNA bulks were aligned to the B. rapa Chiifu-401 genome version 1.5 (http://brassicadb.org/brad/index.php accessed on 15 March 2015) using BWA software. The SAM tool was used to convert alignment files to SAM/BWA files and to make SNP calls. The SNP-index and ∆(SNP-index) were calculated for all positions to identify candidate regions associated with the clubroot resistance gene [28,29]. The SNP-index was calculated by using the total number of reads harboring an SNP compared to the reference genome sequence and dividing this total by the entire numbers of reads. The average SNP-index was calculated in a 1 Mb interval with a 10 kb sliding window. ∆(SNP-index) was calculated by subtracting the SNP-index of the S-pool from that of the R-pool [27]. The homozygous SNP-index was defined as '0' and '1' when the total reads contained genomic fragments derived from 'Y177-47' and 'Y635-10', respectively. SNPindex graphs for the S-pools and R-pools were plotted using the average SNP-index against the positions of each sliding window in the Chiifu-401 genome, and ∆(SNP-index) was plotted against positions of the genome using the SNP-indexes of the S-and R-pools [29].

Marker Analysis and Linkage Map Development
The candidate region for the clubroot resistance gene was identified by BSA-seq and was verified using SSR marker. The interval sequences were downloaded from the Brassica database (http://brassicadb.org/brad/downloadOverview.php accessed on 18 May 2021). InDel markers numbering 365 and 298 SSR primers were designed with Primer Premier 5.0 software. They were used for identifying polymorphism of two parents and two bulk DNA pools [30]. The segregation ratios of markers were tested for expected Mendelian ratios using the chi-square test by JoinMap4.0 at a significance level of α = 0.05. The linked loci were identified and completely removed from the data set. Linkage groups were established with a minimum logarithm of odds (LOD) threshold of 4.0, and the genetic linkage map was constructed following the Kosambi mapping function [31,32].

Cloning and Sequence Analysis of the CRq Gene
In order to clone the CRq gene, the candidate region was compared with the Chiifu-401 genome and 4 R genes were found (Bra019409, Bra019410, Bra019412 and Bra019413). The local Blastn alignment between the R genes and the assembled sequence showed the similarity of candidate fragments as CRq. Then, Clontech's Smarter ® Race 5 /3 Kit was used to clone the complete cDNA 5 and 3 ends of the CRq gene (Table S6). The full-length CRq gene was cloned from the clubroot resistant line Y635-10 and susceptible line Y177-47 (Table S6). DNAMAN software was used for analyzing the sequences. The complete coding sequence of the CRq (CRb) gene was listed as MW (LC155799.1) in GenBank [18]. PCR amplification was performed in a total volume of 30 µL according to the manual supplied with Phanta Max Master Mix (Vazyme Biotech Co., Ltd., Nanjing, China). The PCR products were sequenced by Sunya Biotech Co., Ltd. (Zhengzhou, China). The physicochemical properties of protein were determined following Expasy (http://web. expasy.org/protparam/accessed on 15 September 2021).

Quantitative Real-Time PCR (qRT-PCR) and Semi-Quantitative RT-PCR Analyses
Samples were collected at 0, 2, 5, 8, 13, and 22 days after inoculation, and total RNA was extracted using RNAprep Pure Plant Kit (Beijing Tien, China) according to the manufacturer's instructions [22]. The first strand of cDNA was synthesized in a 20 µL volume containing approximately 7 µg RNA and oligo (dT) primers and TransScript One- Step gDNA removal was used (Trans, Beijing, China). The resulting cDNA was diluted 10-fold, for qRT-PCR in Roche LightCycler 480-II system (Roche Applied Sciences, Beijing, China). qRT-PCR was conducted as previously described [33]. GAPDH was used as an internal reference gene for Chinese cabbage [34]. Each qRT-PCR experiment was performed in 3 technical replications, and the mean value was used for qRT-PCR analysis [18]. Relative expression levels were calculated using the 2 ∆Ct method [35]. Sequence information from B. rapa was used for designing primers of RT-PCR (Table S6). Standard PCR was performed with 5 min denaturation at 94 • C followed by 25 cycles of 94 • C for 30 s, 55 • C for 30 s and 72 • C for 60 s. The PCR products were visualized following electrophoresis on a 1% agarose gel [36].

Construction of Expression Vectors
The gateway technique used the site-specific recombinant system of λ-phage to construct the clone vector through BP and LR reactions (https://www.thermofisher.cn/ accessed on 10 October 2021). In the BP reaction, attB1 site was added at the 5 end of the PCR product, and attB2 site was added at the 3 end. On the other hand, attP1 and attP2 sites were added at both ends of the donor carrier. Under the catalysis of BP reactivity enzyme, new sites such as attL1 and attL2 were formed by a homologous recombination with the donor carrier. At the same time, an entry cloning vector carrying the target gene was generated [37]. The target gene fragment was put into the donor vector and in the way of the gene recombination to form an entry clone vector. In the LR reaction, the same principle of BP reaction was followed to transfer the target gene from the entry vector into the expression vector by means of gene recombination through the catalysis of LR reactivity enzyme [38]. To detect whether the target genes were inserted into the target vector, monoclonal colonies were amplified with primers (Table S6).

Genetic Transformation in Arabidopsis thaliana
The CRq gene was transformed in situ in Arabidopsis floral organs mediated by Agrobacterium tumefaciens [39]. Arabidopsis seeds were sterilized and sown evenly on 1/2 MS solid plates, incubated in the dark at 4 • C for 2 days and placed in a plant growth incubator ((20 ± 2) • C, 16 h/8 h(L/D)) medium culture [22]. At the four-leaf stage, Arabidopsis thaliana seedlings were transplanted into nutrient soil and cultivated to the flowering stage. The main inflorescence is cut off, and the lateral inflorescence with more flower buds was dipped into Agrobacterium solution by removing the open flower buds and pods. The preserved Agrobacterium strain was activated by picking a single colony and inoculated in YEB liquid medium containing 50 mg/L rifampicin and 50 mg/L kanamycin and incubated at 28 • C, 240 r/min, to an OD 600 of approximately 0.8. P. brassicae was collected by centrifugation and used for transformation solution (1/2 MS medium + 50 g/L sucrose + 100 µL/L Silwet L-77) by resuspending. Arabidopsis flower buds were soaked in the dip solution for 30 s and then wrapped in plastic wrap, incubated in the dark for 24 h and then cultured in normal light. The seeds were collected from individual plants at maturity and recorded as T 0 generation.

Screening, Identification and Phenotype Observation of Transgenic Arabidopsis
T 0 generation seeds were sterilized according to the strains, snf, and they were evenly sown on resistant plates containing kanamycin (50 mg/L); moreover, non-transgenic wildtype seeds were used as controls [40]. Two weeks later, the resistant seedlings with normal root and leaf development were transplanted into nutrient soil and cultivated in a plant growth incubator. When 10 rosette leaves were obtained, the leaves were cut to extract genomic DNA. For amplifying the target gene, a PCR was performed using gene specific primers; false positive plants were eliminated, and seeds were harvested only from the positive plants, which was recorded as T 1 generation. Subsequently, the Kanamycin resistance plate was used to screen T 1 generation seeds, and the mature seeds from T 1 resistant plants were harvested per plant and recorded as T 2 generation. T 2 generation seeds from each plant were screened with hygromycin resistance plates. After being airdried, the seeds were stored at 4 • C for subsequent test analysis. Each strain of T 0 -T 2 generation was observed and recorded.

Phenotypic and Genetic Analysis
The QJ F 2 segregating populations were screened by artificial inoculation with clubroot spores (10 7 spores/mL), and the plants were categorized following the scale described in materials and methods. The F 2 plants with grade 0 and 1 were considered as resistant, but those with grades 3, 5 and 7 were considered as susceptible ( Figure 1). The screening experiments were repeated in two seasons. In spring, out of 340 F 2 plants, 259 were identified as resistant and 81 as susceptible. In autumn, out of 340 F 2 plants 266 were identified as resistant and 74 were identified as susceptible (Table 1). The F 2 population showed a 3:1 separation in both spring and autumn, χ 2 3:1 = 0.192 and 1.878, respectively, which confirmed that the resistance trait is under Mendelian inheritance and governed by a single dominant gene, which was contributed from the resistance to clubroot in Y635-10.

Mapping of CRq Gene Based on BSA-Seq Analysis
Genomic DNA of two parents (Y635-10 and Y177-47) and 30 plants from each of R-pool and S-pool was sequenced by illumine HiSeq 2500 sequencer. We obtained high quality reads with Q30 ≥ 87.50% (Table S2). We filtered the predicted SNP set by removing SNPs that had mass value ≤20. After filtering SNP indice sites ≤0.3 or ≥0.7 in both pools, we retrieved 182,425 SNPs for analysis. The distribution curve of the SNP index on the genome was achieved by using a 1 Mb window, where a 10 kb sliding window was moved each time for coordinating the representative values of SNP-index ( Figure S1) and potential candidate gene regions ( Figure S2). With the results of SNP-exponential curve and Chi-square test, we identified three potential candidate regions on chromosome A03 (Table S3) located in the 23.80-28.10 Mb region of the chromosome. A total of 365 InDel and 298 SSR markers were designed in the candidate region with the highest probability of representation A03 chromosome. Primer polymorphism was screened by using Y635-10, Y177-47 and F1 as templates (Table S4). The selected polymorphic primers were linked by JoinMap 4.0, and the partial linkage genetic map of A03 chromosome was constructed. CRq gene from 'Qiulihuang' was mapped between GC30-FW/RV and BGA06 with a genetic distance of 0.2 cM and 0.4 cM, respectively ( Figure 2A). Results indicated that the range of candidate region was successfully narrowed to 24.35-24.43 Mb (0.08 Mb) ( Figure 2B).

Mapping of CRq Gene Based on BSA-Seq Analysis
Genomic DNA of two parents (Y635-10 and Y177-47) and 30 plants from each of R-pool and S-pool was sequenced by illumine HiSeq 2500 sequencer. We obtained high quality reads with Q30 ≥ 87.50% (Table S2). We filtered the predicted SNP set by removing SNPs that had mass value ≤20. After filtering SNP indice sites ≤0.3 or ≥0.7 in both pools, we retrieved 182,425 SNPs for analysis. The distribution curve of the SNP index on the genome was achieved by using a 1 Mb window, where a 10 kb sliding window was moved each time for coordinating the representative values of SNP-index ( Figure S1) and potential candidate gene regions ( Figure S2). With the results of SNP-exponential curve and Chi-square test, we identified three potential candidate regions on chromosome A03 (Table S3) located in the 23.80-28.10 Mb region of the chromosome. A total of 365 InDel and 298 SSR markers were designed in the candidate region with the highest probability of representation A03 chromosome. Primer polymorphism was screened by using Y635-10, Y177-47 and F 1 as templates (Table S4). The selected polymorphic primers were linked by JoinMap 4.0, and the partial linkage genetic map of A03 chromosome was constructed. CRq gene from 'Qiulihuang' was mapped between GC30-FW/RV and BGA06 with a genetic distance of 0.2 cm and 0.4 cm, respectively (Figure 2A). Results indicated that the range of candidate region was successfully narrowed to 24.35-24.43 Mb (0.08 Mb) ( Figure 2B).

Candidate Gene Analysis
Based on the fine mapping of CRq region, we compared the 80 kb range of DNA sequences from the Brassica database with annotated genes of Arabidopsis thaliana, which confirmed 11 genes present in the candidate region ( Figure 2C). Functional annotation analysis showed that Bra019409, Bra019410, Bra019412 and Bra019413 are potential candidate genes, which are encoded in the TIR-NBS-LRR protein ( Table 2). The expression levels of R genes were analyzed by RT-PCR and qRT-PCR, which showed differential expression levels at 0 DAI, 2 DAI, 5 DAI, 8 Figure 3A). The expression of Bra019410 in resistant material was higher than that in susceptible material ( Figure 3B). On 0 DAI, the expression of Bra019412 and Bra019413 in Y6350-10 was higher than in Y177-47. The content of Bra019412 and Bra019413 in Y177-47 increased sharply in 2 DAI and decreased sharply in Y635-10. At this time, the expression levels of Bra019412 and Bra019413 in the susceptible materials were higher than those in resistant materials, and the expression levels of Bra019412 and Bra019413 in Y635-10 gradually increased at 5 DAI and 8 DAI, while the expression levels in Y177-47 gradually decreased. The expression levels of Bra019412 and Bra019413 in 8 DAI,13 DAI,22 DAI and Y635-10 were higher than those of Y177-47 and showed a stable trend ( Figure 3C,D). Therefore, we assumed that Bra019412 and Bra019413 are the most likely candidate genes regulating clubroot resistance.

Cloning and Sequencing of CRq Gene
The candidate genes sequences and the RNA-Seq de novo assembly sequences were analyzed by local Blastn, and the data of RNA-seq were deposited at NCBI under ac-cession numbers SRR3571023 and SRR3571024. The CRq gene fragment of resistant clubroot of Chinese cabbage was found. Then, the complete cDNA of the CRq gene was cloned using SMARTER ® RACE 5 /3 Kit and isolated by electrophoresis. After sequencing, SeqMan software was used to splice the 5 and 3 ends, respectively ( Figure 4A,B). The gene of CRq was amplified by PCR with primers designed in Premier 5.0. In silico tools confirmed that the gene length was 3959 bp with 3675 bp coding sequences (CDs), and it has three exons and two introns ( Figures 4C and 3). Sequence analysis confirmed that  Figure 5B). Clubroot resistance genes CRa, CRb and Crr1a have been cloned, and the TIR-NB-LRR protein was also found. We performed a blast comparison of the CRq gene CDS in NCBI and found 100% similarity of CRq relative to the CRb sequence ( Figure S5 and Table S5). We hypothesized that CRq might be an allele of the CRb gene.
The CRq gene was amplified and sequenced in the resistant and susceptible lines using specific primers (Table S6). Sequence analysis showed that the candidate genes of Y177-47 and Y635-10 were 4031 bp and 3959 bp, respectively, and contained three exons and two introns. The CDS lengths of candidate genes Y177-47 and Y635-10 were 3747 bp and 3675 bp, respectively ( Figure 4D). There were four SNP variations and one InDel variation between the genomic sequences of Y635-10 and Y177-47' among them, the 72 bp insertion at 2327 bp in the third exon was the most significant sequence variation in susceptible parents ( Figure 5A and Figure S4). As a result, functional marker Br-insert1 was designed, by which 24 extremely resistant and 24 susceptible materials were tested in F 2 population. The results showed that Br-insert1 co-segregated with clubroot phenotype in the F 2 population. Twenty-four resistant materials showed the same genotype as Y635-10, and eight susceptible materials showed the same genotype as Y177-47 ( Figure 5C). In summary, the base insertion in the LRR region may lead to a frame-shift mutation, which destroys the structure of the domain and leads to the loss of CRq protein.

Cloning and Sequencing of CRq Gene
The candidate genes sequences and the RNA-Seq de novo assembly sequences were analyzed by local Blastn, and the data of RNA-seq were deposited at NCBI under ac-cession numbers SRR3571023 and SRR3571024. The CRq gene fragment of resistant clubroot of Chinese cabbage was found. Then, the complete cDNA of the CRq gene was cloned using SMARTER ® RACE 5′/3′ Kit and isolated by electrophoresis. After sequencing, SeqMan software was used to splice the 5′ and 3′ ends, respectively ( Figure 4A,B).  Figure 5B). Clubroot resistance genes CRa, CRb and Crr1a have been cloned, and the TIR-NB-LRR protein was also found. We performed a blast comparison of the CRq gene CDS in NCBI and found 100% similarity of CRq relative to theCRb sequence ( Figure S5 and Table S5). We hypothesized that CRq might be an allele of the CRb gene. The CRq gene was amplified and sequenced in the resistant and susceptible lines using specific primers (Table S6). Sequence analysis showed that the candidate genes of Y177-47 and Y635-10 were 4031 bp and 3959 bp, respectively, and contained three exons and two introns. The CDS lengths of candidate genes Y177-47 and Y635-10 were 3747 bp and 3675 bp, respectively ( Figure 4D). There were four SNP variations and one InDel

Construction of CRq Gene in Expression Vector
After attB-PCR products were connected to the entry vector, the clonal P. brassicae solution was selected for PCR reactions and detected by 2% agarose gel electrophoresis. Meanwhile, vectors were selected as the positive control. Electrophoresis results showed that the CRq gene was detected in all 15 mono-clones, but some of the PCR products were weak, which might be a false positive. Therefore, after comparing with the positive control, three samples of lanes 1, 2 and 3 were selected for sequencing verification, and the feedback results of the three samples were consistent with the target gene sequence ( Figure S6), indicating that the obtained clones were positive clones. To detect whether the CRq gene was inserted correctly into the two sites of the target vector at the same time, the selected 13 monoclonal colonies were amplified with three pairs of primers (Table S6). Electrophoresis results showed that the clones were positive clones, and the plasmid was extracted from these clones to obtain aCRq expression vector.

Analysis of Genetic Transformant of Arabidopsis thaliana
Transgenic T1 generation and the wild seeds of Arabidopsis thaliana were disinfected and identified using ½ MS medium containing kanamycin. Transgenic Arabidopsis thaliana of T1 generation could be grown and developed normally in kanamycin-containing

Construction of CRq Gene in Expression Vector
After attB-PCR products were connected to the entry vector, the clonal P. brassicae solution was selected for PCR reactions and detected by 2% agarose gel electrophoresis. Meanwhile, vectors were selected as the positive control. Electrophoresis results showed that the CRq gene was detected in all 15 mono-clones, but some of the PCR products were weak, which might be a false positive. Therefore, after comparing with the positive control, three samples of lanes 1, 2 and 3 were selected for sequencing verification, and the feedback results of the three samples were consistent with the target gene sequence ( Figure S6), indicating that the obtained clones were positive clones. To detect whether the CRq gene was inserted correctly into the two sites of the target vector at the same time, the selected 13 monoclonal colonies were amplified with three pairs of primers (Table S6). Electrophoresis results showed that the clones were positive clones, and the plasmid was extracted from these clones to obtain a CRq expression vector.

Analysis of Genetic Transformant of Arabidopsis thaliana
Transgenic T 1 generation and the wild seeds of Arabidopsis thaliana were disinfected and identified using 1 2 MS medium containing kanamycin. Transgenic Arabidopsis thaliana of T 1 generation could be grown and developed normally in kanamycin-containing medium.
After screening, the transgenic Arabidopsis thaliana of T 1 generation was grown in a growth chamber. Then, we extracted genomic DNA of wild and T 1 generation plants and used them as templates to conduct PCR detection with specific primers of the CRq gene ( Figure S7). The transgenic plants that amplified the target band were considered as positive and were cultured to maturity for T 2 seeds in order to further validate the role of CRq gene in the process of clubroot resistance. We infected wild and T 2 seeds of A. thaliana with clubroot spores (10 7 spores/mL) and cultured in growth chamber for screening out the resistant one. The roots of wild type A. thaliana were swollen due to P. brassicae infection (Figure 6), while the roots of transgenic A. thaliana were normal. medium. After screening, the transgenic Arabidopsis thaliana of T1 generation was grown in a growth chamber. Then, we extracted genomic DNA of wild and T1 generation plants and used them as templates to conduct PCR detection with specific primers of the CRq gene ( Figure S7). The transgenic plants that amplified the target band were considered as positive and were cultured to maturity for T2 seeds in order to further validate the role of CRq gene in the process of clubroot resistance. We infected wild and T2 seeds of A. thaliana with clubroot spores (10 7 spores/mL) and cultured in growth chamber for screening out the resistant one. The roots of wild type A. thaliana were swollen due to P. brassicae infection (Figure 6), while the roots of transgenic A. thaliana were normal.

Discussion
Different advanced techniques such as BSA-seq technologies, combined Bulked Segregant Analysis (BSA) and next-generation sequencing (NGS) are more efficient in marker-assisted selection by establishing the association of agronomic traits and molecular markers or candidate genes [21,41]. In most cases, the genes controlling the target agronomic traits are located on the candidate regions detected by BSA-seq analysis [33]. In our study, genetic analysis showed that resistance to clubroot in Y635-10 is controlled by a single dominant gene. However, our BSA-seq analysis identified three adjacent regions on A03 instead of one, and the results were inconsistent with genetic analysis. This phenomenon also occurred in other studies in our laboratory [33]. We suspected that the structural differences between the parental genome and the reference genome reduce the efficiency of BSA-seq. Although the results of BSA-seq were not perfect, we used SNP-index curves and chi-square distribution to rapidly locate the clubroot resistance gene on A03. The gene was named as CRq [21].
In this study, we determined that CRq is located in the 80 kb genomic region between GC30-FW/RV and BGA06 markers (24350761-24426905 bp on A03 based on information of BRAD). The functional annotation analysis of 11 genes in the candidate region showed four R genes (Bra019409, Bra019410, Bra019412 and Bra019413) homologous to Arabidopsis, which are carried typical TIR-NBS-LRR protein domains responsible for disease resistance [42,43]. In previous reports, three CR genes (CRa, CRb and Rcr1) of B. rapa are located at around 23-24 Mb on chromosome A03 of B. rapa, and CRa and CRb are a pair of alleles [19]. In this study, the CRq gene coding sequence was compared with the reported CRb in NCBI, and it was found that the nucleotide sequence of coding area between CRq and CRb had 100.00% similarity (Table S5). The CRq gene we reported is likely to be an allele of the CRb gene, but the CRq gene contains two introns and three

Discussion
Different advanced techniques such as BSA-seq technologies, combined Bulked Segregant Analysis (BSA) and next-generation sequencing (NGS) are more efficient in markerassisted selection by establishing the association of agronomic traits and molecular markers or candidate genes [21,41]. In most cases, the genes controlling the target agronomic traits are located on the candidate regions detected by BSA-seq analysis [33]. In our study, genetic analysis showed that resistance to clubroot in Y635-10 is controlled by a single dominant gene. However, our BSA-seq analysis identified three adjacent regions on A03 instead of one, and the results were inconsistent with genetic analysis. This phenomenon also occurred in other studies in our laboratory [33]. We suspected that the structural differences between the parental genome and the reference genome reduce the efficiency of BSA-seq. Although the results of BSA-seq were not perfect, we used SNP-index curves and chi-square distribution to rapidly locate the clubroot resistance gene on A03. The gene was named as CRq [21].
In this study, we determined that CRq is located in the 80 kb genomic region between GC30-FW/RV and BGA06 markers (24350761-24426905 bp on A03 based on information of BRAD). The functional annotation analysis of 11 genes in the candidate region showed four R genes (Bra019409, Bra019410, Bra019412 and Bra019413) homologous to Arabidopsis, which are carried typical TIR-NBS-LRR protein domains responsible for disease resistance [42,43]. In previous reports, three CR genes (CRa, CRb and Rcr1) of B. rapa are located at around 23-24 Mb on chromosome A03 of B. rapa, and CRa and CRb are a pair of alleles [19]. In this study, the CRq gene coding sequence was compared with the reported CRb in NCBI, and it was found that the nucleotide sequence of coding area between CRq and CRb had 100.00% similarity (Table S5). The CRq gene we reported is likely to be an allele of the CRb gene, but the CRq gene contains two introns and three exons, while the CRb gene contains three introns and four exons. We speculate that this may be caused by an alternate splicing of gene sequences in different materials.
The tandem repeats of the NBS-LRR gene may promote the unequal recombination of paralogs, thus changing the gene number and producing new resistance gene recognition specificity [44]. CRa and CRb genes have been reported, which are verified in studies [18,19]. In this study, we identified four genes (Bra019409, Bra019410, Bra019412 and Bra019413) in the candidate regions. RT-PCR and qRT-PCR analyses showed that the expression of Bra019412 and Bra019413 in Y635-10 was higher than in Y177-47 at 0 DAI. The content of Bra019412 and Bra019413 in Y177-47 increased sharply in 2 DAI and decreased sharply in Y635-10. At this time, the expression levels of Bra019412 and Bra019413 in the susceptible materials were higher than those in the resistant materials, and the expression levels of Bra019412 and Bra019413 in Y635-10 gradually increased at 5 DAI and 8 DAI, while the expression levels in Y177-47 gradually decreased. The expression levels of Bra019412 and Bra019413 in 8 DAI,13 DAI, 22 DAI and Y635-10 were higher than those of Y177-47 and showed a stable trend ( Figure 3C,D). These results were consistent with the previously published results of MIR1885 regulating TNL genes [45]. We assumed that CRq may be formed by the duplication and crossover of Bra019412 and Bra019413 genes.
In the NBS-LRR gene family, the change of length of the LRR domain in the R gene seems to be an important factor for diversification [46]. For example, the genes of the tomato Cf4/9 locus are mainly changed due to multiple nucleotide substitutions. The related genes of the unlinked Cf2/5 locus have also undergone deletion/duplication events involving a single LRR domain repeat unit [47]. In addition, these events are limited to the amino-terminal LRR region of the protein, which is a Cf protein region that determines the specific difference between paralogs [48]. In our study, sequence alignment showed 72 bp insertions between 2326 bp in the third exon of susceptible line. A functional marker Br-insert1 was co-segregated with clubroot phenotype in the F 2 population. In summary, the base insertion in the LRR region may lead to frameshift mutation, which destroys the structure of the domain and leads to a loss of CRq proteins.
In traditional methods, the construction of plant expression vector is the enzyme digestion method, which is used specific restriction enzymes to confront the corresponding loci for the purpose of a specific enzyme. The enzyme digestion is reused after connection of the plasmid gene by ligase enzyme. However, this process is very complex and requires long experiment cycles [49]. Invitrogen has invented a new entry vector-Gateway technologythat uses specific homologous recombination between sites for construction. Compared with traditional methods, this technology does not require restriction enzymes and ligases for carrying out digestion and connection. Moreover, after completing BP reaction and obtaining the entry clone, it can be transferred to a variety of different destination vectors. Moreover, this technology uses the principle of recombination and exchange to construct vectors. It will not affect the normal reading frame and orientation of the gene, and it will not have a great impact on sequencing and expression. It can facilitate the construction of expression vector to the greatest extent. Currently, the use of genetic transformation technology to perform functional identification of target genes is very common. As a model plant, Arabidopsis is a cruciferous crop similarly to Chinese cabbage, in which the genetic transformation of the target CRq gene had great significance in the functional identification of this gene [50]. CRq at the T 2 generation of transgenic Arabidopsis showed clubroot resistance after 28 days of fungal inoculation, and the roots showed no signs of the disease, while the roots of wild type Arabidopsis thaliana were significantly enlarged ( Figure 6). Therefore, it is speculated that this gene is a candidate resistance gene, which plays a role in the resistance of clubroot in Chinese cabbage.

Conclusions
We fine-mapped the CRq gene to an 80 kb interval following BSA-seq and InDel linkage analyses. Functional annotation, expression study and sequence variation analysis showed that Bra019412 and Bra019413 were most likely the resistance candidates. Due to the conserved nature of the R gene, we obtained the resistance CRq gene by RNA-seq and RACE techniques. The 72-base insertion occurred in the third exon of the CRq gene, and the frame-shift mutation may occur in the LRR region, damaging the structure of this region and leading to a loss of CRq function. Furthermore, genetic transformation results confirmed that the CRq gene played a vital role in resistance response. Those results not only verified and analyzed the biological function of CR gene but also provided a methodological reference for elucidating the disease resistance mechanism and the markerassisted breeding of clubroot resistant varieties.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/agronomy12051172/s1, Figure S1: Identification of the clubroot resistance gene CRq related candidate regions through SNP-index association analysis. SNP-index graphs of R-pool (a), S-pool (b) and ∆ (SNP-index) (c) were obtained from BSA-seq analysis. X-axis represents the position of ten chromosomes and Y-axis represents the SNP-index; Figure S2: SNPindex analysis of potential regions of candidate genes; Figure S3: The lengths of gDNA and CDS of CRq gene were 3959 bp and 3675 bp, respectively; Figure S4: Genomic DNA sequence alignment of CRq from Y635-10 and Y177-47. There were 4 SNP variation and 1 InDel variation between the genomic sequence of Y635-10 and Y177-47, among them, the 72 bp insertion at 2327 bp was the most significant sequence variation; Figure S5: The CRq gene CDS in NCBI found 100% similarity of CRq to the CRb sequence; Figure S6: Detection of PCR products for clones derived from LR reaction. Lanes 1~14 are 14 monoclone from LR reaction and the 15 lane is the positive control of CRq gene; Figure S7: Detection of PCR products for the CRq gene derived from T 1 generation of transgenic Arabidopsis. Lane 1 is the positive control and Lane 2 is the negative control for CRq gene. Lane 3 is wild type Arabidopsis. Lanes 4~13 are PCR products for CRq gene derived from T 1 generation of transgenic Arabidopsis; Table S1: Materials used in this study; Table S2: Summary of whole-genome re-sequencing data for BSA-seq; Table S3: Potential regions of the clubroot-resistant candidate genes in Chinese cabbage; Table S4: Primer sequences and genetic information for markers developed in this study; Table S5: Comparison of CRq cDNA sequence with other R gene; Table S6: The primer sequences in this study.