Genetic and Molecular Characterization of a Self-Compatible Brassica rapa Line Possessing a New Class II S Haplotype

Most flowering plants have evolved a self-incompatibility (SI) system to maintain genetic diversity by preventing self-pollination. The Brassica species possesses sporophytic self-incompatibility (SSI), which is controlled by the pollen- and stigma-determinant factors SP11/SCR and SRK. However, the mysterious molecular mechanism of SI remains largely unknown. Here, a new class II S haplotype, named BrS-325, was identified in a pak choi line ‘325’, which was responsible for the completely self-compatible phenotype. To obtain the entire S locus sequences, a complete pak choi genome was gained through Nanopore sequencing and de novo assembly, which provided a good reference genome for breeding and molecular research in B. rapa. S locus comparative analysis showed that the closest relatives to BrS-325 was BrS-60, and high sequence polymorphism existed in the S locus. Meanwhile, two duplicated SRKs (BrSRK-325a and BrSRK-325b) were distributed in the BrS-325 locus with opposite transcription directions. BrSRK-325b and BrSCR-325 were expressed normally at the transcriptional level. The multiple sequence alignment of SCRs and SRKs in class II S haplotypes showed that a number of amino acid variations were present in the contact regions (CR II and CR III) of BrSCR-325 and the hypervariable regions (HV I and HV II) of BrSRK-325s, which may influence the binding and interaction between the ligand and the receptor. Thus, these results suggested that amino acid variations in contact sites may lead to the SI destruction of a new class II S haplotype BrS-325 in B. rapa. The complete SC phenotype of ‘325’ showed the potential for practical breeding application value in B. rapa.


Introduction
In flowering plants, self-incompatibility (SI) evolved to maintain genetic diversity and prevent self-pollination, in which pistils inhibit pollen germination or pollen tube growth after self-pollination [1]. In Brassica species, the SI system is characterized by sporophytic self-incompatibility (SSI), in which the diploid parental genotype determines the pollen grain self-incompatible phenotype, and is controlled by a single Mendelian locus (named S locus) [2]. The S locus contains multiple physically linked genes and segregates with the SI phenotype [3]. The S locus genes have been identified, including S locus Glycoprotein Previous reports investigated leaf development, flowering, and LTR-RT expansion based on the assembled genomes in B. rapa [32,33,66]; meanwhile, we concentrated on performing comparative analysis of the S locus of two class II S haplotypes, BrS-60 and BrS-325, which are each other's closest relatives. The S locus of BrS-60 and BrS-325 showed high sequence polymorphism with few genes' distribution. Meanwhile, multiple sequence alignment of class II SCRs and SRKs was carried out to reveal amino acid variations in the contact regions (CR II and CR III) of BrSCR-325 and the hypervariable regions (HV I and HV II) of BrSRK-325s, which may influence the interaction between the ligand and the receptor. Here, a complete pak choi genome was obtained and many amino acid variations were identified in the contact regions of BrSCR-325 and the hypervariable regions of BrSRK-325s, which may lead to self-incompatibility destruction.

Determination of a New Class II S Haplotype in an SC B. rapa
In our breeding process, we found that the inbred line '325' was completely selfcompatible, the self-pollination of which involved normal pollen germination and produced normal siliques, harboring full seeds ( Figure 1A,B). To explore the mechanism of selfcompatibility in '325', we determined the S haplotype by cloning and analyzing S locus genes. The line '325' could be amplified by universal primer pairs of class II, but not class I, indicating that a class II S haplotype existed ( Figure 1C). Then, based on the sequences of class II SRK and SCR genes, partial sequences of SCR and SRK E1 (the first exon of SRK) were obtained by the means of PCR and sequenced ( Figures S1 and S2). Nucleotide sequence comparative analysis revealed that both the SCR and SRK E1 sequences in '325' were different from the reported class II S locus genes and showed the highest sequence similarity with that of BrS-60 (79% sequence similarity of SCR and 91% sequence similarity of SRK E1) ( Figures S1 and S2). Phylogenetic analysis results also showed that both the SCR and SRK E1 genes in '325' were the closest relatives to those of BrS-60 ( Figure 1D,E). Thus, a new class II S haplotype, named BrS-325, was determined.

Genetic Analysis of SC Trait in '325'
To reveal the genetic basis of the SC trait in '325', reciprocal cross F 1 hybrids were created from the SI parent '326'  and SC parent '325' (BrS-325), and pollination behavior displayed self-incompatibility ( Table 1). The aniline blue staining results showed that a large amount of pollen grains germinated and passed through the stigma in the SC mutant. However, few pollen grains germinated and passed through the stigma in the SI parent '326'and F 1 hybrids (Figure 2A). Siliques in SC '325' were dramatically longer than that of the SI parents and F 1 hybrids (1.29 and 1.48 cm per silique) ( Figure 2B,C). Average seed-set of SI parents and F 1 hybrids were 1.14 and 1.76 seeds per silique, respectively, which was dramatically less than the SC mutant (18.47 seeds per silique) ( Figure 2B,D). Based on the reciprocal cross F 1 hybrid phenotypes, this meant that '325' had a recessive nucleus mutation. In the F 2 population, originating from the self-pollination of the F 1 hybrid, pollination phenotypes were segregated into SI:SC = 70:22 (3:1, χ 2 = 0.094, p < 0.05), indicating that the SC trait in '325' was controlled by a single genetic locus. S haplotypes in the F 2 population were segregated into S-12S-12: S-12S-325: S-325S-325 =20:50:22 (1:2:1, χ 2 = 0.73, p < 0.05), and all the pollination phenotypes were consistent with the genotypes ( Figure 2E and Table 1). All the results show that BrS-325 was responsible for the SC in '325'.

Nanopore Sequencing and De Novo Assembly the Genome of SC Line '325'
To obtain complete genomic information for BrS-325 and discover the underlying mechanism of SC in '325', we carried out Nanopore sequencing and next de novo assembly to obtain the pak choi genome [67]. Here, we produced a 376. 70 Mb genome assembly, containing 314 contigs with an N50 size of 4.54 Mb, which was longer than previous B. rapa genome assemblies (Tables 2, S1 and S2) [66]. To determine the accuracy and (B) self-pollination of inbred line '325' by observing silique development, scale bar = 2 cm; (C) S haplotype identification with class I and class II universal primer pairs, PS5 and PS15 and PK1 and PK4 were used for class I S haplotypes identification, PS3 and PS21 and SRK IIE4L and E7R were used for class II S haplotypes identification, with Actin7 as the control. (D) and (E) phylogenetic analysis of class II S locus genes SCR and SRK E1. An unrooted phylogenetic tree was constructed using the neighbor-joining method. Bootstrap values from 1000 replicates are shown.

Genetic Analysis of SC trait in '325'
To reveal the genetic basis of the SC trait in '325', reciprocal cross F1 hybrids were created from the SI parent '326' (BrS-12) and SC parent '325' (BrS-325), and pollination behavior displayed self-incompatibility (Table 1). The aniline blue staining results showed that a large amount of pollen grains germinated and passed through the stigma in the SC mutant. However, few pollen grains germinated and passed through the stigma in the SI parent '326'and F1 hybrids (Figure 2A). Siliques in SC '325' were dramatically longer than that of the SI parents and F1 hybrids (1.29 and 1.48 cm per silique) ( Figure 2B,C). Average seed-set of SI parents and F1 hybrids were 1.14 and 1.76 seeds per silique, respectively, which was dramatically less than the SC mutant (18.47 seeds per silique) ( Figure 2B,D). (C) S haplotype identification with class I and class II universal primer pairs, PS5 and PS15 and PK1 and PK4 were used for class I S haplotypes identification, PS3 and PS21 and SRK IIE4L and E7R were used for class II S haplotypes identification, with Actin7 as the control. (D,E) phylogenetic analysis of class II S locus genes SCR and SRK E1. An unrooted phylogenetic tree was constructed using the neighbor-joining method. Bootstrap values from 1000 replicates are shown. To obtain complete genomic information for BrS-325 and discover th mechanism of SC in '325', we carried out Nanopore sequencing and next d (D) seeds per silique of the SC line '325', the SI line '326' and the hybrid ('325' × '326') after selfpollination, n = 10 siliques. Error bars indicate SE. The significance was analyzed by Student's t-test (** p < 0.01); (E) S genotypes identification and SI phenotypes analysis in partial representative plants of the F 2 population with class I (PK1and PK4) and class II (SRKII-E4Land &SRKII-E7R) S haplotype specific primer pairs. SC: self-compatible, SI: self-incompatible. H: class I and class II S haplotypes, I: class I S haplotype, II: class II S haplotype.  We then performed gene prediction in our assembled genome. Approx 52.65% of the genome sequences were annotated as repetitive sequences, including TE (15.60%), retrotransposons (42.41%), and unclassified elements (3.41%). The ann retrotransposons mainly consisted of the long terminal repeat retrotransposon (LT which covered 120 Mb and occupied 31.96% of the assembled genome (Table S5). I 43,855 genes of the coding protein were identified, which had an average size of 2 and consisted of an average of five exons (Table S6). About 42,500 genes could b tated by utilizing the foreign protein database, which could annotate 96.91% of t dicted genes (Table S7). Meanwhile, a total of 1431 tRNA, 1623 rRNA, 241 mic (miRNA), and 1397 small nuclear RNA (snRNA) genes were identified (Table S8). T uate the accuracy and completeness of the gene annotation, we carried out BUSCO ment. The results showed that approximately 97.4% of the embryophyte genes co We then performed gene prediction in our assembled genome. Approximately 52.65% of the genome sequences were annotated as repetitive sequences, including DNA-TE (15.60%), retrotransposons (42.41%), and unclassified elements (3.41%). The annotated retrotransposons mainly consisted of the long terminal repeat retrotransposon (LTR-TE), which covered 120 Mb and occupied 31.96% of the assembled genome (Table S5). In total, 43,855 genes of the coding protein were identified, which had an average size of 2650 bp and consisted of an average of five exons (Table S6). About 42,500 genes could be annotated by utilizing the foreign protein database, which could annotate 96.91% of the predicted genes (Table S7). Meanwhile, a total of 1431 tRNA, 1623 rRNA, 241 microRNA (miRNA), and 1397 small nuclear RNA (snRNA) genes were identified (Table S8). To evaluate the accuracy and completeness of the gene annotation, we carried out BUSCO assessment. The results showed that approximately 97.4% of the embryophyte genes could be detected in our pak choi genome (Table S9). These results indicate that we gained a complete and functionally annotated pak choi assembled genome.

S Locus Organization of BrS-325
Based on the cloned sequences of SCR and SRK from '325', homologous genes were blasted in our assembled genome. Finally, we obtained a 145 kb sequence from contig, named ctg00061, which covered the whole BrS-325 locus and boundary regions ( Figure 4). In ctg000061, the SCR homologous gene BrSCR-325 was found to be located 16,673,067 to 16,673,465 bp of the assembled genome, consisting of two exons and one intron. Meanwhile, we blasted the SLG homologous sequence of BrSLG-60 [32] in ctg000061, and BrSLG-325 was identified 41.173 kb downstream of BrSCR-325, which contained two exons and one intron. Interestingly, two SRK genes, named BrSRK-325a and BrSRK-325b, were found to be located in the intergenic region between BrSCR-325 and BrSLG-325. Both BrSRK-325a and BrSRK-325b consisted of seven exons and six introns, coding 855 AA and 856 AA, respectively. There were just 61 different amino acids in BrSRK-325a and BrSRK-325b ( Figure S4B). They showed opposite transcriptional directions and shared a 280 bp common promoter in a 1946 bp intergenic region ( Figure S3). Based on the precursor sequences of BrS60-Smi and BrS60-Smi2, BrS60-SMI and BrS60-SMI2 [21], respectively, we blasted the homologous sequence in ctg000061, and only BrS325-SMI2 was identified. In addition, many repeat sequences and genes were also annotated in the upstream and downstream regions of the S locus ( Figure 4). In a word, we successfully identified the SI-related genes in BrS-325, including one BrSCR-325, one BrSLG-325, and two SRK genes (BrSRK-325a and BrSRK-325b) ( Figure 4 and Table S10).
Based on the cloned sequences of SCR and SRK from '325', homologous genes were blasted in our assembled genome. Finally, we obtained a 145 kb sequence from contig, named ctg00061, which covered the whole BrS-325 locus and boundary regions ( Figure  4). In ctg000061, the SCR homologous gene BrSCR-325 was found to be located 16,673,067 to 16,673,465 bp of the assembled genome, consisting of two exons and one intron. Meanwhile, we blasted the SLG homologous sequence of BrSLG-60 [32] in ctg000061, and BrSLG-325 was identified 41.173 kb downstream of BrSCR-325, which contained two exons and one intron. Interestingly, two SRK genes, named BrSRK-325a and BrSRK-325b, were found to be located in the intergenic region between BrSCR-325 and BrSLG-325. Both BrSRK-325a and BrSRK-325b consisted of seven exons and six introns, coding 855 AA and 856 AA, respectively. There were just 61 different amino acids in BrSRK-325a and BrSRK-325b ( Figure S4B). They showed opposite transcriptional directions and shared a 280 bp common promoter in a 1946 bp intergenic region ( Figure S3). Based on the precursor sequences of BrS60-Smi and BrS60-Smi2, BrS60-SMI and BrS60-SMI2 [21], respectively, we blasted the homologous sequence in ctg000061, and only BrS325-SMI2 was identified. In addition, many repeat sequences and genes were also annotated in the upstream and downstream regions of the S locus ( Figure 4). In a word, we successfully identified the SIrelated genes in BrS-325, including one BrSCR-325, one BrSLG-325, and two SRK genes (BrSRK-325a and BrSRK-325b) ( Figure 4 and Table S10).

Comparative Analysis of the S locus of Two II Class S Haplotypes
As the SI genes in BrS-325 showed the highest sequence similarity with those of BrS-60 (Figures S1 and S2), we carried out S locus comparative analysis between them. By means of a homologous sequence alignment, two sequences covering the whole S locus of BrS-60 were obtained; one had a length of 147 kb from B. rapa cultivars 'Z1' [33] and the other had a length of 155 kb from B. rapa cultivars 'Chiifu' [32] (Table 3), in which contrast phenotypes were not attributed to the S locus variation [68]. Previous reports suggested that high sequences polymorphism existed in the S locus region, while sequences outside the boundary were highly conservative [69]. Through sequence alignment and collinearity analysis, the S locus was determined. The S locus of BrS-325 covered 67.777 kb, and that of BrS-60 covered 73.764 kb and 54.448 kb in 'Chiifu' and 'Z1', respectively (Table 3). In the S locus regions, except for BrSRK-325a, the distribution of SI-related genes was completely conserved, with SCR lying upstream of SRK, and SLG lying downstream of SRK ( Figure 5). Additionally, all the SI-related genes showed conserved transcription directions ( Figure 5). By comparing the intergenic region between SCR and SLG, we found that the spacing distance reached 41.173 kb, containing two duplicated SRK genes in the BrS-325 locus, while the distance reached 41.192 and 24.828 kb in the BrS-60 locus of 'Chiifu' and 'Z1', respectively. Referring to the BrSRK-325b gene location in the BrS-325 locus, the

Comparative Analysis of the S locus of Two Class II S Haplotypes
As the SI genes in BrS-325 showed the highest sequence similarity with those of BrS-60 (Figures S1 and S2), we carried out S locus comparative analysis between them. By means of a homologous sequence alignment, two sequences covering the whole S locus of BrS-60 were obtained; one had a length of 147 kb from B. rapa cultivars 'Z1' [33] and the other had a length of 155 kb from B. rapa cultivars 'Chiifu' [32] (Table 3), in which contrast phenotypes were not attributed to the S locus variation [68]. Previous reports suggested that high sequences polymorphism existed in the S locus region, while sequences outside the boundary were highly conservative [69]. Through sequence alignment and collinearity analysis, the S locus was determined. The S locus of BrS-325 covered 67.777 kb, and that of BrS-60 covered 73.764 kb and 54.448 kb in 'Chiifu' and 'Z1', respectively (Table 3). In the S locus regions, except for BrSRK-325a, the distribution of SI-related genes was completely conserved, with SCR lying upstream of SRK, and SLG lying downstream of SRK ( Figure 5). Additionally, all the SI-related genes showed conserved transcription directions ( Figure 5). By comparing the intergenic region between SCR and SLG, we found that the spacing distance reached 41.173 kb, containing two duplicated SRK genes in the BrS-325 locus, while the distance reached 41.192 and 24.828 kb in the BrS-60 locus of 'Chiifu' and 'Z1', respectively. Referring to the BrSRK-325b gene location in the BrS-325 locus, the intergenic region of SCR and SRK reached 19.706 kb in the BrS-325 locus, while an identical spacing distance was observed in the BrS-60 locus of 'Chiifu' and 'Z1'. There was a big difference in the intergenic region of SRK and SLG. The spacing distance reached 15.384 kb in the BrS-325 locus, while the distance reached 26.943 and 11.533 kb in the BrS-60 locus of 'Chiifu' and 'Z1', respectively (Table 3). LTR-TE, DNA-TE and other-TEs were widely distributed in the intergenic regions of S locus ( Figure 5). Meanwhile, a DNA-TE inserted into the intron of BrSLG-325 was identified. Taken together, similar S-intergenic regions with high polymorphism in the S locus and which were relatively conservative in boundary regions existed in different S haplotypes ( Figure 5). All of these results further indicate that BrS-325 is a new class II S haplotype.

Expression Analysis of SRK and SCR
Normal expression of SRK and SCR is a critical first step for the interaction between the receptor and the ligand, which could be supported by SRK and SCR expression suppression [27,39,40,70,71]. To detect whether the SC phenotype of '325' was caused by a change in the expression of SI genes, RT-qPCR was performed in the SC line '325', the SI line '326' (BrS-12 haplotype), and their F 1 hybrid. BrSCR-325 showed a higher expression level in the anther of the SC line '325' than that of BrSCR-12 in the SI line '326', while in the F 1 hybrid, the expression of BrSCR-325 was hardly detected ( Figure 6A). The silence of BrSCR-325 in the F 1 hybrid may be attributed to the suppression effect of the dominant class I BrS-12 haplotype in the SI line'326' [72]. In the stigma side of '325'and F 1 hybrid, the expression of BrSRK-325a was hardly detected, but BrSRK-325b was expressed normally ( Figure 6B). All the results indicate that, at the transcriptional level, the SI genes BrSCR-325 and BrSRK-325b are expressed normally in SC line '325'.  (Table 3). LTR-TE, DNA-TE and other-TEs were widely distributed in the intergenic regions of S locus ( Figure 5). Meanwhile, a DNA-TE inserted into the intron of BrSLG-325 was identified. Taken together, similar S-intergenic regions with high polymorphism in the S locus and which were relatively conservative in boundary regions existed in different S haplotypes ( Figure 5). All of these results further indicate that BrS-325 is a new class II S haplotype.    (Table S10) [73]. To understand whether amino acid sequence variations contribute to the phenotype change, we carried Plants 2021, 10, 2815 9 of 20 out the multiple sequence alignment of the class II SCR and SRK in B. rapa. The amino acid sequence identities among SCRs were 57% to 73% (Table S11), and among the class II SRKs, they were very conservative, from 86.67% to 94.62% (Table S12). Evolutionary analysis results showed that the phylogenetic tree was divided into two branches, Arabidopsis species and Brassica species. It meant that a closer relationship was identified in different species within the same genus. Furthermore, the branches could be divided into two groups, class I and class II, based on the sequences similarity of Brasssica species. The interspecific pairs were each other's closest relatives with higher sequence similarity, and BrS-325 was the relative closest to BrS-60 ( Figure 7E,F).
Normal expression of SRK and SCR is a critical first step for the interaction between the receptor and the ligand, which could be supported by SRK and SCR expression suppression [27,39,40,70,71]. To detect whether the SC phenotype of '325' was caused by a change in the expression of SI genes, RT-qPCR was performed in the SC line '325', the SI line '326' (BrS-12 haplotype), and their F1 hybrid. BrSCR-325 showed a higher expression level in the anther of the SC line '325' than that of BrSCR-12 in the SI line '326', while in the F1 hybrid, the expression of BrSCR-325 was hardly detected ( Figure 6A). The silence of BrSCR-325 in the F1 hybrid may be attributed to the suppression effect of the dominant class I BrS-12 haplotype in the SI line'326' [72]. In the stigma side of '325'and F1 hybrid, the expression of BrSRK-325a was hardly detected, but BrSRK-325b was expressed normally ( Figure 6B). All the results indicate that, at the transcriptional level, the SI genes BrSCR-325 and BrSRK-325b are expressed normally in SC line '325'.  (Table S10) [73]. To understand whether amino acid sequence variations contribute to the phenotype change, we carried out the multiple sequence alignment of the class II SCR and SRK in B. rapa. The amino acid sequence identities among SCRs were 57% to 73% (Table S11), and among the class II SRKs, they were very conservative, from 86.67% to 94.62% (Table S12). Evolutionary analysis results showed that the phylogenetic tree was divided into two branches, Arabidopsis species and Brassica species. It meant that a closer relationship was identified in different species within the same genus. Furthermore, the branches could be divided into two groups, class I and class II, based on the sequences similarity of Brasssica species. The interspecific pairs were each other's closest relatives with higher sequence similarity, and BrS-325 was the relative closest to BrS-60 ( Figure 7E,F). As we know, the hypervariable region of SRK and contact regions were necessary for the interaction between SCRs and SRKs with the same S haplotypes [73][74][75]. Based on class II eSRKs and SCRs structure models [75], we compared the contact regions (CR I, CR II, and CR III) of class II SCRs and the hypervariable regions (HV I, HV II, and HV III) of class II SRKs (Figures 7 and S4). Sequence alignment showed that some variable amino acids existed in CR II and CR III of BrSCR-325, in which the amino acid sites were identical in other known class II SCRs, including amino acid alteration (e.g., Met62, Thr75, Ser78 in BrS-325) and amino acid deletion (e.g., Pro83 in BrS-60 and Arg80 in BrS-44).
Referring to the BrSRK-8/BrSRK-9 structure [74,75] and hypervariable regions of class II SRKs [75], we carried out amino acid sequence alignment of class II SRKs ( Figure S4). The results showed that the twelve conserved cysteine residues were intact, and several amino acid variations presented in the S domain of BrSRK-325 ( Figure S4). The amino acid variation analysis of hypervariable regions showed that HV I and HV II of BrSRK-325 had very high polymorphism, while the HV III variations had very low polymorphism ( Figures 7B-D and S4). It is worth noticing that three continuous amino acid deletions (e.g., Phe219, Leu220, Asn221) and two amino acid alterations (e.g., Phe220 and Met222 in BrSRK-325a and Tyr220 and Val222 in BrSRK-325b) were observed in the HV I, (Figures 7B and S4B). Many variation residues were distributed in the HV II, including Ile277, Ser286, Arg291, Gln292, Gly300, Tyr303 and Phe305 in BrSRK-325b and Thr291, Gly299, Tyr302, and Phe304 in BrSRK-325a ( Figures 7C and S4B), which may be involved in the interaction of BrSCR-325 and BrSRK-325 and BrSRK-325 homo-dimerization. These observations indicate that these amino acid variations may influence the binding of BrSRK-325 and BrSCR-325. As we know, the hypervariable region of SRK and contact regions were nece the interaction between SCRs and SRKs with the same S haplotypes [73][74][75]. class II eSRKs and SCRs structure models [75], we compared the contact regions II, and CR III) of class II SCRs and the hypervariable regions (HV I, HV II, and H class II SRKs (Figures 7 and S4). Sequence alignment showed that some variab acids existed in CR II and CR III of BrSCR-325, in which the amino acid sites were in other known class II SCRs, including amino acid alteration (e.g., Met62, Thr75 BrS-325) and amino acid deletion (e.g., Pro83 in BrS-60 and Arg80 in BrS- 44).
Referring to the BrSRK-8/BrSRK-9 structure [74,75] and hypervariable re class II SRKs [75], we carried out amino acid sequence alignment of class II SRK S4). The results showed that the twelve conserved cysteine residues were intact, eral amino acid variations presented in the S domain of BrSRK-325 ( Figure S4). T

Discussion
The self-incompatible system is one of the best-known physiological outbreeding devices, which can discriminate self-/non-self pollen to maintain genetic diversity and prevent self-inflicted recessions [1]. The SI phenomenon is very significant, not only for basic research, but also for crop breeding. In the past several decades, research has mainly focused on collecting materials, identifying S haplotypes, self-incompatible determinant factors, and self-incompatible signal cascade factors [10,11,36,[53][54][55][76][77][78]. However, the mysterious molecular mechanism of SI still remains largely unknown. In Brassica species, many S haplotypes have been identified, including more than100 S haplotypes in B. rapa [25], about 50 S haplotypes in B. oleracea [26] and several S haplotypes in B. napus [27,28]. However, there are just four class II S haplotypes in B. rapa  and three class II S haplotypes in B. oleracea (BoS-2b, BoS-5, and BoS-15) [29]. In this study, a new class II S haplotype, named BrS-325, was identified in a pak choi line'325' (Figure 1). Through Nanopore sequencing and de novo assembly strategy, we obtained the complete pak choi genome of'325' (Figure 3 and Table 2). The entire S locus of BrS-325 was extracted from the assembled genome, and functional annotation was performed (Figure 4 and Table S10). Sequence alignment of all the class II S haplotypes showed that BrS-325 had high sequence similarity with other class II S haplotypes. The phylogenetic tree showed that the S haplotypes of Arabidopsis species and Brassica species could be divided into independent branches, based on the sequence polymorphism of SCR and SRK, and BrS-60 was the closest relative to BrS-325 (Figures 7 and S4). As seen in a previous report [31], the distribution and transcription directions of SI related genes were completely conserved in class II S haplotypes ( Figure 5). Previous reports indicated that the genetic effects of the class II S haplotypes are masked, when making heterozygotes with any class I S haplotype [38]. In our study, BrSCR-325 showed a high level of expression in the anther of the SC line '325', while the expression of BrSCR-325 was hardly detected in the F 1 hybrids (created from the SI line '326'and the SC line '325', which possess class I BrS-12 and class II BrS-325, respectively) ( Figure 6A). The silence of BrSCR-325 in the F 1 hybrids may be attributed to the suppression effect of the dominant class I S haplotype BrS-12 in the SI line '326' [20]. All the evidence suggests that BrS-325 is a new class II S haplotype.
The rapid development of genomics technology will facilitate the exploration of the mechanisms of important trait formation in crops. In this study, the inbreed line '325' was a pak choi SC mutant and possessed a new class II S haplotype BrS-325 (Figure 1). Segregation analysis showed that the SC phenotype in '325' was related to BrS-325 ( Figure 2 and Table 1).
To obtain the S locus information and discover the mechanism of SC in '325', a complete pak choi assembled genome was acquired by carrying out nanopore sequencing and a de novo assembly strategy (Figure 3 and Tables S1-S9). There are only a few available genome assemblies for the B. rapa cultivars, including the heading Chinese cabbage type, the yellow sarson oilseed type, and the no-heading pak choi, which concentrated on documenting the morphotypes' phenotypic variations [32,33,66,79]. Some accessions were re-sequenced to investigate the domestication history in B. rapa [80]. Compared with the existed genome assemblies, our pak choi genome assembly had a longer N50 contig and higher coverage (Tables 2, S2 and S3). It provides a valuable resource for comparative genomics research and the genetic improvement of the vegetable and crop in Brassica [81][82][83]. According to the results, the candidate genes for leaf morphology and flowering were identified by comparing gene structure and expression in three morphotypes [66]. This is a good strategy to reveal the SI phenotype change by comparing with S locus variation. BrS-325 was the closest relative to BrS-60, and the available genome provided the chance to compare these two class II S haplotypes. Based on the SRK and SCR sequences and location in the genome, we determined the S locus and boundary regions of BrS-325 and BrS-60 (Figures 4 and 5). Comparative analysis showed that high collinearity emerged in the outer boundary region of S locus, while high sequence polymorphism existed in the S locus ( Figure 5), which was similar to the results in Arabidopsis thaliana [69].
Several SC mutation materials were collected, and the results showed that changing the S locus genes, SRK, and/or SCR, leads to SI phenotype transition [27,[38][39][40]42]. One SCR gene (BrSCR-325) and two duplicated SRK genes (BrSRK-325a and BrSRK-325b) exist in the BrS-325 locus; however, expression analysis results showed that the expression of BrSCR-325 and BrSRK-325b are normal, which may not be responsible for the SC phenotype ( Figure 6). Recently, studies with a SCR and SRK structural basis have helped us to understand the specific recognition response in Brassica [74,75]. The specific recognition of SCR and SRK is mediated through three hypervariable regions (HV I, HV II, HV III) of eSRK and contact regions (CR I, CR II, CR III) of SCR, which predominantly contribute to ∆G for their corresponding eSRK. The specific binding induces eSRK homo-dimerization, forming a 2:2 eSRK:SCR hetero-tetramer [74,75]. Referring to the known SRK and SCR structure [74,75], we aligned the class II SRK and SCR amino acid sequences, and especially focused on the hypervariable regions of SRKs and contact regions of SCRs. The results showed that a number of amino acid variations presented in the HV I and HV II of BrSRK-325 compared to other class II SRKs ( Figures 7B-D and S4), which may influence the interaction between BrSRK-325 and BrSCR-325. Sequence alignment of the class II SCRs showed that some amino acid variations existed in CR II and CR III of BrSCR-325, and these mutations may influence the interaction between BrSCR-325 and BrSRK-325 ( Figure 7). Thus, we speculated that amino acid variations in HV I and HV II of BrSRK-325 and CR II and CR III of BrSCR-325 may destroy the interaction between the receptor and the ligand, leading to the SI phenotype change in B. rapa (Figure 8). The model showed that specific recognition would occur in the contact regions of SCRs and hypervariable regions of SRKs, and lead to self-incompatibility in known class II S haplotypes, while the amino acid variations in contact regions and hypervariable regions may destroy the specific recognition between BrSCR-325 and BrSRK-325 in BrS-325.
x FOR PEER REVIEW 13 of 21 and the ligand, leading to the SI phenotype change in B. rapa (Figure 8). The model showed that specific recognition would occur in the contact regions of SCRs and hypervariable regions of SRKs, and lead to self-incompatibility in known class II S haplotypes, while the amino acid variations in contact regions and hypervariable regions may destroy the specific recognition between BrSCR-325 and BrSRK-325 in BrS-325.  -29), and an SI signal cascade is activated to reject self-pollination pollen. In the right upper panel, the specific recognition was destroyed for some amino acid variations, which exist in the HV I and HV II of BrSRK-325 and CR II and CR III of BrSCR-325, and SC is investigated. The red dotted box indicates the contact region of class II SCRs. Amino acids in contact with SCR and eSRK are indicated by red square boxes. Variable amino acids of BrSCR-325 compared with other class II SCR are shown by yellow dotted boxes. Above the hypervariable region sequence alignment, contact amino acids against the cognate SCR were shown red circles, based on the same sites' contact amino acids of BrSRK-8/BrSCR-8 and BrSRK-9/BrSCR-9, and yellow circles, based on the contact amino acids of class II SRK. The blue circles indicate the residues involved in SRK homo-dimerization. The variable amino acids of BrSRK-325 compared with other class II SRK are shown by purple dotted boxes.
In the Brassica vegetable crop breeding process, SI is one of the major obstructions for the development of inbred lines and the propagation of parent plants. SC inbred lines are very important for the seed production of male sterile hybrid, which can greatly reduce the cost. The manipulation of S locus genes is one of the most widely recognized ways so far to convert SI into SC in Brassica. In this study, we identified a new class II S haplotype BrS-325, which had high sequence similarity with other class II S haplotypes, and was responsible for the complete SC of '325'. We concluded that BrS-325 showed the potential for practical breeding application value in B. rapa. , and an SI signal cascade is activated to reject self-pollination pollen. In the right upper panel, the specific recognition was destroyed for some amino acid variations, which exist in the HV I and HV II of BrSRK-325 and CR II and CR III of BrSCR-325, and SC is investigated. The red dotted box indicates the contact region of class II SCRs. Amino acids in contact with SCR and eSRK are indicated by red square boxes. Variable amino acids of BrSCR-325 compared with other class II SCR are shown by yellow dotted boxes. Above the hypervariable region sequence alignment, contact amino acids against the cognate SCR were shown red circles, based on the same sites' contact amino acids of BrSRK-8/BrSCR-8 and BrSRK-9/BrSCR-9, and yellow circles, based on the contact amino acids of class II SRK. The blue circles indicate the residues involved in SRK homo-dimerization. The variable amino acids of BrSRK-325 compared with other class II SRK are shown by purple dotted boxes.

Plant Materials and Growth Conditions
In the Brassica vegetable crop breeding process, SI is one of the major obstructions for the development of inbred lines and the propagation of parent plants. SC inbred lines are very important for the seed production of male sterile hybrid, which can greatly reduce the cost. The manipulation of S locus genes is one of the most widely recognized ways so far to convert SI into SC in Brassica. In this study, we identified a new class II S haplotype BrS-325, which had high sequence similarity with other class II S haplotypes, and was responsible for the complete SC of '325'. We concluded that BrS-325 showed the potential for practical breeding application value in B. rapa.

Plant Materials and Growth Conditions
The self-incompatible line '326', which contains a class I S haplotype BrS-12 and the self-compatible line '325', was provided by the Vegetable Research Institute of the Wuhan Academy of Agricultural Sciences. The self-incompatible line '326' was crossed with the self-compatible line '325' to produce the segregation population. All the materials were grown in a greenhouse at Huazhong Agricultural University under a light intensity of 100 µmolm −2 s −1 with a 16/8 h light/dark photoperiod at 22 • C.

S Haplotype Determination and Genetic Analysis
Genomic DNA from all individuals was extracted from young leaves as described by Murray and Thompson [81]. The S haplotypes were classified by cloning with the universal primers PS5and PS15 [82] and PK1and PK4 [83] to identify class I S haplotypes and PS3 and PS21 [82] and SRK II E4L and SRK II E7R to identify class II S haplotypes. The polymerase chain reaction (PCR) was performed in a 10 µL reaction system, including 1 µL of DNA, 0.5 µL of each primer at 10 µM, 5 µL of 2× Taq Reaction Buffer (containing Mg 2+ , dNTPs, and DNA polymerase) (Vazyme, Nanjing, Jiangsu Province, China), and 3 µL of distilled water. The amplification program involved one cycle of 95 • C for 5 min, followed by 35 cycles of 95 • C for 30 s, 56 • C for 30 s, and 72 • C for 1 min. PCR products were electrophoretically-fractionated in a 1% agarose gel, stained with ethidium bromide, and then viewed under a UV illuminator. The SCR sequence was determined using the II SCR-1Land II SCR-1R primer pair, and the partial sequence (included first exon and four to seven exons) of SRK was determined using SRK II E1L and E1R and SRK II E4L&E7R in the self-compatible line "325". PCR products were sequenced to determine the S haplotype. To explain the mutation locus of the self-compatible inbred line '325', the F 1 hybrid was created by crossing the SI parent '326' (BrS-12) and the SC parent '325', and self-pollinated to produce the F 2 segregation population. S haplotypes were determined with the specific primer pairs: PK1and PK4 for detecting class I S haplotypes and SRK II E4L and SRK II E7R for detecting class II S haplotypes. PCR was performed under the same conditions mentioned above. Primer information for detecting S haplotypes is listed in Table S14. 4.3. Nanopore Sequencing and Genome Assembly 4.3.1. Genomic DNA Extraction, Nanopore Sequencing, and Genome Assembly Genomic DNA was extracted from the first young leaves of the self-compatible inbred line '325', according to the procedure described by Belser et al [33]. Genomic DNA quality was evaluated by detecting the purity, concentration, and integrity of genomic DNA, and a sequencing library was created according to the procedure described by Belser et al. [33]. Meanwhile, the library quality was evaluated by detecting the accurate quantification using Qubit 3.0 and the size of the library using Agilent 2100. Finally, the library was sequenced using Nanopore MinION according to the amount of target offline data [84]. After gaining sequencing data, we carried out the genome assembly using the next de novo assembly by NextDenovo tool [84][85][86][87], including, error correction, pruning, and assembly. Each step contains the following stages of data processing: (1) splitting the sequencing data according to the set number of files and converting them into bit2 format; (2) using minimap2 [88] for mutual comparison to find overlapping regions between reads and to remove redundant overlap regions; (3) calibrating the reads according to overlap area; (4) using minimap2 to compare the corrected reads again; and (5) based on the results of mutual comparison, adopting the string Graph algorithm for assembly. Next, the assembled contigs were polished using Nanopore reads with Racon 1.4.3 [89], followed by secondary polishing with Pilon 1.23 [90] utilizing the Illumina reads. Finally, the redundancy of genomes would be removed after initial assembly and error correction using the software Purge-Haplotigs. Meanwhile, based on the sequence similarity and the proportion of the redundant part in the total length of contigs, the redundant contigs were identified and removed.
The structure prediction of coding genes was usually combined with a variety of prediction methods, such as homolog prediction [96], de novo prediction (software: Augustuss [97], Genscan, GlimmerHMM, etc.), and cDNA/EST prediction, etc. Meanwhile, transcripts were obtained by comparing the RNA-seq data with Tophat and assembled by Cufflinks [72]. Then, a non-redundant and complete gene set was obtained by integrating predicted gene sets based on MAKER software [98] and integrating the CEGMA results based on the HiCESAP process. Finally, based on the foreign protein databases (SwissProt, TrEMBL, KEGG, InterPro and GO), proteins in the gene sets were functionally annotated. Based on tRNA structural characteristics, tRNA sequences in the genome were searched by using the tRNAscan-SE software [99]. Based on the highly conserved characteristics of rRNA, the rRNA sequence was blasted with related species' rRNA sequences. In addition, miRNA and snRNA sequence information on the genome was predicted using the covariance model of the Rfam family (http://rfam.xfam.org/ accessed on 21 March 2020) [100].

Aniline Blue Staining Assay
The aniline blue staining assay was performed according to the established procedure with minor modifications [54]. The anther was removed the day before flowering. The emasculated stigmas were pollinated with the corresponding pollen on the day of flowering. After 16 h, the pollinated pistils were put into the fixer buffer with anethanol:glacial acetic acid ratio of 3:1 for 2 h, and then the samples were incubated in 1 M NaOH at 65 • C water for 1 h. Next, the pistils were washed three times with distilled water and stained with the basic aniline blue staining (0.1% aniline blue staining in 0.1 M K 3 PO 4 ) for 3 h. The stained pistils were placed on the glass slide in distilled water to detect the pollen germination and pollen tube growth using the blue channel (UV 340-380 nm) of a fluorescence microscope (SP8; Leica, Wetzlar, Germany).

RNA Extraction and RT-qPCR
Anthers were cut from medium-size buds (2 to 3 days before flowering) and stigmas were gained from mature buds (on the day of flowering). Total RNA of stigmas and anthers was isolated from more than 30 stigmas and from six anthers of more than 20 buds using the SV Total RNA Isolation System kit (Promega, Madison, WI, USA) following the manufacturer's instructions. One microgram of total RNA was used for the cDNA synthesis using a PrimeScript TM RT reagent kit (Takara, Tokyo, Japan). The real-time quantitative polymerase chain reaction (RT-qPCR) was performed using 2× SYBR Green master mix (Toyobo, Osaka, Osaka Prefecture, Japan) with the 10 µL reaction system, including 5 µL of 2× SYBR Green master mix, 0.4 µL of each primer at 10 µM and 4.2 µL of 50× diluted cDNA. Amplification was performed using the CFX96 Touch Real-Time PCR Detection System (Bio-Rad, Hercules, CA, USA). The amplification program involved one cycle of 95 • C for 5 min, followed by 45 cycles at 95 • C for 15 s, 60 • C for 20 s, and 72 • C for 30 s. After each run, a melting curve was performed by heating up the samples from 60 to 95 • C. All analyses were repeated with three biological replicates. Actin gene (GenBank accession No: AF111812) was used as an internal control and actin was used to normalize transcript levels for all expression analyses [101]. Significant differences were calculated by Student's t-test. The primer pairs coming from S locus genes (SRK and SCR) are developed. All primers are listed in Table S15.

Sequence Alignment and Phylogenetic Analysis
CDS and amino acid sequences of class II SRK and SCR were obtained from the NCBI and their GenBank or entry name are listed in Table S13. The promoter sequence of BrSRK-60 came from the genome of 'Chiifu' [32]. MEGA-X [102] program was used to perform multiple sequence alignment, and the results were modified with geneDoc (http://nrbsc.org/gfx/genedoc 26 August 2015). Class II SCR drafts were drawn referring to the research results [75] and class II SRK drafts were drawn referring to the report [74]. Phylogenetic trees were constructed using MEGA-X software [102] using the neighborjoining method and a bootstrap test that was replicated 1000 times.

Synteny Assay of Two Class II S Locus
To display the synteny relationships of two class II S haplotypes, BrS-60 was obtained from the genome of 'Chiifu' [32] and 'Z1' [33], which possessed the same S haplotype and a contrasting SI phenotype, and BrS-325 was obtained from our assembled genome. Collinearity analysis of S locus sequences was performed using the software: MuMmer [103] and JCVI [104]. The S locus collinearity analysis of BrS-325 and BrS-60 was performed with the MuMmer tool. Then, visualizations of the S locus collinearity analysis were displayed using a JCVI tool.

Self-Incompatibility Phenotype Assay
The self-incompatibility phenotype was measured as in [105]. When three to five flowers were set on the major inflorescence, a bag was used to cover the major inflorescence and two or three secondary branches after cutting the opened flower and the apical buds for self-pollination. The bags were administered gently every two days to facilitate selfpollination. After two weeks, the bags were removed to allow the growth of seeds. After seeds were mature, the number of seeds was counted, and the self-compatibility index (SCI) was calculated as the ratio of the number of seeds to the number of flowers [106]. Plants with SCI ≥ 2 were referred to as self-compatible, and plants with SCI < 2 were considered self-incompatible. The significant difference was calculated by Student's t-test.

Conclusions
In conclusion, we identified a new class II S haplotype in Brassica, which displayed completely self-compatibility. It is a good resource for breeding practices due to its recessiveness to all class I S haplotypes [20]. Here, we gained a high-quality reference genome, which provides the opportunity to carry out comparative genomics research on different traits [66]. In this study, compared with other class II SRK and SCR, we found many amino acid variations in BrSRK-325 and BrSCR-325, which may influence the interaction between the ligand and the receptor. Based on the above results, we inference that amino acid variations in critical sites may lead to self-incompatibility destruction in a new class II S haplotype in B. rapa.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/plants10122815/s1: Figure S1: Multiple nucleotide sequence alignment of the class II SCR, Figure S2: Multiple nucleotide sequence alignment of the first exon in class II SRKs, Figure S3: Promoter sequence alignment of class II SRKs, Figure S4: Multiple amino acid sequence alignment of class II SRKs, Table S1: Statistics for the genome assembly results of the self-compatible line '325', Table S2: Statistics for the sequencing results of the genome assembly, Table S3: Blasting of the second-generation reads and third-generation reads of the genome assembly, Table S4: Statistics of genome completeness in the SC pak choi genome assemblies according to BUSCO, Table S5: Statistics of repetitive sequences in the self-compatible line '325', Table S6: Statistics of gene annotation in the self-compatible line '325', Table S7: Statistics the genes functional annotation in the self-compatible line '325', Table S8: Statistics of non-coding RNA annotation in the self-compatible line '325', Table S9: Statistics of gene annotation completeness in the SC pak choi assembled genome according to BUSCO, Table S10: Function prediction of S locus and boundary regions in BrS-325, Table S11: Amino acid sequence similarity of class II SCR, Table S12: Amino acid sequence similarity of class II SRK. Table S13: The gene information of the known class II SRK and SCR in B. rapa and B. oleracea, Table S14: Primers designed to detect the S haplotypes based on the SLG, SRK and SCR sequence, Table S15: Primers designed to detect the expression of SRK and SCR.

Conflicts of Interest:
The authors declare that they have no conflict of interest.