Comprehensive Analysis of Arabinogalactan Protein-Encoding Genes Reveals the Involvement of Three BrFLA Genes in Pollen Germination in Brassica rapa

Arabinogalactan proteins (AGPs) are a superfamily of hydroxyproline-rich glycoproteins that are massively glycosylated, widely implicated in plant growth and development. No comprehensive analysis of the AGP gene family has been performed in Chinese cabbage (Brassica rapa ssp. chinensis). Here, we identified a total of 293 putative AGP-encoding genes in B. rapa, including 25 classical AGPs, three lysine-rich AGPs, 30 AG-peptides, 36 fasciclin-like AGPs (FLAs), 59 phytocyanin-like AGPs, 33 xylogen-like AGPs, 102 other chimeric AGPs, two non-classical AGPs and three AGP/extensin hybrids. Their protein structures, phylogenetic relationships, chromosomal location and gene duplication status were comprehensively analyzed. Based on RNA sequencing data, we found that 73 AGP genes were differentially expressed in the floral buds of the sterile and fertile plants at least at one developmental stage in B. rapa, suggesting a potential role of AGPs in male reproductive development. We further characterized BrFLA2, BrFLA28 and BrFLA32, three FLA members especially expressed in anthers, pollen grains and pollen tubes. BrFLA2, BrFLA28 and BrFLA32 are indispensable for the proper timing of pollen germination under high relative humidity. Our study greatly extends the repertoire of AGPs in B. rapa and reveals a role for three members of the FLA subfamily in pollen germination.


Introduction
Arabinogalactan proteins (AGPs) are a class of hydroxyproline-rich proteoglycan compounds that are widely present in higher plants [1]. AGPs consist of variable core protein backbones covered by carbohydrate side chains rich in arabinose and galactose, with a molecular mass of about 60-300 kDa. For most of the AGPs, the core protein structure occupies less than 10% of the mature molecule, while more than 90% is related to the carbohydrate side chains [1,2]. The complexity arising from the incredible diversity of the glycans decorating the protein backbone makes AGPs a large complex family in higher plants. To date, a total of 85 and 282 putative AGPs have been identified in the model plant Arabidopsis thaliana (hereafter, Arabidopsis) and rice, respectively [3][4][5][6][7][8][9][10][11].
According to the domain constitutions of the core proteins, AGPs are divided into classical AGPs, AG-peptides, lysine (Lys)-rich AGPs, chimeric AGPs (CAGPs) and non-classical AGPs [10]. Classical AGPs usually consist of (i) an N-terminal signal peptide sequence, (ii) a central domain with variable length and rich in proline (Pro), alanine (Ala), serine

Comprehensive Analysis of Putative AGP Genes in B. rapa
All the annotated putative BrAGPs in B. rapa were gathered and summarized based on previous studies, including 33 classical AGPs, 28 AG-peptides, three Lys-rich classical AGPs, 33 FLAs and 48 PLAs (28 BrENODLs,11 BrUCLs,8 BrSCLs and 1 unknown PLA) (Table S1) [51,55,56]. Those CAGPs that were not defined as an AGP for lacking N-terminal signal peptide came back in our consideration. The orthologous genes of all the annotated AtAGPs were searched in the Brassica database (BRAD, http://brassicadb.cn, V3.0, accessed on 1 December 2021). In this study, the total number of AtAGPs was 151 and included 23 classical AGPs, three Lys-rich classical AGPs, 16 AG-peptides, 22 FLAs, 28 PLAs, 22 XYLPs, 32 other CAGPs, four HAEs and one non-classical AGP (Table S2) [3,6,10,16,57]. In addition, BLASTP was performed in the BRAD using AGP-like sequences (part of whole protein sequences) filtered in Arabidopsis and B. rapa, respectively [10]. An overview of our integrated strategy to identify whole AGP families is shown in Figure 1. high relative humidity. Our study provides a resource for studying AGPs in B. rapa and reveals a role for AGPs in male reproductive development.

Comprehensive Analysis of Putative AGP Genes in B. rapa
All the annotated putative BrAGPs in B. rapa were gathered and summarized based on previous studies, including 33 classical AGPs, 28 AG-peptides, three Lys-rich classical AGPs, 33 FLAs and 48 PLAs (28 BrENODLs,11 BrUCLs,8 BrSCLs and 1 unknown PLA) (Table S1) [51,55,56]. Those CAGPs that were not defined as an AGP for lacking N-terminal signal peptide came back in our consideration. The orthologous genes of all the annotated AtAGPs were searched in the Brassica database (BRAD, http://brassicadb.cn, V3.0, accessed on 4 Dec 2021). In this study, the total number of AtAGPs was 151 and included 23 classical AGPs, three Lys-rich classical AGPs, 16 AG-peptides, 22 FLAs, 28 PLAs, 22 XYLPs, 32 other CAGPs, four HAEs and one non-classical AGP (Table S2) [3,6,10,16,57]. In addition, BLASTP was performed in the BRAD using AGP-like sequences (part of whole protein sequences) filtered in Arabidopsis and B. rapa, respectively [10]. An overview of our integrated strategy to identify whole AGP families is shown in Figure 1. Orthologous genes and syntenic searches of all the annotated putative AGPs in Arabidopsis thaliana and B. rapa were performed in the Brassica database (BRAD) to filter out AGP candidates. Additional BLASTP searches using AGP-like sequences were conducted in BRAD. Classical AGPs were defined as having greater than 50% PAST coupled with the presence of AG glycomodules and an N-terminal signal sequence. AG-peptides were defined to be between 50 and 90 amino acids in length with biased amino acid compositions of at least 35% PAST coupled with the presence of AG glycomodules and an N-terminal signal sequence. The presence of a GPI anchor addition sequence was used to provide added support for the identification of classical AGPs and AG-peptides. Chimeric AGPs were defined as possessing conservative domains coupled with the localized distribution of putative AG glycomodules. AGP/extensin hybrids were defined as containing two or more SP3-5 repeats and a characteristic used to identify AGPs. The remaining AGP candidates with biased amino acid compositions of at least 50% PAST were deemed as non-classical AGPs.
Initially, 356 B. rapa AGP candidates were filtered out, among which 32 candidates were obtained by searching for proteins between 50 and 90 amino acids in length with biased amino acid compositions of at least 35% PAST (Table S3). Bra030409 (BrAGP42) was excluded here due to a PAST percentage below threshold and the absence of AG glycomodules. The presence of an N-terminal signal sequence provided additional supports for all AG-peptides except Bra040832 and Bra008021. In total, 30 potential AG-peptides Orthologous genes and syntenic searches of all the annotated putative AGPs in Arabidopsis thaliana and B. rapa were performed in the Brassica database (BRAD) to filter out AGP candidates. Additional BLASTP searches using AGP-like sequences were conducted in BRAD. Classical AGPs were defined as having greater than 50% PAST coupled with the presence of AG glycomodules and an N-terminal signal sequence. AG-peptides were defined to be between 50 and 90 amino acids in length with biased amino acid compositions of at least 35% PAST coupled with the presence of AG glycomodules and an N-terminal signal sequence. The presence of a GPI anchor addition sequence was used to provide added support for the identification of classical AGPs and AG-peptides. Chimeric AGPs were defined as possessing conservative domains coupled with the localized distribution of putative AG glycomodules. AGP/extensin hybrids were defined as containing two or more SP [3][4][5] repeats and a characteristic used to identify AGPs. The remaining AGP candidates with biased amino acid compositions of at least 50% PAST were deemed as non-classical AGPs.
Initially, 356 B. rapa AGP candidates were filtered out, among which 32 candidates were obtained by searching for proteins between 50 and 90 amino acids in length with biased amino acid compositions of at least 35% PAST (Table S3). Bra030409 (BrAGP42) was excluded here due to a PAST percentage below threshold and the absence of AG glycomodules. The presence of an N-terminal signal sequence provided additional supports for all AG-peptides except Bra040832 and Bra008021. In total, 30 potential AG-peptides were defined, the majority of which (28 of 30) were found to be GPI-anchored proteins.
Two newly identified AG-peptides, Bra008765 and Bra023457, were designated as BrAGP46 and BrAGP47, respectively.
The remaining candidates longer than 90 amino acids in length were subjected to confirm the existence of conversed domains for the identification of CAGPs. Searching of the HMMERScan, Pfam (PF02469), SMART (SM00554) and InterPro (IPR000782) identified 39 AGP candidates predicted to possess fasciclin (FAS) domains (Table S3). The proteins with FAS domains were next manually scanned for the presence of putative AG glycomodules. It is noteworthy that three proteins (Bra009905, Bra034511 and Bra036584) were excluded from the FLA subfamily due to the lack of AG glycomodules. Ultimately, 36 BrFLA genes were identified in B. rapa. Three newly identified BrFLA genes, Bra018207, Bra007193 and Bra023589, were named as BrFLA34, BrFLA35 and BrFLA36, respectively. A further algorithm predicted that two (BrFLA34 and BrFLA35) of these lack N-terminal secretion signal sequences and 23 of them (63.89%) are predicted to be GPI-anchored to the plasma membrane. In addition, 35 BrFLAs were predicted to have putative N-glycosylation sites.
Using the HMMERScan, Pfam (PF02298) and InterPro (IPR003245), a total of 65 plastocyanin-like (PCNL) domain-containing proteins were identified in B. rapa initially (Table S3). To check whether those identified proteins possess AG glycomodules, we manually analyzed Pro-containing sequence motifs in each of them, and discovered that all but six (Bra024208, Bra023459/BrENODL30, Bra009934/BrENODL16, Bra008763/BrENODL14, Bra021481 and Bra019625) contain these glycomodules. These 59 proteins with putative AG glycomodules were identified as BrPLAs, of which 53 were predicted to contain N-terminal signal sequences required for conventional protein secretion, but six (BrENODL7, BrEN-ODL40, BrENODL48, BrENODL52, BrUCL4 and BrSCL2) did not. In addition, 54 BrPLAs were expected to have GPI anchor signal sequences and 39 had putative N-glycosylation sites. Based on motif constitution, all BrPLAs described here could be divided into four subfamilies: 35 ENODLs (including a newly identified BrPLA gene, Bra025873, named as BrENODL53), nine SCLs, 13 UCLs and two unknown PCNL-containing proteins (Bra019044 and Bra018282).
To identify XYLPs in B. rapa, we performed orthologous gene search of all the annotated AtXYLPs and additional BLASTP searches in the BRAD. After confirming the presence of non-specific lipid transfer protein 2 (nsLTP2) domains by submission to the HMMERScan, Pfam (PF14368), SMART (SM00499) and InterPro (IPR016140), 41 proteins were yielded (Table S3). According to the presence of a well-conserved eight-cysteine motif [7,9], Bra030067 and Bra039574 were excluded from the list of XYLP subfamily but classified as other CAGPs, and named as BrCAGP1 and BrCAGP2. It is noteworthy that the residue is conserved as a hydrophobic residue, always leucine, isoleucine or valine between Cys5 (C5) and Cys6 (C6) (Table S4). We then manually scanned all these BrXYLP protein sequences to ensure the presence of AG glycomodules. Putative AG glycomodules in all BrXYLPs except for six (Bra030873, Bra012033, Bra015985, Bra015984, Bra008102 and Bra027016) were found to be distributed before and/or after the nsLTP2 domain (Table S4), suggesting that 33 BrXYLPs may be chimeric AGPs. We named them as BrXYLP1-BrXYLP33.
Analogously to already classified chimeric subfamilies, such as FLA, PLA and XYLP, extra types of CAGPs were explored. A total of 100 candidates with other conservative domains and indispensable AG glycomodules were obtained by the integrated screening method of AGP prediction (Table S3). Typically, there were 25 CAGPs with protein kinase (PKinase)-like domains, 11 CAGPs with glycosyl hydrolase (GH)-like domains, nine CAGPs with formin homology 2 (FH2)-like domains, seven CAGPs with glycerophosphoryl diester phosphodiesterase (GDPD)-like domains, seven CAGPs with pollen Ole e I (POeI)-like domains, four CAGPs with leucine-rich repeat (LRR)-like domains, five CAGPs with X8like domains and five CAGPs with pectin methylesterase inhibitor (PMEI)-like domains. It is noteworthy that BrAGP55, documented as a classical AGP in the literature [55], was classified into the list of CAGP subfamily here as it possessed a Nsp1-like C-terminal region. Among these newly identified CAGPs, seven members (Bra013339, Bra002969, Bra017734, Bra027650, Bra020846, Bra038210 and Bra022423) also have characteristics of EXTs with the presence of the SP 3 tetrapeptide and SP 4 pentapeptide motifs. We termed them as chimeric AGP-EXT hybrids (BrCHAE1-BrCHAE7). The remaining CAGPs are named as BrCAGP3-BrCAGP94. The majority (83 of 100) were predicted to have a signal peptide, whereas only some (34 of 100) were predicted to have a GPI anchor.
Besides the seven BrCHAEs identified above, additionally three HAEs (Bra030020, Bra009880 and Bra014023) were yield with a biased amino acid composition of PAST greater than 50% and sequence characteristics of both AGPs and EXTs, and termed as BrHAE1, BrHAE2 and BrHAE3 (Table S3).
After removing proteins belonging to AG-peptides, CAGPs and HAEs, the remining sequences were retained for further analysis. Initially, 25 classical AGP candidates were searched for biased amino acid compositions of at least 50% PAST and the presence of N-terminal signal sequences and putative AG glycomodules, among which three (BrAGP17, BrAGP18.1 and BrAGP18.2) are Lys-rich AGPs (Table S3). The presence of a GPI-modified anchor sequence provided additional supports for all classical AGPs except Bra011948 (BrAGP54.1). Four candidate proteins, Bra027649 (BrAGP52), Bra027648 (BrAGP53), Bra035700 (BrAGP54.2) and Bra038294 (BrAGP57), previously identified as AGPs by [55], were excluded from the list of classical AGPs here due to the biased amino acid composition below the PAST threshold. In addition, three additional classical AGPs, Bra032796 (BrAGP50.1), Bra010966 (BrAGP50.2) and Bra012505 (BrAGP50.3), are below the 50% PAST threshold but were identified by searching the Brassica protein database annotations and the presence of putative AG glycomodules, N-terminal signal sequences and GPI-modified anchor sequences. These three AGPs were added to the list of classical AGPs. Then, PAST threshold, signal peptide and AG glycomodule filtration resulted in two non-classical AGPs, Bra040103 (BrAGP58.1) and Bra021074 (BrAGP58.2).
In conclusion, a total of 293 putative BrAGP genes were screened out here, including 25 classical AGPs, three Lys-rich AGPs, 30 AG-peptides, 36 FLAs, 59 PLAs, 33 XYLP, 102 other CAGPs, two non-classical AGPs and three HAEs (Table S3). The classical AGPs ranged in size from 115 to 339 amino acids; the AG-peptides ranged in size from 58 to 90 amino acids; the FLAs ranged in size from 201 to 457 amino acids; the PLAs ranged in size from 118 to 713 amino acids; the XYLPs ranged in size from 146 to 387 amino acids; other CAGPs ranged in size from 141 to 1071 amino acids. Compared with the results in previous studies [51,55,56], 153 additional putative BrAGPs were screened out. These newly identified BrAGPs included two AG-peptides, three FLAs, 11 PLAs, 33 XYLP genes, 101 other CAGP genes and three HAEs (Table S3). The protein backbones of newly identified BrAGPs, including the predicted locations of their signal peptides, GPI anchor addition sequences and conversed domains, are displayed in Table S4.

Phylogenetic Analysis of BrAGPs and AtAGPs
Comparative sequence analysis of AGP genes in Arabidopsis and B. rapa was performed to explore their evolution and divergence. In total, 151 AGPs in Arabidopsis and 293 in B. rapa were aligned (Tables S2 and S3) [3,6,10,16,57]. An unrooted neighbor-joining phylogenetic tree was generated from the alignments of full-length protein sequences among BrAGPs and AtAGPs ( Figure 2). Based on their sequence homology, all of the AGPs were clustered into distinct clades according to different subfamilies. (E) Xylogen-like proteins (XYLPs). The neighbor-joining tree was constructed using MEGA X software with 1000 bootstraps. The OmicStudio tool was used to display and manipulate trees.
A previous study clustered 55 classical AGPs from B. rapa and Arabidopsis into six clades and 44 AG-peptides into three clades [55]. Similarly, our results showed that 25 classical BrAGPs, three Lys-rich BrAGPs, 23 classical AtAGPs and three Lys-rich AtAGPs can also be classified into six clades with high bootstrap support ( Figure 2A). All members belonging to Clade I contained motif 1, motif 2 and motif 3, except for AtAGP1 and AtAGP10, which only had motif 1 and motif 2. BrAGP54.1, AtAGP54 and AtAGP57 in Clade II possessed motif 1 and motif 3, while AtAGP52 and AtAGP53 in Clade VI only had motif 2. Clade III could be further mainly classified into three types: a type with motif 1, motif 2 and motif 5 (BrAGP6, BrAGP11.1, BrAGP11.2, AtAGP6 and AtAGP11), another type with motif 1, motif 2 and motif 4 (BrAGP50.1, BrAGP50.2, BrAGP50.3 and AtAGP50) and At3g27416 that only included motif 1. All Lys-rich AGPs are divided into Clade IV, most members of which contained motif 1, motif 2 and motif 3. Nine of ten AGPs in Clade V had motif 1. All predicted locations of motifs are presented in Figure S1.
Phylogenetic analysis of 30 AG-peptides from B. rapa together with 16 AG-peptides from Arabidopsis led to five clades, among which Clade IV and Clade V were new ( Figure  2B). Two members (BrAGP24 and AtAGP24) in Clade III, five members (BrAGP44, BrAGP45, AtAGP42, AtAGP44 and AtAGP45) in Clade IV and all four members (BrAGP15.1, BrAGP15.2, BrAGP15.3 and AtAGP15) in Clade V were previously defined as orphan genes in [55]. Motif analysis showed that almost all AG-peptides possessed motif 6 and motif 8, and most of them additionally had motif 7 ( Figure S2).
Protein-based phylogenetic analysis and pair-wise sequence comparison of 36 BrFLAs and 22 AtFLAs were conducted. Four clades could be distinguished ( Figure 2C), which was consistent with previous studies in Arabidopsis and Chinese cabbage [4,56]. Newly identified BrFLA34 was placed in Clade II, whereas BrFLA35 and BrFLA36 both belonged to Clade III. Depending on the motif analysis, all BrFLAs and AtFLAs possessed (E) Xylogen-like proteins (XYLPs). The neighbor-joining tree was constructed using MEGA X software with 1000 bootstraps. The OmicStudio tool was used to display and manipulate trees.
A previous study clustered 55 classical AGPs from B. rapa and Arabidopsis into six clades and 44 AG-peptides into three clades [55]. Similarly, our results showed that 25 classical BrAGPs, three Lys-rich BrAGPs, 23 classical AtAGPs and three Lys-rich AtAGPs can also be classified into six clades with high bootstrap support ( Figure 2A). All members belonging to Clade I contained motif 1, motif 2 and motif 3, except for AtAGP1 and AtAGP10, which only had motif 1 and motif 2. BrAGP54.1, AtAGP54 and AtAGP57 in Clade II possessed motif 1 and motif 3, while AtAGP52 and AtAGP53 in Clade VI only had motif 2. Clade III could be further mainly classified into three types: a type with motif 1, motif 2 and motif 5 (BrAGP6, BrAGP11.1, BrAGP11.2, AtAGP6 and AtAGP11), another type with motif 1, motif 2 and motif 4 (BrAGP50.1, BrAGP50.2, BrAGP50.3 and AtAGP50) and At3g27416 that only included motif 1. All Lys-rich AGPs are divided into Clade IV, most members of which contained motif 1, motif 2 and motif 3. Nine of ten AGPs in Clade V had motif 1. All predicted locations of motifs are presented in Figure S1.
Phylogenetic analysis of 30 AG-peptides from B. rapa together with 16 AG-peptides from Arabidopsis led to five clades, among which Clade IV and Clade V were new ( Figure 2B). Two members (BrAGP24 and AtAGP24) in Clade III, five members (BrAGP44, BrAGP45, AtAGP42, AtAGP44 and AtAGP45) in Clade IV and all four members (BrAGP15.1, BrAGP15.2, BrAGP15.3 and AtAGP15) in Clade V were previously defined as orphan genes in [55]. Motif analysis showed that almost all AG-peptides possessed motif 6 and motif 8, and most of them additionally had motif 7 ( Figure S2).
Protein-based phylogenetic analysis and pair-wise sequence comparison of 36 BrFLAs and 22 AtFLAs were conducted. Four clades could be distinguished ( Figure 2C), which was consistent with previous studies in Arabidopsis and Chinese cabbage [4,56]. Newly identified BrFLA34 was placed in Clade II, whereas BrFLA35 and BrFLA36 both belonged to Clade III. Depending on the motif analysis, all BrFLAs and AtFLAs possessed motif 11, and all members in Clade IV additionally included four motifs (motif 12-motif 15) ( Figure S3).
A previous study indicated that the 48 BrPCs, which were CAGPs with putative AG glycomodules, can be divided into seven groups [51]. Here, we found that 59 BrPLAs and 28 AtPLAs can also be classified into seven clades, named Clade I-Clade VII ( Figure 2D). There were 12, 10, 6, 8, 4, 6 and 13 BrPLAs in each clade, respectively ( Figure 2D). All Br-PLAs in Clade I, Clade II and Clade III belonged to the ENODL subfamily except an unknown PCNL-containing protein Bra019044 ( Figure 2D). Almost all of them (41 in 43) had motif 16-motif 19, the majority of which also contained motif 20 ( Figure S4). All BrU-CLs were clustered into Clade IV and Clade VI, where Clade IV had only UCL members and Clade VI contained an additional ENODL ( Figure 2D). Motif constitution analysis showed the presence of motif 16-motif 18 in most of the UCLs in Clade IV ( Figure S4). All members in Clade VI possessed motif 16-motif 19 except for BrENODL52, which only had motif 19 ( Figure S4). There were three BrSCLs with motif 16-motif 19 and one BrENODL in Clade V ( Figures 2D and S4). Clade VII consisted of 18 members, including 12 BrENODLs and one unknown PCNL-containing protein Bra018282. Except for BrSCL2, all members in Clade VII contained motif 19 ( Figure S4).
Previous studies indicated that the 11 AtXYLPs belonging to the AGP family can be divided into four clades [7,9]. Here, manual analysis of the phylogenetic tree revealed seven distinct clades of 33 BrXYLPs, two nsLTP2-containing BrCAGPs and 22 AtXYLPs ( Figure 2E). Clade I was composed of 14 members, each of which had motif 21, motif 22 and motif 23. Clade II consisted of six members with motif 22 and motif 23, and three members (BrXYLP31, AtXYLP10 and AtXYLP13) also had motif 21. All eight members in Clade III and all eleven members in Clade VII had motif 21 and motif 22. All BrXYLPs in Clade V and Clade VI contained motif 21, motif 22 and motif 24 except BrCAGP1 that lacked motif 22. All predicted locations of motifs are shown in Figure S5.
Phylogenetic analysis of the remaining BrCAGPs was also conducted and contributed to the study of evolutionary relationships among these proteins with other conservative domains, such as the PKinase-like domain, GH-like domain, FH2-like domain and GDPD-like domain ( Figure S6A,B). In addition, a protein-based phylogenetic tree of seven BrCHAEs, three BrHAEs and four AtHAEs was generated to aid the relationship investigation of putative AGPs with characteristics of EXTs ( Figure S6C).

Chromosomal Distribution of BrAGP Genes and Gene Duplication
According to the information of the approximate position obtained from the BRAD, each BrAGP gene was marked on the physical map of B. rapa chromosomes. A total of 286 BrAGP genes were randomly distributed among the 10 chromosomes of B. rapa, except for seven BrAGP genes (BrAGP9.2, BrAGP11.2, BrAGP12.2, BrUCL16, BrENODL52, BrXYLP30 and BrCAGP2), for which only scaffold information was available ( Figure 3). In total, 38 genes were located on chromosome A01, 20 genes on chromosome A02, 53 genes on chromosome A03, 26 genes on chromosome A04, 27 genes on chromosome A05, 21 genes each on chromosomes A06 and A08, 18 genes on chromosome A07, 40 genes on chromosome A09 and 22 genes on chromosome A10 (Table S3 and Figure 3). The BrAGP genes were distributed across the three sub-genomes, with 120 BrAGP genes in LF (the least fractionated blocks of B. rapa), 92 in MF1 (the medium fractionated blocks of B. rapa) and 78 in MF2 (the most fractionated blocks of B. rapa), while three members, BrENODL52, BrXYLP30 and BrCAGP2, had no hits in BRAD (Table S5). Among these BrAGPs, 42 genes had no syntenic orthologs in the Arabidopsis genome (Table S5), which could have been originated by gene transposition in B. rapa after its divergence from Arabidopsis [58]. In total, there were 244 BrAGPs syntenic to 156 genes in Arabidopsis (Table S5), suggesting that most BrAGP genes were retained in B. rapa after whole-genome triplication and fractionation. Furthermore, the retained homologous BrAGP gene copies in the three B. rapa sub-genomes had the same conserved collinear blocks (Figure 3 and Tables S3 and S5). The occurrence of genome triplication in B. rapa was clearly observed and well supported, as 24 AtAGP genes in the Arabidopsis genome had three syntenic copies in B. rapa after removing the redundancy of duplicated tandem genes (keeping one gene from each tandem array), such as AtAGP50, AtAGP15, AtFLA14, AtENOLD1 and AtXYLP12 ( Figure 3 and Table S5). A total of 138 Arabidopsis genes, including 131 AtAGPs, had one or two syntenic orthologs in B. rapa. Additionally, there were 24 AtAGPs with no syntenic counterparts in B. rapa, such as AtAGP7, AtAGP19, AtFLA2, AtENODL4 and AtHAE3 (Table S5).
In the light of the large size of the BrAGP gene family in B. rapa, the importance of segmental duplication of chromosomal regions and tandem duplication generating nearby gene copies to the size and evolution of the BrAGP gene family was investigated. We found that 194 (66.21%) of the 293 BrAGP genes resulted from duplications, of which 183 were located in the duplicated chromosomal segments of the B. rapa chromosomes, such as BrAGP1.1 and BrAGP1.2; BrFLA2, BrFLA28 and BrFLA32; BrXYLP4 and BrXYLP5 (Table S5 and Figure 3). These results showed that segmental gene duplication was the preferential reason for the expansion of this gene family. Additionally, 15 BrAGP genes were found to be tandemly duplicated: eight tandemly duplicated genes were distributed on chromosome A09, two genes each on A04 and A05 and five genes on A03 (Tables S3 and S5). In all cases of tandem duplications, the amino acid similarities were ≥70%, except for two pairs (BrSCL3 and BrSCL4, BrCAGP27 and BrCAGP28) that only shared 37.11% and 44.38% amino acid similarities. Interestingly, four groups of BrAGP genes (group I: BrXYLP15, BrXYLP16, BrXYLP17, BrXYLP18 and BrXYLP19; group II: BrCAGP6, BrCAGP7 and BrCAGP8; group III: BrCAGP27, BrCAGP28 and BrCAGP29; group IV: BrCAGP81, BrCAGP82, BrCAGP83, BrCAGP84 and BrCAGP85) were expanded through both segmental and tandem duplications (Table S5).

BrAGP Genes Are Developmentally Regulated and Male Fertility-Related
In view of our previous focus on pollen development and function and pollen tube growth, here we were more concerned with AGPs that may be candidates for these unique processes. To investigate whether the identified AGP-encoding genes are male fertilityrelated, we estimated the expression levels of each BrAGP gene in floral buds at five developmental stages by calculating fragments per kilobase of transcript per million mapped reads (FPKMs) based on the previous RNA sequencing datasets. We found that 73 BrAGP genes had at least differential expression in at least one stage of the fertile floral buds compared with the sterile ones (Table 1). These AGPs include nine classical AGPs, one Lys-rich AGP, eight AG-peptides, ten FLAs, 12 PLAs, 12 XYLPs, 17 CAGPs, two HAEs and two non-classical AGPs. These data show the dynamic changes of AGPencoding gene expression in response to floral bud development and suggest their roles in pollen ontogenesis.

Expression of Three BrFLA Genes Is Tissue-Specific and Developmentally Controlled
The finding that some AGP-encoding genes are dysregulated in the sterile floral buds relative to the fertile ones of some specific periods prompted us to study the biological importance of such a family. We focused on three FLA-encoding genes, BrFLA2 (Bra001464), BrFLA28 (Bra034746) and BrFLA32 (Bra038741). The full-length DNA and cDNA sequences were obtained by PCR amplification (Figure S7A-F). The result of the corresponding cDNA and DNA product sequencing confirmed that BrFLA2, BrFLA28 and BrFLA32 are 804 bp, 813 bp and 792 bp in length, respectively, with no introns in their sequences. Their protein sequences deduced from the DNA sequence are 267, 270 and 263 amino acids in length, with an amino acid similarity ≥73.04% to each other and high homology (≥82.07%) with Arabidopsis AtFLA14 ( Figure S7G). BrFLA2, BrFLA28 and BrFLA32 are expected to have an N-terminal signal peptide sequence and a C-terminal GPI anchor signal sequence ( Figure 4). The PAST content of their core protein backbone is about 40%. They all possess a FAS domain with lengths of 130 aa, 122 aa and 121 aa, which contain H1 and H2 conserved regions (Figures 4 and S7G).  According to the above analysis, these three BrFLA genes were specially expressed in floral buds at late microspore developmental stages and down-regulated in the flora buds of sterile plants (Table 1), suggesting their potential involvement in reproductive development. We validated the expression pattern of these three BrFLA genes in flora According to the above analysis, these three BrFLA genes were specially expressed in floral buds at late microspore developmental stages and down-regulated in the floral buds of sterile plants (Table 1), suggesting their potential involvement in reproductive development. We validated the expression pattern of these three BrFLA genes in floral buds at different developmental stages using quantitative real-time PCR. We found that BrFLA2, BrFLA28 and BrFLA32 showed similar patterns of expression, and they were only abundant in floral buds at the mature pollen stage, particularly in stamens ( Figure 5B,C). The determination of the expression pattern in various vegetative tissues such as roots, leaves, stems and other reproductive tissues such as germinal siliques, pollinated and unpollinated pistils 1, 3 and 10 h after pollination (HAP) further revealed that the expression of BrFLA2, BrFLA28 and BrFLA32 was stamen-specific ( Figure 5A-D). Overall, BrFLA2 expression levels were lower than those of BrFLA28 and BrFLA32.
Additionally, to investigate the expression of these three BrFLA genes at the tissue level, the expression of the GUS gene under the control of the BrFLA2, BrFLA28 and BrFLA32 promoter, respectively, was examined in transgenic Arabidopsis plants.

BrFLA2, BrFLA28 and BrFLA32 Proteins Are Localized at the Plasma Membrane and in the Hechtian Strands
To assess the subcellular distribution of BrFLA2, BrFLA28 and BrFLA32 proteins, these three BrFLA genes were truncated to form eGFP-fused constructs under the control

BrFLA2, BrFLA28 and BrFLA32 Proteins Are Localized at the Plasma Membrane and in the Hechtian Strands
To assess the subcellular distribution of BrFLA2, BrFLA28 and BrFLA32 proteins, these three BrFLA genes were truncated to form eGFP-fused constructs under the control of the constitutive CaMV 35S promoter. Onion epidermal cells harboring eGFP-BrFLA2, eGFP-BrFLA28 and eGFP-BrFLA32 showed an intense peripheral fluorescence ( Figure 6E,F,I,J,M,N). In plasmolyzed eGFP-BrFLA2 cells, the eGFP fluorescence occurred only at the surface of the protoplast rather than in the cell wall ( Figure 6G,H). Intriguingly, the eGFP-BrFLA2 signal was also obvious in the Hechtian strands (plasma membrane extensions connecting the cell wall and the plasma membrane) after the membrane was retracted during plasmolysis in 0.3 g·mL -1 sucrose. The cases in onion epidermal cells expressing eGFP-BrFLA28 and eGFP-BrFLA32 displayed a similar and clearly visible localization of eGFP at the plasma membrane and in Hechtian strands that remained attached to the cell wall ( Figure 6I-P), while the pFGC-eGFP (Control) was exclusively membrane-localized ( Figure 6A-D).
the eGFP-BrFLA2 signal was also obvious in the Hechtian strands (plasma membrane extensions connecting the cell wall and the plasma membrane) after the membrane was retracted during plasmolysis in 0.3 g·mL -1 sucrose. The cases in onion epidermal cells expressing eGFP-BrFLA28 and eGFP-BrFLA32 displayed a similar and clearly visible localization of eGFP at the plasma membrane and in Hechtian strands that remained attached to the cell wall ( Figure 6I-P), while the pFGC-eGFP (Control) was exclusively membranelocalized ( Figure 6A-D).

Suppression of BrFLA2, BrFLA28 and BrFLA32 Genes' Expression Promotes Pollen Grain Germination under High Humidity
Since BrFLA2, BrFLA28 and BrFLA32 share a high homology and a similar expression pattern with each other, to study the exact biological function of these three BrFLA genes, we generated RNA interference (RNAi) transgenic plants ( Figure 7A). We selected three lines (1, 2 and 3) that revealed marked down-regulation of BrFLA2, BrFLA28 and BrFLA32 expression in the transgenic plants ( Figures 7B and S8) for further investigation. The BrFLA2/28/32-RNAi transgenic plants and control plants showed no differences in vegetative tissues (data not shown). Additionally, RNAi transgenic plants did not show abnormal phenotypes in reproductive organs, such as stamens and gynoecium ( Figure S9). In standard growth conditions under long-day light (16 h of light/8 h of dark) with 50% relative humidity, dehiscent pollen ( Figure S10A-D,I-L) from RNAi plants had normal morphological appearance for microscopy, showed active pollen vitality and pollen germination frequencies in vitro and displayed normal fluorescence in the three nuclei detected by 4′,6-diamidino-1-phenylindole (DAPI) solution and normal cellulose distribution seen by calcofluor fluorescent white stain, compared with control pollen (Figure S10E-H,M-P).
A case in AGP40, a pollen-specific AG-peptide, in which the null mutant shows no alteration in pollen grain development but illustrates a reduction in pollen grain fitness [59], and an early pollen germination in agp6 agp11 null mutants inside the anther [40] gave us a clue and encouraged anther experiments to further investigate this phenotype.

Suppression of BrFLA2, BrFLA28 and BrFLA32 Genes' Expression Promotes Pollen Grain Germination under High Humidity
Since BrFLA2, BrFLA28 and BrFLA32 share a high homology and a similar expression pattern with each other, to study the exact biological function of these three BrFLA genes, we generated RNA interference (RNAi) transgenic plants ( Figure 7A). We selected three lines (1, 2 and 3) that revealed marked down-regulation of BrFLA2, BrFLA28 and BrFLA32 expression in the transgenic plants ( Figures 7B and S8) for further investigation. The BrFLA2/28/32-RNAi transgenic plants and control plants showed no differences in vegetative tissues (data not shown). Additionally, RNAi transgenic plants did not show abnormal phenotypes in reproductive organs, such as stamens and gynoecium ( Figure S9). In standard growth conditions under long-day light (16 h of light/8 h of dark) with 50% relative humidity, dehiscent pollen ( Figure S10A-D,I-L) from RNAi plants had normal morphological appearance for microscopy, showed active pollen vitality and pollen germination frequencies in vitro and displayed normal fluorescence in the three nuclei detected by 4 ,6-diamidino-1-phenylindole (DAPI) solution and normal cellulose distribution seen by calcofluor fluorescent white stain, compared with control pollen ( Figure S10E-H,M-P).
A case in AGP40, a pollen-specific AG-peptide, in which the null mutant shows no alteration in pollen grain development but illustrates a reduction in pollen grain fitness [59], and an early pollen germination in agp6 agp11 null mutants inside the anther [40] gave us a clue and encouraged anther experiments to further investigate this phenotype. We placed the BrFLA2/28/32-RNAi transgenic plants and control plants in chambers with an 85% relative humidity. After 3 d, dehiscent pollen was gathered and stained with aniline blue. Light microscopy revealed that approximately 9.5% of pollen grains germinated precociously in the anthers of BrFLA2/28/32-RNAi transgenic plants, with prominent callose fluorescence signals in the pollen tubes detected by aniline blue (Figure 7C,D), whereas dehiscent pollen from control plants had negligible callose staining within the grains ( Figure 7F,G). Scanning electron microscopy further confirmed the result of aniline blue staining that early germinated pollen tubes from BrFLA2/28/32-RNAi transgenic plants were discernible ( Figure 7E) and there was no pollen tube outgrowth in the anthers of control plants ( Figure 7H). This dramatic phenotype suggests that high humidity can affect the BrFLA2/28/32-RNAi phenotype, enhancing early pollen germination in the anthers. grains ( Figure 7F,G). Scanning electron microscopy further confirmed the result of aniline blue staining that early germinated pollen tubes from BrFLA2/28/32-RNAi transgenic plants were discernible ( Figure 7E) and there was no pollen tube outgrowth in the anthers of control plants ( Figure 7H). This dramatic phenotype suggests that high humidity can affect the BrFLA2/28/32-RNAi phenotype, enhancing early pollen germination in the anthers.

Discussion
As a first step in research on gene families and as a useful strategy in general, a genomewide analysis of the AGP gene family has been performed in a variety of higher plants, such as Arabidopsis, tobacco (Nicotiana tabacum), rice (Oryza sativa), tomato (Lycopersicon esculentum), as well as mosses, ferns and even brown algae [3][4][5][6][8][9][10][11][21][22][23][24][25][26][27]. Although some special subfamilies of AGPs in B. rapa have been identified, such as classical AGPs, AG-peptides, FLAs and PLAs [10,51,55,56], no comprehensive study has been carried out on the whole of the AGP gene family in B. rapa and the information was fragmented. In this study, after the calculation of PAST content and the length of amino acid sequence length, confirming the presence of conserved domains and N-terminal signal peptides and manual prediction of the putative AG glycomodules, we identified 293 AGP genes in B. rapa, including 25 classical AGPs, three Lys-rich AGPs, 30 AG-peptides, 36 FLAs, 59 PLAs, 33 XYLPs, 102 other CAGPs, two non-classical AGPs and three HAEs (Figure 1 and Table S3). We classified BrAGP genes into several distinct clades according to different subfamilies based on their sequence homology (Figures 2 and S1-S6).
Tandem and segmental duplications during evolution represent the major mechanism for gene family expansion [60]. BrAGP genes are randomly located on ten B. rapa chromosomes. The large size of members of the BrAGP gene family indicated that it was retained from frequent duplication events during evolution. Gene duplication analysis revealed that 183 (62.46%) of the 293 BrAGP genes were derived from segmental duplication and six clusters of tandemly duplicated genes (17 BrAGP genes) were present on chromosomes A03, A04, A05 and A09 (Table S5 and Figure 3). Our analyses indicated that chromosomal segment duplications contributed most to the expansion of this gene family.
Several studies have suggested that AGPs participate in a multitude of vital activities related to plant growth and development [18,[61][62][63][64][65][66][67][68][69][70]. Expression analysis of all BrAGP genes based on our previous RNA sequencing data revealed that 73 (24.91%) of the 293 BrAGP genes were differentially expressed in floral buds between the sterile and fertile plants at least at one developmental stage, including nine classical AGPs, one Lys-rich AGP, eight AG-peptides, ten FLAs, 12 PLAs, 12 XYLPs, 17 CAGPs, two HAEs and two nonclassical AGPs (Table 1). Thus, it is possible that these AGP members are involved in pollen ontogenesis.
The two sequential phases of angiosperm pollen ontogenesis, a developmental phase leading to mature pollen grain formation and a functional phase beginning with the activation of pollen grain by rehydration on the stigma surface and ending at double fertilization, have been the focus of substantial research because of their biological importance and uniqueness [71]. Among the previous studies on the molecules involved in pollen ontogenesis, several members in AGPs have been highlighted, especially during the crucial and complex process of male reproductive development and function [35,40,44,72]. For example, six AGPs in Arabidopsis, namely two classical AGPs (AGP6 and AGP11), two FLAs (FLA3 and FLA14) and two AG-peptides (AGP23 and AGP40), are specifically expressed in pollen and/or pollen tubes and found to be involved in microspore development as well as subsequent pollen germination [35,39,40,59,72,73]. This is also the case for several pollen-preferentially expressed AGPs derived from B. rapa, an AG-peptide (BAN102) and two classical AGPs (BcMF8 and BcMF18) [44,74]. In rice, several AGP-encoding genes are also dominantly expressed in anthers, among which OsAGP7, OsAGP10 and MTR1 are highly expressed in pollen and a mutation in MTR1 results in defects in both tapetum and microspore development, eventually causing complete male sterility [5,75]. In our study, we characterized three anther, pollen grain and pollen tube-specific BrFLA genes, BrFLA2, BrFLA28 and BrFLA32, and identified BrFLA2/28/32-RNAi transgenic plants that have an increased ratio of pollen grains germinated precociously in the anthers under high humidity compared with control plants (Figures 5 and 7). These results suggested that BrFLA2, BrFLA28 and BrFLA32 are required during the late stage of pollen ontogenesis.

Plant Materials and Growth Conditions
Homozygous male-sterile plants 'Bcajh97-01A' and heterozygous male-fertile plants 'Bcajh97-01B' were cultivated in the experimental farm of Zhejiang University, Hangzhou, China. Arabidopsis, ecotype Columbia-0, was grown on the potting mix at 22 ± 2 • C under long-day conditions with a 16 h light and 8 h dark cycle.

Identification of BrAGP Genes and Bioinformatics Analysis
All the annotated putative AGPs in Arabidopsis and B. rapa referred to previous studies [3,6,16,51,[55][56][57]. Orthologous genes and syntenic region search were performed in the BRAD. Additional BLASTP searches using AGP-like sequences (part of whole protein sequences) filtered by [10] were conducted. The characteristics of AGPs have been widely described, but the only controversy is whether a signal peptide sequence at the N-terminal should be optional. Some special FLA members in wheat (Triticum aestivum) and rice (O. sativa) lack signal peptides [76]. In addition, the identification of AGP32I, a CAGP, lacking signal peptides in Arabidopsis is also challenging our conventional concept to define an AGP [6]. Because of this reason, the candidate chimeric proteins that have been excluded from the AGP category for lacking an N-terminal signal peptide, such as some BrENODLs and BrFLAs [51,56], were reconsidered. However, the presence of an N-terminal secretion signal is constantly thought to be a prerequisite for classical AGPs and AG-peptides in this study.

Phylogenetic Analysis
Multiple alignments of AGP protein sequences in Arabidopsis and B. rapa were generated using Clustal X version 2.0 [81]. Phylogenetic trees were constructed using a neighbor-joining algorithm and drawn using MEGA X software. Bootstrapping was implemented with 1000 replicates. Then, the phylogenetic trees were visualized with the OmicStudio tools at https://www.omicstudio.cn/tool/ (accessed on 1 December 2021). Multiple Em for Motif Elicitation (MEME) (https://meme-suite.org/meme/tools/meme, accessed on 1 December 2021) was used for motif discovery and searching [82].

Chromosomal Localization
The 24 conserved collinear blocks on the ten chromosomes of the B. rapa genome and the five chromosomes of the Arabidopsis genome were based on previous studies [83,84]. These blocks are labeled A to X and are color coded by inferred ancestral chromosomes following established convention [85]. The position on the blocks and the chromosomal location of each AGP gene were obtained from TAIR and BRAD databases. The tool for syntenic gene analysis, SynOrths, embedded in the BRAD database (http://brassicadb.cn/ #/syntenic-gene/, accessed on 1 December 2021), was used to check the duplication of BrAGP genes by searching for homologous genes between Arabidopsis and three B. rapa subgenomes (LF, MF1 and MF2) [58]. The syntenic relationships were illustrated by the OmicStudio tools at https://www.omicstudio.cn/tool/ (accessed on 1 December 2021).

Expression Analysis of BrAGP Genes
Gene expression abundance of 72,169 unigenes (uploaded to the Sequence Read Archive of NCBI; accession number SRP149066) showed by FPKMs was generated from our previous study using RNA sequencing [54]. Differentially expressed BrAGP genes were screened out using the DESeq (2010) R Package. A Benjamini-Hochberg false discovery rate (FDR) <0.05 and a log 2 fold change (FC) ≥ 1 or ≤−1 in each pairwise comparison were assigned as criteria.

Expression Analysis of BrFLA2, BrFLA28 and BrFLA32
The promoter sequences of 1962 bp, 1957 bp and 1962 bp upstream from the first ATG of the coding open reading frame fragments of BrFLA2, BrFLA28 and BrFLA32 were cloned for the eGFP-GUS fusion with gene-specific primers (Table S6), respectively. The PCR product was cloned into the pBI101 vector with enhanced green fluorescent protein (GFP) and β-glucoronidase (GUS) and verified by sequencing. The plasmids were then transformed into Arabidopsis using the floral dip method via Agrobacterium tumefaciens [87]. Transformants were selected using 50 mg·L −1 kanamycin. The backgrounds of the transformants were verified by primers for NPT II (Table S6). GFP fluorescence and GUS signal were detected under an ECLIPSE 90i fluorescent microscope (Nikon, Tokyo, Japan).
For quantitative real-time PCR, RNA was extracted from floral buds at five developmental stages (stage 1: longitudinal diameter ≤1 mm, stage 2-5: longitudinal diameter approximately at 2.2, 2.6, 3.8 and 5.9 mm, respectively; stage 1-5 represent pollen mother cell stage, tetrad stage, uninucleate stage, binucleate stage and mature pollen stage) [88], roots, stems, young leaves, whole inflorescences, siliques, sepals, petals, stamens, gynoeciums and pistils at 1, 3, 10 and 24 HAP. First-strand cDNA was then synthesized using a PrimeScript ® RT reagent Kit with gDNA Eraser (TaKaRa, Beijing, China) as the template. UBC10 was used as the normalization control. Quantitative real-time PCR was performed to analyze the expression level of BrFLA2, BrFLA28 and BrFLA32 with specific primers (Table S6) on a CFX96 Real-time RT-PCR Detection System (Bio-Rad, Hercules, CA, USA). Three technical repeats and three biological replicates were carried out. The values represent means ± standard deviation (SD) of three biological replicates. The relative expression levels were calculated by the 2 −∆∆Ct method [89].
For subcellular localization, signal peptide sequences and the rest fragments of Br-FLA2, BrFLA28 and BrFLA32 full-length complementary DNA were cloned for the GFP fusion with specific primers (Table S6). The PCR products were cloned into the 3 -and 5 -terminuses of GFP in the pFGC vector and transiently transformed into onion epidermal cells by particle bombardment [90]. Twenty-four hours after introduction, onion epidermal cells were plasmolyzed in 0.3 g·mL −1 sucrose for 3 min; the GFP fluorescence of transgenic onion epidermal cells was observed under a LSM780 confocal microscope (ZEISS, Oberkochen, Germany).

Construction of RNAi Recombinant Vector and Plant Transformation
For the RNAi construct, a 518-bp specific sequence of BrFLA32 was amplified and cloned in the sense orientation followed by a 357-bp inverted repeat to create a hairpin transgene into the pBI121 vector with the NPT II reporter gene. The recombinant vector was verified by sequencing and then transformed into Chinese cabbage by the Agrobacterium tumefaciens-mediated method [91]. A binary vector pBI121 was introduced into Chinese cabbage as a control. PCR screening approach was performed with NPT II-specific primers to confirm the RNAi-positive transformants. Quantitative real-time PCR was performed to detect the mRNA levels of BrFLA2, BrFLA28 and BrFLA32 in RNAi transformants and control plants according to the method mentioned above. All primers used are listed in Table S6. Afterward, positive transformants were cultivated in a growth chamber at 22 ± 2 • C under long-day conditions (16 h of light/8 h of dark) with 50% relative humidity. A parallel experiment was carried out at 85% relative humidity with other culture conditions unchanged.

Conclusions
We identified 293 AGP genes from the B. rapa genome and divided them into different subfamilies based on the differences in the domain constituents of their core protein backbone sequences: 25 classical AGPs, three Lys-rich AGPs, 30 AG-peptides, 36 FLAs, 59 PLAs, 33 XYLPs, 102 other CAGPs, two non-classical AGPs and three HAEs. On this basis, we further classified BrAGP genes into several distinct clades based on their evolutionary relationships, and elucidated their genomic characteristics, protein structures, duplication status and expression patterns during five different microspore developmental stages in the 'Bcajh97-01A/B' system. Alterations in BrAGP gene expression levels in at least one stage of the fertile floral buds compared with the sterile ones indicated that BrAGP genes may be involved in floral bud development. RNAi transgenic plants with marked down-regulation of BrFLA2, BrFLA28 and BrFLA32 expression showed early germinated pollen tubes in the anthers under high humidity, suggesting the implication of BrFLAs in pollen ontogenesis. In conclusion, our study has provided insights into the characteristics of BrAGP genes and is a first step in the functional research of BrAGP genes.

Data Availability Statement:
The data presented in this study are openly available in the Sequence Read Archive of NCBI, accession number SRP149066 [54].